Bug 627509 – ODF import: no text:tab, text:s handling

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 627509 - ODF import: no text:tab, text:s handling


Summary:	ODF import: no text:tab, text:s handling


Status:	RESOLVED FIXED

Product:	Gnumeric
Classification:	Applications
Component:	import/export OOo / OASIS
Version:	git master
Hardware:	Other All

Importance:	Normal trivial
Target Milestone:	---
Assigned To:	Andreas J. Guelzow
QA Contact:	Jody Goldberg

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2010-08-20 15:49 UTC by Morten Welinder
Modified:	2011-11-01 01:40 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Rich text sampleA1 contains (7.63 KB, application/vnd.oasis.opendocument.spreadsheet) 2011-10-29 19:04 UTC, Morten Welinder		Details
debug patch (4.96 KB, patch) 2011-10-29 19:05 UTC, Morten Welinder	none	Details \| Review

Description Morten Welinder 2010-08-20 15:49:53 UTC

Unexpected element 'text:tab' in state : 
        document-content -> body -> spreadsheet -> table -> table-row -> table-cell -> annotation -> p -> span

https://bugzilla.gnome.org/attachment.cgi?id=168413

Comment 1 Andreas J. Guelzow 2010-08-20 16:39:26 UTC

This exposes the fact that we don't handle white space correctly on reading. Currently we cannot have mixed elements and content. I am pretty sure that there are some bugs around related to that but I can't find them.

Essentially  gsf_xml_in_doc_parse  needs to be extended to handle things like:
<span> a <text:tab /> b </span>.

Comment 2 Andreas J. Guelzow 2011-10-28 17:32:37 UTC

Other things we can't handle at this time are such as:

<text:p>
<text:span text:style-name="T1">xxx</text:span>
yyy
<text:span text:style-name="T2">zzz</text:span>
</text:p>

Comment 3 Morten Welinder 2011-10-29 19:04:35 UTC

Created attachment 200259 [details]
Rich text sampleA1 contains 

A1 contains "text is bold and italic and underlined" where the three
adjectives are styled as they say.

Comment 4 Morten Welinder 2011-10-29 19:05:25 UTC

Created attachment 200260 [details] [review]
debug patch

Patch for debugging

Comment 5 Morten Welinder 2011-10-29 19:09:05 UTC

I get...

Start of span: [text is ]
End of span: [text is bold]
Start of span: [text is bold]
End of span: [text is bold and ]
Start of span: [text is bold and ]
End of span: [text is bold and italic]
Start of span: [text is bold and italic]
End of span: [text is bold and italic and ]
Start of span: [text is bold and italic and ]
End of span: [text is bold and italic and underlined]

From the first two lines (and style info not printed) we deduce that characters
8-11 should be bold.

I don't think any information is missing, i.e., we should be able to read this.

Comment 6 Morten Welinder 2011-10-29 19:19:58 UTC

Relatedly, this "vile hack" from gsf_xml_in_doc_add_nodes seems to be
what enables it to work by having a bug.  The code and the comment
certainly do not agree and the code effectively does nothing.

			node = g_new0 (GsfXMLInNodeInternal, 1);
			node->pub = *e_node;
			/* WARNING VILE HACK :
			 * The api in 1.8.2 passed has_content as a boolean.
			 * Too many things still use that to change yet.  We
			 * edit the bool here to be GSF_CONTENT_NONE or
			 * GSF_XML_CONTENT and try to ignore SHARED_CONTENT */
			if (node->pub.has_content != 0 &&
			    node->pub.has_content != GSF_XML_SHARED_CONTENT)
				node->pub.has_content = GSF_XML_CONTENT;

Comment 7 Andreas J. Guelzow 2011-10-31 23:33:41 UTC

text:tab and text:s are now correctly handled on import. (So is text:line-break.)

text:span is bug #663135. text:a is bug #603533.

This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.

Comment 8 Andreas J. Guelzow 2011-11-01 01:40:50 UTC

I guess 

Start of span: [text is ]
End of span: [text is bold]

(plus style info) does not mean that  we deduce that characters
8-11 should be bold.

The content does not show other elements such as text:s that may create additional characters.

But nevertheless we can build a stack indicating where we need to start...