GNOME Bugzilla – Bug 725852
ods format roundtrip problems
Last modified: 2014-05-07 15:41:32 UTC
New t6512: It looks like invisible alignment characters -- "_)" here -- are lost. Is that our problem or the format's problem? - <gnm:Style HAlign="GNM_HALIGN_GENERAL" VAlign="GNM_VALIGN_BOTTOM" WrapText="0" ShrinkToFit="0" Rotation="0" Shade="0" Indent="0" Locked="1" Hidden="0" Fore="0:0:0" Back="FFFF:FFFF:FFFF" PatternColor="0:0:0" Format="0.0000_);[Red](0.0000)"> + <gnm:Style HAlign="GNM_HALIGN_GENERAL" VAlign="GNM_VALIGN_BOTTOM" WrapText="0" ShrinkToFit="0" Rotation="0" Shade="0" Indent="0" Locked="1" Hidden="0" Fore="0:0:0" Back="FFFF:FFFF:FFFF" PatternColor="0:0:0" Format="0.0000;[Red](0.0000)">
It also looks like it cannot tell how many digits are required in scientific formats' exponent part.
ODF has number:min-exponent-digits so the problem of comment #1 should be ours. The "_)" may require some workaround.
I have fixed the "_)" issue.
There seems to be an off-by-one problem with regard to the number:min-exponent-digits
This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report. Note that some of the fixes are in goffice.
On the read side for invisible text, can we get more than one character? Or an empty string for that matter? If so, we need to create something like _T_e_x_t. Unrelatedly, I added more tests. I hope ods can handle it, because it's a fairly common class of format. "00000-0000" would be an extended US zip code, for example. - <gnm:Style HAlign="GNM_HALIGN_GENERAL" VAlign="GNM_VALIGN_BOTTOM" WrapText="0" ShrinkToFit="0" Rotation="0" Shade="0" Indent="0" Locked="1" Hidden="0" Fore="0:0:0" Back="FFFF:FFFF:FFFF" PatternColor="0:0:0" Format="000-000-000"> + <gnm:Style HAlign="GNM_HALIGN_GENERAL" VAlign="GNM_VALIGN_BOTTOM" WrapText="0" ShrinkToFit="0" Rotation="0" Shade="0" Indent="0" Locked="1" Hidden="0" Fore="0:0:0" Back="FFFF:FFFF:FFFF" PatternColor="0:0:0" Format="000"--"">
ODF does not support invisible text. (LO converts invisible text that one can define in their interface into blanks when writing files.) So we are using a foreign element (gnm:invisible). Since we are writing one such element per invisible character, _T_E_X_T should happen automatically. I am still thinking about how to best export it in a way that LO sees a space. ODF data formats are very different (and much more restricted) than the format strings we use. We of course could just hide our format string in teh ODF file and round trips would all work magically, but I do not think that is the point. So I really think we should restrict round-trip testing to formats that Gnumeric has an interface for (i.e. 'custom' does not count).
I'm all for restricting roundtrip to a reasonable set. That is, in fact, why I didn't just take formats.xls for this. Feel free to complain if any particular one is too strange and too much work. But there are sensible formats that we call "custom" I think we should handle. One such is 000-000-0000 which LO seems to handle just fine: <number:number-style style:name="N120"> <number:number number:decimal-places="0" number:min-integer-digits="10"> <number:embedded-text number:position="7">-</number:embedded-text> <number:embedded-text number:position="4">-</number:embedded-text> </number:number> </number:number-style> Basically t6512 only makes sense for ods. In the case of gnumeric/xls/xlsx we just store the format string raw in the file, so it's no wonder it roundtrip just fine. I agree that we do not want to do that for ods because it doesn't help us read LO-generated files.
There are ~98k google hits on just this example: https://www.google.com/#q=%22000-000-0000%22+excel+format&safe=off Note also the use of "(000) 000-0000" for phone numbers.
So if these formats are used that often, why don't we have a proper interface for them? I believe Excel has a localized special formats list. (This of course has little to do with the round trip issue.)
Because we pretty much suck at GUIs. Also, I'm not sure how we would communicate what we wanted to the translators. Regrettably we cannot just call an all-hands meeting with the translation department. Excel/US has four formats in the "Special" section: 00000 [zip] 00000-0000 [zip+4] [<=9999999]###-####;(###) ###-#### [Phone] 000-00-0000 [ssn] The format dialog is a particularly difficult beast because of its size and the hard balance between completeness and usability. I sometimes feel half the stuff ought to be hidden until an "Advanced" button is pressed. Fractions would certain go in there.
And in Canada we have only 2 items :-( 000 000 000 [sin] [<=9999999]###-####;###-###-#### [Phone]
The use of plain %g causes truncation of values used in conditions. - <gnm:Style HAlign="GNM_HALIGN_GENERAL" VAlign="GNM_VALIGN_BOTTOM" WrapText="0" ShrinkToFit="0" Rotation="0" Shade="0" Indent="0" Locked="1" Hidden="0" Fore="0:0:0" Back="FFFF:FFFF:FFFF" PatternColor="0:0:0" Format="[=0.0009765625][Red]General;General"> + <gnm:Style HAlign="GNM_HALIGN_GENERAL" VAlign="GNM_VALIGN_BOTTOM" WrapText="0" ShrinkToFit="0" Rotation="0" Shade="0" Indent="0" Locked="1" Hidden="0" Fore="0:0:0" Back="FFFF:FFFF:FFFF" PatternColor="0:0:0" Format="[=0.000976562]0.00;[<0]"−"0.00">
Created attachment 272456 [details] [review] goffice patch This patch to goffice together with the next patch for gnumeric fixes the use of %g in format conditions problem. I haven't committed the since they change the goffice api.
Created attachment 272457 [details] [review] gnumeric patch corresponding gnumeric patch
Hmm... I would rather that goffice figured out the required number of digits by itself.
The %g in format conditions problem is fixed in goffice (we are always using 'double' in format conditions).
Something broken here. A literal "%g" is left in the condition. "value()%g>=0.12345" cannot be right.
And dropping that %g somehow causes loss of some conditions in t6512. Strange.
I'll have a go at the conditional formatting. There's some misunderstanding of the formats in there.
What problem do you see. I think the issue raised by t6512 has mostly to do with 'General'.
The code was assuming that the first part of a conditional format was for positive numbers, the second for negative numbers, the third for zero, and the forth for everything else. That's the default, but only the default. Things like "[>1000][Red]0.00;0.00" are not that uncommon. I have simplified stuff a lot, but letting GOFormat take care of all the conditional stuff. 1. We do not handle colours for general: "[Red]General" 2. Something is strange with inhibiting minus. I *think* LO does it when a style map has been used. We have code to add a display factor of -1, but that isn't right for formats like "[>-12]0.00;0.00". 3. We might have issues with no parts matching. More tests needed.
Created attachment 272954 [details] sample gnumeric file LO seems to like the export of this file to ODF using the new code even less than the previous. Gnumeric does not seem to prefer one over the other but still doesn't do a great job at reading this.
We'll see if I can sort it out; otherwise the wonders of version control makes it possible to roll back.
We are in better shape now and the sample from comment 23 is handled with a result that looks like what LO generates in this way: cond.gnumeric ---(ssconvert)---> cond.xls ---(localc)---> cond.ods
I copied the handling of "General" from LO. The only thing failing now is the 000-00-0000 type of format, although that could change as I improve tests to include more of the standard formats.
Hmm, running t6512 for me now results in: t6512-format.pl: Check format gnumeric roundtrip. --- format-tests.xml 2014-03-31 01:41:46.873571588 -0600 +++ format-tests-new.xml 2014-03-31 01:41:46.885571589 -0600 @@ -40,7 +40,7 @@ <gnm:Sheet DisplayFormulas="0" HideZero="0" HideGrid="0" HideColHeader="0" HideRowHeader="0" DisplayOutlines="1" OutlineSymbolsBelow="1" OutlineSymbolsRight="1" Visibility="GNM_SHEET_VISIBILITY_VISIBLE" GridColor="0:0:0"> <gnm:Name>Sheet1</gnm:Name> <gnm:MaxCol>3</gnm:MaxCol> - <gnm:MaxRow>95</gnm:MaxRow> + <gnm:MaxRow>94</gnm:MaxRow> <gnm:Zoom>1</gnm:Zoom> <gnm:Names> <gnm:Name> diff died due to signal 1
Ah, will fix. It's irrelevant and just requires a re-save of the .gnumeric
We now handle 000-000-000. On the downside, I have temporarily broken invisible text support. Andreas: is it too late to change how we store that? I'd like to simply use a gnm:invisible attribute on number:text or number:embedded text.
I don't think it is to late to change how we store invisible text, I am not sure what you mean with simply using a gnm:invisible attribute on number:text or number:embedded text: If you think about just flagging the text as invisible that would be a really bad idea: other ODF consumers will ignore that attribute and so show the text. We basically have to store the invisible text in a way that other consumers see blanks (or so) while we know what the true characters should be.
I was thinking of something like this: <number:text gnm:invisible="("> </number:text> or <number:embedded-text number:position="7" gnm:invisible="-"> </number:embedded-text> (I hope that survives bugzilla. I both cases we have just whitespace between the tags.) The idea is that LO can process it and just see " " while we can get the attribute and make the right format.
I do not think that the ODF schema allows adjacent number:text in case one has visible text next to invisible text, so that may not work. In the embedded text situation, the ODF schema does not provide any mechanism to order the effect of several embedded-text elements pointing to the same position. So again there are difficulties with invisible and visible text that is adjacent.
Speaking of which... our current import code only handles multiple embedded-text items if they happen to arrive in the right order. A number format, according to the schema, can have precisely two number:text items: one before and one after. So the current invisible handling must be wrong too. That means we really do have to specify this differentlu. For example, <number:text gnm:invisible="4,&(&,10,&)&">moo, oink, baa</number:text> meaning the space at character offset 4 is an invisible "(" and the space at position 10 is an invisible ")". Alternatively: <number:text> moo, oink, baa <gnm:invisible position="4" text="("/> <gnm:invisible position="10" text=")"/> </number:text> (with extra whitespace only for readability here). If you ignore the gnm:invisible you get space.
FAIL: t6104-finfuns-ods.pl (I'm guessing caused by embedded text changes.)
Regarding: A number format, according to the schema, can have precisely two number:text items: one before and one after. So the current invisible handling must be wrong too. The current gnm-invisible handling might have been incompletely or incorrectly implemented, but the idea works in general: The gnm:invisible element acts by replacing the characters preceding it in the surrounding number:text with invisible characters, ie. it acts like <number:text> moo, oink, baa <gnm:invisible position="4" text="("/> <gnm:invisible position="10" text=")"/> </number:text> except that the position is determined by its location inside number:text.
I don't see any failures with t6104: $ jhbuild run ./t6104-finfuns-ods.pl ------------------------------------------------------------------------------- t6104-finfuns-ods.pl: Check the ods exporter. Pass
Of course t6512 is failing because the invisible text part is not working anymore.
Right. That fell off the radar somehow.
We now handle invisible characters again. That leaves this: - [...] Format="_(* 0.00_);_(* (0.00);_(* "-"??_);_(@_)" + [...] Format="_(0.00_);_((0.00);_(-# ;@" Problem 1: we have no idea what to do with "* " here Problem 2: we cannot have invisible characters for text format. Problem 3: quoting "-" or not. (Not a functional problem.)
Problem 2: we cannot have invisible characters for text format. Why not? number:text-style can have number:text as a child element. I assume we are still having the invisible characters stored inside number:text. Problem 3: quoting "-" or not. (Not a functional problem.) inside ODF this should not be an issue.
RE problem 2: we will shortly have code to generate this: <number:text-style style:name="ND-19"> <number:text> <gnm:invisible gnm:char="(" office:process-content="true"> </gnm:invisible> </number:text> <number:text-content/> <number:text> <gnm:invisible gnm:char=")" office:process-content="true"> </gnm:invisible> </number:text> <style:map style:condition="value()>0" style:apply-style-name="ND-19-0"/> <style:map style:condition="value()<0" style:apply-style-name="ND-19-1"/> <style:map style:condition="value()=0" style:apply-style-name="ND-19-2"/> </number:text-style> We seem to read the text-part of this as " @ ", though.
- [...] Format="_(* 0.00_);_(* (0.00);_(* "-"??_);_(@_)" + [...] Format="_(0.00_);_((0.00);_(-# ; @ "
And now we read it: - [...] Format="_(* 0.00_);_(* (0.00);_(* "-"??_);_(@_)" + [...] Format="_(0.00_);_((0.00);_(-# ;_(@_)"
I can't find any support for '*' in ODF, so at this time we may want to add an extension gnm:repeat-char or so...
Unfortunately LO does not seem to understand office:process-content=TRUE.
Hmm... How can we make both LO and something that does understand office:process-content happy?
That's how I ended up with putting what should be the 'content' of the element in front of the element, so everybody will process it and when we handle the gnm:invisible element we know to delete the appropriate number of characters that preceded the element.
I have moved the space back outside. That's a fairly gross hack.
I agree that that is awful, but I couldn't come up with a better way that LO still understood.
Why are we checking: case TOK_INVISIBLE_CHAR: { size_t len = g_utf8_next_char(token + 1) - (token + 1); if (len > 0) { if (phase != 1 && with_extension) { We should write the invisible characters alsowhen we are in phase =1. With this check we get: '_(* "-"??_)' ---> '_(-# '
The problem here is failure to properly detect end of phase 1/2. "_)" should be in phase 3. A true phase-1 case would be 000_-000_-000 and I really don't know what to write for that.
I think the obvious choice would be to use the gnm:invisible element inside the number:embedded-text element in exactly the way we are already using it inside number:text. In both cases we have character content and some of those symbols should be invisible.
We're down to this: - [...] Format="_(* 0.00_);_(* (0.00);_(* "-"??_);_(@_)" + [...] Format="_(0.00_);_((0.00);_(-#_);_(@_)"
LibreOffice saves ?? as #, so currently we are matching that. Nevertheless I think we can do better by adding a gnm:min-integer-characters attribute to number:number. In connection with number:min-integer-digits this would allow us to store things like ??000 as gnm:min-integer-characters="5" number:min-integer-digits="3"
- [...] Format="_(* 0.00_);_(* (0.00);_(* "-"??_);_(@_)" + [...] Format="_(* 0.00_);_(* (0.00);_(* -#_);_(@_)" That's a match for anything but zero.
We now have: </gnm:Style> - Format="_(* 0.00_);_(* (0.00);_(* "-"??_);_(@_)"> + Format="_(* 0.00_);_(* (0.00);_(* -??_);_(@_)"> So the only difference is whether we quote - or not.
The only effect of having quotes or not is to make us not recognize this as an accounting format in the gui. (I think -- haven't tested.)
This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for your bug report. If more problems arise I will open a new bug.