After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 725852 - ods format roundtrip problems
ods format roundtrip problems
Status: RESOLVED FIXED
Product: Gnumeric
Classification: Applications
Component: import/export OOo / OASIS
git master
Other All
: Normal normal
: ---
Assigned To: Andreas J. Guelzow
Jody Goldberg
Depends on:
Blocks:
 
 
Reported: 2014-03-06 21:31 UTC by Morten Welinder
Modified: 2014-05-07 15:41 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
goffice patch (2.24 KB, patch)
2014-03-20 03:52 UTC, Andreas J. Guelzow
none Details | Review
gnumeric patch (1.25 KB, patch)
2014-03-20 03:52 UTC, Andreas J. Guelzow
none Details | Review
sample gnumeric file (2.16 KB, application/x-gnumeric)
2014-03-26 04:03 UTC, Andreas J. Guelzow
  Details

Description Morten Welinder 2014-03-06 21:31:49 UTC
New t6512:

It looks like invisible alignment characters -- "_)" here -- are lost.
Is that our problem or the format's problem?


-          <gnm:Style HAlign="GNM_HALIGN_GENERAL" VAlign="GNM_VALIGN_BOTTOM" WrapText="0" ShrinkToFit="0" Rotation="0" Shade="0" Indent="0" Locked="1" Hidden="0" Fore="0:0:0" Back="FFFF:FFFF:FFFF" PatternColor="0:0:0" Format="0.0000_);[Red](0.0000)">
+          <gnm:Style HAlign="GNM_HALIGN_GENERAL" VAlign="GNM_VALIGN_BOTTOM" WrapText="0" ShrinkToFit="0" Rotation="0" Shade="0" Indent="0" Locked="1" Hidden="0" Fore="0:0:0" Back="FFFF:FFFF:FFFF" PatternColor="0:0:0" Format="0.0000;[Red](0.0000)">
Comment 1 Morten Welinder 2014-03-06 21:40:45 UTC
It also looks like it cannot tell how many digits are required in scientific
formats' exponent part.
Comment 2 Andreas J. Guelzow 2014-03-06 23:39:32 UTC
ODF has number:min-exponent-digits so the problem of comment #1 should be ours.

The "_)" may require some workaround.
Comment 3 Andreas J. Guelzow 2014-03-07 05:58:39 UTC
I have fixed the "_)" issue.
Comment 4 Andreas J. Guelzow 2014-03-07 06:03:29 UTC
There seems to be an off-by-one problem with regard to the number:min-exponent-digits
Comment 5 Andreas J. Guelzow 2014-03-07 07:39:44 UTC
This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.

Note that some of the fixes are in goffice.
Comment 6 Morten Welinder 2014-03-07 14:03:47 UTC
On the read side for invisible text, can we get more than one character?
Or an empty string for that matter?

If so, we need to create something like _T_e_x_t.


Unrelatedly, I added more tests.  I hope ods can handle it, because
it's a fairly common class of format.  "00000-0000" would be an
extended US zip code, for example.


-          <gnm:Style HAlign="GNM_HALIGN_GENERAL" VAlign="GNM_VALIGN_BOTTOM" WrapText="0" ShrinkToFit="0" Rotation="0" Shade="0" Indent="0" Locked="1" Hidden="0" Fore="0:0:0" Back="FFFF:FFFF:FFFF" PatternColor="0:0:0" Format="000-000-000">
+          <gnm:Style HAlign="GNM_HALIGN_GENERAL" VAlign="GNM_VALIGN_BOTTOM" WrapText="0" ShrinkToFit="0" Rotation="0" Shade="0" Indent="0" Locked="1" Hidden="0" Fore="0:0:0" Back="FFFF:FFFF:FFFF" PatternColor="0:0:0" Format="000&quot;--&quot;">
Comment 7 Andreas J. Guelzow 2014-03-07 15:41:17 UTC
ODF does not support invisible text. (LO converts invisible text that one can define in their interface into blanks when writing files.) So we are using a foreign element (gnm:invisible). Since we are writing one such element per invisible character, _T_E_X_T should happen automatically. I am still thinking about how to best export it in a way that LO sees a space.

ODF data formats are very different (and much more restricted) than the format strings we use. We of course could just hide our format string in teh ODF file and round trips would all work magically, but I do not think that is the point. So I really think we should restrict round-trip testing to formats that Gnumeric has an interface for (i.e. 'custom' does not count).
Comment 8 Morten Welinder 2014-03-07 15:56:37 UTC
I'm all for restricting roundtrip to a reasonable set.  That is, in fact,
why I didn't just take formats.xls for this.  Feel free to complain if
any particular one is too strange and too much work.

But there are sensible formats that we call "custom" I think we should handle.
One such is 000-000-0000 which LO seems to handle just fine:

    <number:number-style style:name="N120">
      <number:number number:decimal-places="0" number:min-integer-digits="10">
        <number:embedded-text number:position="7">-</number:embedded-text>
        <number:embedded-text number:position="4">-</number:embedded-text>
      </number:number>
    </number:number-style>


Basically t6512 only makes sense for ods.  In the case of gnumeric/xls/xlsx
we just store the format string raw in the file, so it's no wonder it
roundtrip just fine.  I agree that we do not want to do that for ods because
it doesn't help us read LO-generated files.
Comment 9 Morten Welinder 2014-03-07 16:01:38 UTC
There are ~98k google hits on just this example:

https://www.google.com/#q=%22000-000-0000%22+excel+format&safe=off

Note also the use of "(000) 000-0000" for phone numbers.
Comment 10 Andreas J. Guelzow 2014-03-07 16:04:39 UTC
So if these formats are used that often, why don't we have a proper interface for them? I believe Excel has a localized special formats list. (This of course has little to do with the round trip issue.)
Comment 11 Morten Welinder 2014-03-07 16:51:17 UTC
Because we pretty much suck at GUIs.  Also, I'm not sure how we would
communicate what we wanted to the translators.  Regrettably we cannot just
call an all-hands meeting with the translation department.

Excel/US has four formats in the "Special" section:

00000                                  [zip]
00000-0000                             [zip+4]
[<=9999999]###-####;(###) ###-####     [Phone]
000-00-0000                            [ssn]

The format dialog is a particularly difficult beast because of its size
and the hard balance between completeness and usability.  I sometimes feel
half the stuff ought to be hidden until an "Advanced" button is pressed.
Fractions would certain go in there.
Comment 12 Andreas J. Guelzow 2014-03-07 20:26:49 UTC
And in Canada we have only 2 items :-(

000 000 000                          [sin]
[<=9999999]###-####;###-###-####     [Phone]
Comment 13 Morten Welinder 2014-03-08 16:09:39 UTC
The use of plain %g causes truncation of values used in conditions.


-          <gnm:Style HAlign="GNM_HALIGN_GENERAL" VAlign="GNM_VALIGN_BOTTOM" WrapText="0" ShrinkToFit="0" Rotation="0" Shade="0" Indent="0" Locked="1" Hidden="0" Fore="0:0:0" Back="FFFF:FFFF:FFFF" PatternColor="0:0:0" Format="[=0.0009765625][Red]General;General">
+          <gnm:Style HAlign="GNM_HALIGN_GENERAL" VAlign="GNM_VALIGN_BOTTOM" WrapText="0" ShrinkToFit="0" Rotation="0" Shade="0" Indent="0" Locked="1" Hidden="0" Fore="0:0:0" Back="FFFF:FFFF:FFFF" PatternColor="0:0:0" Format="[=0.000976562]0.00;[&lt;0]&quot;−&quot;0.00">
Comment 14 Andreas J. Guelzow 2014-03-20 03:52:19 UTC
Created attachment 272456 [details] [review]
goffice patch

This patch to goffice together with the next patch for gnumeric fixes the use of %g in format conditions problem. I haven't committed the since they change the goffice api.
Comment 15 Andreas J. Guelzow 2014-03-20 03:52:54 UTC
Created attachment 272457 [details] [review]
gnumeric patch

corresponding gnumeric patch
Comment 16 Morten Welinder 2014-03-20 13:03:26 UTC
Hmm...  I would rather that goffice figured out the required number of
digits by itself.
Comment 17 Andreas J. Guelzow 2014-03-21 04:46:23 UTC
The %g in format conditions problem is fixed in goffice (we are always using 'double' in format conditions).
Comment 18 Morten Welinder 2014-03-24 17:16:39 UTC
Something broken here.  A literal "%g" is left in the condition.

"value()%g>=0.12345" cannot be right.
Comment 19 Morten Welinder 2014-03-24 17:52:45 UTC
And dropping that %g somehow causes loss of some conditions in t6512.
Strange.
Comment 20 Morten Welinder 2014-03-25 23:18:27 UTC
I'll have a go at the conditional formatting.  There's some misunderstanding
of the formats in there.
Comment 21 Andreas J. Guelzow 2014-03-25 23:22:33 UTC
What problem do you see. I think the issue raised by t6512 has mostly to do with 'General'.
Comment 22 Morten Welinder 2014-03-26 02:22:09 UTC
The code was assuming that the first part of a conditional format was
for positive numbers, the second for negative numbers, the third for zero,
and the forth for everything else.

That's the default, but only the default.  Things like "[>1000][Red]0.00;0.00"
are not that uncommon.

I have simplified stuff a lot, but letting GOFormat take care of all the
conditional stuff.

1. We do not handle colours for general: "[Red]General"

2. Something is strange with inhibiting minus.  I *think* LO does it
   when a style map has been used.  We have code to add a display factor
   of -1, but that isn't right for formats like "[>-12]0.00;0.00".

3. We might have issues with no parts matching.  More tests needed.
Comment 23 Andreas J. Guelzow 2014-03-26 04:03:00 UTC
Created attachment 272954 [details]
sample gnumeric file

LO seems to like the export of this file to ODF using the new code even less than the previous.

Gnumeric does not seem to prefer one over the other but still doesn't do a great job at reading this.
Comment 24 Morten Welinder 2014-03-26 11:21:22 UTC
We'll see if I can sort it out; otherwise the wonders of version control
makes it possible to roll back.
Comment 25 Morten Welinder 2014-03-26 12:20:55 UTC
We are in better shape now and the sample from comment 23 is handled with
a result that looks like what LO generates in this way:
              
cond.gnumeric ---(ssconvert)---> cond.xls ---(localc)---> cond.ods
Comment 26 Morten Welinder 2014-03-27 13:04:41 UTC
I copied the handling of "General" from LO.  The only thing failing now
is the 000-00-0000 type of format, although that could change as I
improve tests to include more of the standard formats.
Comment 27 Andreas J. Guelzow 2014-03-31 07:42:08 UTC
Hmm, running t6512 for me now results in:

t6512-format.pl: Check format gnumeric roundtrip.
--- format-tests.xml	2014-03-31 01:41:46.873571588 -0600
+++ format-tests-new.xml	2014-03-31 01:41:46.885571589 -0600
@@ -40,7 +40,7 @@
     <gnm:Sheet DisplayFormulas="0" HideZero="0" HideGrid="0" HideColHeader="0" HideRowHeader="0" DisplayOutlines="1" OutlineSymbolsBelow="1" OutlineSymbolsRight="1" Visibility="GNM_SHEET_VISIBILITY_VISIBLE" GridColor="0:0:0">
       <gnm:Name>Sheet1</gnm:Name>
       <gnm:MaxCol>3</gnm:MaxCol>
-      <gnm:MaxRow>95</gnm:MaxRow>
+      <gnm:MaxRow>94</gnm:MaxRow>
       <gnm:Zoom>1</gnm:Zoom>
       <gnm:Names>
         <gnm:Name>
diff died due to signal 1
Comment 28 Morten Welinder 2014-03-31 11:55:39 UTC
Ah, will fix.  It's irrelevant and just requires a re-save of the .gnumeric
Comment 29 Morten Welinder 2014-04-01 01:13:14 UTC
We now handle 000-000-000.

On the downside, I have temporarily broken invisible text support.
Andreas: is it too late to change how we store that?

I'd like to simply use a gnm:invisible attribute on number:text or
number:embedded text.
Comment 30 Andreas J. Guelzow 2014-04-01 04:18:08 UTC
I don't think it is to late to change how we store invisible text, I am not sure what you mean with simply using a gnm:invisible attribute on number:text or number:embedded text:
If you think about just flagging the text as invisible that would be a really bad idea: other ODF consumers will ignore that attribute and so show the text. We basically have to store the invisible text in a way that other consumers see blanks (or so) while we know what the true characters should be.
Comment 31 Morten Welinder 2014-04-01 14:58:41 UTC
I was thinking of something like this:

    <number:text gnm:invisible="("> </number:text>

or

    <number:embedded-text number:position="7" gnm:invisible="-"> </number:embedded-text>

(I hope that survives bugzilla.  I both cases we have just whitespace
between the tags.)

The idea is that LO can process it and just see " " while we can get the
attribute and make the right format.
Comment 32 Andreas J. Guelzow 2014-04-01 15:41:08 UTC
I do not think that the ODF schema allows adjacent number:text in case one has visible text next to invisible text, so that may not work.

In the embedded text situation, the ODF schema does not provide any mechanism to order the effect of several embedded-text elements pointing to the same position. So again there are difficulties with invisible and visible text that is adjacent.
Comment 33 Morten Welinder 2014-04-01 17:15:42 UTC
Speaking of which...  our current import code only handles multiple
embedded-text items if they happen to arrive in the right order.

A number format, according to the schema, can have precisely two number:text
items: one before and one after.  So the current invisible handling must
be wrong too.  That means we really do have to specify this differentlu.
For example,

<number:text gnm:invisible="4,&amp;(&amp;,10,&amp;)&amp;">moo, oink, baa</number:text>

meaning the space at character offset 4 is an invisible "(" and the space
at position 10 is an invisible ")".

Alternatively:

  <number:text>
    moo, oink, baa
    <gnm:invisible position="4" text="("/>
    <gnm:invisible position="10" text=")"/>
  </number:text>

(with extra whitespace only for readability here).  If you ignore the gnm:invisible you get space.
Comment 34 Morten Welinder 2014-04-03 00:50:33 UTC
FAIL: t6104-finfuns-ods.pl

(I'm guessing caused by embedded text changes.)
Comment 35 Andreas J. Guelzow 2014-04-03 03:21:29 UTC
Regarding: A number format, according to the schema, can have precisely two number:text items: one before and one after.  So the current invisible handling must be wrong too.

The current gnm-invisible handling might have been incompletely or incorrectly implemented, but the idea works in general: The gnm:invisible element acts by replacing the characters preceding it in the surrounding number:text with invisible characters, ie. it acts like

  <number:text>
    moo, oink, baa
    <gnm:invisible position="4" text="("/>
    <gnm:invisible position="10" text=")"/>
  </number:text>

except that the position is determined by its location inside number:text.
Comment 36 Andreas J. Guelzow 2014-04-19 15:26:55 UTC
I don't see any failures with t6104:

$ jhbuild run ./t6104-finfuns-ods.pl 
-------------------------------------------------------------------------------
t6104-finfuns-ods.pl: Check the ods exporter.
Pass
Comment 37 Andreas J. Guelzow 2014-04-19 15:28:50 UTC
Of course t6512 is failing because the invisible text part is not working anymore.
Comment 38 Morten Welinder 2014-04-19 15:35:18 UTC
Right.  That fell off the radar somehow.
Comment 39 Morten Welinder 2014-04-19 19:11:46 UTC
We now handle invisible characters again.  That leaves this:

- [...] Format="_(* 0.00_);_(* (0.00);_(* &quot;-&quot;??_);_(@_)"
+ [...] Format="_(0.00_);_((0.00);_(-# ;@"

Problem 1: we have no idea what to do with "* " here
Problem 2: we cannot have invisible characters for text format.
Problem 3: quoting "-" or not.  (Not a functional problem.)
Comment 40 Andreas J. Guelzow 2014-04-19 19:24:13 UTC
Problem 2: we cannot have invisible characters for text format.

Why not? number:text-style can have number:text as a child element. I assume we are still having the invisible characters stored inside number:text.


Problem 3: quoting "-" or not.  (Not a functional problem.)

inside ODF this should not be an issue.
Comment 41 Morten Welinder 2014-04-20 22:12:38 UTC
RE problem 2: we will shortly have code to generate this:

    <number:text-style style:name="ND-19">
      <number:text>
        <gnm:invisible gnm:char="(" office:process-content="true"> </gnm:invisible>
      </number:text>
      <number:text-content/>
      <number:text>
        <gnm:invisible gnm:char=")" office:process-content="true"> </gnm:invisible>
      </number:text>
      <style:map style:condition="value()&gt;0" style:apply-style-name="ND-19-0"/>
      <style:map style:condition="value()&lt;0" style:apply-style-name="ND-19-1"/>
      <style:map style:condition="value()=0" style:apply-style-name="ND-19-2"/>
    </number:text-style>

We seem to read the text-part of this as " @ ", though.
Comment 42 Morten Welinder 2014-04-20 22:16:45 UTC
- [...] Format="_(* 0.00_);_(* (0.00);_(* &quot;-&quot;??_);_(@_)"
+ [...] Format="_(0.00_);_((0.00);_(-# ; @ "
Comment 43 Morten Welinder 2014-04-20 22:26:25 UTC
And now we read it:

- [...] Format="_(* 0.00_);_(* (0.00);_(* &quot;-&quot;??_);_(@_)"
+ [...] Format="_(0.00_);_((0.00);_(-# ;_(@_)"
Comment 44 Andreas J. Guelzow 2014-04-20 22:43:04 UTC
 I can't find any support for '*' in ODF, so at this time we may want to add an extension gnm:repeat-char or so...
Comment 45 Andreas J. Guelzow 2014-04-20 22:52:18 UTC
Unfortunately LO does not seem to understand office:process-content=TRUE.
Comment 46 Morten Welinder 2014-04-20 23:42:41 UTC
Hmm...  How can we make both LO and something that does understand
office:process-content happy?
Comment 47 Andreas J. Guelzow 2014-04-21 00:39:44 UTC
That's how I ended up with putting what should be the 'content' of the element in front of the element, so everybody will process it and when we handle the gnm:invisible element we know to delete the appropriate number of characters that preceded the element.
Comment 48 Morten Welinder 2014-04-21 01:13:00 UTC
I have moved the space back outside.  That's a fairly gross hack.
Comment 49 Andreas J. Guelzow 2014-04-21 01:14:47 UTC
I agree that that is awful, but I couldn't come up with a better way that LO still understood.
Comment 50 Andreas J. Guelzow 2014-04-21 17:45:05 UTC
Why are we checking:

		case TOK_INVISIBLE_CHAR: {
			size_t len = g_utf8_next_char(token + 1) - (token + 1);
			if (len > 0) {
				if (phase != 1 && with_extension) {


We should write the invisible characters alsowhen we are in phase =1. With this check we get:

'_(* &quot;-&quot;??_)'   --->    '_(-# '
Comment 51 Morten Welinder 2014-04-21 18:12:32 UTC
The problem here is failure to properly detect end of phase 1/2.  "_)"
should be in phase 3.  A true phase-1 case would be 000_-000_-000 and
I really don't know what to write for that.
Comment 52 Andreas J. Guelzow 2014-04-21 19:13:33 UTC
I think the obvious choice would be to use the gnm:invisible element inside the number:embedded-text element in exactly the way we are already using it inside number:text. In both cases we have character content and some of those symbols should be invisible.
Comment 53 Morten Welinder 2014-04-22 01:10:52 UTC
We're down to this:

- [...] Format="_(* 0.00_);_(* (0.00);_(* &quot;-&quot;??_);_(@_)"
+ [...] Format="_(0.00_);_((0.00);_(-#_);_(@_)"
Comment 54 Andreas J. Guelzow 2014-04-22 01:59:41 UTC
LibreOffice saves ?? as #, so currently we are matching that.

Nevertheless I think we can do better by adding a
gnm:min-integer-characters
attribute to number:number. In connection with 
number:min-integer-digits
this would allow us to store things like
??000
as
gnm:min-integer-characters="5" number:min-integer-digits="3"
Comment 55 Morten Welinder 2014-04-22 12:04:39 UTC
- [...] Format="_(* 0.00_);_(* (0.00);_(* &quot;-&quot;??_);_(@_)"
+ [...] Format="_(* 0.00_);_(* (0.00);_(* -#_);_(@_)"

That's a match for anything but zero.
Comment 56 Andreas J. Guelzow 2014-04-24 06:42:13 UTC
We now have:
           </gnm:Style>
- Format="_(* 0.00_);_(* (0.00);_(* &quot;-&quot;??_);_(@_)">
+ Format="_(* 0.00_);_(* (0.00);_(* -??_);_(@_)">

So the only difference is whether we quote - or not.
Comment 57 Morten Welinder 2014-04-24 13:01:58 UTC
The only effect of having quotes or not is to make us not recognize this as
an accounting format in the gui.  (I think -- haven't tested.)
Comment 58 Morten Welinder 2014-05-07 15:41:32 UTC
This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for your bug report.

If more problems arise I will open a new bug.