GNOME Bugzilla – Bug 131585
Problems with XML attributes containing dashes
Last modified: 2004-12-22 21:47:04 UTC
I've been seeing problems with CVS intltool; the intltool-merge seems to be losing some XML attributes when merging. There seemed to be problems with attributes containing either internal dashes or underscores. So I hacked the test file tests/cases/merge4.xml.in to read: <sect testatt1="foo" test_att2="bar" test-att3="baz"> <_title.-_:>Fax</_title.-_:> </sect> The result, when I run tests/selftest is: <?xml version="1.0" encoding="UTF-8"?> <sect testatt1="foo" test_att2="bar"> <title.-_:>Fax</title.-_:> <title.-_: xml:lang="test">Translated Fax</title.-_:> </sect> i.e. the attribute test-att3 has been lost. I reversed the order of test_att2 and test-att3 and tried again; this time both attributes were lost (though testatt1 came through OK). So it looks like it's throwing away all attributes after (and including) one containing a dash in its name. I believe this is a bug: it's well-formed XML AFAIK; I can't read Perl so hopefully someone else can look into this and check that I'm not going mad. I believe it worked a month or so ago (last time I grabbed intltool from CVS) My doctype (for Conglomerate) uses attributes with names like these, so it's a major pain for me, currently blocking a release.
Some extra related information: On Tue, 2004-01-20 at 13:43 +0100, Alexander Larsson wrote: I'm having some pretty bad problems with the new XML::Parser version of intltool. Its seems that its not really using an XML parser, but instead doing things like: # Grab attribute key/value pairs and push onto @origlist array. # while ($source =~ s|^[ \t]*([\w:]+)[\t]*[=][\t]*["]([^"]*)["]||s) { push @origlist, $1; push @origlist, $2; This has some issues for me when building shared-mime-info: a) It removes any attributes using ' as quote b) It strips out multiple whitespaces inside an attribute value, breaking the mime file c) (part of a) really), doesn't handle " quotes inside '-quoted attributes
Created attachment 23609 [details] [review] patch with fix
This patch fixes the following problems: 1) Now allows attributes to be quoted with a single quote (') 2) Works fine if double-quotes are inside single-quotes ('text "text" text') or if single-quotes are inside double-quotes ("text 'text' text") 3) Allows attributes to have the dash (-) character.
The patch is straigh forward - looks fine to commit, so please go ahead! :-)
comitted
It still: * Removes comments * reorders attributes (not horrible, but strange) * adds space before \>, i.e. foo="bar"\> turns into foo="bar" \> (also just strange) * Compresses whitespace in attributes. E.g: <match type="string" mask="0xffffffff000000000000000000000000ff" value="\177ELF \004" offset="0"/> turns into: <match type="string" offset="0" mask="0xffffffff000000000000000000000000ff" value="\177ELF \004" /> Which is pretty bad.
Maybe it could just use the whole original string for tags that don't start with underscore?
That would be an idea yes - but won't that just hide the issues. They still need to get fixed. Then you could still write <_tag a1="string" a2="\177ELF \004"/> though, that might not be very well designed XML.
Created attachment 23646 [details] [review] patch fixing problem with spaces getting lost
I've attached a patch that fixes the problem of spaces getting lost in attributes and CDATA., which seems to be Alexander's only remaining serious problem.
Looks good - please commit.
Just committed fix so that intltool-merge no longer compresses whitespace in attribute values or CDATA sections. I think this bug can probably be marked as fixed. The only remaining problems (reordering of attributes and loss of comments) are probably not significant enough to keep this bug open. Whatever makes the most sense, though.
With CVS intltool files in the conglomerate tree, I get a clean output from the validate script in the examples directory. ( previous I got lots of element property-dialog: validity error : Element property-dialog does not carry attribute plugin-id those are gone ) I consider the bug also FIXED and marking it so. This bug has seen a REOPENED before, but I expect a VERIFIED next.
Second of the patches breaks intltool-merge when building gnome-desktop/gnome-version.xml (no translations are merged). Discussed at http://pdx.freedesktop.org/pipermail/intltool/2004-February/000029.html as well.
Hmmm. Is the problem with not compressing spaces in attribute values or in CDATA sections, or both? Kenneth's original problem was just with spaces being compressed in attribute values, but my patch perhaps went a little too far by also removing them from CDATA sections. So it would be nice if we only need to replace such compression in CDATA sections. Then I think everyone will be happy. Let me know and I am happy to fix the code.
Adding keywords. This needs to be fixed for 2.6.0.
The problem here seems to be twofold: - intltool-extract is not updated in the same way (so it extracts "compressed" strings into PO files) - this change is *incompatible* change, so doing this at this point is very bad -- we don't know what could all this affect, and string-freeze time in Gnome 2.6 is really not the right time for that (basically, untranslated XML files will pop out from all places) My suggestion would be to simply revert this change, and reconsider after 2.6.0 has been released (with a large *WARNING* sign to all developers using intltool for translating XML files).
Danilo can you answer Brians question? > Is the problem with not compressing spaces in attribute values or in CDATA sections, or both?
Now that I reread it, perhaps I didn't state the question too clearly. Kenneth wanted the code to not compress spaces in attributes, so I created a patch so spaces are no longer compressed in both attributes and CDATA sections. We can put back the space compression for CDATA sections and I think both Kenneth and Danilo would be happy. It's simply a matter of reversing this part of the last patch. @@ -850,9 +849,8 @@ sub parseTree # my $string = $sub; - $string =~ s/\s+/ /g; - $string =~ s/^ //; - $string =~ s/ $//; + $string =~ s/^[\s]+//; + $string =~ s/[\s]+$//; push(@translation_strings, $string);
Please try that Danilo and if everything works for you feel free to commit it to cvs.
Yes, that would do the work for CDATA sections (that's exactly what I posted in the mail referenced above). The thing is that I *do* know that it breaks CDATA translations, but I *don't* know if it breaks attribute translations. That's why I wasn't "brave" enough to say a definite "yes, only CDATA should not be compressed", so I said "at least CDATA should not be compressed". So, YES, we should revert that small part as soon as possible. (I got a collision with Kenneth when posting to Bugzilla [first time I saw this happen], so yes this does work -- I already said that on intltool list :)
I believe that reverting the portion of the patch quoted above will return xml-merge to the behavior that it had before the XML parser was added to the code. So people who want it to behave like "it used to" will appreciate the change. It was my error to "correct" the problem in both the CDATA and attribute sections when the prior bug was only complaining about a problem in the attribute section.
Brian, it's not necessarilly an error assuming that, but it's an error not changing intltool-extract to preserve spaces as well. For example, try the following with intltool-extract and -merge: <test> <_something>This bails out</_something> </test> intltool-extract would put "This bails out" into PO file, yet intltool-merge would look for translations of "This bails\nout" and wouldn't find them -- that's where the problem was (and, adjusting intltool-extract now would seriously damage the string freeze). Do you want me to commit this, or you'll go ahead? ;)
I wasn't suggesting that this needs to be fixed *right now*. Obviously, the commit may need to wait until various freezes are over and the time is right for such a change.
Brian, I think we're misunderstanding each other -- this (reverting those two/three lines) must be fixed right *NOW*, before 2.6 is released. intltool, as it stands, doesn't work correctly for gnome-desktop module, file gnome-version.xml.in.in. This means that Gnome 2.6 would have untranslated gnome-version.xml, which means that data in program gnome-about (right click on panel, then choose "About Gnome") will be untranslated. There are probably more instances of where this bug (compressing spaces in intltool-extract, but not in intltool-merge) comes up, but this is one very visible example.
Could you please commit this. I do not really have time at the moment. Thanks!
No problem, it's now fixed in CVS.
Is this supposed to be fixed now? HEAD still compresses whitespace for me, breaking the build of shared-mime-info. I.e. turning: <match mask="0xffffffff000000000000000000000000ff" offset="0" type="string" value="\177ELF \004" /> Into: <match mask="0xffffffff000000000000000000000000ff" offset="0" type="string" value="\177ELF \004" />
Alexander - I am confused. The merge6.xml test (run "make check" in intltool/tests directory) does a test to check this. The cases/merge6.xml.in file has this line: <foo value="<!DOCTYPE\ xbel" value-dash='testme' quotetest='a "quote" test' valuetesttwo="\177ELF \004"> And I notice that the spaces are not compressed in the output. I'm not sure why you are seeing this problem. Are you sure you are using CVS head? Does this test pass for you?
Oh, I had HEAD built, but the intltool copy in the module i was building was an old version. Sorry about that.
So is this resolved then? [It looks resolved to me, at least with the gnome-about problem.]
Yes, I think it's safe to mark it as "resolved/fixed" (I don't have enough Bugzilla privileges to do that myself).
Whoa! No way. Marking the bug FIXED, and giving you the permissions for bugzilla. Should have just asked, dude. :)