After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 127218 - new parser does not re-escape entities when merging
new parser does not re-escape entities when merging
Status: RESOLVED FIXED
Product: intltool
Classification: Deprecated
Component: general
unspecified
Other All
: Normal critical
: ---
Assigned To: intltool maintainers
intltool maintainers
Depends on: 116526
Blocks:
 
 
Reported: 2003-11-17 18:18 UTC by Jody Goldberg
Modified: 2004-12-22 21:47 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
now uses OrigTree style (1.50 KB, patch)
2003-11-18 15:54 UTC, Brian Cameron
none Details | Review
OrigTree.pm file (2.28 KB, text/plain)
2003-11-18 15:55 UTC, Brian Cameron
  Details
Use the standard Tree style instead of XML::Parser::Style::OrigTree (1.56 KB, patch)
2003-12-11 10:10 UTC, Denis Barbier
none Details | Review
patch fixing bug reported by jrb (26.61 KB, patch)
2003-12-18 19:08 UTC, Brian Cameron
none Details | Review
updated patch (9.25 KB, patch)
2004-01-05 19:13 UTC, Brian Cameron
none Details | Review

Description Jody Goldberg 2003-11-17 18:18:36 UTC
If the original text to translate contains an entity

eg 'Latex & HTML'

The parser unescapes things for translation but does not re-escape for merging.
Which results in
    <foo>Latex & HTML</foo>

and parse warnings
Comment 1 Jody Goldberg 2003-11-17 18:19:18 UTC
It seems caused by the patch for #116526
Comment 2 Kenneth Rohde Christiansen 2003-11-17 18:24:56 UTC
Okay, I am delaying the release before this is fixed. Will you take a
look Brian?
Comment 3 Jody Goldberg 2003-11-17 18:28:32 UTC
On a semi-related note.  I've noticed that

- intltool did not branch for gnome-2-4 before adding this major change
- intltool does not seem to tag CVS for its releases

Both are good release practices that you may want to adopt.
Comment 4 Kenneth Rohde Christiansen 2003-11-17 18:39:20 UTC
You're right. Next release will be CVS-tagged.
Comment 5 Brian Cameron 2003-11-17 23:23:14 UTC
Okay, there are a few ways we could address this problem.

1. We could preprocess the file and change Entity tags to
   something else before we parse it, and then change them
   back after the XML parser has processed the file. 

   For example, we could change &amp; to something like
   INTLTOOLENTITYSTARTampINTLTOOLENTITYEND (or somesuch)
   via regular expression before parsing the file.  Then
   change it back after the XML parser is done parsing.
   This is obviously a bit of a hack, but simple to implement
   if we can come up with keywords that we are sure would
   never be found in a real XML document.

2. If we are happy handling just the 5 Entities that are 
   Predefined in XML (&lt, &gt, &amp, &apos, &quot)., then
   we could just use the Expat::xml_escape function to
   convert them back into the &lt, &gt, &amp, &apos, &quot
   format.

3. If we are happy handling just unsafe HTML Entities, then
   we could use the HTML::Entities.encode_entities method on
   CDATA elements.  This does what #2 does and also handles
   some additional HTML Entities.

4. The most robust solution would be to define our own 
   XML::Filter Start, End, and Char handlers which work much
   the same way as the Tree style, but use 
   $expat->original_string() rather than using the UTF-8 output.
   This could be done by writing our own custom XML::Style based
   on the Tree style.  This means a slight variation of the
   Tree.pm file would become a part of the intltool project.

   However this technique works on all Entities *except* for
   the 5 Entities that are a part of the XML standard (&lt,
   &gt, &amp, &apos, &quot), so it would need to be used in 
   conjunction with solution #2.  

Let me know what solution works best for intltool-merge's needs,
and I'll go ahead and fix the code.

Comment 6 Kenneth Rohde Christiansen 2003-11-18 00:07:00 UTC
I would think that 2 is enought for our needs. If not 4 seems like the
best sollution. the pm modules are written in C and compiled right?

Kenneth
Comment 7 Brian Cameron 2003-11-18 15:53:35 UTC
Here is patch for solution #4.  I realize that my earlier analysis
was a bit wrong.  Solution #4 does properly handle the Predefined
XML tags, so it is not necessary to use xml_escape().  

Also, note that I needed to comment out the call to entity_decode()
on line 814.  This logic was in the code before I ever added the
XML::Parser logic.  I'm not sure how "&amp" was ever preserved in
the output with this function call.   I think that &apos, &quot,
&amp, and &lt were never preserved in escape-form.  Not sure what
is right here.

I'm also attaching the new "OrigTree.pm" file.  This is not c-code,
this is just a perl module that must be installed somewhere in
a directory named XML/Parser/Style/OrigTree.pm.  intltool-merge
will need to set @INC to include whatever directory we decide to
place this custom XML::Parser style.  The OrigTree.pm uses the
original_string() function so that it doesn't change any Entity
values.



Comment 8 Brian Cameron 2003-11-18 15:54:12 UTC
Created attachment 21591 [details] [review]
now uses OrigTree style
Comment 9 Brian Cameron 2003-11-18 15:55:02 UTC
Created attachment 21592 [details]
OrigTree.pm file
Comment 10 Kenneth Rohde Christiansen 2003-11-18 16:12:37 UTC
Jody, are you giving this a test run?

Brian, you should probably add a few tests so we won't break this
functionality again in the future.

Kenneth
Comment 11 Brian Cameron 2003-11-18 17:03:11 UTC
Kenneth, you never answered my question.  I believe that the old
logic was calling entity_decode() which would have always 
converted "&amp;" to "&" anyway.  So, Jody's original situation
would have failed with the old intltool-merge, unless I am missing
something.

What is wrong here.  Should the entity_decode function calls be
removed from intltool-merge?  They were there before the 
XML::Parser logic was added.
Comment 12 Kenneth Rohde Christiansen 2003-11-18 17:28:49 UTC
What I did was:

I looked at the xml.in file. Decoded &amp; to & etc - since that is
how they are store in the po files. ie. the xml string "blah &amp;
blah" is shows as "blah & blah" in the po file.

Then I lookup the translated string.

Then I encode the & to &amp; etc and store the "translatedblah &amp;
translatedblah" string in the xml output file.

If the Parser outputs escaped special letters (&amp; etc) then you
don't need this function and can safely remove it from the XML output.

Jody's example wouldn't have failed with the old method.

Does this enlighten you? Feel free to ask more.

Kenneth
Comment 13 Brian Cameron 2003-11-19 18:18:29 UTC
Okay, I just committed the fix for bug #127250 which will cause
my above patch to fail.  However, the change is so simple to make
I'm not sure a patch is really necessary.

You just need to change the line that says:

  my $xp = new XML::Parser(Style => 'Tree');

to

  my $xp = new XML::Parser(Style => 'OrigTree');

Install the OrigTree.pm file to a directory named 
(base_dir)/XML/Parser/Style, and add the following
line to the beginning of intltool-merge:

push(@INC, "(base_dir)/lib/");

Please test this, and verify that this fixes the problem.

Also, could somebody tell me where the XML/Parser/Style/OrigTree.pm
file should be installed?  Once this has been established, I can 
set up the Makefile to install it to the proper location.

Note that I am going on vacation for the next 1.5 weeks, so 
someone else may need to integrate this patch into intltool-merge
if you need it done sooner.

Comment 14 Brian Cameron 2003-11-19 18:21:56 UTC
I should mention that the issue with entity_decode was
fixed in bug #127250. 

Also, it would be useful if someone told me where the OrigTree.pm
file should be stored within the intltool project.  Should it
go in intltool/lib/perl/XML/Parser/Style?  I assume that
@INC should also include "./lib/perl" so that it works when 
running from the source directory as well?
Comment 15 Kenneth Rohde Christiansen 2003-11-19 18:49:47 UTC
we normally install intltool resources in PREFIX/share/intltool/

Kenneth
Comment 16 Brian Cameron 2003-11-19 19:00:21 UTC
Okay, but where should it go in the CVS module?
Comment 17 Kenneth Rohde Christiansen 2003-11-19 19:13:00 UTC
intltool/module ? intltool/perlmodule ? intltool/parser ?

It doesn't really matter so much - if you have better suggestions
please write them here.

Kenneth
Comment 18 Brian Cameron 2003-11-19 23:09:50 UTC
Note, I am going on vacation for the next 1.5 weeks, so someone
else will need to integrate this change into the code if it
needs to be included sooner than that.
Comment 19 Jody Goldberg 2003-11-20 16:16:44 UTC
Is it in CVS already ?

I tried cvs last night and things did not work.  ChangeLog suggested that some
of the patch was in.
Comment 20 Kenneth Rohde Christiansen 2003-11-20 16:21:35 UTC
No it is not in cvs yet. If it works for you we will commit it.

Kenneth
Comment 21 Brian Cameron 2003-12-02 18:15:38 UTC
Kenneth:

I understand from reading bug 116526 that you are having trouble
getting the OrigTree XML:Parser Style module to work.  What are
you seeing that is wrong, specifically?
Comment 22 Kenneth Rohde Christiansen 2003-12-02 18:38:27 UTC
I figured it out. It is all in cvs now and working AFAICT.

Please test this Jody - so we can close this bug.
Comment 23 Peter O'Shea 2003-12-09 16:40:19 UTC
The 0.28 release seems to almost fix this, except that the OrigTree.pm
module is installed where my Perl doesn't see it.

I have perl (along with all of gnome) installed locally, under $USER.
 The 0.28 release of intltool installed OrigTree.pm under
$USER/share/intltool/XML/Parser/Style, but perl doesn't find it, and I
get problems compiling gnome-vfs when it tries to use
intltool-merge...it complains about OrigTree.pm not being installed
properly.

If I put OrigTree.pm into:
$USER/lib/perl5/site_perl/5.8.1/sun4-solaris/XML/Parser/Style/, then
everything works fine.
Comment 24 Denis Barbier 2003-12-11 10:06:05 UTC
Hi,

I believe that OrigTree is useless, the Char handler can be redefined
with the Tree style with
  my $xp = new XML::Parser(Style => 'Tree');
  $xp->setHandlers(Char => \&intltool_tree_char);
where intltool_tree_char is the Char function in OrigTree.pm.

Denis
Comment 25 Denis Barbier 2003-12-11 10:10:26 UTC
Created attachment 22327 [details] [review]
Use the standard Tree style instead of XML::Parser::Style::OrigTree
Comment 26 Brian Cameron 2003-12-11 10:42:25 UTC
That's clever.  That is a much better solution to the same problem.
However, the intltoolize.in and modules/Makefile.am also contain
references to OrigTree.pm which need to be removed.
Comment 27 Kenneth Rohde Christiansen 2003-12-11 13:54:42 UTC
And you are saying that now :-) Please supply a patch that removes all
traces of OrigTree. Please check the ChangeLog to see what files that
has been modified.

This probably requires a specific version of XML::Parser. A check for
this has to be added to intltool.m4
Comment 28 Denis Barbier 2003-12-11 16:28:57 UTC
The XML::Parser changelog begins with
  2.33
      - Fixed Tree style (grantm)

I do not know whether these fixes are important.
XML::Parser::Style::Tree is less than 50 lines of code, maybe it
could be embedded into intltool-merge?

Denis
Comment 29 Jody Goldberg 2003-12-12 20:01:18 UTC
CVS still appears to generate raw '&' in spots.
Was this patch in 0.28 ?
Comment 30 Kenneth Rohde Christiansen 2003-12-12 20:16:12 UTC
it was
Comment 31 Jody Goldberg 2003-12-13 02:13:41 UTC
Something is still unhappy.

1) Before updating this morning raw ampersands were still being
generated in gnumeric/plugins/excel

2) After updating the source text was escaped, but no translations
appeared. (again in gnumeric/plugins/excel).
Comment 32 Jonathan Blandford 2003-12-16 07:23:32 UTC
I'm hitting intltool unescaping entities in attributes in the
freedesktop mime xml file.  For example:

      <match type="string" value="&lt;!DOCTYPE\ xbel" offset="0:64"/>
is going to:
      <match type="string" value="<DOCTYPE\ xbel" offset="0:64"/>

This is preventing a release of the new Mime database.
Comment 33 Kenneth Rohde Christiansen 2003-12-16 12:43:13 UTC
Brian will you please look into this? I am going on christmas holidays
tomorrow, so if you fix this later I will allow you to do a new
release afterward, since I wont be able to do it

Kenneth
Comment 34 Brian Cameron 2003-12-18 19:07:56 UTC
jrb can you test the attached patch.  I think this resolves your
issues.
Comment 35 Brian Cameron 2003-12-18 19:08:36 UTC
Created attachment 22549 [details] [review]
patch fixing bug reported by jrb
Comment 36 Denis Barbier 2003-12-20 15:22:32 UTC
Brian, why did you reindent the whole file?
When applied, I will have to fix my patch sent to #123911 :(

I tried to reproduce Jody's problems with gnumeric/plugins/excel.
My test shows that it worked fine with intltool 0.28, but CVS is
broken (with or without your latest patch).  Translations are not
incorporated, but if I manually replace
  msgid "MS Excel (tm) 97/2000/XP & 5.0/95"
by
  msgid "MS Excel (tm) 97/2000/XP &amp; 5.0/95"
in PO files, translated tag is displayed again.  This means that
entities should be decoded before being compared to msgids.
Comment 37 Brian Cameron 2003-12-22 18:40:17 UTC
Okay, I'm pretty sure that I just fixed the bug with intltool-merge
not properly translating strings.  Please verify.

Sorry about the re-indenting.  I noticed that there was a lot
of inconsistancy in the code regarding using space vs. tab for
indentation.  Since tabs were only used in a few places, I replaced
them with spaces.  Is it okay if you rebuild the patch for #123911?
I think the code is much cleaner looking with consistant spacing.
Comment 38 Brian Cameron 2003-12-22 18:41:19 UTC
Note that I am going on vacation starting tomorrow, and will not
be back into the office until January 5th.  Therefore, let me know
if my patch above works okay.  If so, I'll apply it to CVS.  
Otherwise, someone else will need to integrate it.
Comment 39 Denis Barbier 2003-12-23 00:01:05 UTC
AFAICT you fixed the bug reported by jrb, not Jody's one.

To test it, I downloaded gnumeric-1.2.2 and removed all po/*.po
(except po/fr.po), otherwise test case took a long time on my old
box.  Eventually I ran
  $ intltool-merge -x . ../plugins/excel/plugin.xml.in out
and looked into 'out' file, with several intltool versions.
With CVS (your patch being applied or not), translations are discarded
when text contains an ampersand, as in the last <description> tag:
    <service type="file_saver" id="excel_biff8" file_extension="xls">
      <information>
        <description>MS Excel (tm) 97/2000/XP</description>
        <description xml:lang="fr">MS Excel(tm) 97/2000/XP</description>
      </information>
    </service>
    <service type="file_saver" id="excel_dsf" file_extension="xls">
      <information>
        <description>MS Excel (tm) 97/2000/XP &amp; 5.0/95</description>
      </information>
    </service>
This entry is translated in fr.po.

About indentation, of course I agree that being consistent is much
better, but IMO this can be fixed when no patches are sleeping in
bugzilla.  But if this is necessary, I will provide an updated patch.
Comment 40 Geert Stappers 2004-01-05 18:36:01 UTC
Added the conglomerate bugtracking mailing list to the CC list.
Comment 41 Brian Cameron 2004-01-05 19:13:26 UTC
I updated the patch that fixes jrb's problem so it applies to CVS
head.  This patch should really go in, since it fixes some serious
problems:

1) It retains the DOCTYPE from the original document in the
   output.
2) It translates entities in attributes properly (see the
   intltool_tree_start function
3) No longer uses the OrigTree Style.  Now just uses Tree and
   overrides the start and char handlers.  Much cleaner.

I updated the patch so it no longer changes white-space.

Denis, I am a bit confused by your problem.  I copied the following
lines into the intltool/tests/cases/merge6.po file:

msgid "MS Excel (tm) 97/2000/XP & 5.0/95"
msgstr "worked - MS Excel(tm) 97/2000/XP & 5.0/95"

And then I added the following test line to the
intltool/tests/cases/merge6.xml.in file:

  <_test>MS Excel (tm) 97/2000/XP &amp; 5.0/95</_test>

I then run "make check" in the intltool/tests directory.  Note
that it complains that test #10 (merge6.xml) fails if you don't
also update the intltool/tests/results/merge6.xml file that
it compares against, but you can just ignore the "failure"
message.

Anyway, after running "make check", I see the following 2 lines
in the intltool/tests/cases/merge6.xml output file 

  <test>MS Excel (tm) 97/2000/XP &amp; 5.0/95</test>
  <test xml:lang="merge6">worked - MS Excel(tm) 97/2000/XP &amp;
5.0/95</test>

This seems to look right to me.  Are you sure that the problem
isn't on your end, or that one of the recent fixes to intltool
didn't correct your problem?
Comment 42 Brian Cameron 2004-01-05 19:13:57 UTC
Created attachment 22950 [details] [review]
updated patch
Comment 43 Kenneth Rohde Christiansen 2004-01-05 19:32:23 UTC
Please commit - it looks fine. Are you going to do anything about
comments?

Kenneth
Comment 44 Denis Barbier 2004-01-05 19:50:49 UTC
Brian, I am all for your patch.  It fixes some bugs, but from my tests not this one.
Your test seems to show that I am wrong, I will try again after you commit your patch.

Denis
Comment 45 Brian Cameron 2004-01-05 20:16:17 UTC
patch applied.  Denis, please test.
Comment 46 Brian Cameron 2004-01-05 20:32:47 UTC
Note that my latest patch removed all references to OrigTree.pm, but
I did not delete the OrigTree.pm file from the module.  Should this
be done?
Comment 47 Kenneth Rohde Christiansen 2004-01-05 20:36:05 UTC
I guess so...too bad, now a lot of modules need to removed OrigTree
from EXTRA_DIST. Can you up the intltool version after this change?

Didn't Jrb make a test case? If so, maybe this one should be committed
Comment 48 Denis Barbier 2004-01-05 23:30:31 UTC
The test mentioned above works now just fine, so I believe that you fix all bugs mentioned in this bugreport.

Denis
Comment 49 Kenneth Rohde Christiansen 2004-01-06 15:57:46 UTC
I am trusting you and closing this. Re-open if I am closing this too
early ;-)
Comment 50 Geert Stappers 2004-01-09 00:45:27 UTC
Will there be tarball released with the patch included?
Comment 51 Jody Goldberg 2004-01-09 02:21:59 UTC
gnumeric head and 1.2.x are happy with this patch, as is libgnomeprint.
Comment 52 Kenneth Rohde Christiansen 2004-01-09 13:51:48 UTC
Ja, maar je moet even een paar weekjes wachten.
(Yes, but you have to wait a couple of weeks.)