Bug 127250 – The new parser adds extra newlines

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 127250 - The new parser adds extra newlines


Summary:	The new parser adds extra newlines


Status:	RESOLVED FIXED

Product:	intltool
Classification:	Deprecated
Component:	general
Version:	unspecified
Hardware:	Other All

Importance:	Normal critical
Target Milestone:	---
Assigned To:	intltool maintainers
QA Contact:	intltool maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2003-11-18 01:05 UTC by Jody Goldberg
Modified:	2004-12-22 21:47 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
patch fixing problem (52.46 KB, patch) 2003-11-18 19:45 UTC, Brian Cameron	none	Details \| Review

Description Jody Goldberg 2003-11-18 01:05:01 UTC

<_Name>Generic PostScript</_Name>
is being converted to
<Name>
Generic Postscript
</Name>

This breaks libgnomeprint's data/models and gnumeric's plugin.xml files.

Comment 1 Kenneth Rohde Christiansen 2003-11-18 05:42:57 UTC

I really thought that wouldn't be an issue. Why does your application
depend on the "layout" of the XML and not just the actual data? How
are you parsing this?

Kenneth

Comment 2 Jody Goldberg 2003-11-18 05:49:53 UTC

libxml2 returns the value of the element as

/nfoo/n rather than foo

Which confuses the heck out of things that link to the data.
The first things that I noticed with this problem are gnome-print's
models, and gnumeric's plugins.

Comment 3 Kenneth Rohde Christiansen 2003-11-18 05:54:46 UTC

OK, we will have to see what we can do then. This needs to be fixed;
but apparently that wasn't so easy with the XML::Parser according to
Brian ;-( Any input, Brian?

Kenneth

Comment 4 Brian Cameron 2003-11-18 14:27:25 UTC

This cannot be fixed in intltool without doing evil and painful
hackery.  Obviously when you read XML into a parser, all 
whitespace is lost.  

This should be fixed in libxml2 so it does not return needless
whitespace.

Comment 5 Daniel Veillard 2003-11-18 15:42:20 UTC

> Obviously when you read XML into a parser, all whitespace is lost.

  hum, someone didn't do his homework
  http://www.w3.org/TR/REC-xml#sec-white-space

----------
An XML processor must always pass all characters in a document
that are not markup through to the application.
----------

  If you expect libxml2 to remove those \n, sorry this will never
happen. An no *compliant* XML parser should remove them either !
This is application specific, your application must handle this ...

Daniel

Comment 6 Brian Cameron 2003-11-18 16:08:25 UTC

I agree that the intltool-merge script is adding the \n character
at the end.  I could change this so that XML::Parser always displays
tags with CDATA as follows:

<tag>value</tag>

This makes the code more complicated since intltool-merge currently
doesn't know if a given tag has CDATA or internal tags, and you 
would only want to avoid printing the \n in the case where the
tag contains CDATA.  But if it makes things easier, I can go ahead
and make the change to intltool-merge.

Does this sound good?

Comment 7 Kenneth Rohde Christiansen 2003-11-18 16:15:03 UTC

yes, that sounds good. Remember to update the tests so that 'make
distcheck' will still pas

Comment 8 Brian Cameron 2003-11-18 19:45:17 UTC

Created attachment 21600 [details] [review]
patch fixing problem

Comment 9 Brian Cameron 2003-11-18 19:46:37 UTC

The attached patch fixes this problem, as can be seen in the
test output in the patch.  It turned out not to be so bad to
handle the data in this way.

Also, I am now properly handling decoding/encoding the string
before/after translating the string.  This wasn't being done
properly before.

Comment 10 Kenneth Rohde Christiansen 2003-11-18 20:23:16 UTC

Looks good - feel free to check in.

Kenneth

Comment 11 Brian Cameron 2003-11-18 22:46:12 UTC

checked into CVS head