After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 309215 - xmllint garbles output from UFT-16 DocBook xml-files
xmllint garbles output from UFT-16 DocBook xml-files
Status: VERIFIED NOTABUG
Product: libxml2
Classification: Platform
Component: general
2.6.19
Other All
: Normal blocker
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2005-06-30 15:54 UTC by Kenneth Johansson
Modified: 2009-08-15 18:40 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Our main build makefile which includes xmllint usage (16.77 KB, text/plain)
2005-07-01 14:16 UTC, Kenneth Johansson
Details
Zip-file containing the Japanese UTF-16 files and log files (274.81 KB, application/x-compressed-tar)
2005-07-01 14:42 UTC, Kenneth Johansson
Details
Broken UTF-16 content (5.81 KB, text/xml)
2005-07-04 14:19 UTC, Kenneth Johansson
Details
The correct UTF-16 output. (5.72 KB, text/xml)
2005-07-04 14:19 UTC, Kenneth Johansson
Details

Description Kenneth Johansson 2005-06-30 15:54:22 UTC
Please describe the problem:
I use xmllint as part of a toolchain in a DocBook CHM and PDF build environment.
xmllint is used in two parts, first to expand all xincluded xml-modules, and
second to validate the xml-content. 

When expanding xincludes from files with UTF-16 content, xmllint produces output
which is clearly broken Unicode. We (Bob Stayton and I) have verified that
xmllib2 works since Bob tested with xsltproc --xinclude and it produced correct
content, hence we've assumed that xmllint is flawed somewhere. 

This problem is a showstopper for us and I'd appreciate if I can get fast help
with this.

Steps to reproduce:
1. By using xmllint together with my customized stylesheets and UTF-16 xml-files.
2. 
3. 


Actual results:
I get an error message further down the toolchain:

Error on line 1 column 36 of
file:/c:/work/10.1/PACS/PACS_IDS/si_doc/pms_japanese/viewer/expanded.xml:
  Error reported by XML parser: unexpected character after document prolog
(found "?") (expected "<")

Transformation failed: Run-time errors were reported

Saxon which handles profiling cannot understand the file expanded.xml since the
content is broken.

Expected results:
An xml-file with correct UTF-16 content should be produced which should be
profiled and built into a PDF or CHM.

Does this happen every time?
Yes.

Other information:
Yes, I have a screenshot of the broken unicode content. Here's the xincluded
file with broken content (part of the file):

<?xml version="1.0" encoding="utf-16"?>਍㰀℀䐀伀䌀吀夀倀䔀 戀漀漀欀 倀唀䈀䰀䤀䌀 
∀ⴀ⼀⼀伀䄀匀䤀匀⼀⼀䐀吀䐀 䐀漀挀䈀漀漀欀 堀䴀䰀 嘀㐀⸀㈀⼀⼀䔀一∀ ∀栀琀琀瀀㨀
⼀⼀眀眀眀⸀漀愀猀椀猀ⴀ漀瀀攀渀⸀漀爀最⼀搀漀挀戀漀漀欀⼀砀洀氀⼀㐀⸀㈀⼀搀漀挀戀漀
漀欀砀⸀搀琀搀∀ 嬀ഀ
<!ENTITY uArr "⇑">਍㰀℀䔀一吀䤀吀夀 栀挀椀爀挀 ∀─∁㸀ഀ
<!ENTITY icirc "î">਍㰀℀䔀一吀䤀吀夀 攀焀甀愀氀猀 ∀㴀∀㸀ഀ
<!ENTITY cong "≅">਍㰀℀䔀一吀䤀吀夀 䠀䄀刀䐀挀礀 ∀⨀∄㸀ഀ
<!ENTITY icy "и">਍㰀℀䔀一吀䤀吀夀 䔀挀愀爀漀渀 ∀ᨀ∁㸀ഀ
<!ENTITY clubs "♣">਍㰀℀䔀一吀䤀吀夀 瀀栀洀洀愀琀 ∀㌀∡㸀ഀ
<!ENTITY sqcap "⊓">਍㰀℀䔀一吀䤀吀夀 琀栀漀爀渀 ∀︀∀㸀ഀ
<!ENTITY Lcedil "Ļ">਍㰀℀䔀一吀䤀吀夀 爀愀爀爀 ∀鈀∡㸀ഀ
<!ENTITY verbar "|">਍㰀℀䔀一吀䤀吀夀 挀椀爀攀 ∀圀∢㸀ഀ
<!ENTITY DZcy "Џ">਍㰀℀䔀一吀䤀吀夀 戀⸀搀攀氀琀愀 ∀㔀엘⋞㸀ഀ
<!ENTITY Gcirc "Ĝ">਍㰀℀䔀一吀䤀吀夀 漀挀椀爀 ∀騀∢㸀ഀ
<!ENTITY circ "ˆ">਍㰀℀䔀一吀䤀吀夀 䤀最爀 ∀餀∃㸀ഀ
<!ENTITY udigr "ϋ">਍㰀℀䔀一吀䤀吀夀 瀀爀椀洀攀 ∀㈀∠㸀ഀ
<!ENTITY npr "⊀">਍㰀℀䔀一吀䤀吀夀 戀⸀瀀椀 ∀㔀퇘⋞㸀ഀ
<!ENTITY frac58 "⅝">਍㰀℀䔀一吀䤀吀夀 氀搀焀甀漀爀 ∀Ḁ∠㸀ഀ
<!ENTITY sqsup "⊐">਍㰀℀䔀一吀䤀吀夀 戀漀砀䐀刀 ∀吀∥㸀ഀ
<!ENTITY kcedil "ķ">਍㰀℀䔀一吀䤀吀夀 瘀䐀愀猀栀 ∀ꠀ∢㸀ഀ
<!ENTITY Scedil "Ş">਍㰀℀䔀一吀䤀吀夀 瀀攀爀瀀 ∀ꔀ∢㸀ഀ
<!ENTITY b.Gamma "
Comment 1 Daniel Veillard 2005-06-30 16:16:38 UTC
In the current state, your bug report does not give me anything to try
to reproduce and assert a problem. It's completely unworkable for me,
I can't even look at the bug, so sorry in that state and without
   - all the input needed provided a bug attachements
   - the command line used in xmllint to generate the problem
there is no way I will look further at this bug report, because there
is just nothing I can do.
 Guidelines are there for reporting bugs:
    http://xmlsoft.org/XSLT/bugs.html
if you don't follow the guidelines, I usually can't debug the problem.

Daniel
Comment 2 Kenneth Johansson 2005-07-01 14:16:55 UTC
Created attachment 48501 [details]
Our main build makefile which includes xmllint usage

This file contains the make rules we use in our docbook build environment. Take
a look at the target inst_chm for a closer look at how we use xmllint.
Comment 3 Kenneth Johansson 2005-07-01 14:42:49 UTC
Created attachment 48503 [details]
Zip-file containing the Japanese UTF-16 files and log files

This file contains the Japanese UTF-16 encoded xml-files. It also contains the
log-files produced by target inst_chm during each step in Make.Docbook.
Comment 4 Kenneth Johansson 2005-07-01 14:56:13 UTC
I've uploaded the main makefile which runs the docbook build environment. The
interesting parts is in the inst_chm target. 

I've also uploaded the Japanese UTF-16 encoded xml-files and the logfiles. The
file expanded.xml contains the garbled UTF-16 content. During Xinclude expansion
no messages was generated by xmllint.
Comment 5 William M. Brack 2005-07-04 00:37:58 UTC
Sorry, but I really can't make any progress on the information you have submitted so far.  I can't go 
through your Make.docbook and attempt to extract the important points, nor can I go through all your 
files and try to guess which ones are causing problems.

Your report states that xmllint somehow fails when processing an xincluded file with UTF-16 content.  If 
this is correct, please prepare two files - a small xml file, together with a small file in UTF-16 to be 
included, which demonstrate the problem.  Once I have those, I should be able to trace through where the 
problem occurs and fix it.
Comment 6 Daniel Veillard 2005-07-04 08:24:22 UTC
Agreed, I'm lost too, I don't even know how to reproduce this bug. And trying
to decypher the Makefile to extract what is really at fault just sounds like
lost time for us. Same as comment #1, provide us what we need whithout us having
to guess how and what is the problem.
  - what is the command line command used for xmllint ?
Also give more informations about the XInclude process:
  - is that text xinclude ?
  - where is the included resource ?
  - where is the including resource what line is the inclusion done ?
  - is there an xmllint error ?

Daniel
Comment 7 Kenneth Johansson 2005-07-04 14:17:54 UTC
Hi,

I think I've solved the problem. When I tried to build a small test file for you
guys I saw that the developers who made parts of our DocBook build environment
had  used this xmllint command line command:
xmllint --xinclude IDS5webUGbook.xml > xmllint_test_ok.xml 
which gave the output as seen in the attached file xmllint_test_broken.xml.

When I run the following command
xmllint --xinclude IDS5webUGbook.xml -o xmllint_test.xml 
I got the output as seen in the attached file xmllint_test_ok.xml.

I did of course trust the developers to use correct syntax, and not output to
"standard out". It's weird that we haven't noticed this a long time ago! Maybe a
blurb about this situation should be put into the manual, just to make sure any
other poor souls like me tear their hair off in frustration. :D

Thanks for your help.

/Kenneth

  
Comment 8 Kenneth Johansson 2005-07-04 14:19:16 UTC
Created attachment 48627 [details]
Broken UTF-16 content
Comment 9 Kenneth Johansson 2005-07-04 14:19:53 UTC
Created attachment 48628 [details]
The correct UTF-16 output.
Comment 10 Daniel Veillard 2005-08-26 08:46:02 UTC
xmlling -o just fopen() the output file instead of using stdout. The relevant code
is:

                FILE *out;
                if (output == NULL)
                    out = stdout;
                else {
                    out = fopen(output,"wb");
                }
around line 2460. If you got errors in the output using stdout instead
of a filename I really don't see how this could relate to libxml2. I
think we can safely close this bug then.

Daniel