GNOME Bugzilla – Bug 309215
xmllint garbles output from UFT-16 DocBook xml-files
Last modified: 2009-08-15 18:40:50 UTC
Please describe the problem: I use xmllint as part of a toolchain in a DocBook CHM and PDF build environment. xmllint is used in two parts, first to expand all xincluded xml-modules, and second to validate the xml-content. When expanding xincludes from files with UTF-16 content, xmllint produces output which is clearly broken Unicode. We (Bob Stayton and I) have verified that xmllib2 works since Bob tested with xsltproc --xinclude and it produced correct content, hence we've assumed that xmllint is flawed somewhere. This problem is a showstopper for us and I'd appreciate if I can get fast help with this. Steps to reproduce: 1. By using xmllint together with my customized stylesheets and UTF-16 xml-files. 2. 3. Actual results: I get an error message further down the toolchain: Error on line 1 column 36 of file:/c:/work/10.1/PACS/PACS_IDS/si_doc/pms_japanese/viewer/expanded.xml: Error reported by XML parser: unexpected character after document prolog (found "?") (expected "<") Transformation failed: Run-time errors were reported Saxon which handles profiling cannot understand the file expanded.xml since the content is broken. Expected results: An xml-file with correct UTF-16 content should be produced which should be profiled and built into a PDF or CHM. Does this happen every time? Yes. Other information: Yes, I have a screenshot of the broken unicode content. Here's the xincluded file with broken content (part of the file): <?xml version="1.0" encoding="utf-16"?>㰀℀䐀伀䌀吀夀倀䔀 戀漀漀欀 倀唀䈀䰀䤀䌀 ∀ⴀ⼀⼀伀䄀匀䤀匀⼀⼀䐀吀䐀 䐀漀挀䈀漀漀欀 堀䴀䰀 嘀㐀⸀㈀⼀⼀䔀一∀ ∀栀琀琀瀀㨀 ⼀⼀眀眀眀⸀漀愀猀椀猀ⴀ漀瀀攀渀⸀漀爀最⼀搀漀挀戀漀漀欀⼀砀洀氀⼀㐀⸀㈀⼀搀漀挀戀漀 漀欀砀⸀搀琀搀∀ 嬀ഀ <!ENTITY uArr "⇑">㰀℀䔀一吀䤀吀夀 栀挀椀爀挀 ∀─∁㸀ഀ <!ENTITY icirc "î">㰀℀䔀一吀䤀吀夀 攀焀甀愀氀猀 ∀㴀∀㸀ഀ <!ENTITY cong "≅">㰀℀䔀一吀䤀吀夀 䠀䄀刀䐀挀礀 ∀⨀∄㸀ഀ <!ENTITY icy "и">㰀℀䔀一吀䤀吀夀 䔀挀愀爀漀渀 ∀ᨀ∁㸀ഀ <!ENTITY clubs "♣">㰀℀䔀一吀䤀吀夀 瀀栀洀洀愀琀 ∀㌀∡㸀ഀ <!ENTITY sqcap "⊓">㰀℀䔀一吀䤀吀夀 琀栀漀爀渀 ∀︀∀㸀ഀ <!ENTITY Lcedil "Ļ">㰀℀䔀一吀䤀吀夀 爀愀爀爀 ∀鈀∡㸀ഀ <!ENTITY verbar "|">㰀℀䔀一吀䤀吀夀 挀椀爀攀 ∀圀∢㸀ഀ <!ENTITY DZcy "Џ">㰀℀䔀一吀䤀吀夀 戀⸀搀攀氀琀愀 ∀㔀엘⋞㸀ഀ <!ENTITY Gcirc "Ĝ">㰀℀䔀一吀䤀吀夀 漀挀椀爀 ∀騀∢㸀ഀ <!ENTITY circ "ˆ">㰀℀䔀一吀䤀吀夀 䤀最爀 ∀餀∃㸀ഀ <!ENTITY udigr "ϋ">㰀℀䔀一吀䤀吀夀 瀀爀椀洀攀 ∀㈀∠㸀ഀ <!ENTITY npr "⊀">㰀℀䔀一吀䤀吀夀 戀⸀瀀椀 ∀㔀퇘⋞㸀ഀ <!ENTITY frac58 "⅝">㰀℀䔀一吀䤀吀夀 氀搀焀甀漀爀 ∀Ḁ∠㸀ഀ <!ENTITY sqsup "⊐">㰀℀䔀一吀䤀吀夀 戀漀砀䐀刀 ∀吀∥㸀ഀ <!ENTITY kcedil "ķ">㰀℀䔀一吀䤀吀夀 瘀䐀愀猀栀 ∀ꠀ∢㸀ഀ <!ENTITY Scedil "Ş">㰀℀䔀一吀䤀吀夀 瀀攀爀瀀 ∀ꔀ∢㸀ഀ <!ENTITY b.Gamma "
In the current state, your bug report does not give me anything to try to reproduce and assert a problem. It's completely unworkable for me, I can't even look at the bug, so sorry in that state and without - all the input needed provided a bug attachements - the command line used in xmllint to generate the problem there is no way I will look further at this bug report, because there is just nothing I can do. Guidelines are there for reporting bugs: http://xmlsoft.org/XSLT/bugs.html if you don't follow the guidelines, I usually can't debug the problem. Daniel
Created attachment 48501 [details] Our main build makefile which includes xmllint usage This file contains the make rules we use in our docbook build environment. Take a look at the target inst_chm for a closer look at how we use xmllint.
Created attachment 48503 [details] Zip-file containing the Japanese UTF-16 files and log files This file contains the Japanese UTF-16 encoded xml-files. It also contains the log-files produced by target inst_chm during each step in Make.Docbook.
I've uploaded the main makefile which runs the docbook build environment. The interesting parts is in the inst_chm target. I've also uploaded the Japanese UTF-16 encoded xml-files and the logfiles. The file expanded.xml contains the garbled UTF-16 content. During Xinclude expansion no messages was generated by xmllint.
Sorry, but I really can't make any progress on the information you have submitted so far. I can't go through your Make.docbook and attempt to extract the important points, nor can I go through all your files and try to guess which ones are causing problems. Your report states that xmllint somehow fails when processing an xincluded file with UTF-16 content. If this is correct, please prepare two files - a small xml file, together with a small file in UTF-16 to be included, which demonstrate the problem. Once I have those, I should be able to trace through where the problem occurs and fix it.
Agreed, I'm lost too, I don't even know how to reproduce this bug. And trying to decypher the Makefile to extract what is really at fault just sounds like lost time for us. Same as comment #1, provide us what we need whithout us having to guess how and what is the problem. - what is the command line command used for xmllint ? Also give more informations about the XInclude process: - is that text xinclude ? - where is the included resource ? - where is the including resource what line is the inclusion done ? - is there an xmllint error ? Daniel
Hi, I think I've solved the problem. When I tried to build a small test file for you guys I saw that the developers who made parts of our DocBook build environment had used this xmllint command line command: xmllint --xinclude IDS5webUGbook.xml > xmllint_test_ok.xml which gave the output as seen in the attached file xmllint_test_broken.xml. When I run the following command xmllint --xinclude IDS5webUGbook.xml -o xmllint_test.xml I got the output as seen in the attached file xmllint_test_ok.xml. I did of course trust the developers to use correct syntax, and not output to "standard out". It's weird that we haven't noticed this a long time ago! Maybe a blurb about this situation should be put into the manual, just to make sure any other poor souls like me tear their hair off in frustration. :D Thanks for your help. /Kenneth
Created attachment 48627 [details] Broken UTF-16 content
Created attachment 48628 [details] The correct UTF-16 output.
xmlling -o just fopen() the output file instead of using stdout. The relevant code is: FILE *out; if (output == NULL) out = stdout; else { out = fopen(output,"wb"); } around line 2460. If you got errors in the output using stdout instead of a filename I really don't see how this could relate to libxml2. I think we can safely close this bug then. Daniel