GNOME Bugzilla – Bug 608333
Different HTML and XML output for HTML and XML when xincluding a text file with \r\n end line markers
Last modified: 2010-02-01 13:10:30 UTC
When a text file is included with <include href="text.txt" parse="text" xmlns="http://www.w3.org/2001/XInclude"/> and the text file uses \r\n end line markers (as it's common on Windows) the output file generated by xmllint and xsltproc contains characters for every \r. There is an exception though: When xsltproc is used to generate HTML output (with a style sheet which contains <xsl:output method="html">) no characters are inserted. According to http://article.gmane.org/gmane.comp.gnome.lib.xslt/3917 libxml2 treats HTML output indeed differently. A style sheet to generate XHTML must contain <xsl:output method="xml">. As XHTML is a XML grammar and libxml2 only treats HTML differently XHTML output contains again these characters. The question is whether it makes sense at all to generate characters when a text file is xincluded? Why is HTML treated differently than XML? As generating characters is a problem for at least one popular reading system for eBook files (ePub format) it might make sense not to generate characters at all when text files which are included. But then I don't know if there are other use cases which require to generate characters for XML output. The problem was discussed first on the DocBook mailing list. Here's the relevant message which made me turn to the libxml2 project: http://lists.oasis-open.org/archives/docbook/201001/msg00065.html
http://www.w3.org/TR/REC-xml/#sec-line-ends The only way to have \r\n be available after parsing of the XML resulting document is to have \r escaped. This is a mandatory rule in XML parsing. Not a bug, Daniel