GNOME Bugzilla – Bug 651925
<br/> being transformed into <br></br>
Last modified: 2021-07-05 11:00:18 UTC
My XSL stylesheet adds a <br/> tag which is being converted to <br></br>, thus producing TWO line breaks instead of one.
Created attachment 192559 [details] Test set which demonstrates the problem The primary XSL file is bug651925.xsl, but this includes other XSL files which contain callable templates.
Created attachment 192602 [details] Test set which demonstrates the problem This version has a smaller XML file so that extraneous fields are excluded from the HTML output. Look for the field labelled 'Password Change'. This is a radio group which has three options, each of which ends with <br> so that they are displayed vertically insead of horizontally. The output clearly shows '<br></br>' instead of '<br />'
This was also reported as <https://bugs.webkit.org/show_bug.cgi?id=76707>. This used to be less of an issue in practice because HTML parsers would ignore "</br>" in strict mode, but it is no longer the case with HTML5 compliant parsers, such as Gecko and in WebKit. According to comments in bug 100114, this previously worked correctly in libxslt, but longer does.
Note that the test case has multiple xsl:output elements with different methods: bug651925.xsl:<xsl:output method='xml' disable-output-escaping.xsl: <xsl:output method="html" encoding="UTF-8"/> std.data_field.xsl: <xsl:output method="html" encoding="UTF-8"/> The XSLT spec says: "A stylesheet may contain multiple xsl:output elements and may include or import stylesheets that also contain xsl:output elements. All the xsl:output elements occurring in a stylesheet are merged into a single effective xsl:output element. For the cdata-section-elements attribute, the effective value is the union of the specified values. For other attributes, the effective value is the specified value with the highest import precedence. It is an error if there is more than one such value for an attribute. An XSLT processor may signal the error; if it does not signal the error, if should recover by using the value that occurs last in the stylesheet. The values of attributes are defaulted after the xsl:output elements have been merged; different output methods may have different default values for an attribute." Since xsl:include is used, all xsl:output elements have the same import precedence. The xsl:output from disable-output-escaping.xsl comes last, so the effective method is "html". The <br/> comes from std.data_field.xsl which has a default namespace of xmlns="http://www.w3.org/1999/xhtml". So the special handling of <br/> doesn't apply (it only applies to elements with a null namespace). If you actually want to use the "html" output method, remove the xmlns="http://www.w3.org/1999/xhtml" from std.data_field.xsl and you'll get a single <br>. If you want to use the "xml" output method, then use it everywhere or use xsl:import.
This test case <http://www.ExampleOnly.com/br-tag/test.html> has a single xml:output tag, and produces <br><br> for me.
Like I said: If you want to use the html output method, you have to remove the default namespace xmlns="http://www.w3.org/1999/xhtml" from your stylesheet. Quoting the spec: "The html output method should not output an element differently from the xml output method unless the expanded-name of the element has a null namespace URI; an element whose expanded-name has a non-null namespace URI should be output as XML."
There is some confusion as to which test case we are talking about. You are probably looking at the test case attached here, while I'm looking at the more reduced test at <http://www.ExampleOnly.com/br-tag/test.html>. Removing namespace from the stylesheet doesn't help with the minimal test. Removing it from source document does, but at the cost of making it invalid XHTML. Anyway, this is not as simple as it looks in the spec. HTML elements in HTML documents always have XHTML namespace now, so requiring a null namespace for HTML output does not make sense. An XSLT processor that wants to be usable for producing HTML needs to recognize this (as does Mozilla).
> There is some confusion as to which test case we are talking about. You are > probably looking at the test case attached here, while I'm looking at the more > reduced test at <http://www.ExampleOnly.com/br-tag/test.html>. Removing > namespace from the stylesheet doesn't help with the minimal test. Removing it > from source document does, but at the cost of making it invalid XHTML. If you want to use XHTML, you have to use the XML output method. > Anyway, this is not as simple as it looks in the spec. HTML elements in HTML > documents always have XHTML namespace now, To which specification are you referring? > so requiring a null namespace for > HTML output does not make sense. But that's what the XSLT spec says.
> If you want to use XHTML, you have to use the XML output method. I was talking about source document. > To which specification are you referring? Either <http://http://whatwg.org/c> or <http://www.w3.org/TR/html5>. All previous versions are fully obsolete. Browsers do implement this - HTML elements in HTML documents have XHTML namespace, not a null one. > > so requiring a null namespace for > > HTML output does not make sense. > > But that's what the XSLT spec says. Right, this is why I said that it's not as simple. For HTML, we need a reasonable way to output <br>, and other implementations (e.g. Mozilla Firefox) do what Web developers need. Relying on libxslt puts WebKit at a disadvantage, this is why I'm requesting a fix. Special casing XHTML namespace in addition to null would be well within the spirit of XSLT spec, I believe. Practically, there is already no interoperability among XSLT processors in this regard - I believe that many fully ignore the requirement to check the namespace, not just special case XHTML.
> > To which specification are you referring? > > Either <http://http://whatwg.org/c> or <http://www.w3.org/TR/html5>. All > previous versions are fully obsolete. > > Browsers do implement this - HTML elements in HTML documents have XHTML > namespace, not a null one. No, HTML5 documents (not XHTML5) shouldn't use namespaces at all, default or otherwise. They're not XML after all. > For HTML, we need a reasonable way to output <br>, and other implementations > (e.g. Mozilla Firefox) do what Web developers need. Then simply drop the namespaces. They're not needed.
Nick, if you have an opinion about how HTML should work, please join HTML working group. There are reasons why HTML works the way it does, and there is no point debating those here. This bug report is about making libxslt usable for producing HTML as it stands. Can you help with that?
I'm really trying to help you, and pointed out that you should simply remove the default namespace (from your stylesheet and from the source document in your example). This will result in a single <br> element in the output document. If you insist on using the XHTML namespace for plain HTML documents, I simply can't help you. libxslt won't use the special HTML output rules in that case, because that's what the spec says.
So you repeated what I said in comment 7, and ignored other comments? Thank you for your help.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxslt/-/issues/ Thank you for your understanding and your help.