GNOME Bugzilla – Bug 652185
xmllint does not emit fatal error whenever encoding declaration differ from UTF-8 BOM
Last modified: 2021-07-05 13:27:18 UTC
For an UTF-8 encoded file with a BOM, if the document contains an XML encoding declaration which says something other than "UTF-8", then it is a 'fatal error', per XML 1.0. See: http://www.w3.org/Bugs/Public/show_bug.cgi?id=12897#c13
Created attachment 189543 [details] XHTML doc in UTF-8 with BOM + disagreeing XML encoding declaration. Added test case. To check, run $ xmllint xml9.xhtml Sucessful means that xmllint emits a fatal error. Unsucesseful means that xmllint does not emit error. PS: Note, that my proposal in bug 12897 against HTML5 is that XML 1.0 is changed to make the UTF-8 BOM override the XML encoding declaration without any error. I also suggest that the BOM should override the HTTP charset parameter. This would be in line with the non-normative Appendix F2 in XMl 1.0: <http://www.w3.org/TR/xml/#sec-guessing-with-ext-info> That option is also discussed in RFC3032: <http://tools.ietf.org/html/rfc3023#section-3.2>
I should add that I am not certain that this bug should be fixed. I now tend to think that, at the most, this - which is currently a fatal error, should be just an error. So the XML spec should change.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxml2/-/issues/ Thank you for your understanding and your help.