GNOME Bugzilla – Bug 655218
HTMLParser does not support HTML5-like <meta charset> encoding declaration
Last modified: 2012-05-10 07:39:21 UTC
From: http://www.w3.org/TR/2011/WD-html5-20110525/semantics.html#the-meta-element """ The charset attribute specifies the character encoding used by the document. This is a character encoding declaration. If the attribute is present in an XML document, its value must be an ASCII case-insensitive match for the string "UTF-8" (and the document is therefore forced to use UTF-8 as its encoding). """ However, while <meta http-equiv="Content-Type" content="text/html; charset=utf8"> works, <meta charset="utf8"> does not.
Created attachment 199769 [details] [review] Add html5 charset meta tag Exist in last draft and dev version: http://www.w3.org/TR/2011/WD-html5-author-20110809/the-meta-element.html#the-meta-element http://dev.w3.org/html5/spec/Overview.html#the-meta-element
While libxml2 HTML parser is not tuned for HTML5, this is a simple addition, I made some indenting changes and added a test case for completeness and commited: http://git.gnome.org/browse/libxml2/commit/?id=868d92da8915fc5dc5e329d93cc7882370a28475 thanks for the suggestion and the patch ! Daniel