GNOME Bugzilla – Bug 687301
[PATCH] Tokenizer doesn't recognize some valid HTML attributes
Last modified: 2012-12-26 16:09:10 UTC
Created attachment 227769 [details] [review] Fix html attribute parsing There are many valid characters that can appear in HTML attribute names that Doxygen doesn't allow, notably the hyphen. This means you can't use data attributes (which always take the form data-foo="bar") in HTML, and that can be pretty annoying since data attributes are frequently used to pass data to Javascript apps. The attached patch modifies doctokenizer.l to be more liberal in what characters are allowed in attribute names. The regular expression for HTMLATTID was derived from this: http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#attributes-0 "Attributes have a name and a value. Attribute names must consist of one or more characters other than the space characters, U+0000 NULL, U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), U+003E GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the control characters, and any characters that are not defined by Unicode."
I don't mind adding the '-' but allowing even more characters will probably lead to cases were text will suddenly be parsed as an attribute. Besides that, using arbitrary names for attributes is not part of the HTML standard. The 4.01 standard only lists these as valid for instance: http://www.w3.org/TR/REC-html40/index/attributes.html
Changed version 'latest' to '1.8.2-SVN' so I can remove 'latest' as an option as it is a moving target.
This bug was previously marked ASSIGNED, which means it should be fixed in doxygen version 1.8.3. Please verify if this is indeed the case. Reopen the bug if you think it is not fixed and please include any additional information that you think can be relevant.