After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 687301 - [PATCH] Tokenizer doesn't recognize some valid HTML attributes
[PATCH] Tokenizer doesn't recognize some valid HTML attributes
Status: RESOLVED FIXED
Product: doxygen
Classification: Other
Component: general
1.8.2-SVN
Other Linux
: Normal normal
: ---
Assigned To: Dimitri van Heesch
Dimitri van Heesch
Depends on:
Blocks:
 
 
Reported: 2012-11-01 01:02 UTC by mason malone
Modified: 2012-12-26 16:09 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Fix html attribute parsing (959 bytes, patch)
2012-11-01 01:02 UTC, mason malone
none Details | Review

Description mason malone 2012-11-01 01:02:20 UTC
Created attachment 227769 [details] [review]
Fix html attribute parsing

There are many valid characters that can appear in HTML attribute names that Doxygen doesn't allow, notably the hyphen. This means you can't use data attributes (which always take the form data-foo="bar") in HTML, and that can be pretty annoying since data attributes are frequently used to pass data to Javascript apps. 

The attached patch modifies doctokenizer.l to be more liberal in what characters are allowed in attribute names. The regular expression for HTMLATTID was derived from this: 
http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#attributes-0
"Attributes have a name and a value. Attribute names must consist of one or more characters other than the space characters, U+0000 NULL, U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), U+003E GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the control characters, and any characters that are not defined by Unicode."
Comment 1 Dimitri van Heesch 2012-11-17 10:06:09 UTC
I don't mind adding the '-' but allowing even more characters will probably lead to cases were text will suddenly be parsed as an attribute.

Besides that, using arbitrary names for attributes is not part of the HTML standard. The 4.01 standard only lists these as valid for instance:
http://www.w3.org/TR/REC-html40/index/attributes.html
Comment 2 Dimitri van Heesch 2012-11-18 11:07:25 UTC
Changed version 'latest' to '1.8.2-SVN' so I can remove 'latest' as an option as it is a moving target.
Comment 3 Dimitri van Heesch 2012-12-26 16:09:10 UTC
This bug was previously marked ASSIGNED, which means it should be fixed in
doxygen version 1.8.3. Please verify if this is indeed the case. Reopen the
bug if you think it is not fixed and please include any additional information
that you think can be relevant.