After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 675373 - Incorrect Name and NCName validation for non-ASCII characters (with fix)
Incorrect Name and NCName validation for non-ASCII characters (with fix)
Status: RESOLVED OBSOLETE
Product: libxml2
Classification: Platform
Component: general
git master
Other Mac OS
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2012-05-03 14:33 UTC by A Developer
Modified: 2021-07-05 13:26 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Test file with ASCII IDs that pass validation (821 bytes, application/octet-stream)
2012-05-03 20:55 UTC, A Developer
Details
Test file with Japanese IDs that fail validation (821 bytes, application/octet-stream)
2012-05-03 21:00 UTC, A Developer
Details
Test file with ASCII IDs that pass validation (800 bytes, application/octet-stream)
2012-05-03 21:01 UTC, A Developer
Details
Modified parserInternals.h (18.15 KB, application/octet-stream)
2012-05-03 21:05 UTC, A Developer
Details
Modified tree.c (249.03 KB, application/octet-stream)
2012-05-03 21:06 UTC, A Developer
Details

Description A Developer 2012-05-03 14:33:39 UTC
The validation functions within the 'Check Name, NCName and QName strings' section of the file 'tree.c' do not seem to conform to the W3C XML1.0 (5th) and Namespaces in XML (3rd) editions. This causes xmllint to find false errors in atomic types for non-ASCII IDs and Names, such as those containing Japanese ideographic characters.

It seems that these the validation code, following the 'try_complex:' labels in these functions, are currently based on orphaned definitions. Suggestion is add some new macros to 'parserinternals.h' to cover the current definitions and then modify the 'try_complex:' sub-sections to utilize them e.g.

/**
 * IS_NAMESTARTCHAR:
 * @c: an xmlChar value
 *
 * [4] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
 *
 */

#define IS_NAMESTARTCHAR(c) (\
    (0x3a == (c)) ||\
    ((0x41 <= (c)) && ((c) <= 0x5a)) ||\
    (0x5f == (c)) ||\
    ((0x61 <= (c)) && ((c) <= 0x7a)) ||\
    ((0xc0 <= (c)) && ((c) <= 0xd6)) ||\
    ((0xd8 <= (c)) && ((c) <= 0xf6)) ||\
    ((0xf8 <= (c)) && ((c) <= 0x2ff))||\
    ((0x370 <= (c)) && ((c) <= 0x37d)) ||\
    ((0x37f <= (c)) && ((c) <= 0x1fff)) ||\
    ((0x200c <= (c)) && ((c) <= 0x200d)) ||\
    ((0x2070 <= (c)) && ((c) <= 0x218f)) ||\
    ((0x2c00 <= (c)) && ((c) <= 0x2fef)) ||\
    ((0x3001 <= (c)) && ((c) <= 0xd7ff)) ||\
    ((0xf900 <= (c)) && ((c) <= 0xfdcf)) ||\
    ((0xfdf0 <= (c)) && ((c) <= 0xfffd)) ||\
    ((0x10000<= (c)) && ((c) <= 0xeffff))\
)

/**
 * IS_NAMECHAR:
 * @c: an xmlChar value
 *
 * [4a] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
 *
 */
    
#define IS_NAMECHAR(c)	(\
    (0x2d == (c)) ||\
    (0x2e == (c)) ||\
    ((0x30 <= (c)) && ((c) <= 0x39)) ||\
    ((0x41 <= (c)) && ((c) <= 0x5a)) ||\
    (0x5f == (c)) ||\
    ((0x61 <= (c)) && ((c) <= 0x7a)) ||\
    (0xb7 == (c)) ||\
    ((0xc0 <= (c)) && ((c) <= 0xd6)) ||\
    ((0xd8 <= (c)) && ((c) <= 0xf6)) ||\
    ((0xf8 <= (c)) && ((c) <= 0x2ff))||\
    ((0x300 <= (c)) && ((c) <= 0x36f)) ||\
    ((0x370 <= (c)) && ((c) <= 0x37d)) ||\
    ((0x37f <= (c)) && ((c) <= 0x1fff)) ||\
    ((0x200c <= (c)) && ((c) <= 0x200d)) ||\
    ((0x203f <= (c)) && ((c) <= 0x2040)) ||\
    ((0x2070 <= (c)) && ((c) <= 0x218f)) ||\
    ((0x2c00 <= (c)) && ((c) <= 0x2fef)) ||\
    ((0x3001 <= (c)) && ((c) <= 0xd7ff)) ||\
    ((0xf900 <= (c)) && ((c) <= 0xfdcf)) ||\
    ((0xfdf0 <= (c)) && ((c) <= 0xfffd)) ||\
    ((0x10000<= (c)) && ((c) <= 0xeffff))\
)


and then (in, for example,  xmlValidateName)...

try_complex:
    /*
     * Second check for chars outside the ASCII range
     */
    cur = value;
    c = CUR_SCHAR(cur, l);
    if (space) {
        while (IS_BLANK(c)) {
            cur += l;
            c = CUR_SCHAR(cur, l);
        }
    }
    if (!IS_NAMESTARTCHAR(c) )
        return(1);
    cur += l;
    c = CUR_SCHAR(cur, l);
    while (IS_NAMECHAR(c) || (c == ':')) {
        cur += l;
        c = CUR_SCHAR(cur, l);
    }

    
//    if ((!IS_LETTER(c)) && (c != '_') && (c != ':'))
//        return(1);
//    cur += l;
//    c = CUR_SCHAR(cur, l);    
//    while (IS_LETTER(c) || IS_DIGIT(c) || (c == '.') || (c == ':') ||
//	   (c == '-') || (c == '_') || IS_COMBINING(c) || IS_EXTENDER(c)) {
//	cur += l;
//	c = CUR_SCHAR(cur, l);
//    }
    if (space) {
        while (IS_BLANK(c)) {
            cur += l;
            c = CUR_SCHAR(cur, l);
        }
    }
    if (c != 0)
        return(1);
    return(0);
Comment 1 A Developer 2012-05-03 20:55:44 UTC
Created attachment 213409 [details]
Test file with ASCII IDs that pass validation

Test against schema:
    http://www.collada.org/2005/11/COLLADASchema.xsd

Note that this test file does have one validation bug for missing source element, which can be ignored. It is the IDs being tested here.
Comment 2 A Developer 2012-05-03 21:00:33 UTC
Created attachment 213411 [details]
Test file with Japanese IDs that fail validation 

Test against schema:
    http://www.collada.org/2005/11/COLLADASchema.xsd

Note that this test file does have one validation bug for missing source element, which can be ignored. It is the IDs being tested here.

This file simply substitutes some valid UTF-8 Japanese characters for the ASCII IDs of file test1.dae.
Comment 3 A Developer 2012-05-03 21:01:23 UTC
Created attachment 213412 [details]
Test file with ASCII IDs that pass validation 

Test against schema:
    http://www.collada.org/2005/11/COLLADASchema.xsd

Note that this test file does have one validation bug for missing source element, which can be ignored. It is the IDs being tested here.
Comment 4 A Developer 2012-05-03 21:05:06 UTC
Created attachment 213413 [details]
Modified parserInternals.h

Added macros used by validation fixes for non-ASCII IDs and names
Comment 5 A Developer 2012-05-03 21:06:49 UTC
Created attachment 213414 [details]
Modified tree.c

Made changes to some (but not all) validation code for non-ASCII characters in names and IDs, using new macros from parserInternals.h
Comment 6 GNOME Infrastructure Team 2021-07-05 13:26:43 UTC
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new ticket at
  https://gitlab.gnome.org/GNOME/libxml2/-/issues/

Thank you for your understanding and your help.