After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 625850 - Not able to decode special characters during parsing
Not able to decode special characters during parsing
Status: RESOLVED INVALID
Product: libxml2
Classification: Platform
Component: general
2.7.3
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2010-08-02 16:17 UTC by Avijeet Gupta
Modified: 2021-06-18 16:03 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Avijeet Gupta 2010-08-02 16:17:47 UTC
Hi,

This is the sample document which i am trying to parse:

<?xml version="1.0" encoding="iso-8859-1"?>
        <statuslineupdatereq>
        <action>notify_display</action>
        <statustext>^^$</statustext>
        <priority>1</priority>
        </statuslineupdatereq>

Looks like the parser is not able to parse the element tag <statustext>^^$</statustext> properly. The parser is returning back only '$' but I need the whole content of the element which is '^^$'. 

I have another case where the content of the statustext element is '^^%' and the parser returns only '%'. I have spent lot of time trying to figure out what is going on here but with no success. 

I am doing the following to get the element content:
/* code snippet */
    .................
    .................
    for (cur_node = node; cur_node; cur_node = cur_node->next) {
      if (cur_node->type == XML_ELEMENT_NODE) {
        if (!xmlStrcmp(cur_node->name, (const xmlChar *) "statustext")){
          data = xmlNodeListGetString(doc, cur_node->xmlChildrenNode, 1);
        }
      }
      parseXMLNode(.......);      
    }

Also, in this case, I get a crash when I do not call xmlCtxtReadMemory() in recover mode. Looks like the parser crashes when it encounters '^' character. I do not see a crash while running in recover mode but the parser ignores '^' character and sends only the '$' or '%' back to me.

I am not sure if the problem lies in the parser or at my end. Any help/clue would be really appreciated. 

Thanks,
Avijeet
Comment 1 André Klapper 2021-06-18 16:03:20 UTC
Please ask support questions about code development in support forums. Thanks!