After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 304020 - better parsing error handling in domparser
better parsing error handling in domparser
Status: RESOLVED FIXED
Product: libxml++
Classification: Bindings
Component: DOM Parser
2.9.x
Other Linux
: Normal enhancement
: ---
Assigned To: Christophe de Vienne
Christophe de Vienne
Depends on:
Blocks:
 
 
Reported: 2005-05-13 09:16 UTC by virgile devaux
Modified: 2012-02-15 09:54 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
targzipped file containing the two diffs beetween domparser.cc and domparser.h and the versions i use (prefixed with new_) (1.32 KB, patch)
2005-05-17 11:36 UTC, virgile devaux
none Details | Review
a tar bzipped file of patches for exception and parser_error classes (935 bytes, patch)
2005-06-13 12:47 UTC, virgile devaux
none Details | Review
patch: Parser: Throw more detailed error messages. (23.70 KB, patch)
2012-01-04 16:21 UTC, Kjell Ahlstedt
none Details | Review

Description virgile devaux 2005-05-13 09:16:16 UTC
Version details: libxml++-2.9.2
Distribution/Version: debian

to obtain a message more precise than "Document not
well formed." if a parsing error occured in DomParser i
slightly modified the parse_context function of the
pbject and add a fonction to format an error message
given a xmlErrorPtr.
Modified parse_context function:
void DomParser::parse_context()
{
  KeepBlanks k(KeepBlanks::Default);

  //The following is based on the implementation of xmlParseFile(), in
xmlSAXParseFileWithData():
  //and the implementation of xmlParseMemory(), in xmlSaxParseMemoryWithData().
  initialize_context();

  xmlParseDocument(context_);

  check_for_exception();

  if((!context_->wellFormed)||(context_->errNo != 0)) {
      std::string errStr="Document not well-formed.";
      xmlErrorPtr	tmpXmlErrorPtr=xmlCtxtGetLastError(context_);
      errStr+=" "+xml_format_error(tmpXmlErrorPtr);
      xmlpp::Parser::release_underlying();
      throw xmlpp::parse_error(errStr);
  }
  
  doc_ = new Document(context_->myDoc);
  // This is to indicate to release_underlying that we took the
  // ownership on the doc.
  context_->myDoc = NULL;

  //Free the parse context, but keep the document alive so people can navigate
the DOM tree:
  //TODO: Why not keep the context alive too?
  Parser::release_underlying();

  check_for_exception();
}


added xml_format_error:
std::string DomParser::xml_format_error(xmlErrorPtr err) {
    if(err==NULL) return "";
    std::ostringstream returned_error_string;
    switch(err->domain) {
        case XML_FROM_NONE: returned_error_string << "XML_FROM_NONE";break;
        case XML_FROM_PARSER: returned_error_string << "XML_FROM_PARSER";break;
        case XML_FROM_TREE: returned_error_string << "XML_FROM_TREE";break;
        case XML_FROM_NAMESPACE: returned_error_string <<
"XML_FROM_NAMESPACE";break;
        case XML_FROM_DTD: returned_error_string << "XML_FROM_DTD";break;
        case XML_FROM_HTML: returned_error_string << "XML_FROM_HTML";break;
        case XML_FROM_MEMORY: returned_error_string << "XML_FROM_MEMORY";break;
        case XML_FROM_OUTPUT: returned_error_string << "XML_FROM_OUTPUT";break;
        case XML_FROM_IO: returned_error_string << "XML_FROM_IO";break;
        case XML_FROM_FTP: returned_error_string << "XML_FROM_FTP";break;
        case XML_FROM_HTTP: returned_error_string << "XML_FROM_HTTP";break;
        case XML_FROM_XINCLUDE: returned_error_string << "XML_FROM_XINCLUDE";break;
        case XML_FROM_XPATH: returned_error_string << "XML_FROM_XPATH";break;
        case XML_FROM_XPOINTER: returned_error_string << "XML_FROM_XPOINTER";break;
        case XML_FROM_REGEXP: returned_error_string << "XML_FROM_REGEXP";break;
        case XML_FROM_DATATYPE: returned_error_string << "XML_FROM_DATATYPE";break;
        case XML_FROM_SCHEMASP: returned_error_string << "XML_FROM_SCHEMASP";break;
        case XML_FROM_SCHEMASV: returned_error_string << "XML_FROM_SCHEMASV";break;
        case XML_FROM_RELAXNGP: returned_error_string << "XML_FROM_RELAXNGP";break;
        case XML_FROM_RELAXNGV: returned_error_string << "XML_FROM_RELAXNGV";break;
        case XML_FROM_CATALOG: returned_error_string << "XML_FROM_CATALOG";break;
        case XML_FROM_C14N: returned_error_string << "XML_FROM_C14N";break;
        case XML_FROM_XSLT: returned_error_string << "XML_FROM_XSLT";break;
        case XML_FROM_VALID: returned_error_string << "XML_FROM_VALID";break;
        case XML_FROM_CHECK: returned_error_string << "XML_FROM_CHECK";break;
        case XML_FROM_WRITER: returned_error_string << "XML_FROM_WRITER";break;
    }
    switch (err->level) {
        case XML_ERR_NONE:
            returned_error_string << "(): ";
            break;
        case XML_ERR_WARNING:
            returned_error_string << "(warning): ";
            break;
        case XML_ERR_ERROR:
            returned_error_string << "(error): ";
            break;
        case XML_ERR_FATAL:
            returned_error_string << "(fatal): ";
            break;
    }
    if(err->message!=NULL) returned_error_string  << err->message;
    else returned_error_string  << "unknown error\n";
    if(err->file!=NULL) {
        returned_error_string << " in file " << err->file ;
    } else {
        returned_error_string << " in Entity";
    }
    returned_error_string << " on line " << err->line;
    return returned_error_string.str();
}

in domparser.h:
...
protected:
  virtual std::string xml_format_error(xmlErrorPtr err);
...
Comment 1 Murray Cumming 2005-05-15 11:45:23 UTC
Could we have an actual patch please? That is much easier to review.
Comment 2 virgile devaux 2005-05-17 11:36:18 UTC
Created attachment 46538 [details] [review]
targzipped file containing the two diffs beetween domparser.cc and domparser.h and the versions i use (prefixed with new_)
Comment 3 Christophe de Vienne 2005-06-13 10:36:39 UTC
The patch looks ok, minus these 2 points : 
 
- The domparser.cc patch does not apply properly on cvs head version. 
- The domparser.h patch make domparser.h include a libxml header. This is not 
good : none of the libxml++ header does include a libxml header. The things we 
do when we need a pointer on libxml type in the header is writting a forward 
declaration of the needed structure. 
 
Otherwise it looks good. 
 
Comment 4 virgile devaux 2005-06-13 10:57:59 UTC
ok, could you give me some more hint about the "forward declaration" stuff?
I download the cvs head version now and make a new patch for it.
Comment 5 Christophe de Vienne 2005-06-13 11:02:22 UTC
for example, in parser.h : 
     25 extern "C" { 
     26   struct _xmlParserCtxt; 
     27 } 
... 
     99   _xmlParserCtxt* context_; 
 
instead of using directly xmlParserCtxtPtr. 
Comment 6 Murray Cumming 2005-06-13 11:06:39 UTC
It should also use Glib::ustring instead of std::string and the white space
should be cleaned up to match the existing code.

I'd also prefer to see human-readable errors in libxml rather than libxml++, if
possible.
Comment 7 Christophe de Vienne 2005-06-13 11:23:47 UTC
Having a closer look to the xmlError api 
(http://www.xmlsoft.org/html/libxml-xmlerror.html), I think there is something 
we can do to extend the error-reporting to the whole library. 
xmlpp::exception, or one of it's children, should take a xmlErrorPtr as 
parameter, and do the error message formatting in what(). This would put the 
error formatting code in a better place than domparser. 
This exception, in it's constructor/destructor, would use xmlCopyError and 
xmlResetError, so there is no memory handling to do with the result of 
getlasterror. 
 
Thoughts ? 
 
Comment 8 virgile devaux 2005-06-13 11:48:15 UTC
yes, this would be the better way, as this error handling would be available for
all the parsers. But doesn't it means that every call to throw exception will
have to be changed?
Comment 9 Christophe de Vienne 2005-06-13 11:55:45 UTC
Sooner or later yes. But as long as it's not breaking the ABI, it's not a 
problem (I count 45 occurences of the token "throw" in the source code). 
 
Comment 10 virgile devaux 2005-06-13 12:47:38 UTC
Created attachment 47710 [details] [review]
a tar bzipped file of patches for exception and parser_error classes

I modify the exception and parse_error file following your idea; i cant test it
now but it compiles. can you tell me if it'ok for you
Comment 11 Christophe de Vienne 2005-06-13 13:16:04 UTC
I don't have time for a good review right now. 
 
However, 
- I see in exception.cc the string "Document not well-formed.". This is 
probably incorrect in most uses of xmlpp::exception. 
- The exception contructor takes a _xmlParserCtxt*. Again, if the goal is to 
make it generic, we should be able to use it on any error, not only parse 
errors. This means it should take a _xmlError*. 
  
Last thing. If we are going to make all the error management a better wrapper 
around libxml2 error handling, I think a free function like 
"wrap_libxmlpp_error(_xmlError const *)" should be usefull. It would, 
considering the content of the error, choose the right exception, instanciate 
it and throw it. 
 
Thoughts ? 
 
 
Comment 12 Murray Cumming 2005-07-09 18:52:57 UTC
I'd rather not add something that should be in libxml without even trying to add
it to libxml. We'd just have to retrofit it again later.
Comment 13 Christophe de Vienne 2005-07-10 14:32:51 UTC
libxml has a pretty good error handling system. The idea is just to make a
better wrapping of it. So I don't really see what you mean by adding something
that should be in libxml.
Comment 14 Murray Cumming 2005-07-11 11:03:24 UTC
Oh, I see, this seems to be mostly about using xmlCtxtGetLastError(), which we
do not use yet. However, this needs to be called at the time the expection is
thrown,  not some time later when the exception is examined.

And this would be much easier to understand as a regular cvs patch. Please
resubmit it using
  cvs diff -up > something.patch
Comment 15 Christophe de Vienne 2005-07-11 11:46:20 UTC
Exactly. 
As said in comment #11, I'd except a function like 
wrap_libxmlpp_error(_xmlError const *), or a throw_ctxt_last_error() to throw 
the correct libxml++ exception. 
 
Comment 16 Murray Cumming 2008-08-02 16:16:11 UTC
Christophe, do you feel like updating your patch with this improvement?
Comment 17 Murray Cumming 2008-12-18 16:18:14 UTC
Christophe?
Comment 18 Christophe de Vienne 2008-12-18 16:31:51 UTC
(In reply to comment #17)
> Christophe?
> 

Hi Murray,

The patch is not mine in the first place, and to be franc I have not touched libxml++ nor a c++ line in the past 3 years.

Moreover I would hardly find some time to do this.

So I do not think I am the right person to do this (and although I am officially still maintainer of libxml++ you have been doing the job since I announce my will to resign from this role - a big thank for that by the way).

Best regards,

Christophe
Comment 19 Murray Cumming 2010-06-14 07:45:28 UTC
I'd still really like someone to provide a patch.
Comment 20 Kjell Ahlstedt 2012-01-04 16:21:18 UTC
Created attachment 204589 [details] [review]
patch: Parser: Throw more detailed error messages.

Here's a fairly big patch.

It turned out to be a bigger job than I thought at first (as often happens
with software jobs).

Formatting the last error after parsing is certainly an improvement, but it's
still not nearly as good as it can be.

libxml contains quite an advanced error handling. It can report all errors, if
there are several errors in a parsed file. Certainly you want to see all errors
at once, not just the last one. The problem with libxml's error reports is that
everything is written to stderr. That's not bad for a program that is started
from a command-line, but if it's started from a GUI, the user will probably not
see stderr.

Messages from validation are caught by callback functions
Parser::callback_validity_error() and callback_validity_warning(). I have added
similar callback functions to catch parser messages. All these callbacks get
more information than callback_validity_*() have previously saved. I improved
that a bit. Unfortunately it's not possible to take advantage of the functions
in libxml that format the messages. E.g. xmlReportError() is a local (static)
function in error.c. Formatting is not clearly separated from printing. (Should
be in separate functions.) And some of the contents of struct _xmlError are
poorly documented.

The result is that the callbacks in xmlpp::Parser save messages that are not
quite as good as what libxml can write to stderr, but it's still a great
improvement from just "Document not well formed."

--- An example of output from libxml++/examples/dom_parser:

Exception caught: 
Parser error:
File example_invalid2.xml, line 12, column 33 (fatal):
EntityRef: expecting ';'

Validity error:
File example_invalid2.xml, line 10, column 5 (error):
Element examplechild content does not follow the DTD, expecting (child_of_child)+, got (CDATA child_of_child )

--- If libxml writes to stderr, the result is:

example_invalid2.xml:10: element examplechild: validity error : Element examplechild content does not follow the DTD, expecting (child_of_child)+, got (CDATA child_of_child )
  </examplechild>
                 ^
example_invalid2.xml:12: parser error : EntityRef: expecting ';'
       Some content. &wwwmurrayc
                                ^
Exception caught: Document not well-formed.
File example_invalid2.xml, line 12, column 33 (fatal):
EntityRef: expecting ';'
Comment 21 Murray Cumming 2012-01-30 10:09:18 UTC
OK. Many thanks. Please push and let's see how it feels.
Comment 23 Kjell Ahlstedt 2012-02-15 09:54:33 UTC
Forgot to mark new public functions with @newin{2,36}. Fixed with this commit
http://git.gnome.org/browse/libxml++/commit/?id=c32f340e9c78c4cf795df12c67432c3c77cedc15