GNOME Bugzilla – Bug 143739
incorrect line numbers in well-formedness error message
Last modified: 2009-08-15 18:40:50 UTC
I use libxml2 as part of an error checking program for the Open Directory Project data dumps. These are large files; 1GB+ containing millions of lines of XML. Libxml2 v2.5 handled well-formedness errors by printing out an error message that indicated the line on which the error occured, like so: Well-formedness Error [line 29664810]: Char 0xD96B out of allowed range After upgrading to libxml2 2.6.9, the line number always shows the value 65535 as the location of errors, like so: Well-formedness Error [line 65535]: Char 0xD96B out of allowed range To reproduce, take a large XML file known to contain errors, such as the Open Directory Project data dumps ( http://rdf.dmoz.org/rdf/ ) and process them like so: xmlTextReaderPtr reader; int ret = 1; reader = xmlNewTextReaderFilename(filename); if (reader != NULL) { xmlTextReaderSetErrorHandler(reader,wfErrorFunc,NULL); while (ret == 1) { ret = xmlTextReaderRead(reader); } xmlFreeTextReader(reader); }
The line number is now a field in the node structure, it used to be stored in a shared area. To preserve space allocated for node numbers lines got restricted to a short integer: include/libxml/tree.h line 456 unsigned short line; /* line number */ It wasn't clear that reporting the line number for very very large file would be useful anyway since editing such files manually is hazardous and they are usually generated and it's simpler to fix the problem in the generator than try to post process the instances. I'm afraid that fixing this won't be possible. This change was made in 2.6.0 and discussed on the list at the time I guess, and trying to increase the field now would lead to ABI breakage, Daniel
Discussed on the mailing list (http://mail.gnome.org/archives/xml/2004- July/msg00009.html et seq), found to be due to the workings of xmlTextReaderLocatorLineNumber() and fixed with a small user routine to directly fetch the line number instead.
This should be fixed in release libxml2-2.6.12. thanks, Daniel
I'm using 2.6.27 and still seeing this. (Library packaged by Ubuntu, 2.6.27.dfsg-1ubuntu3). I think this should be WONTFIX rather than FIXED.
I am still seeing this bug in 2.6.30.dfsg-2 on Debian unstable. To me, as to others reporting here, the correct line number of the problem is an important feature.
Simple patch to resolve problem can be found in bug 325533.
The reasons why patch suggested in #6 wont be applied has beem exposed in that given bug. For the reader API, since the reader keeps a parser context which has a input line count in 32 bits, it should be possible to get the full line number, as William Brack apparently fixed c.f. comment #2. this was discussed in the mailing-list too not that long ago. Daniel