After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 143739 - incorrect line numbers in well-formedness error message
incorrect line numbers in well-formedness error message
Status: VERIFIED FIXED
Product: libxml2
Classification: Platform
Component: general
2.6.9
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2004-06-04 20:19 UTC by Steve Rainwater
Modified: 2009-08-15 18:40 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Steve Rainwater 2004-06-04 20:19:03 UTC
I use libxml2 as part of an error checking program for the Open Directory
Project data dumps. These are large files; 1GB+ containing millions of lines of
XML. Libxml2 v2.5 handled well-formedness errors by printing out an error
message that indicated the line on which the error occured, like so:

 Well-formedness Error [line 29664810]: Char 0xD96B out of allowed range

After upgrading to libxml2 2.6.9, the line number always shows the value 65535
as the location of errors, like so:

 Well-formedness Error [line 65535]: Char 0xD96B out of allowed range

To reproduce, take a large XML file known to contain errors, such as the Open
Directory Project data dumps ( http://rdf.dmoz.org/rdf/ ) and process them like so:

xmlTextReaderPtr reader;
int ret = 1;
 
reader = xmlNewTextReaderFilename(filename);
if (reader != NULL) {
  xmlTextReaderSetErrorHandler(reader,wfErrorFunc,NULL);
  while (ret == 1) {
    ret = xmlTextReaderRead(reader);
  }
  xmlFreeTextReader(reader);
}
Comment 1 Daniel Veillard 2004-06-04 22:47:31 UTC
The line number is now a field in the node structure, it used to be stored
in a shared area. To preserve space allocated for node numbers lines got
restricted to a short integer:
  include/libxml/tree.h line 456
      unsigned short   line;      /* line number */

 It wasn't clear that reporting the line number for very very large file
would be useful anyway since editing such files manually is hazardous and
they are usually generated and it's simpler to fix the problem in the generator
than try to post process the instances.

  I'm afraid that fixing this won't be possible. This change was made in 2.6.0
and discussed on the list at the time I guess, and trying to increase the field
now would lead to ABI breakage,

Daniel
 
Comment 2 William M. Brack 2004-07-26 16:13:20 UTC
Discussed on the mailing list (http://mail.gnome.org/archives/xml/2004-
July/msg00009.html et seq), found to be due to the workings of 
xmlTextReaderLocatorLineNumber() and fixed with a small user routine to 
directly fetch the line number instead.
Comment 3 Daniel Veillard 2004-08-22 20:34:38 UTC
 This should be fixed in release libxml2-2.6.12.
                                                                                
   thanks,
                                                                                
Daniel
Comment 4 Luke Plant 2007-07-12 12:38:33 UTC
I'm using 2.6.27 and still seeing this.  (Library packaged by Ubuntu, 2.6.27.dfsg-1ubuntu3).  I think this should be WONTFIX rather than FIXED.
Comment 5 Sven Neuhaus 2007-10-10 14:36:42 UTC
I am still seeing this bug in 2.6.30.dfsg-2 on Debian unstable.
To me, as to others reporting here, the correct line number of the problem is an important feature.
Comment 6 Starlight 2007-11-25 02:37:51 UTC
Simple patch to resolve problem can be found in bug 325533.
Comment 7 Daniel Veillard 2007-11-26 16:25:54 UTC
The reasons why patch suggested in #6 wont be applied has beem
exposed in that given bug. 
For the reader API, since the reader keeps a parser context
which has a input line count in 32 bits, it should be possible
to get the full line number, as William Brack apparently fixed 
c.f. comment #2. this was discussed in the mailing-list too
not that long ago. 


Daniel