GNOME Bugzilla – Bug 654567
xmlTextReader bails too quickly on error
Last modified: 2012-07-18 10:37:12 UTC
Created attachment 191898 [details] [review] Possible fix I use xmlTextReader to parse failed that might be incomplete. These files are the beginning of a well-formed file, but the end is missing so the file as a whole is not well-formed. The problem is that xmlTextReader starts returning errors when it encounters the early EOF, even though I haven't finished reading all of the valid data in the file. It would be helpful if xmlTextReader kept working until the very end. The fix might look like the attached patch. It passes all regression tests.
I have been thinking about that one for a week or so. On one hand the XML specification sides on the "stop reporting data as soon as a fatal error is detected": http://www.w3.org/TR/REC-xml/#dt-fatal *but* the missing end is not strictly speaking detected by the parser itself but by the surrounding I/O feeding it in the reader mode. So I think continuing to parse until the parser itself detects the fatal error is Okay. The patch looks reasonable, I hope there won't be nasty side effects though but I can't really think of any at this point. So I commit this, I hope testing won't raise trouble :) http://git.gnome.org/browse/libxml2/commit/?id=9d9685ad88c17d35b6688695af3ceba7c7309b13 thanks ! Daniel
The problem is that the patch allowed some well formedness errors to not be reported, i.e. a falso positive, I had to fix this and pushed http://git.gnome.org/browse/libxml2/commit/?id=c508fa3f0b40ba232e00ed8d514e0ba37ed602ab as a follow up. We really can't exit xmlTextReaderPushData knowing that the XML is not well formed without raising an error there. Hopefully you own use case is not affected, but accuracy there is primordial. Daniel