GNOME Bugzilla – Bug 605740
1 byte HTML files are not parsed
Last modified: 2012-05-11 20:09:58 UTC
#include <libxml/parser.h> #include <libxml/tree.h> #include <libxml/HTMLparser.h> #include <stdio.h> int main() { char buffer[] = "123"; int len = 1; htmlParserCtxtPtr ctx = htmlCreatePushParserCtxt(NULL, NULL, buffer, len, NULL, XML_CHAR_ENCODING_NONE); htmlParseChunk(ctx, 0,0, 1); xmlDocDump(stdout, ctx->myDoc); htmlFreeParserCtxt(ctx); } Returns <?xml version="1.0" standalone="yes"?> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> No '1' here. If 'len' in the C code is set to 2, it works as expected: <?xml version="1.0" standalone="yes"?> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body><p>12</p></body></html>
Created attachment 199756 [details] [review] Fix for this case Add statements for case when exist only one char in buffer and content ended.
That's really a corner case, but okay :-) Thanks for the patch Denis ! http://git.gnome.org/browse/libxml2/commit/?id=fdf990c2ef2ccf1b4fadf24ded562857d187be78 Daniel
Thanks :-)