After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 605740 - 1 byte HTML files are not parsed
1 byte HTML files are not parsed
Status: RESOLVED FIXED
Product: libxml2
Classification: Platform
Component: general
git master
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2009-12-30 14:01 UTC by Arnold Hendriks
Modified: 2012-05-11 20:09 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Fix for this case (1.46 KB, patch)
2011-10-23 11:09 UTC, Denis Pauk
none Details | Review

Description Arnold Hendriks 2009-12-30 14:01:50 UTC
#include <libxml/parser.h>
#include <libxml/tree.h>
#include <libxml/HTMLparser.h>
#include <stdio.h>

int main()
{
        char buffer[] = "123";
        int len = 1;
        htmlParserCtxtPtr ctx = htmlCreatePushParserCtxt(NULL, NULL,
                                                         buffer, len, NULL,
                                                         XML_CHAR_ENCODING_NONE);
        htmlParseChunk(ctx, 0,0, 1);
        xmlDocDump(stdout, ctx->myDoc);
        htmlFreeParserCtxt(ctx);
}

Returns

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">

No '1' here.

If 'len' in the C code is set to 2, it works as expected:

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>12</p></body></html>
Comment 1 Denis Pauk 2011-10-23 11:09:06 UTC
Created attachment 199756 [details] [review]
Fix for this case

Add statements for case when exist only one char in buffer and content ended.
Comment 2 Daniel Veillard 2012-05-10 12:43:40 UTC
That's really a corner case, but okay :-)
Thanks for the patch Denis !

http://git.gnome.org/browse/libxml2/commit/?id=fdf990c2ef2ccf1b4fadf24ded562857d187be78

Daniel
Comment 3 Denis Pauk 2012-05-11 20:09:58 UTC
Thanks :-)