GNOME Bugzilla – Bug 700840
incorrect parsing of html document, that begins with 2 closing tags
Last modified: 2021-07-05 13:25:29 UTC
certainly document is also incorrect. I think, the problem is in function: htmlParseTryOrFinish file: HTMLparser.c line num: 5859 After parser finds first closing tag, his instance == XML_PARSER_END_TAG. Then he process this state - he cant find the name of opened tag and make his instanse = XML_PARSER_EPILOG. /*...*/ htmlParseEndTag(ctxt); if (ctxt->nameNr == 0) { ctxt->instate = XML_PARSER_EPILOG; } else { ctxt->instate = XML_PARSER_CONTENT; } /*...*/ after this parser ignores all other html buffer. In my program I used such code: /*...*/ htmlParseEndTag(ctxt); if (ctxt->nameNr == 0) { ctxt->instate = XML_PARSER_MISC; } else { ctxt->instate = XML_PARSER_CONTENT; } /*...*/ parser works great on bugged files and seemed to work good on other files too.. Thank you for your work and sorry for my english :)
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxml2/-/issues/ Thank you for your understanding and your help.