GNOME Bugzilla – Bug 616241
HTMLparser do a bad fix for not closed td tr and font tags.
Last modified: 2021-07-05 13:26:47 UTC
copy of my email. As write here http://xmlsoft.org/html/libxml-HTMLparser.html It should be able to parse "real world" HTML, even if severely broken from a specification point of view. The example is based on http://www.voicenews.ca/ Using xmllint --html file with the input: <table> <tr><td><font size=1><a class=menu href="1125.pdf"> 1125.<tr><td><font size=1><a class=menu href="1124.pdf"> 1124.</table> the output is: <table><tr><td><font size="1"><a class="menu" href="1125.pdf"> 1125.<tr><td><font size="1"><a class="menu" href="1124.pdf"> 1124. </a></font></td></tr></a></font></td></tr></table> "<tr><td><font> <tr><td><font> <tr><td><font> </font></td></tr> </font></td></tr> </font></td></tr>" is a wrong fix to this HTML input. The correct is clearly , "<tr><td><font> </font></td></tr> <tr><td><font> </font></td></tr> <tr><td><font> </font></td></tr>" is how HTML is render by any browser .
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxml2/-/issues/ Thank you for your understanding and your help.