After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 317253 - Request for "Not converting names to lower-case" in HTML parsing
Request for "Not converting names to lower-case" in HTML parsing
Status: RESOLVED OBSOLETE
Product: libxml2
Classification: Platform
Component: htmlparser
2.6.22
Other All
: Normal enhancement
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2005-09-26 15:34 UTC by GPN
Modified: 2021-07-05 13:27 UTC
See Also:
GNOME target: ---
GNOME version: Unversioned Enhancement



Description GPN 2005-09-26 15:34:59 UTC
Email text which led to this request for enhancement:

Subject: Re: [xml] XML/HTML Mixed mode parsing
From: Daniel Veillard <veillard@redhat.com>
Date: Mon, 26 Sep 2005 11:15:48 -0400
To: GPN <gpn.libxml@gmail.com>
CC: xml@gnome.org

On Mon, Sep 26, 2005 at 08:30:13PM +0530, GPN wrote:

>> Daniel Veillard wrote:
>
>>>> >>and hence most browsers do not complain about a page even
>>>> >>if it has errors. (This can be turned on though, but the
>>>> >>page display does not stop if there was an error).
>>
>>> >
>>> >
>>> >  right that's how browser interpret HTML 4.x based on SGML with
>>> >an text/html Mime type. If there is an XML mime type they must 
>>> >use a real XML parser and fail on fatal errors.
>>> >
>
>> I am seeing if there is a viable solution for this. I need to parse
>> html pages, which will have xml content.
>> a) If I use an XML parser, then the parsing process will stop
>> even there was an error in html tags.
>> b) If I use a html parser, then the tags/atributes will be converted
>> to lower case (breaking XML rules).


  b) is a bit extreme, and should probably be fixed *but* any XML 
passed though an HTML parser loose all its garantee of portability
that drove to use XML in the first place, this is broken. island
of foreign vocabularies in XHTML makes sense, but not in SGML HTML.
  Add a request for enhancement about not converting the names
to lower case in bugzilla, that could be added as an HTML parsing option
and probably not too hard to add.

Daniel
Comment 1 jcarlosgarciasegovia 2013-02-18 11:15:22 UTC
I do not understand what are you trying to do.
Could you explain what is that HTML document that contains XML?
Even better, could attach it?
Comment 2 GNOME Infrastructure Team 2021-07-05 13:27:12 UTC
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new ticket at
  https://gitlab.gnome.org/GNOME/libxml2/-/issues/

Thank you for your understanding and your help.