After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 362552 - html entities in attribute values get corrupted
html entities in attribute values get corrupted
Status: RESOLVED FIXED
Product: libxml2
Classification: Platform
Component: general
2.6.x
Other All
: Normal enhancement
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2006-10-16 09:45 UTC by Olaf Walkowiak
Modified: 2006-10-17 15:56 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
html file that triggers the error (267 bytes, text/html)
2006-10-16 12:25 UTC, Olaf Walkowiak
Details
simple testcase (95 bytes, text/html)
2006-10-17 06:30 UTC, Olaf Walkowiak
Details
Output of /usr/bin/xmllint --html --output result.html ./simple.html (194 bytes, text/html)
2006-10-17 06:31 UTC, Olaf Walkowiak
Details

Description Olaf Walkowiak 2006-10-16 09:45:19 UTC
Please describe the problem:
some entities in attibute values get corrupted for entities like  š  œ  Ÿ

In normal textnodes everything is OK.

Steps to reproduce:
1. Let xmllint parse the document below in html mode


Actual results:
Entities in "value" attribute get corrupted.

Expected results:


Does this happen every time?
Yes

Other information:
<html>
<body>

scaron: &scaron;, nbsp: &nbsp;, auml: &auml;, oelig: &oelig;, Yuml: &Yuml;, yuml: &yuml;, rarr: &rarr;

<input type="text" name="hae" value="scaron:  &scaron; .... nbsp: &nbsp; auml: &auml; oelig: &oelig;  Yuml: &Yuml;  yuml: &yuml;"/>
</body>
</html>
Comment 1 Olaf Walkowiak 2006-10-16 12:25:11 UTC
Created attachment 74801 [details]
html file that triggers the error
Comment 2 Daniel Veillard 2006-10-16 12:53:18 UTC
Not corrupted, output as their UTF-8 code point, the value and
content is exact. It just doesn't have the form you expect, and in
general that can't be garanteed. Not a bug, at best a request for enhancement

Daniel
Comment 3 Olaf Walkowiak 2006-10-16 13:01:57 UTC
I get:

<input type="text" name="hae" value="scaron:  a .... nbsp: &nbsp; auml: &auml; oelig: S  Yuml: x  yuml: &yuml;">

and 
 xmllint --debug --html ./x.html
HTML DOCUMENT
URL=./x.html
standalone=true
  DTD(html), PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN, SYSTEM http://www.w3.org/TR/REC-html40/loose.dtd
  ELEMENT html
    ELEMENT body
      TEXT
        content=  scaron: #C5#A1, nbsp: #C2#A0, auml: #C3#A4, oelig:...
      ELEMENT input
        ATTRIBUTE type
          TEXT
            content=text
        ATTRIBUTE name
          TEXT
            content=hae
        ATTRIBUTE value
          TEXT
            content=scaron:  a .... nbsp: #C2#A0 auml: #C3#A4 oelig:...


Some entities (with code > 255) are broken.






Comment 4 Daniel Veillard 2006-10-16 15:57:48 UTC
Can you be more specific : what entity ? How broken ? 
Remember that 
  1/ cut and past of a terminal output means *nothing* it depends 
     what encoding the terminal expects its output in and how he manages
     something different
  2/ --debug dumps the internal form, i.e. UTF-8 so one characters are 
     encoded with 2 bytes sometimes 3 or 4 depending on the code point.

Daniel 
Comment 5 Olaf Walkowiak 2006-10-17 06:30:12 UTC
Created attachment 74852 [details]
simple testcase

When parsing this file the value of the "input" element gets corrupted.
&scaron; => a 

  <input type="text" name="test" value="&scaron;">
 becomes:
 <input type="text" name="test" value="a">

This seems to happen with all entities with a code above 255.
Comment 6 Olaf Walkowiak 2006-10-17 06:31:10 UTC
Created attachment 74853 [details]
Output of  /usr/bin/xmllint --html --output result.html ./simple.html
Comment 7 Daniel Veillard 2006-10-17 15:56:41 UTC
Okay with the simple test case it was relatively easy to find
and fix the problem:

paphio:~/XML -> xmllint --html  ../74852.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<form>
  <input type="text" name="test" value="&scaron;">
</form>
</body></html>

It was just an 'historical' cast to xmlChar reducing the attribute :-\

  thanks for the report, this should be fixed in CVS now ! I also 
added the test to the regression suite,

Daniel