GNOME Bugzilla – Bug 377544
UTF8ToHtml returns -2 error on valid UTF-8
Last modified: 2006-11-23 16:18:13 UTC
Please describe the problem: UTF8ToHtml() fails with error -2 if the input is valid UTF-8 and contains at least one character for which a named character entity doesn't not exist in the libxml2 entity table. I will provide a patch to fix the problem by substituting a numerical character value when a named entity is not available. Steps to reproduce: 1. Call UTF8ToHtml with a UTF8 string containing a Han ideograph Chinese character 2. 3. Actual results: UTF8ToHtml returns a -2 error Expected results: UTF8ToHTML should return an encoded ASCII encoded output equivalent to the UTF-8 input. Does this happen every time? yes Other information:
Created attachment 76943 [details] [review] Patch to allow UTF8ToHtml to return numerical character references As per my post to the libxml2 mailing list, this patch fixed the problem for me. I have attempted to make the code and variable names consistent with other HTMLparser.c functions. This patch was made against HTMLparser.c as it existed in CVS on 20 Nov 2006.
Makes sense, applied and commited, thanks a lot ! Daniel