GNOME Bugzilla – Bug 110314
Reading config files with non-UTF8 8-bit characters fails
Last modified: 2004-12-22 21:47:04 UTC
When I started Pan 0.13.3 that comes with Redhat 9 it gave errors like this: /home/skarkkai/.pan/data/config.xml:18: error: Input is not proper UTF-8, indicate encoding ! <value key="Full_Name" type="s">Samuli Kärkkäinen</value> ** (pan:13446): CRITICAL **: file pan-config.c: line 580 (pan_config_load): assertion Doc!=NULL' failed /home/skarkkai/.pan/data/profiles.xml:3: error: Input is not proper UTF-8, indicate encoding ! <profiles default_news="Samuli Kärkkäinen <skarkkai@woods.iki.fi>" defaul I removed the ä's and updated to Pan 0.13.4 (http://pan.rebelbase.com/download/releases/0.13.4/REDHAT_80/pan-0.13.4-nospell1.i686.rpm). Now Pan starts okay, but when trying to load new headers of at least one newsgroups, it prints errors "** (pan:13754): WARNING **: Invalid UTF8 string passed to pango_layout_set_text()", and won't load the headers. I'm guessing it still has non-UTF8 strings somewhere in there. The correct solution would obviously be to detect situations where old Pan version has left incorrectly encoded strings into config and other state files, and fix them on the fly.
I don't think this is possible: I don't think libxml2 provides a means to parse a badly encoded xml file. Will look into the possibilities.
Chris: ping
Well, rather than read the xml file with xmlParseFile (i.e. straight from the file), we could read it into memory and parse it with xmlParseMemory. If the file does not contain valid UTF-8, we could convert to UTF-8. The config file is relatively small, so this wouldn't be to heavy on resources The only issue is: which charset to use as the source? From the example, we could deduce it from the user's locale (get_charset_from_locale). However, this isn't exactly 100% reliable. So, not convinced that we should do anything here.
Keep in mind this is a fatal problem for any user who happens to have 8 bit characters in their name or other data and upgrades pan. The best alternative for figuring out the correct locale would be to ask the user using a GUI dialog. If that's too much work, just look at the current locale and default to 8859-1. That may get some characters wrong, but they are just human readable data. Getting the name slightly wrong will be a neglible problem compared to pan just aborting on a cryptic (to most users) error message.
Which version were you using before upgrading to RH9 & Pan 0.13.3?
Unfortunately I don't remember.
Ah, one way non-UTF8 characters could make it into profiles.xml is if a user moves from a gnome1.4 release (e.g. 0.11.4) to a gtk2 release, and his posting profile contains 8bit characters: pan_config_load_ini (in base/pan-config.c) migrates ~/.gnome/Pan to ~/.pan/config.xml and ~/.pan/profile.xml. However, it currently just copies the fields without any conversion. The values need to be converted from local charset to UTF-8.
Fixed in CVS: http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan/pan/base&command=DIFF_FRAMESET&file=pan-config.c&rev1=1.31&rev2=1.32&root=/cvs/gnome