After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 110314 - Reading config files with non-UTF8 8-bit characters fails
Reading config files with non-UTF8 8-bit characters fails
Status: RESOLVED FIXED
Product: Pan
Classification: Other
Component: general
0.13.4
Other other
: Normal normal
: 0.14.1
Assigned To: Christophe Lambin
Pan QA Team
Depends on:
Blocks:
 
 
Reported: 2003-04-08 20:31 UTC by Samuli Kärkkäinen
Modified: 2004-12-22 21:47 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Samuli Kärkkäinen 2003-04-08 20:31:22 UTC
When I started Pan 0.13.3 that comes with Redhat 9 it gave errors like this:

/home/skarkkai/.pan/data/config.xml:18: error: Input is not proper UTF-8,
indicate encoding !
<value key="Full_Name" type="s">Samuli Kärkkäinen</value>

** (pan:13446): CRITICAL **: file pan-config.c: line 580 (pan_config_load):
assertion Doc!=NULL' failed
/home/skarkkai/.pan/data/profiles.xml:3: error: Input is not proper UTF-8,
indicate encoding !
<profiles default_news="Samuli Kärkkäinen &lt;skarkkai@woods.iki.fi&gt;" defaul

I removed the ä's and updated to Pan 0.13.4
(http://pan.rebelbase.com/download/releases/0.13.4/REDHAT_80/pan-0.13.4-nospell1.i686.rpm).
Now Pan starts okay, but when trying to load new headers of at least one
newsgroups, it prints errors "** (pan:13754): WARNING **: Invalid UTF8
string passed to pango_layout_set_text()", and won't load the headers. I'm
guessing it still has non-UTF8 strings somewhere in there.

The correct solution would obviously be to detect situations where old Pan
version has left incorrectly encoded strings into config and other state
files, and fix them on the fly.
Comment 1 Christophe Lambin 2003-06-15 22:42:24 UTC
I don't think this is possible:  I don't think libxml2 provides a
means to parse a badly encoded xml file.  Will look into the
possibilities.
Comment 2 Charles Kerr 2003-07-26 16:53:00 UTC
Chris: ping
Comment 3 Christophe Lambin 2003-07-26 18:09:06 UTC
Well, rather than read the xml file with xmlParseFile (i.e. straight
from the file), we could read it into memory and parse it with
xmlParseMemory.  If the file does not contain valid UTF-8, we could
convert to UTF-8.  The config file is relatively small, so this
wouldn't be to heavy on resources

The only issue is: which charset to use as the source?  From the
example, we could deduce it from the user's locale
(get_charset_from_locale).  However, this isn't exactly 100% reliable.

So, not convinced that we should do anything here.
Comment 4 Samuli Kärkkäinen 2003-07-26 18:30:29 UTC
Keep in mind this is a fatal problem for any user who happens to have
8 bit characters in their name or other data and upgrades pan. The
best alternative for figuring out the correct locale would be to ask
the user  using a GUI dialog. If that's too much work, just look at
the current locale and default to 8859-1. That may get some characters
wrong, but they are just human readable data. Getting the name
slightly wrong will be a neglible problem compared to pan just
aborting on a cryptic (to most users) error message.
Comment 5 Christophe Lambin 2003-07-26 22:58:37 UTC
Which version were you using before upgrading to RH9 & Pan 0.13.3?
Comment 6 Samuli Kärkkäinen 2003-07-26 23:02:12 UTC
Unfortunately I don't remember.
Comment 7 Christophe Lambin 2003-07-28 21:02:25 UTC
Ah, one way non-UTF8 characters could make it into profiles.xml is 
if a user moves from a gnome1.4 release (e.g. 0.11.4) to a gtk2 
release, and his posting profile contains 8bit characters: 
pan_config_load_ini (in base/pan-config.c) migrates ~/.gnome/Pan to 
~/.pan/config.xml and ~/.pan/profile.xml.

However, it currently just copies the fields without any conversion. 
The values need to be converted from local charset to UTF-8.