After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 396665 - QIF Import: Detect and convert non-ascii character encoding of QIF files
QIF Import: Detect and convert non-ascii character encoding of QIF files
Status: VERIFIED FIXED
Product: GnuCash
Classification: Other
Component: Import - QIF
unspecified
Other All
: Normal enhancement
: ---
Assigned To: Christian Stimming
Christian Stimming
: 344841 436423 495216 (view as bug list)
Depends on:
Blocks: backport
 
 
Reported: 2007-01-14 21:38 UTC by Guido Ostkamp
Modified: 2018-06-29 21:22 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Sample QIF encoded in ISO 8859-1 (142 bytes, text/plain)
2008-04-02 18:22 UTC, Charles Day
Details
Sample QIF file contain an invalid character (119 bytes, text/plain)
2008-04-02 18:31 UTC, Charles Day
Details

Description Guido Ostkamp 2007-01-14 21:38:30 UTC
Please describe the problem:
I used Quicken 98 to export my accounts to QIF files, one account per file.
The underlying is was Windows 98, German edition. My accounts, categories and transactions include German national language characters (so called 'Umlauts').
The file was then transferred to Linux system (I was running Win98 under VMware under Linux).

I could verify that the QIF files included correct iso-8859-1 encoded national language characters. On import, these characters were apparently removed and replaced by nothing, leading to strange results.

Steps to reproduce:
1. Use Quicken 98 with national language characters
2. Export to QIF
3. Import QIF to gnucash


Actual results:
National language characters are removed and not converted.

Expected results:
National language characters should be converted to what gnucash needs.

Does this happen every time?
Yes.

Other information:
I could workaround this problems by applying the 'recode' tool to convert any chars in the QIF files from iso8859-15 to utf8 encoding.
Comment 1 Christian Stimming 2007-01-15 12:55:25 UTC
I guess this is similar to bug#394783 (there's an example file) and/or bug#389740 uld(also with example file).

The problem with file encoding of QIF files is that the file doesn't explicitly say which encoding the data is in. That's a big problem (well, just one more) in the QIF data file format. GnuCash could only make some guesses, and any guess would have to be confirmed by the user as well (very similar to the 1.8.x -> 1.9.x data file upgrade wizard of gnucash). That's a lot of program code that would need to be written. I'm afraid this won't happen in the near future :-( so you will probably have to live with the manual "recode" workaround for some time to come.
Comment 2 Guido Ostkamp 2007-01-15 18:00:40 UTC
I understand the problem of autodetection, but I wonder what has changed because the gnucash Version 1.8.9 that originally came with my OpenSuSE 10.0 distribution can import the original QIF files out of the box without any conversion, leaving the German national characters intact.

So something must have been changed in GnuCash 2.x SVN.
Comment 3 Christian Stimming 2007-01-15 20:50:25 UTC
gnucash 1.8.x was based on gtk1, which didn't deal with any explicit encoding but instead implicitly used the locale's encoding. So you were lucky enough that the file happened to be in your locale's encoding (latin1) and in this case everything worked. That was the exception - the rule was that even more things would be broken. Now with gtk2 everything internally is utf-8 and we (usually) make sure that all texts from externally are converted correctly -- except if we don't know the external encoding to start with.
Comment 4 Christian Stimming 2007-04-14 15:14:09 UTC
*** Bug 344841 has been marked as a duplicate of this bug. ***
Comment 5 Christian Stimming 2007-05-06 20:10:56 UTC
*** Bug 436423 has been marked as a duplicate of this bug. ***
Comment 6 Christian Stimming 2007-11-09 15:07:12 UTC
*** Bug 495216 has been marked as a duplicate of this bug. ***
Comment 7 Charles Day 2008-04-02 18:18:49 UTC
I have committed a fix for this as r17063. Requesting backport for 2.2.

When any QIF file content is found that is not encoded in UTF-8,
the importer now first attempts to convert it to UTF-8 according to the locale.
If this fails, the offending bytes will be removed from the string as usual.
In addition, the user will now be informed of either of these actions via a
pop-up warning in the GUI. (And if converting by locale was used, a "before" and "after" is shown.) Each occurrence will also be logged as a warning.
Comment 8 Charles Day 2008-04-02 18:22:42 UTC
Created attachment 108485 [details]
Sample QIF encoded in ISO 8859-1

This file contains an ISO 8859-1 symbol for Yen which is not valid in UTF-8. In my locale, the fix causes the QIF importer to successfully translate this into a UTF-8 encoded Yen symbol.
Comment 9 Charles Day 2008-04-02 18:31:49 UTC
Created attachment 108487 [details]
Sample QIF file contain an invalid character

This QIF file contains a character (0x1b) which is offensive to GnuCash and is not helped by converting using my locale. This demonstrates the ability of the fix to continue to remove offending content byte-by-byte when necessary.
Comment 10 Charles Day 2008-04-02 21:41:36 UTC
Re comment 7, a correction: The user won't see the "before" and "after" in the GUI. But this information can be seen in the log (/tmp/gnucash.trace.* on my system).
Comment 11 Andreas Köhler 2008-04-20 19:57:17 UTC
Applied to branches/2.2 as r17104 for GnuCash 2.2.5.
Thanks a lot!
Comment 12 John Ralls 2018-06-29 21:22:39 UTC
GnuCash bug tracking has moved to a new Bugzilla host. This bug has been copied to https://bugs.gnucash.org/show_bug.cgi?id=396665. Please update any external references or bookmarks.