GNOME Bugzilla – Bug 396665
QIF Import: Detect and convert non-ascii character encoding of QIF files
Last modified: 2018-06-29 21:22:39 UTC
Please describe the problem: I used Quicken 98 to export my accounts to QIF files, one account per file. The underlying is was Windows 98, German edition. My accounts, categories and transactions include German national language characters (so called 'Umlauts'). The file was then transferred to Linux system (I was running Win98 under VMware under Linux). I could verify that the QIF files included correct iso-8859-1 encoded national language characters. On import, these characters were apparently removed and replaced by nothing, leading to strange results. Steps to reproduce: 1. Use Quicken 98 with national language characters 2. Export to QIF 3. Import QIF to gnucash Actual results: National language characters are removed and not converted. Expected results: National language characters should be converted to what gnucash needs. Does this happen every time? Yes. Other information: I could workaround this problems by applying the 'recode' tool to convert any chars in the QIF files from iso8859-15 to utf8 encoding.
I guess this is similar to bug#394783 (there's an example file) and/or bug#389740 uld(also with example file). The problem with file encoding of QIF files is that the file doesn't explicitly say which encoding the data is in. That's a big problem (well, just one more) in the QIF data file format. GnuCash could only make some guesses, and any guess would have to be confirmed by the user as well (very similar to the 1.8.x -> 1.9.x data file upgrade wizard of gnucash). That's a lot of program code that would need to be written. I'm afraid this won't happen in the near future :-( so you will probably have to live with the manual "recode" workaround for some time to come.
I understand the problem of autodetection, but I wonder what has changed because the gnucash Version 1.8.9 that originally came with my OpenSuSE 10.0 distribution can import the original QIF files out of the box without any conversion, leaving the German national characters intact. So something must have been changed in GnuCash 2.x SVN.
gnucash 1.8.x was based on gtk1, which didn't deal with any explicit encoding but instead implicitly used the locale's encoding. So you were lucky enough that the file happened to be in your locale's encoding (latin1) and in this case everything worked. That was the exception - the rule was that even more things would be broken. Now with gtk2 everything internally is utf-8 and we (usually) make sure that all texts from externally are converted correctly -- except if we don't know the external encoding to start with.
*** Bug 344841 has been marked as a duplicate of this bug. ***
*** Bug 436423 has been marked as a duplicate of this bug. ***
*** Bug 495216 has been marked as a duplicate of this bug. ***
I have committed a fix for this as r17063. Requesting backport for 2.2. When any QIF file content is found that is not encoded in UTF-8, the importer now first attempts to convert it to UTF-8 according to the locale. If this fails, the offending bytes will be removed from the string as usual. In addition, the user will now be informed of either of these actions via a pop-up warning in the GUI. (And if converting by locale was used, a "before" and "after" is shown.) Each occurrence will also be logged as a warning.
Created attachment 108485 [details] Sample QIF encoded in ISO 8859-1 This file contains an ISO 8859-1 symbol for Yen which is not valid in UTF-8. In my locale, the fix causes the QIF importer to successfully translate this into a UTF-8 encoded Yen symbol.
Created attachment 108487 [details] Sample QIF file contain an invalid character This QIF file contains a character (0x1b) which is offensive to GnuCash and is not helped by converting using my locale. This demonstrates the ability of the fix to continue to remove offending content byte-by-byte when necessary.
Re comment 7, a correction: The user won't see the "before" and "after" in the GUI. But this information can be seen in the log (/tmp/gnucash.trace.* on my system).
Applied to branches/2.2 as r17104 for GnuCash 2.2.5. Thanks a lot!
GnuCash bug tracking has moved to a new Bugzilla host. This bug has been copied to https://bugs.gnucash.org/show_bug.cgi?id=396665. Please update any external references or bookmarks.