After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 796586 - QIF import incorrectly converts unicode characters from UTF8 encoded file
QIF import incorrectly converts unicode characters from UTF8 encoded file
Status: RESOLVED FIXED
Product: GnuCash
Classification: Other
Component: Import - QIF
3.1
Other Windows
: Normal normal
: future
Assigned To: gnucash-import-maint
gnucash-import-maint
Depends on:
Blocks:
 
 
Reported: 2018-06-14 14:32 UTC by mrzreat
Modified: 2018-06-30 00:11 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
QIF file containing czech characters in transaction descriptions (1.21 KB, application/octet-stream)
2018-06-14 14:32 UTC, mrzreat
Details

Description mrzreat 2018-06-14 14:32:08 UTC
Created attachment 372686 [details]
QIF file containing czech characters in transaction descriptions

After updating to gnucash 3.1 from 2.6.x the following problem appeared:
I have a QIF UTF8 encoded file with transaction descriptions containing Czech characters (e.g. 'ř', a sample file is in the attachment). Import proceeds without any error messages, however, all the unicode characters becomes corrupted; for example 'Připsaný bonusový úrok' becomes 'PÅ™ipsaný bonusový úrok' (which seems like ANSI conversion). 
Transaction imported before the update looks fine, so it does not look like database or displaying issue.

My OS is Windows 10, please let me know if any additional details are needed.
Comment 1 mrzreat 2018-06-14 14:57:54 UTC
Same results even if language of GnuCash interface is changed to Czech.
Comment 2 John Ralls 2018-06-14 17:13:13 UTC
Can you edit gnucash/import-export/qif-imp/qif-file.scm Line 132 to say
  (with-input-from-file #:guess-encoding #t path
and see if that fixes it?
Comment 3 John Ralls 2018-06-14 17:15:33 UTC
Em, sorry, that should be c:\Program Files (x86)\gnucash\share\gnucash\scm\qif-import\qif-file.scm. You'll need to run the editor with admin privs.
Comment 4 mrzreat 2018-06-15 09:00:31 UTC
The fix you suggested unfortunately generates an error message during import (something like "there is a bug during import" or similar).
However, after digging into some specs, I managed to make it work for me by changing line 519 instead to 
(line-loop))))) #:encoding "UTF-8")

Unfortunately, (line-loop))))) #:guess-encoding #t) has no effect for some reason
Comment 5 John Ralls 2018-06-15 13:58:01 UTC
Ah, right, after the thunk. Sorry.
#:encoding "UTF-8" was my fallback, but I'm concerned that other sources may use other encodings. Does your QIF have a BOM?
Comment 6 mrzreat 2018-06-15 14:56:07 UTC
Tried both with a BOM and without - if there is no explicitly specified encoding (#:encoding "UTF-8") - result is the same. 

I don't know how exactly smart the encoding detection algorithm is, but, may be it's caused by the fact that not every transaction has such non-English symbols. In fact, only some of them has it. However, it should not be the case in case of BOM presence...

I also experimented with converting to ANSI Windows-1250 and it results in messing up of some characters (and some are fine). Well, ANSI is a mess anyway and I can hardly imagine anyone sane using it for Czech alphabet these days.
Comment 7 John Ralls 2018-06-15 15:20:21 UTC
Guile's default encoding is CP1252 so it probably tried to use that to decode your CP1250 file resulting in misinterpreting some characters.

Unfortunately I can easily see an ignorant programmer who's only ever worked with Microsoft products using an ANSI code page instead of UTF-8. Open up a CMD shell and type chcp. It's going to return 1250 unless you've changed the default setting.
Comment 8 John Ralls 2018-06-16 17:46:24 UTC
Further study of the thunk finds that the not-UTF-8 is already covered in the line handling code, so I've pushed the #:encoding "UTF-8" fix. It will be in tomorrow's nightly and GnuCash 3.2.

Thanks!
Comment 9 John Ralls 2018-06-30 00:11:51 UTC
GnuCash bug tracking has moved to a new Bugzilla host. This bug has been copied to https://bugs.gnucash.org/show_bug.cgi?id=796586. Please update any external references or bookmarks.