After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 346535 - QIF import with control character in account description creates bad datafile that cannot be reopened
QIF import with control character in account description creates bad datafile...
Status: VERIFIED FIXED
Product: GnuCash
Classification: Other
Component: Import - QIF
1.9.x
Other Linux
: Normal major
: ---
Assigned To: Derek Atkins
Derek Atkins
Depends on:
Blocks:
 
 
Reported: 2006-07-04 12:04 UTC by Jeff
Modified: 2018-06-29 21:08 UTC
See Also:
GNOME target: ---
GNOME version: 2.11/2.12


Attachments
QIF to reproduce bug (118 bytes, application/octet-stream)
2006-07-04 15:46 UTC, Jeff
  Details
Proposed Patch (6.98 KB, patch)
2006-07-04 21:31 UTC, Derek Atkins
committed Details | Review

Description Jeff 2006-07-04 12:04:30 UTC
I exported a large 1.38 MB QIF file from Quicken for Mac 2004 and it imported successfully into GnuCash, however after saving and re-opening GnuCash all account balances showed zero and some accounts were missing.  I found after much digging that one account in the QIF file had a mysterious x05 control character as the first character in the "D" (description) field.  The GnuCash QIF importer reported no problem here but created the data file with this x05 character intact in the account description field, as in:

<act:description><05>desc</act:description>

Removing this character allowed the file to open normally.  Even though the  problem was initiated by the Quicken export program placing an invalid character in the QIF file, without an error message or an automatic string cleanup in GnuCash, it gives the impression of just losing all the data and leaves the user guessing what happened.
Comment 1 Christian Stimming 2006-07-04 15:25:30 UTC
This has been reported before as bug#344170 , which was reportedly fixed in 1.9.8 but existed in any earlier version. Did you see this problem really in 1.9.8? Then we're (still) in trouble. In that case, could you attach a (very small) example QIF file that will show this problem? Thanks.

Also related: bug#344841
Comment 2 Jeff 2006-07-04 15:46:23 UTC
Created attachment 68356 [details]
QIF to reproduce bug

Note it contains a non-printing character (0x05) in the account description field that is key to reproducing the bug.
Comment 3 Derek Atkins 2006-07-04 16:07:09 UTC
What's the actual revision number you're using?  Run:   gnucash --version

Looking at the QIF, it does have a non-printing control character, as you said.  But it IS a valid UTF-8 Character, which is why it's let through.  The question is why the XML parser barfs on it, and if there's something we can do to strip out those types of characters?
Comment 4 Jeff 2006-07-04 16:29:19 UTC
I built it from the provided Gentoo ebuild for 1.9.x:

$ gnucash --version
GnuCash 1.9.8
Built 2006-07-01 from r14384

I'm familiar with XML but not an expert, however I don't believe 0x05 is an allowed character in XML.  See http://www.w3.org/TR/REC-xml/#NT-Char which defines the allowed characters as:

2]   	Char	   ::=   	#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

which only allows tab, carriage return, line feed in the range below 0x20.
Comment 5 Josh Sled 2006-07-04 17:18:13 UTC
(targeting)
Comment 6 Derek Atkins 2006-07-04 21:31:21 UTC
Created attachment 68365 [details] [review]
Proposed Patch

I'm wondering if this patch will fix the problem?  This should change the way we validate UTF-8 such that it should ignore the invalid control characters.  The only characters considered valid by g_utf8_validate() that are not considered valid by the (new) gnc_utf8_validate() are characters < 0x20 except 0x09, 0x0A, and 0x0D...  So this should be "good enough".  I haven't tested it, yet (except compiling), but I have to go.
Comment 7 Derek Atkins 2006-07-05 16:05:53 UTC
I tested this patch and it seems to solve the problem.  Commited as r14466.
Comment 8 John Ralls 2018-06-29 21:08:59 UTC
GnuCash bug tracking has moved to a new Bugzilla host. This bug has been copied to https://bugs.gnucash.org/show_bug.cgi?id=346535. Please update any external references or bookmarks.