After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 593479 - Account file being deleted because of erroneous checking for lock file
Account file being deleted because of erroneous checking for lock file
Status: RESOLVED FIXED
Product: GnuCash
Classification: Other
Component: Backend - XML
2.2.x
Other Linux
: Normal major
: ---
Assigned To: Geert Janssens
Andreas Köhler
: 556022 646869 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2009-08-29 13:13 UTC by Sam Morris
Modified: 2018-06-29 22:27 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
gnucash-unlink.strace (328.04 KB, application/x-gzip)
2010-01-03 13:35 UTC, Sam Morris
  Details
Skip over the main account file (837 bytes, patch)
2010-10-03 23:54 UTC, Tim Retout
committed Details | Review
Rewritten version of gnc_xml_be_remove_old_files (6.82 KB, patch)
2010-10-14 13:45 UTC, Geert Janssens
needs-work Details | Review
Rewritten version of gnc_xml_be_remove_old_files - with tighter checks (8.30 KB, patch)
2010-10-19 18:33 UTC, Geert Janssens
committed Details | Review

Description Sam Morris 2009-08-29 13:13:23 UTC
[ forwarded from http://bugs.debian.org/525549 ]

I have twice recently launched gnucash to have it tell me that my data file is
missing... and sure enough, if I go to the directory it is supposed to be in,
(~/Documents/gnucash) it is no longer there...

$ ll
total 696K
- -rw-r--r-- 1 sam sam  827 2009-04-10 13:32 gnucash.xac.20090410132757.log
- -rw-r--r-- 1 sam sam 213K 2009-04-05 19:41 gnucash.xac.20090410133213.xac
- -rw-r--r-- 1 sam sam  170 2009-04-10 14:01 gnucash.xac.20090410133214.log
- -rw-r--r-- 1 sam sam 213K 2009-04-10 13:32 gnucash.xac.20090410140139.xac
- -rw-r--r-- 1 sam sam  170 2009-04-10 14:01 gnucash.xac.20090410140140.log
- -rw-r--r-- 1 sam sam  20K 2009-04-24 23:55 gnucash.xac.20090424235315.log
- -rw-r--r-- 1 sam sam 213K 2009-04-10 14:01 gnucash.xac.20090424235542.xac
- -rw-r--r-- 1 sam sam  170 2009-04-24 23:55 gnucash.xac.20090424235544.log

Fortunately I have been able to recover:

$ cp gnucash.xac.20090424235542.xac gnucash.xac
$ gnucash gnucash.xac

Weirdly, this file only contained transactions up to 2009-04-02. However, I
then replayed the log file gnucash.xac.20090424235315.log and was back
up-to-date.

What is strange is that the log entries I needed to replay were in that file,
rather than the more-recent gnucash.xac.20090424235544.log file, which was
empty apart from the file headers.

I appreciate that this will be hard to track down... I really have no idea how
to go about it. I can only say that my system hasn't been behaving unusually
in any other regard, wasn't shut down uncleanly, etc., so I currently suspect
the data/log file rollover code in Gnucash itself rather than anything else.
Comment 1 Christian Stimming 2009-09-16 09:48:16 UTC
Similar bugs: #555173

Which file system is this being saved on: ext3 or something else?
Comment 2 Sam Morris 2009-09-16 11:22:48 UTC
ext3. I've seen it occur three times since April, so it doesn't happen very often. I've taken to always running Gnucash via strace in case that helps in the event that it crops up again.
Comment 3 Sam Morris 2009-11-08 17:50:09 UTC
I just caused this to happen again by reducing the log/backup file retention setting from 30 days to 1, saving, and then exiting Gnucash. So perhaps it is related to the log file rotation code?
Comment 4 Sam Morris 2010-01-03 13:35:29 UTC
Created attachment 150731 [details]
gnucash-unlink.strace

I reproduced this again by opening my accounts file and then saving and exiting Gnucash (in the unlikely event that it matters, specifically by pressing Ctrl+S then Ctrl+Q). It cleaned up a weeks worth of old backup files in the process, and left me without my primary data file (gnucash.xac) but with a single backup file (gnucash.xac.20100103131839.xac) with contents identical to the original gnucash.xac file.

I had strace tracing the gnucash process while I did this, and here is the output.
Comment 5 Sam Morris 2010-03-01 16:28:04 UTC
This has happened to me a few more times over the last few months, even on a third freshly-installed system, on different hardware, and using a different filesystem (ext4). In all cases the deletion seems to be related to the expiry of old backup/log files.

It definitely looks like an application bug from here.
Comment 6 Sam Morris 2010-06-30 23:15:45 UTC
Just reading bug 556022 and it is possible this is related to it. My system has 4 (virtual) CPUs and my account file is 3.5 MiB uncompressed.

Anyway, I'm going disable file compression for a while and see if it makes a difference.
Comment 7 Sam Morris 2010-08-01 15:35:11 UTC
Disabling file compression does not prevent the occasional deletion of the account file, so this does not look like bug 556022.
Comment 8 Tim Retout 2010-10-03 23:54:13 UTC
Created attachment 171653 [details] [review]
Skip over the main account file

strptime is passed (name + pathlen + 1) as the string to search.  However, when looking at the main account file, strlen(name) == pathlen, so strptime is looking at the point just past the end of name.

Sometimes this will be parseable by strptime, and this leads to the account file being unlinked.

In Sam's case, the base filename ends with ".xac", so the problem is also not caught by the checks around the unlink call.
Comment 9 Tim Retout 2010-10-04 00:02:57 UTC
*** Bug 556022 has been marked as a duplicate of this bug. ***
Comment 10 Christian Stimming 2010-10-04 09:37:55 UTC
Comment on attachment 171653 [details] [review]
Skip over the main account file

*Thanks a lot*! Now this is finally being tracked down. The code in SVN trunk contains an additional check (g_strcmp0(name, be->fullpath)) beforehand, but I think the patch should be applied there as well.
Comment 11 Christian Stimming 2010-10-04 09:39:34 UTC
Note: This will be fixed in 2.3.16 and hence 2.4.0, but there won't be any new 2.2.x release. The attached patch can nevertheless be applied by the distros when packaging gnucash binaries.
Comment 12 Geert Janssens 2010-10-04 16:15:46 UTC
(In reply to comment #10)
> (From update of attachment 171653 [details] [review])
> *Thanks a lot*! Now this is finally being tracked down. The code in SVN trunk
> contains an additional check (g_strcmp0(name, be->fullpath)) beforehand, but I
> think the patch should be applied there as well.

I'm glad as well this bug is finally tracked down. I made the additional check in svn trunk some time ago based on bug 623801 and it never occurred to me this is about the same bug.

@Christian: why do you think the patch in this bug should still be added ? Which scenario do you see that isn't caught by my original addition ?

I do agree distros can use the patch if they are still shipping 2.2.9 to deal with potential data loss there.
Comment 13 Tim Retout 2010-10-04 23:50:58 UTC
Personally I can't see a problem right now with a g_strcmp0 as described. although I haven't reviewed that code.  (I do suspect that my patch has one less strcmp, so is more "efficient".)

It would be nice to refactor this code at some point to reduce the amount of pointer arithmetic going on - that is the root cause of this bug.
Comment 14 Christian Stimming 2010-10-05 06:38:53 UTC
(In reply to comment #12)
> @Christian: why do you think the patch in this bug should still be added ?
> Which scenario do you see that isn't caught by my original addition ?

Indeed the error scenario should be caught by your addition. Nevertheless I think the pointer arithmetic ("name + len" currently) which was the source of this problem should be guarded with the assumed invariants as closely as possible. The patch here added one such invariant ("(len > pathlen)" in the patch), and I think this part should be applied to trunk as well. Because of your addition, this invariant shouldn't be violated now anymore, but as this wording already implies, we better add that check explicitly. That's why I propose to add the patch into trunk as well. (And BTW this bug again confirms pointer arithmetic are a nuisance, even though I would have thought "I'm clever enough to use pointer arithmetic without bugs".)
Comment 15 Christian Stimming 2010-10-05 18:07:58 UTC
Comment on attachment 171653 [details] [review]
Skip over the main account file

trunk r19638 (with some modifications) - thanks a lot!
Comment 16 Geert Janssens 2010-10-05 19:57:18 UTC
(In reply to comment #14)
> Indeed the error scenario should be caught by your addition. Nevertheless I
> think the pointer arithmetic ("name + len" currently) which was the source of
> this problem should be guarded with the assumed invariants as closely as
> possible. The patch here added one such invariant ("(len > pathlen)" in the
> patch), and I think this part should be applied to trunk as well. Because of
> your addition, this invariant shouldn't be violated now anymore, but as this
> wording already implies, we better add that check explicitly. That's why I
> propose to add the patch into trunk as well. (And BTW this bug again confirms
> pointer arithmetic are a nuisance, even though I would have thought "I'm clever
> enough to use pointer arithmetic without bugs".)

That was the only reason I could think of as well.

Would it make sense to replace the pointer logic altogether by string logic in the future ? I guess the required tests could all be done with the g_str* family of routines without any pointer arithmetic. Or would such change have too much negative impact on performance ?

The alternative would be of course to simply go with the pointer arithmetics only and remove my now redundant test. But seeing how tricky pointer arithmetic can be, I'm not too in favour of that...
Comment 17 Christian Stimming 2010-10-05 20:31:40 UTC
(In reply to comment #16)
> Would it make sense to replace the pointer logic altogether by string logic in
> the future ? I guess the required tests could all be done with the g_str*
> family of routines without any pointer arithmetic.

Yes, that would be even better.

> Or would such change have
> too much negative impact on performance ?

No. The test is completely non-critical compared to the actual loading of a file.

> The alternative would be of course to simply go with the pointer arithmetics
> only and remove my now redundant test. But seeing how tricky pointer arithmetic
> can be, I'm not too in favour of that...

Yes. I would not remove your test even though it appears redundant *at this point in time*. But I can clearly see how anyone might change the other parts, overlooking any dangerous condition which might then still be caught by your test. The error is just too critical to rely on "only" one test if we know several sane tests instead. So just leave it in.
Comment 18 Geert Janssens 2010-10-11 11:28:32 UTC
I am currently looking at removing all of the pointer arithmetics from this function. For the most part, this is fairly trivial to do.

While looking at the last part, (where the backup file's date is being compared to see if it's old enough to be deleted), I wondered why we are not using the backup file's actual last modification date instead of the time stamp string in the file name.

I do understand that the file modification date can change, for example as the result of a copy or when the user actually modifies the file (which does defeat part of the purpose of a backup file). But still it makes more sense to me to look at the real modified time. If a user has made changes, chances are he wants to keep these changes a little longer. At some point the file will be removed also if that's the retention preference, just a little later than when comparing the time stamp in the file name.

The benefit of this change is that another risky pointer manipulation can be avoided.

What do you think ?
Comment 19 Geert Janssens 2010-10-14 13:45:41 UTC
Created attachment 172351 [details] [review]
Rewritten version of gnc_xml_be_remove_old_files

Since this is a very critical area of GnuCash I will attach my proposed changes for peer review.

The attached patch is a rewrite of the function that deletes outdated backup and log files. The new code uses glib's string functions instead of pointer arithmetic for all tests.

Additionally it will use the file's modification date to determine if a file should be deleted or nor, instead of parsing the date string in the file name. This should be more accurate, as I noticed gnucash sometimes still modified logfiles after the file was created. The old logic would delete file too soon in such circumstances.
Comment 20 Christian Stimming 2010-10-16 19:43:07 UTC
Comment on attachment 172351 [details] [review]
Rewritten version of gnc_xml_be_remove_old_files

The patch is fine, except for one issue: IMHO the code is still too aggressive in removing files. Say, the user copied the data file to a second file, intending to keep a manual copy. Say, from "myfile.gnucash" to "myfile-yesterday.gnucash". The current code would remove that, if I understand correctly. This should not happen, and instead only those files which were created by gnucash are removed. That is, there should be an additional check before line 835 which is the comment "The file is a backup or log file" - exactly this statement must really really hold. There should be additional checks that only a file with the prefix and suffix and only gnucash-generated additional components are to be deleted. As gnucash knows how we generate the file names of backup files, we can use that knowledge here.
Comment 21 Christian Stimming 2010-10-16 19:43:46 UTC
(Removing milestone because the issue is fixed now, but we're not closing because we are still discussing further improvements.)
Comment 22 Geert Janssens 2010-10-19 18:33:53 UTC
Created attachment 172751 [details] [review]
Rewritten version of gnc_xml_be_remove_old_files - with tighter checks

If the base file is myfile.gnucash, then myfile-yesterday.gnucash will not be deleted by the code in patch 172351:
It fails the test "is this file related to the current data file" because
myfile-yesterday.gnucash
doesn't begin with (or: isn't prefixed with)
myfile.gnucash

But if the user had called his personal backup copy
myfile.gnucash-yesterday.gnucash, it would indeed have been deleted. It's an unusual name to choose, but a valid one, so you are right, we should avoid this if possible.

I have attached a new patch that will include an additional check:
the datestamp is always exactly 14 digits. I have added a regular expression check if what comes after the original filename is exactly
- 1 dot
- 14 digits (numerical only)
- any of .gnucash, .xac, .log
and nothing more.

I don't think we can narrow the check down even further.

Note that regular expressions have the potential to reduce all the string tests into two regular expressions (one for the .LNK files and one for the backup and log files), but I worry that there may be issues with characters that regex considers special in the filename/filepath. That's why I limit the regex usage to the date stamp only which follows a very well defined scheme.
Comment 23 Christian Stimming 2010-10-22 19:12:29 UTC
Comment on attachment 172751 [details] [review]
Rewritten version of gnc_xml_be_remove_old_files - with tighter checks

Looks very good. +1 from me!
Comment 24 Geert Janssens 2010-10-23 10:02:58 UTC
Comment on attachment 172751 [details] [review]
Rewritten version of gnc_xml_be_remove_old_files - with tighter checks

In r19694.
Comment 25 Christian Stimming 2011-04-06 14:04:08 UTC
*** Bug 646869 has been marked as a duplicate of this bug. ***
Comment 26 John Ralls 2018-06-29 22:27:41 UTC
GnuCash bug tracking has moved to a new Bugzilla host. This bug has been copied to https://bugs.gnucash.org/show_bug.cgi?id=593479. Please update any external references or bookmarks.