GNOME Bugzilla – Bug 593479
Account file being deleted because of erroneous checking for lock file
Last modified: 2018-06-29 22:27:41 UTC
[ forwarded from http://bugs.debian.org/525549 ] I have twice recently launched gnucash to have it tell me that my data file is missing... and sure enough, if I go to the directory it is supposed to be in, (~/Documents/gnucash) it is no longer there... $ ll total 696K - -rw-r--r-- 1 sam sam 827 2009-04-10 13:32 gnucash.xac.20090410132757.log - -rw-r--r-- 1 sam sam 213K 2009-04-05 19:41 gnucash.xac.20090410133213.xac - -rw-r--r-- 1 sam sam 170 2009-04-10 14:01 gnucash.xac.20090410133214.log - -rw-r--r-- 1 sam sam 213K 2009-04-10 13:32 gnucash.xac.20090410140139.xac - -rw-r--r-- 1 sam sam 170 2009-04-10 14:01 gnucash.xac.20090410140140.log - -rw-r--r-- 1 sam sam 20K 2009-04-24 23:55 gnucash.xac.20090424235315.log - -rw-r--r-- 1 sam sam 213K 2009-04-10 14:01 gnucash.xac.20090424235542.xac - -rw-r--r-- 1 sam sam 170 2009-04-24 23:55 gnucash.xac.20090424235544.log Fortunately I have been able to recover: $ cp gnucash.xac.20090424235542.xac gnucash.xac $ gnucash gnucash.xac Weirdly, this file only contained transactions up to 2009-04-02. However, I then replayed the log file gnucash.xac.20090424235315.log and was back up-to-date. What is strange is that the log entries I needed to replay were in that file, rather than the more-recent gnucash.xac.20090424235544.log file, which was empty apart from the file headers. I appreciate that this will be hard to track down... I really have no idea how to go about it. I can only say that my system hasn't been behaving unusually in any other regard, wasn't shut down uncleanly, etc., so I currently suspect the data/log file rollover code in Gnucash itself rather than anything else.
Similar bugs: #555173 Which file system is this being saved on: ext3 or something else?
ext3. I've seen it occur three times since April, so it doesn't happen very often. I've taken to always running Gnucash via strace in case that helps in the event that it crops up again.
I just caused this to happen again by reducing the log/backup file retention setting from 30 days to 1, saving, and then exiting Gnucash. So perhaps it is related to the log file rotation code?
Created attachment 150731 [details] gnucash-unlink.strace I reproduced this again by opening my accounts file and then saving and exiting Gnucash (in the unlikely event that it matters, specifically by pressing Ctrl+S then Ctrl+Q). It cleaned up a weeks worth of old backup files in the process, and left me without my primary data file (gnucash.xac) but with a single backup file (gnucash.xac.20100103131839.xac) with contents identical to the original gnucash.xac file. I had strace tracing the gnucash process while I did this, and here is the output.
This has happened to me a few more times over the last few months, even on a third freshly-installed system, on different hardware, and using a different filesystem (ext4). In all cases the deletion seems to be related to the expiry of old backup/log files. It definitely looks like an application bug from here.
Just reading bug 556022 and it is possible this is related to it. My system has 4 (virtual) CPUs and my account file is 3.5 MiB uncompressed. Anyway, I'm going disable file compression for a while and see if it makes a difference.
Disabling file compression does not prevent the occasional deletion of the account file, so this does not look like bug 556022.
Created attachment 171653 [details] [review] Skip over the main account file strptime is passed (name + pathlen + 1) as the string to search. However, when looking at the main account file, strlen(name) == pathlen, so strptime is looking at the point just past the end of name. Sometimes this will be parseable by strptime, and this leads to the account file being unlinked. In Sam's case, the base filename ends with ".xac", so the problem is also not caught by the checks around the unlink call.
*** Bug 556022 has been marked as a duplicate of this bug. ***
Comment on attachment 171653 [details] [review] Skip over the main account file *Thanks a lot*! Now this is finally being tracked down. The code in SVN trunk contains an additional check (g_strcmp0(name, be->fullpath)) beforehand, but I think the patch should be applied there as well.
Note: This will be fixed in 2.3.16 and hence 2.4.0, but there won't be any new 2.2.x release. The attached patch can nevertheless be applied by the distros when packaging gnucash binaries.
(In reply to comment #10) > (From update of attachment 171653 [details] [review]) > *Thanks a lot*! Now this is finally being tracked down. The code in SVN trunk > contains an additional check (g_strcmp0(name, be->fullpath)) beforehand, but I > think the patch should be applied there as well. I'm glad as well this bug is finally tracked down. I made the additional check in svn trunk some time ago based on bug 623801 and it never occurred to me this is about the same bug. @Christian: why do you think the patch in this bug should still be added ? Which scenario do you see that isn't caught by my original addition ? I do agree distros can use the patch if they are still shipping 2.2.9 to deal with potential data loss there.
Personally I can't see a problem right now with a g_strcmp0 as described. although I haven't reviewed that code. (I do suspect that my patch has one less strcmp, so is more "efficient".) It would be nice to refactor this code at some point to reduce the amount of pointer arithmetic going on - that is the root cause of this bug.
(In reply to comment #12) > @Christian: why do you think the patch in this bug should still be added ? > Which scenario do you see that isn't caught by my original addition ? Indeed the error scenario should be caught by your addition. Nevertheless I think the pointer arithmetic ("name + len" currently) which was the source of this problem should be guarded with the assumed invariants as closely as possible. The patch here added one such invariant ("(len > pathlen)" in the patch), and I think this part should be applied to trunk as well. Because of your addition, this invariant shouldn't be violated now anymore, but as this wording already implies, we better add that check explicitly. That's why I propose to add the patch into trunk as well. (And BTW this bug again confirms pointer arithmetic are a nuisance, even though I would have thought "I'm clever enough to use pointer arithmetic without bugs".)
Comment on attachment 171653 [details] [review] Skip over the main account file trunk r19638 (with some modifications) - thanks a lot!
(In reply to comment #14) > Indeed the error scenario should be caught by your addition. Nevertheless I > think the pointer arithmetic ("name + len" currently) which was the source of > this problem should be guarded with the assumed invariants as closely as > possible. The patch here added one such invariant ("(len > pathlen)" in the > patch), and I think this part should be applied to trunk as well. Because of > your addition, this invariant shouldn't be violated now anymore, but as this > wording already implies, we better add that check explicitly. That's why I > propose to add the patch into trunk as well. (And BTW this bug again confirms > pointer arithmetic are a nuisance, even though I would have thought "I'm clever > enough to use pointer arithmetic without bugs".) That was the only reason I could think of as well. Would it make sense to replace the pointer logic altogether by string logic in the future ? I guess the required tests could all be done with the g_str* family of routines without any pointer arithmetic. Or would such change have too much negative impact on performance ? The alternative would be of course to simply go with the pointer arithmetics only and remove my now redundant test. But seeing how tricky pointer arithmetic can be, I'm not too in favour of that...
(In reply to comment #16) > Would it make sense to replace the pointer logic altogether by string logic in > the future ? I guess the required tests could all be done with the g_str* > family of routines without any pointer arithmetic. Yes, that would be even better. > Or would such change have > too much negative impact on performance ? No. The test is completely non-critical compared to the actual loading of a file. > The alternative would be of course to simply go with the pointer arithmetics > only and remove my now redundant test. But seeing how tricky pointer arithmetic > can be, I'm not too in favour of that... Yes. I would not remove your test even though it appears redundant *at this point in time*. But I can clearly see how anyone might change the other parts, overlooking any dangerous condition which might then still be caught by your test. The error is just too critical to rely on "only" one test if we know several sane tests instead. So just leave it in.
I am currently looking at removing all of the pointer arithmetics from this function. For the most part, this is fairly trivial to do. While looking at the last part, (where the backup file's date is being compared to see if it's old enough to be deleted), I wondered why we are not using the backup file's actual last modification date instead of the time stamp string in the file name. I do understand that the file modification date can change, for example as the result of a copy or when the user actually modifies the file (which does defeat part of the purpose of a backup file). But still it makes more sense to me to look at the real modified time. If a user has made changes, chances are he wants to keep these changes a little longer. At some point the file will be removed also if that's the retention preference, just a little later than when comparing the time stamp in the file name. The benefit of this change is that another risky pointer manipulation can be avoided. What do you think ?
Created attachment 172351 [details] [review] Rewritten version of gnc_xml_be_remove_old_files Since this is a very critical area of GnuCash I will attach my proposed changes for peer review. The attached patch is a rewrite of the function that deletes outdated backup and log files. The new code uses glib's string functions instead of pointer arithmetic for all tests. Additionally it will use the file's modification date to determine if a file should be deleted or nor, instead of parsing the date string in the file name. This should be more accurate, as I noticed gnucash sometimes still modified logfiles after the file was created. The old logic would delete file too soon in such circumstances.
Comment on attachment 172351 [details] [review] Rewritten version of gnc_xml_be_remove_old_files The patch is fine, except for one issue: IMHO the code is still too aggressive in removing files. Say, the user copied the data file to a second file, intending to keep a manual copy. Say, from "myfile.gnucash" to "myfile-yesterday.gnucash". The current code would remove that, if I understand correctly. This should not happen, and instead only those files which were created by gnucash are removed. That is, there should be an additional check before line 835 which is the comment "The file is a backup or log file" - exactly this statement must really really hold. There should be additional checks that only a file with the prefix and suffix and only gnucash-generated additional components are to be deleted. As gnucash knows how we generate the file names of backup files, we can use that knowledge here.
(Removing milestone because the issue is fixed now, but we're not closing because we are still discussing further improvements.)
Created attachment 172751 [details] [review] Rewritten version of gnc_xml_be_remove_old_files - with tighter checks If the base file is myfile.gnucash, then myfile-yesterday.gnucash will not be deleted by the code in patch 172351: It fails the test "is this file related to the current data file" because myfile-yesterday.gnucash doesn't begin with (or: isn't prefixed with) myfile.gnucash But if the user had called his personal backup copy myfile.gnucash-yesterday.gnucash, it would indeed have been deleted. It's an unusual name to choose, but a valid one, so you are right, we should avoid this if possible. I have attached a new patch that will include an additional check: the datestamp is always exactly 14 digits. I have added a regular expression check if what comes after the original filename is exactly - 1 dot - 14 digits (numerical only) - any of .gnucash, .xac, .log and nothing more. I don't think we can narrow the check down even further. Note that regular expressions have the potential to reduce all the string tests into two regular expressions (one for the .LNK files and one for the backup and log files), but I worry that there may be issues with characters that regex considers special in the filename/filepath. That's why I limit the regex usage to the date stamp only which follows a very well defined scheme.
Comment on attachment 172751 [details] [review] Rewritten version of gnc_xml_be_remove_old_files - with tighter checks Looks very good. +1 from me!
Comment on attachment 172751 [details] [review] Rewritten version of gnc_xml_be_remove_old_files - with tighter checks In r19694.
*** Bug 646869 has been marked as a duplicate of this bug. ***
GnuCash bug tracking has moved to a new Bugzilla host. This bug has been copied to https://bugs.gnucash.org/show_bug.cgi?id=593479. Please update any external references or bookmarks.