After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 653445 - notes are lost if system crashes
notes are lost if system crashes
Status: RESOLVED FIXED
Product: tomboy
Classification: Applications
Component: General
1.6.x
Other Linux
: Normal blocker
: 1.14.x
Assigned To: Alex Tereschenko
Tomboy Maintainers
: 634685 679172 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2011-06-26 19:10 UTC by josh.outerspace
Modified: 2013-07-05 15:45 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
possible solution for a fix (1.11 KB, patch)
2011-07-01 18:58 UTC, Thomas Danzl
rejected Details | Review
Proposed solution v1 (950 bytes, patch)
2013-06-18 20:26 UTC, Alex Tereschenko
none Details | Review
Proposed solution v2 (948 bytes, patch)
2013-07-04 17:56 UTC, Alex Tereschenko
committed Details | Review

Description josh.outerspace 2011-06-26 19:10:39 UTC
I came back from a system crash to find a lot of notes missing.  Maybe the notes that were open during the crash.  If there is a way to recover lost notes?  If not, I am devastated.
Comment 1 Robert Nordan 2011-06-26 19:48:25 UTC
Note corruption is known to happen when systems crash/have the plug pulled with notes open. You can try looking in the note directory (See http://live.gnome.org/Tomboy/Directories to see where it is on your system) and looking through the note files manually to find the missing ones. If you know your way around xml you can try to fix them or you can just salvage the text content and create new notes from that.

Now, while losing notes is a big pain, losing notes because your system crashes in some way that is unrelated to Tomboy is not a blocker bug in Tomboy.
Comment 2 Sandy Armstrong 2011-06-26 23:16:09 UTC
This has been happening since people have started using ext4.  We have been unable to figure out the cause, or reproduce it.  Any help would be welcome.
Comment 3 lapseofreason0 2011-06-29 08:39:58 UTC
The bug seems to be triggered when applications don't use fsync() correctly and use replace-via-rename.

A possible fix is to mount the file system with auto_da_alloc, but it would be better to fix it in tomboy directly.

I will try the mount option, but it's hard to know for sure whether the problem is fixed or not until more data losses occur.

Here is the related excerpt from the mount man page which gives some indications how applications can fix this:

       auto_da_alloc|noauto_da_alloc
              Many  broken applications don't use fsync() when noauto_da_alloc
              replacing existing files via patterns such as

              fd =  open("foo.new")/write(fd,..)/close(fd)/  rename("foo.new",
              "foo")

              or worse yet

              fd = open("foo", O_TRUNC)/write(fd,..)/close(fd).

              If  auto_da_alloc  is enabled, ext4 will detect the replace-via-
              rename and replace-via-truncate  patterns  and  force  that  any
              delayed  allocation  blocks  are allocated such that at the next
              journal commit, in  the  default  data=ordered  mode,  the  data
              blocks  of  the  new file are forced to disk before the rename()
              operation is commited.  This provides roughly the same level  of
              guarantees  as  ext3,  and avoids the "zero-length" problem that
              can happen when a system crashes before the  delayed  allocation
              blocks are forced to disk.


The bug also seems to be related to this:

http://www.h-online.com/open/news/item/Ext4-data-loss-explanations-and-workarounds-740671.html

Hope this helps!
Comment 4 Sandy Armstrong 2011-06-29 13:14:07 UTC
Great research, that does indeed seem to be the problem, when you look at our note file saving code:

http://git.gnome.org/browse/tomboy/tree/Tomboy/Note.cs?id=d1dc142830f58f77706264b03fef4c822421d362#n1394

Apparently we need to change the pattern we use there.
Comment 5 lapseofreason0 2011-06-29 14:24:43 UTC
Running fsync() on tmp_file after it is created should do the trick, or at least it should be run before the backup file is deleted so that if there is a crash before all the data is committed at least there is a backup of the previous data.

Moreover tomboy could check for backup files on startup and do some recovery, but this is optional and shouldn't be necessary if the fsync() is done before the original file is moved to the backup file.

C# is not my strong side though, so hopefully someone else can come up with the details (or even find a different way to accomplish this).
Comment 6 Thomas Danzl 2011-06-30 20:33:00 UTC
There is a code duplicate on the TaskArchiver AddIn where the same thing happens:
http://git.gnome.org/browse/tomboy/tree/Tomboy/Addins/Tasks/TaskArchiver.cs?id=d1dc142830f58f77706264b03fef4c822421d362#n135

I am new to tomboy code but isn't the "backup_path" file the file that should exist in this situation at startup and may be used to restore broken notes?
Comment 7 Sandy Armstrong 2011-06-30 21:13:51 UTC
Hi Thomas.

We don't actually build the Tasks add-in anymore, we should remove it from our git tree as it's been obsolete for years.

Let me explain the code a bit:

1) The new note data is written to tmp_file (a *.tmp file)
2) The original file is write_file, and is backed up to backup_path (a *~ file)
3) tmp_file is then moved to write_file, and backup_path is deleted

Now, none of these steps completely fail.  Steps 1 and 2 work beautifully.  Step 3 fails in the worst possible way.  tmp_file is moved kinda successfully, so the *.tmp file no longer exists, but the write to the new filename (write_file) does not work properly.  Even worse, backup_path is successfully deleted, so the previous version of the note is also gone.  The user is left with no *.tmp file, no *~ file, and a 0 byte note file that is impossible to restore without a backup.

<rant>By the way, everybody should have automated backups set up.  CrashPlan (for example) ranges from free to very cheap depending on how you use it, and works on Linux Windows and Mac.  Tomboy is not the only app out there that can accidentally mangle important files.</rant>

But yeah, this is a serious bug that *we* need to fix.
Comment 8 lapseofreason0 2011-07-01 08:32:08 UTC
I just had another crash and found out that the mount flag doesn't actually solve the issue. I suspect this is because it is simply not smart enough to detect the C# replace-via-rename implementation, so this is one more reason to fix it in tomboy.

<rant>I agree about backups, but it still sucks if you loose the last changes to your notes now and notice in 2 weeks. Moreover I don't think you should expect from the average user to understand enough about ext4 and tomboy to check ~/.local/share/tomboy/ for 0-length files after every power loss and restore the notes if necessary. Since in the case of tomboy the overhead of fsync() at every write shouldn't be too big, I think not loosing data should take precedence over minor performance increases.</rant>

So I think it's good that you are going to fix it.
Comment 9 josh.outerspace 2011-07-01 08:33:29 UTC
What is stopping your backup system from backing up the 0-length file?
Comment 10 Thomas Danzl 2011-07-01 11:52:29 UTC
Some additional Information on this topic can be found here:
http://www.humboldt.co.uk/2009/03/fsync-across-platforms.html
Comment 11 Sandy Armstrong 2011-07-01 12:32:20 UTC
@lapse absolutely agree, and thanks for checking on the mount flag.  A code-based approach is better anyway for our users.

@josh: this is why almost all backup tools keep some amount of version history for backed up files.
Comment 12 Thomas Danzl 2011-07-01 18:58:09 UTC
Created attachment 191115 [details] [review]
possible solution for a fix
Comment 13 Thomas Danzl 2011-07-01 19:00:52 UTC
I do not know if it is really working because I could not find the code how mono implements the FileOptions.WriteThrough property but from the documentation it looks primising.
Or is the real solution only to recover backup files at startup?

(I am sorry to write this in a sepperate comment but I forgot to add the information with the patch attachement...)
Comment 14 Aaron D Borden 2011-07-06 03:32:12 UTC
Thanks for the patch Thomas. I dug through the mono code too but I don't think FileOptions.WriteThrough will work here (I'll ask on mono-list to be sure). Hopefully I have a good understanding of what is going on with the bug. Hear me out and let me know if something doesn't sound correct.

With regard to fsync, I'm coming to the same conclusion as Sandy in this post, that a clean mono solution isn't available until .NET 4:
http://www.mail-archive.com/mono-list@lists.ximian.com/msg33621.html
(Although it kind of blows me away that this isn't built into mono, it seems like a missing feature.)

So, to be backwards compatible, we could use Mono.Unix.UnixStream and call Mono.Unix.Native.Syscall.fsync on the file descriptor. I think it's a pain to do, especially because it works fine as is on windows and mac.

Alternatively, we can try to avoid the whole fsync thing. The argument is that fsync is really a perf hit because it forces the cache flush, and the whole point of the cache is to optimize disk writes. Delayed allocation is after all a feature, not a bug, of ext4. This is mentioned in the above links from lapseofreason0@gmail.com and Thomas (thanks again guys). It's also mentioned in this blog post: http://joeshaw.org/2006/08/22/416/

You can argue that the perf hit is not significant but consider that we do many writes (as long as notes are changing, we're saving every 4 seconds) and all file writes are done synchronously on a single thread.

So, instead of calling fsync on save, we can just not delete the backup file. That way, if there is a crash, the user can recover their file. And as an enhancement, we can add code to recover the backup on note load time if there is an error loading the note. I think this solution makes a lot of sense, plus it's a really simple fix (comment one line).

The downside is that we'd be doubling note storage space in the worst case, but again, we could add code to cleanup some backups which we don't need. For example at startup, once all notes are loaded, it's safe to delete all the backups. We can also set some kind of timeout to remove old backups while tomboy is running.

What do you guys think?
Comment 15 lapseofreason0 2011-07-06 11:12:09 UTC
Hi Aaron,

it's too bad that there seems to be no easy and consistent way of doing this in mono, at least not in earlier versions.

I kind of have some doubts though whether fsync can really be avoided by simply keeping the backup file. Couldn't it happen that you change a file and before the actual file gets written out to disk you change it again, so both the current file and the backup file are only cached in memory? I guess in that case tomboy would still have to at least run fsync on the backup file.

Personally I'm in favor of running fsync at every write or at least before deleting something because I think it's a small price to pay in case of text-only notes compared to loosing a users work.
Comment 16 Benjamin Podszun 2011-07-06 11:27:10 UTC
I have to agree here.

We're writing small files and we don't write _that_ often, imo. We don't write anything if there are no changes. If there are changes, we delay writing the note, until either there's a short amount (these 4 seconds sound about right) of inactivity or Tomboy's shut down.

So - if you don't work with Tomboy actively, an fsync after every write wouldn't hurt. If you work with Tomboy, I don't think that we'd do it excessively often.
Comment 17 Aaron D Borden 2011-07-06 17:00:41 UTC
@lapseofreason0 Yeah, I've been thinking about this more, if ext4 is still not allocating the file we write after 4 seconds, then the backup (which comes from this)  also might not be allocated immediately. We'd have to put some logic to make sure the backup we're writing is allocated, but i doubt there is a deterministic way to do this.

@Benjamin where you would feel it is if something triggered several notes to be saved at once: a sync with multiple note updates, an addin that removes broken links, possibly a settings change (WikiWords comes to mind, but I don't think it retro-actively looks for wiki words). But of course, we could add the fsync and look at otherways to optimize note save.

I'm going to dig around for more information, it just seems messed up that ext4 says "if you're not using fsync, you're doing it wrong even though it defeats the purpose of the feature anyway". I understand optimizing the disk writes, but at what point are you losing data integrity? I would think after a few seconds, if data is not written to disk then you've got a bug. Thanks for reading my rant :)
Comment 18 Aaron D Borden 2011-12-18 23:46:39 UTC
Comment on attachment 191115 [details] [review]
possible solution for a fix

Yeah, WriteThrough is still not supported.
Comment 19 Jared Jennings 2012-04-18 14:07:45 UTC
What if we make sure that the note was successfully written to the file system before we delete the backup file? We could even be as aggressive as running a checksum on the Note before deleting the backup.
Comment 20 Jared Jennings 2012-05-30 04:20:16 UTC
*** Bug 634685 has been marked as a duplicate of this bug. ***
Comment 21 Jared Jennings 2012-06-30 14:03:56 UTC
*** Bug 679172 has been marked as a duplicate of this bug. ***
Comment 22 Aaron D Borden 2012-10-20 04:58:55 UTC
(In reply to comment #19)
> What if we make sure that the note was successfully written to the file system
> before we delete the backup file? We could even be as aggressive as running a
> checksum on the Note before deleting the backup.

This won't work because the file is buffered in memory, so as long as we are using standard APIs to read the file, it will APPEAR as if the file was successfully written despite it not having yet been written to disk.

Cross platform complicates things, since lower level APIs might be available but would be different between *nix, osx, and windows.
Comment 23 Jared Jennings 2012-10-22 03:51:55 UTC
What if we delete the back-ups on shutdown. We could check and see if backup's exist on start-up too. (just in case a crash happened that leaves the backup)

The scenario would be that the backup is made, then the move happens, but we don't delete the backup until much later, during shutdown. By then it should have been written to the filesystem and flushed.
Comment 24 Alex Tereschenko 2013-06-18 20:26:23 UTC
Created attachment 247212 [details] [review]
Proposed solution v1

Here's my take at it - based on the fsync approach (which I like the most out of all those discussed in this thread).

This is similar to what Thomas Danzl has proposed, but it uses a different mechanism. WriteThrough seems to be not implemented still (or I couldn't find that), but they *have* implemented real Flush operation for FileStream, which does actual fsync() on Linux (checked in Mono sources + tested the sample code with strace). It works only for .NET 4.0 and up, but as long as it's our default version anyway - I think we're good here.

I couldn't come up with any more meaningful file rotation scheme than we have right now, so went for a simple one. Your thoughts are welcome.
Comment 25 Jared Jennings 2013-07-04 01:47:13 UTC
(In reply to comment #24)
I'm trying to apply this patch, but it's reporting that it's corrupt. Do you mind reposting or such?
Comment 26 Alex Tereschenko 2013-07-04 17:56:20 UTC
Created attachment 248405 [details] [review]
Proposed solution v2

Oops, sorry for that, this one checked to apply cleanly to current master.
Comment 27 Jared Jennings 2013-07-05 15:37:04 UTC
Review of attachment 248405 [details] [review]:

committed b962005173d8a4aa3d6a6845312abd476ac32b82
Comment 28 Jared Jennings 2013-07-05 15:45:43 UTC
Marking as resolved. It can be reopened if needed.