GNOME Bugzilla – Bug 700785
gvfs metadata corrupted by concurrent access
Last modified: 2013-06-14 09:38:01 UTC
Created attachment 244960 [details] gdb.txt Unsure whether the flaw here is in glib or gvfs, let's start here. See attached stack:
Blowing away ~/.local/share/gvfs-metadata makes the symptom go away, and the test passes. Ok, to get a clean environment: logging out, and from a VT as root, shutting down gdm, ensuring gvfs-metadatad is dead, then rm ~walters/.local/share/gvfs-metadata -rf, then start gdm again to get a clean environment. Running glib/file.test twice in a row triggers the corruption.
Ok, it's worth noting that glib/file.test creates a *lot* of threads. From https://bugzilla.gnome.org/show_bug.cgi?id=637095#c9 "One thing I find disturbing is no protection from concurrent access to the same journal file from multiple clients. Clients have the journal file partially mmaped - the data area but not the header, containing number of records etc., which is updated separately." Really =( This trivially reproduces for me by the way in jhbuild in Fedora 19, *assuming* that I build gvfs in the jhbuild root.
Created attachment 245002 [details] [review] 0001-metadata-Downgrade-journal-short-read-to-debug.patch
Oh, is the g_warning() too intrusive for you? It was really meant as a warning since we're breaking journal iteration effectively losing data written after the corrupted entry (we don't know how much to seek, heuristics won't help here either since there's no magic header to look for). (In reply to comment #2) > Ok, it's worth noting that glib/file.test creates a *lot* of threads. From > https://bugzilla.gnome.org/show_bug.cgi?id=637095#c9 > > "One thing I find disturbing is no protection from concurrent access to the > same > journal file from multiple clients. Clients have the journal file partially > mmaped - the data area but not the header, containing number of records etc., > which is updated separately." I should note that this was meant in context of multiple systems sharing one physical storage, client here means a machine, isolated operating system. In a single environment write access is controlled by single daemon, coordinating writes and database rotation. Unless you have multiple instances of gvfsd-metadata running (e.g. chroot or separate d-bus session sharing the same homedir), you shouldn't run into problem. (In reply to comment #1) > Running glib/file.test twice in a row triggers the corruption. What was your setup exactly? > This trivially reproduces for me by the way in jhbuild in Fedora 19, *assuming* > that I build gvfs in the jhbuild root. Not sure how jhbuild works, mixing various gvfs versions shouldn't matter in this case.
(In reply to comment #4) > In a single environment write access is controlled by single daemon, > coordinating writes and database rotation. Unless you have multiple instances > of gvfsd-metadata running (e.g. chroot or separate d-bus session sharing the > same homedir), you shouldn't run into problem. But multiple processes are attempting to read from the journal which is being concurrently written, and it's possible to hit that g_warning(). > (In reply to comment #1) > > Running glib/file.test twice in a row triggers the corruption. > > What was your setup exactly? This was inside the GnomeOSTree continuous integration/testing system: https://live.gnome.org/OSTree/GnomeOSTree which runs the glib installed tests (such as glib/file from glib/gio/tests/file.c). However it reproduced inside jhbuild for me as well. > Not sure how jhbuild works, mixing various gvfs versions shouldn't matter in > this case. Right. So...for now I guess, let's accept the possibilty of "short journal iteration". But we should really figure out a better scheme for concurrent read access. http://sqlite.org/atomiccommit.html is pretty in depth on how SQLite does multi-process read/write. The tradeoffs gvfs metadata is going for were somewhat different (mainly the emphasis on performance), but it's still probably worth using as a reference.
Comment on attachment 245002 [details] [review] 0001-metadata-Downgrade-journal-short-read-to-debug.patch Leaving this bug open for further discussion.
(In reply to comment #5) > But multiple processes are attempting to read from the journal which is being > concurrently written, and it's possible to hit that g_warning(). I see your point now. Looking at the code, it should be safe to have concurrent read access from multiple clients while daemon is writing new data. The whole journal file is mmap'ed incl. header and data. When reading, clients iterate from the end of the journal back to the beginning, having the journal length stored just before that. When writing to the journal, data are written (appended) first and journal header is changed as the last thing. Since clients are retrieving journal length at the beginning, they don't see the newly written entry. When rotating, new journal file is created and the old (current) one is unlinked, no more writes, no corruption possible from that side. Perhaps it's caused by unflushed blocks at various locations of the file? We mmap PROT_READ | PROT_WRITE and MAP_SHARED on an fd that is opened using just standard O_RDONLY/O_RDWR flags. Perhaps O_DIRECT could help to stay uncached? Or explicit flush on the fd? Performance will suffer either way. > So...for now I guess, let's accept the possibilty of "short journal iteration". > But we should really figure out a better scheme for concurrent read access. > > http://sqlite.org/atomiccommit.html is pretty in depth on how SQLite does > multi-process read/write. The tradeoffs gvfs metadata is going for were > somewhat different (mainly the emphasis on performance), but it's still > probably worth using as a reference. Thanks, this is useful. I'll leave it to somebody else taking gvfs over :-)
I don't see how the readers being "concurrent" should in any way affect this. Even a single reader will be concurrent with the single writer, but the ordering should be such that the reader doesn't see the update until everything is fully written. It seem like its getting the new journal length before all the data is written, which seems to be like a memory barrier issue or something like that...
In fact, i can reproduce this just doing: $ gvfs-set-attribute file.txt metadata::foo test5 $ gvfs-set-attribute file.txt metadata::foo test6 $ gvfs-set-attribute file.txt metadata::foo test7 (gvfs-set-attribute:13983): GVFS-WARNING **: meta_journal_iterate: found short sized entry, possible journal corruption seems like we have some serious issue with metadata writing.
Fixed this, but accidentally put the patch in: https://bugzilla.gnome.org/show_bug.cgi?id=637095#c27