GNOME Bugzilla – Bug 598561
gvfsd-metadata crashes on start
Last modified: 2018-09-21 16:57:44 UTC
Crashes about 30 seconds after startup. gvfs-1.4.0-3.fc12.x86_64 Program terminated with signal 11, Segmentation fault.
+ Trace 218336
And running under valgrind: ==3488== Invalid read of size 4 ==3488== at 0x405736: copy_tree_to_builder (metatree.c:223) ==3488== by 0x40616F: meta_tree_flush_locked (metatree.c:2344) ==3488== by 0x406397: meta_tree_flush (metatree.c:2365) ==3488== by 0x403AAC: writeout_timeout (meta-daemon.c:61) ==3488== by 0x34CC43993A: g_timeout_dispatch (gmain.c:3396) ==3488== by 0x34CC43922D: g_main_context_dispatch (gmain.c:1960) ==3488== by 0x34CC43CC17: g_main_context_iterate (gmain.c:2591) ==3488== by 0x34CC43D064: g_main_loop_run (gmain.c:2799) ==3488== by 0x4049DC: main (meta-daemon.c:783) ==3488== Address 0x8 is not stack'd, malloc'd or (recently) free'd ==3488== ==3488== ==3488== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==3488== Access not within mapped region at address 0x8 ==3488== at 0x405736: copy_tree_to_builder (metatree.c:223) ==3488== by 0x40616F: meta_tree_flush_locked (metatree.c:2344) ==3488== by 0x406397: meta_tree_flush (metatree.c:2365) ==3488== by 0x403AAC: writeout_timeout (meta-daemon.c:61) ==3488== by 0x34CC43993A: g_timeout_dispatch (gmain.c:3396) ==3488== by 0x34CC43922D: g_main_context_dispatch (gmain.c:1960) ==3488== by 0x34CC43CC17: g_main_context_iterate (gmain.c:2591) ==3488== by 0x34CC43D064: g_main_loop_run (gmain.c:2799) ==3488== by 0x4049DC: main (meta-daemon.c:783)
Created attachment 145523 [details] home.tar.gz The files that caused the crash. Seems to work fine when those files are removed.
*** Bug 590060 has been marked as a duplicate of this bug. ***
To reproduce the issue with attached file: 1. create new user 'hadess' and copy the files to ~/.local/share/gvfs-metadata 2. open Nautilus, choose some file in your home directory, set an emblem 3. wait #define WRITEOUT_TIMEOUT_SECS 60 seconds, crash The `home` file is missing #define MAGIC "\xda\x1ameta" at the beginning, file is malformed. That could have happened in two cases - filesystem error (e.g. xfs zeroing opened files) or invalid write from the daemon. The latter needs more investigation and we really need good repro steps leading to malformed file. Either way, we need to fix the daemon to drop the malformed data (users won't be happy!) and start with a clean file.
As long as Alex is away, I will try to come up with a fix for at least the first part of the problem, namely that we don't detect corrupted metadata files at all.
Wth happened to that file? The "meta" magic and version info is not at the top of the file but almost at the end, and all strings in the file seem partial. There is no table listing all the keys in use in the file (which is generally at the start of the file after the header). The strings look weird too, parts of full pathnames where generally only relative pathnames are used in these files (unless as stored values by some app), nor are they stored in blocks related to each directory as they normally are. No app maps the "home" file writable and its only written by gvfsd-metadata, and then quite carefully with fsyncs and whatnot. I have no idea what mangled this file. Did you have a system crash at some time before this?
I don't know what happened to that file, and yes, there was a crash before the problem happened. If gvfsd-metadata could just detect that the file is damaged and ignore it, it would be a good start... Note that I saw something similar in another folder, without a system crash being necessary.
Created attachment 146914 [details] [review] Check if MetaTree exists before flushing it Also change meta_tree_clear () so it sets fd to -1; this will make meta_tree_exists () work as expected. Bug 598561.
The attached patch will fix the crasher but will leave the render the specific metatree unusable (since we have neither file nor journal to write to). I guess we need a meta_tree_reset () or maybe meta_tree_recover () functions that will just truncate the file in case the magic header is broken. Since the data most likely is already lost at that point it shouldn't matter.
Created attachment 146922 [details] [review] Nuke metadata file if magic blob is wrong This is more a hack/proof of concept. It nukes a metadata file if the magic blob is wrong. Works ok here.
I'm not sure nuking it is the best approach, maybe we can rename it, adding ".error" to the name or something.
Interesting, the "home" file does have a fully correct, but empty of data, metadata file starting at offset 0x1121 until the end of the file. And the journal tag it refers to is the correct one of the attached journal file, which is also a correct empty journal file.
I believe commit f45b677201abb6b8471fa5bc935afda420c28c39 fixes the reason for the bugged file. However, we should get the robust handling of buggy files in anyway.
*** Bug 601504 has been marked as a duplicate of this bug. ***
*** Bug 603645 has been marked as a duplicate of this bug. ***
*** Bug 609594 has been marked as a duplicate of this bug. ***
Stacktrace from #609594 (gdb) bt full
+ Trace 220570
I stumbled over this as well, in the report that I've got (https://launchpad.net/bugs/744836) it happens when starting the session with AT and orca. Bug 643268 seems to be the same issue (tree->root == NULL), I'll dupe this. It has a patch, but I'm afraid I know too little about gvfs-metadata to say whether it's correct.
*** Bug 643268 has been marked as a duplicate of this bug. ***
*** Bug 741216 has been marked as a duplicate of this bug. ***
Created attachment 292287 [details] [review] metadata: Handle failure if files are not readable If the gvfs-metadata directory is not readable, handle the failure by returning NULL if meta_tree_open fails. Note that failing to open a tree for reading is not an error, but failing to open for writing is an error. Fix up several callers to handle a NULL tree without crashing.
The attach patch fixes the issue for me, tested by making gvfs-metadata unreadable and trying various combinations of gvfs-ls gvfs-info and gvfs-set-attribute on local and remote files.
Thanks Ross, will look at the patch in next days (not feeling well today) and compare it with my draft how to fix it...
Bug 741216 has easy steps to reproduce the crash. Ok I made this patch before I saw you had a draft patch... anyway, get better soon :-)
Review of attachment 292287 [details] [review]: It looks good. Most of lines we fix in same way. ::: client/gdaemonfile.c @@ +2658,3 @@ + g_set_error (error, G_IO_ERROR, + G_IO_ERROR_FAILED, + _("Error setting file metadata")); Maybe would be good to specify the error (also in other places), something like: _("Error setting file metadata: %s\n"), _("can't open metadata file") ::: client/gdaemonvfs.c @@ +1346,3 @@ + g_free (tree_path); + } + } I hope there are changes only in indentation... unfortunately there isn't probably any nice way how to fix it without changing to many lines. ::: metadata/metatree.c @@ +395,3 @@ } tree->fd = -1; + return !tree->for_write; I'm not really sure if it is ok, I need to do more investigation... but makes sense. @@ +479,3 @@ + if (!meta_tree_init (tree)) + { + g_free (tree); meta_tree_unref (tree)? @@ +480,3 @@ + { + g_free (tree); + return NULL; Why not use tree = NULL; to do it same as in other parts of the code... @@ +618,3 @@ meta_tree_clear (tree); + + /* XXX what happens if this fails? */ I think we can handle it, see my patch...
Created attachment 292831 [details] [review] metadata: don't crash if meta_tree_init fails (draft) Also I have some debug outputs in meta_tree_init...
Review of attachment 292287 [details] [review]: ::: client/gdaemonfile.c @@ +2658,3 @@ + g_set_error (error, G_IO_ERROR, + G_IO_ERROR_FAILED, + _("Error setting file metadata")); OK ::: client/gdaemonvfs.c @@ +1346,3 @@ + g_free (tree_path); + } + } Yeah, it is just a change in indentation. ::: metadata/metatree.c @@ +479,3 @@ + if (!meta_tree_init (tree)) + { + g_free (tree); OK @@ +480,3 @@ + { + g_free (tree); + return NULL; OK @@ +618,3 @@ meta_tree_clear (tree); + + /* XXX what happens if this fails? */ OK...
Review of attachment 292831 [details] [review]: As you said, this is pretty similar to the patch I submitted. However, it is missing the "return !tree->for_write" line which makes setting metadata fail if no metadata tree file exists. E.g. try rm ~/.local/share/gvfs-metadata/*; gvfs-set-attribute ... The following applied on top of your patch should fix it: diff --git a/metadata/metatree.c b/metadata/metatree.c index 5c320f6..af72c1e 100644 --- a/metadata/metatree.c +++ b/metadata/metatree.c @@ -376,9 +376,6 @@ meta_tree_init (MetaTree *tree) fd = safe_open (tree, tree->filename, O_RDONLY); if (fd == -1) { - errsv = errno; - g_warning ("can't init metadata tree %s: %s", tree->filename, g_strerror (errsv)); - if (tree->for_write && !retried) { MetaBuilder *builder; @@ -397,9 +394,14 @@ meta_tree_init (MetaTree *tree) } meta_builder_free (builder); } + else if (tree->for_write) + { + errsv = errno; + g_warning ("can't init metadata tree %s: %s", tree->filename, g_strerror (errsv)); + } tree->fd = -1; - return FALSE; + return !tree->for_write; } if (fstat (fd, &statbuf) != 0) ::: metadata/metatree.c @@ +415,2 @@ { + g_warning ("can't init metadata tree %s: %s", tree->filename, "wrong size"); This would be simpler like: g_warning ("can't init metadata tree %s: wrong size", tree->filename); And same for the others below.
(In reply to comment #28) > Review of attachment 292831 [details] [review]: > > As you said, this is pretty similar to the patch I submitted. > > However, it is missing the "return !tree->for_write" line which makes setting > metadata fail if no metadata tree file exists. > E.g. try rm ~/.local/share/gvfs-metadata/*; gvfs-set-attribute ... Correct... :-) could you merge those patches and add comment, why we are returning !tree->for_write in this case? Maybe this patch could help us to find root cause, why it is crashing so frequently: https://retrace.fedoraproject.org/faf2/problems/438841/ Also maybe good time to resurrect attachment 146922 [details] [review].
Created attachment 292901 [details] [review] metadata: don't crash if meta_tree_init fails It fails e.g. when gvfs-metadata dir isn't readable.
Created attachment 292977 [details] [review] metadata: don't crash if meta_tree_init fails Thanks. I've been playing with metadata some time. It seems all meta_tree_init callers are handled properly. I've simplify those warnings also as you suggested and remove some whitespace. So I think it is ready for committing and we will see what happens, because the number of crashes wouldn't be due to wrong permissions...
Review of attachment 292977 [details] [review]: Looks good other than a minor comment ::: metadata/metatree.c @@ +399,3 @@ + else if (tree->for_write || errsv != ENOENT) + { + g_warning ("can't init metadata tree %s: %s", tree->filename, g_strerror (errsv)); Perhaps the message could give some indication of what failed (e.g. open) to differentiate from the other messages. @@ +411,3 @@ + { + errsv = errno; + g_warning ("can't init metadata tree %s: %s", tree->filename, g_strerror (errsv)); Perhaps the message could give some indication of what failed (e.g. fstat) to differentiate from the other messages. @@ +429,3 @@ { + errsv = errno; + g_warning ("can't init metadata tree %s: %s", tree->filename, g_strerror (errsv)); Perhaps the message could give some indication of what failed (e.g. mmap) to differentiate from the other messages.
Review of attachment 292977 [details] [review]: ::: client/gdaemonfile.c @@ +2657,3 @@ + { + g_set_error (error, G_IO_ERROR, G_IO_ERROR_FAILED, + _("Error setting file metadata: %s\n"), Error messages like this shouldn't end with newlines... ::: client/gdaemonvfs.c @@ +1284,3 @@ + { + g_set_error (error, G_IO_ERROR, G_IO_ERROR_FAILED, + _("Error setting file metadata: %s\n"), Error messages like this shouldn't end with new lines @@ +1286,3 @@ + _("Error setting file metadata: %s\n"), + _("can't open metadata tree")); + res = FALSE; For consistency with the rest of the function, add: error = NULL; /* Don't set further errors */
Created attachment 293120 [details] [review] client: Strip remote dbus error when setting metadata fails Without this, if the function fails, the error presented to the user looks like: GDBus.Error:org.gtk.GDBus.UnmappedGError.Quark._g_2dio_2derror_2dquark.Code1: Can't find metadata file /home/ross/.local/share/gvfs-metadata/home
Review of attachment 293120 [details] [review]: Looks good.
Created attachment 293803 [details] [review] metadata: don't crash if meta_tree_init fails Thanks for the review! Sorry for the delay, I was sick and then on vacation. There is fixed patch. Let's go in...
Did you see comment 33?
I saw it, what's wrong? I've double checked it now. Your notes should be addressed in the attachment 293803 [details] [review].
Sorry, I meant comment 32, just to differentiate between a few of the error messages (open, fstat and mmap).
Created attachment 293831 [details] [review] metadata: don't crash if meta_tree_init fails Ah, sorry, I saw it before, but I've forgotten on it now. That's good idea! There is modified version, hopefully last one :-D
Review of attachment 293831 [details] [review]: Looks good!
Comment on attachment 293831 [details] [review] metadata: don't crash if meta_tree_init fails commit a03635ee68f043b68c1bb4cb95114af3ecb65ec4
Comment on attachment 293120 [details] [review] client: Strip remote dbus error when setting metadata fails commit 754b8750abbd01ae8a429b5f75d5e9f9622ab010
Also pushed for gnome-3-14, gnome-3-12, gnome-3-10. Hopefully we will reduce the enormous number of crashes: https://retrace.fedoraproject.org/faf2/problems/537424/ Thanks Ross for cooperation!
Unfortunately this bug isn't still fixed, there are new downstream bugreports (same as before): https://bugzilla.redhat.com/show_bug.cgi?id=1187492 I made some elaboration on the backtrace. Full bactrace is in the mentioned downstream report. There you can see:
+ Trace 234637
The dirent is NULL, so tree->root is NULL. I suspect that this could happen only if meta_tree_init failed (but I would really like to know why). So if e.g. meta-daemon.c:handle_set() is executed and meta_tree_set_string failed, we still execute tree_info_schedule_writeout. If meta_tree_set_string failed due to meta_tree_init (e.g. in meta_tree_refresh), the tree->root is NULL. So there is crash if writeout_timeout is executed. I don't know about different case, when tree->root could be NULL, do you? Unfortunately I don't have idea, how to reproduce the mentioned case. Playing with permissions doesn't work as I intended, because meta_builder_write has to success and consequential meta_tree_init in meta_tree_refresh has to fail... So I think we improved the behavioral by the previous patch, because handle_set() should return error in this case, however did not fix the crashing.
Created attachment 296445 [details] [review] metadata: don't shedule flush if operation failed There is proposed solution which should remove invalid trees from tree_infos cache and don't allow tree_info_schedule_writeout to be executed.
Also different metadata crasher appeared with lot of hits: https://retrace.fedoraproject.org/faf2/reports/483385/ Unfortunately there isn't much info, but we don't break it (either fix) by the previous patch according package versions...
Review of attachment 296445 [details] [review]: I've been playing with metadata daemon a bit. The crash with mentioned backtrace can be "reproduced", when I manually inject error in meta_tree_init (to cause what I mentioned before in comment #45). The provided patch fixes the crash, however meta_tree_exists doesn't work as intended, because meta_tree_clear sets fd = 0. So, currently if meta_tree_init fails once, metadata daemon doesn't try to read it again (because there is invalid tree in tree_infos cache) and just returns error. I intended to remove it from the cache and try it again next time initially. But hard to say what is better behavioral. It would be really useful to know reason, why it is crashing. If it is just fails when opening tree file due to e.g. lack of memory, or if there are some corrupted files. If there is corrupted file, meta_tree_init never succeeds, but it can succeed in other cases... However in all cases we should to fix at least the consequences to not crash the daemon. We have recently added warnings in meta_tree_init, so if it is crashing for somebody, he can provide the output to determine reason of crashes. Isn't possible to get those warnings from some logs? Or is only way how to get them to run the daemon manually? I can't find them anywhere unfortunately...
(In reply to Ondrej Holy from comment #48) > Review of attachment 296445 [details] [review] [review]: > (snip) > We have recently added warnings in meta_tree_init, so if it is crashing for > somebody, he can provide the output to determine reason of crashes. Isn't > possible to get those warnings from some logs? Or is only way how to get > them to run the daemon manually? I can't find them anywhere unfortunately... I answer myself, e.g.: journalctl _COMM=gnome-session | grep gvfsd-metadata > gvfsd-metadata.log
Created attachment 297762 [details] [review] metadata: return if meta_tree_refresh failed There is fix for bug which was introduced by the attachment 293831 [details] [review]. G_UNLOCK is called twice if meta_tree_refresh fails and also concurrent access is possible on cached_trees. However this isn't problem currently, because meta_tree_lookup_by_name is always called with for_write=FALSE (except for meta-set testing utility), so this if statement is never issued
Created attachment 297763 [details] [review] metadata: don't shedule flush if operation failed There is simplified version of the fix, which requires attachment 146914 [details] [review] to be committed also...
Review of attachment 297762 [details] [review]: Looks good to me.
Maybe we're really seeing a use-after-free situation rather than init failing? meta_tree_unref() clears the MetaTree on last unref.
Review of attachment 297763 [details] [review]: Can't we just use meta_tree_exists() in writeout_timeout() instead? Although we'd have to change meta_tree_clear() to set fd to -1, not 0. This seems better in general, as there seems to be a risk that we close fd 0 if meta_tree_clear() is called twice. It would be nice to get some debug info back from users though, as this bug seems to indicate we're flushing a tree and creating an invalid file... ::: metadata/meta-daemon.c @@ +125,3 @@ if (info) + { + if (meta_tree_refresh (info->tree)) I'm not super fond of this. The daemon is the only thing that triggers the need for a refresh (when flushing), so this kind of "see if we need to refresh" call in the daemon is kind of weird. We should be able to have full knowledge of when the tree goes bad. @@ +214,3 @@ else { + tree_info_schedule_writeout (info); This is not enough, is it? We may succeed in a write and schedule a timeout, but then a later set() call fails and makes the tree invalid. ::: metadata/metatree.c @@ +644,3 @@ } +/* The tree is uninicialized if FALSE is returned! */ uninitialized
putting crashes on the blocker list
Thanks for looking at this... (In reply to Alexander Larsson from comment #53) > Maybe we're really seeing a use-after-free situation rather than init > failing? meta_tree_unref() clears the MetaTree on last unref. That's another option, but I don't see where it could happen. Metadata daemon has own cache of trees. meta_tree_unref is called only in tree_info_free and tree_info_free is used only as argument of g_hash_table. It seems hash table isn't never destroyed, so only case when tree_info_free can be called is when g_hash_table_insert is called with already used filename. g_hash_table_insert is used only in tree_info_lookup, but tree_info_lookup is called from handle_ callbacks, which should be executed on with one context, so it can't be called concurrently... am I right?
(In reply to Alexander Larsson from comment #54) > Review of attachment 297763 [details] [review] [review]: > > Can't we just use meta_tree_exists() in writeout_timeout() instead? > > Although we'd have to change meta_tree_clear() to set fd to -1, not 0. This > seems better in general, as there seems to be a risk that we close fd 0 if > meta_tree_clear() is called twice. As I wrote before this patch requires attachment 146914 [details] [review] [review], which does those changes and I think there isn't difference whether we use meta_tree_exists in meta_tree_flush_locked, or in writeout_timeout... But I don't think it is enough. We shouldn't return tree info with invalid tree from tree_info_lookup. > It would be nice to get some debug info back from users though, as this bug > seems to indicate we're flushing a tree and creating an invalid file... It would be nice, but there is almost anything what we know about: https://bugzilla.redhat.com/show_bug.cgi?id=699109 There is over 70 users in CC, however nothing useful, and people doesn't respond - there are unanswered questions from Tomas and me... > ::: metadata/meta-daemon.c > @@ +125,3 @@ > if (info) > + { > + if (meta_tree_refresh (info->tree)) > > I'm not super fond of this. The daemon is the only thing that triggers the > need for a refresh (when flushing), so this kind of "see if we need to > refresh" call in the daemon is kind of weird. > > We should be able to have full knowledge of when the tree goes bad. So what to do when e.g. meta_tree_flush failed due to meta_tree_init and we have uninitialized tree in tree_infos cache? > @@ +214,3 @@ > else > { > + tree_info_schedule_writeout (info); > > This is not enough, is it? > We may succeed in a write and schedule a timeout, but then a later set() > call fails and makes the tree invalid. This isn't enough, but doesn't make sense to schedule timeout if there is error...
(In reply to Matthias Clasen from comment #55) > putting crashes on the blocker list Not sure it is real blocker since this bug is there for several releases (and years!)... We don't have reproducer and people don't responding on downstream bugzilla, probably because it occurs randomly and rarely.
(In reply to Ondrej Holy from comment #58) > (In reply to Matthias Clasen from comment #55) > > putting crashes on the blocker list > > Not sure it is real blocker since this bug is there for several releases > (and years!)... We don't have reproducer and people don't responding on > downstream bugzilla, probably because it occurs randomly and rarely. Fair enough. I just saw it swim by in my bug stream and picked it up. Let me drop it again.
Comment on attachment 297762 [details] [review] metadata: return if meta_tree_refresh failed commit 56326e463a56afa0e632baad4c94faa7aa154486
>> I'm not super fond of this. The daemon is the only thing that triggers the >> need for a refresh (when flushing), so this kind of "see if we need to >> refresh" call in the daemon is kind of weird. >> >> We should be able to have full knowledge of when the tree goes bad. > > So what to do when e.g. meta_tree_flush failed due to meta_tree_init and we > have uninitialized tree in tree_infos cache? This is the million dollar question is it not. When we failed to write out an updated tree last time we lost all the data in the previous tree. Now we crash when flushing out the new journal entries because of the NULL tree->root. Can't we just skip the copy_tree_to_builder() call if root is NULL and accept the fact that we lost the data. Delaying the writeout schedule is not going to bring back the data, but delaying it forever will lose the *new* data we've gotten since then.
Patch to always clear fd to -1 was already pushed as: commit 26c3dc3986760483625262a4a3564f1696f829b3
Created attachment 298991 [details] [review] metadata: don't crash if write out failed
Created attachment 298993 [details] [review] metadata: don't crash if write out failed
Review of attachment 298993 [details] [review]: Looks good to me
Review of attachment 298993 [details] [review]: Looks good to me, but I don't know the code so well :-) ::: metadata/metatree.c @@ +2365,3 @@ + { + /* It shouldn't happen. We failed to write out an updated tree + * probably, therefor all the data are lost. Backup the file and therefore
Comment on attachment 298993 [details] [review] metadata: don't crash if write out failed I've pushed it only into master and let's see what will happen... commit f7f18bb6e7576672c2240c8ead8dfa4a2668d478
Comment on attachment 146914 [details] [review] Check if MetaTree exists before flushing it Marking as obsolete, because those changes are already in by another commits: commit 26c3dc3986760483625262a4a3564f1696f829b3 commit a03635ee68f043b68c1bb4cb95114af3ecb65ec4
Comment on attachment 146922 [details] [review] Nuke metadata file if magic blob is wrong Marking as obsolete, see Bug 750113.
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gvfs/issues/122.