GNOME Bugzilla – Bug 701322
occasional crash on login in gnome-ostree buildmaster
Last modified: 2013-07-08 09:00:36 UTC
Created attachment 245695 [details] stack trace Check out this crash, it ends in getenv(). Almost certainly some code doing g_setenv() after threads have started.
Oops, this is a g-s-d bug, not dconf (necessarily).
Yeah, set_locale_env() can't be called after gtk_init() (which probably does gsettings -> dconf -> gdbus -> thread)
See https://bugzilla.gnome.org/show_bug.cgi?id=659326 for more information about why you can only call setenv() before creating threads.
IRC discussion: walters> mclasen, btw did you see https://bugzilla.gnome.org/show_bug.cgi?id=701322 ? <Services> Bug 701322: normal, Normal, ---, gnome-settings-daemon-maint, UNCONFIRMED, occasional crash on login in gnome-ostree buildmaster <Company> mclasen: I was referencing walters' work <walters> if you've noticed the smoketest failing occasionally, that's usually why <Company> so no credit for me there <walters> i started on a patch but it's ugly as all hell <mclasen> I saw gettext and LANGUAGE being discussed somewhere recently <mclasen> somebody has seen that same crash in fedora, I think <walters> yeah i'm sure <gnome-ostree> New build 20130531.45: successful in 186.6 seconds. built: pango http://build.gnome.org/ostree/buildmaster/tasks/build/successful/20130531.45/output.txt <gnome-ostree> New smoketest 20130531.45: successful in 32.3 seconds. http://build.gnome.org/ostree/buildmaster/tasks/smoketest/successful/20130531.45/output.txt <gnome-ostree> New build 20130531.46: successful in 178.3 seconds. built: pango http://build.gnome.org/ostree/buildmaster/tasks/build/successful/20130531.46/output.txt <gnome-ostree> New integrationtest 20130531.42: failed in 138.6 seconds. SUMMARY: total: 242 passed: 237 skipped: 2 failed: 3 http://build.gnome.org/ostree/buildmaster/tasks/integrationtest/failed/20130531.42/output.txt <mclasen> walters: can't you just move that call up a few lines ? <mclasen> oh dang <walters> mclasen, not afaics, the gsettings calls will init dconf, and that creates a thread <mclasen> will it, though ? <mclasen> we're just reading <mclasen> thats mmapped <gnome-ostree> New smoketest 20130531.46: successful in 31.3 seconds. http://build.gnome.org/ostree/buildmaster/tasks/smoketest/successful/20130531.46/output.txt <mclasen> lets ask the expert <mclasen> desrt: ^? <desrt> hi <mclasen> does g_settings_new, g_settings_get initialize gdbus and thus threads ? <gnome-ostree> New integrationtest 20130531.43: failed in 145.7 seconds. SUMMARY: total: 250 passed: 247 skipped: 2 failed: 1 http://build.gnome.org/ostree/buildmaster/tasks/integrationtest/failed/20130531.43/output.txt <desrt> yes <mclasen> bummer <walters> mclasen, cool, 242 -> 250 <desrt> g_settings_new(), alone, is sufficient to do that <mclasen> walters: and all 8 passed first try <desrt> i've often considered delaying it until g_settings_get() and only getting on dbus to watch for changes in the event that someone has connected a change signal handler <mclasen> that would help us in this instance <mclasen> because we're pulling a value out of gsettings to use for setenv <mclasen> and setenv with threads -> eventual explosion <walters> i was kind of thinking of adding g_spawn_setenv() which would be setenv only for all uses of g_spawn() <halfline_laptop> https://bugzilla.redhat.com/show_bug.cgi?id=919855 <desrt> walters: REVIEW GSUBPROCESS K THX <Services> Bug 919855: unspecified, unspecified, ---, bnocera, NEW , [abrt] gnome-settings-daemon-3.7.91-1.fc19: getenv: Process /usr/libexec/gnome-settings-daemon was killed by signal 11 (SIGSEGV) <walters> desrt, heh <mclasen> so, g_subprocess_setenv ? <desrt> g_subprocess_launcher_setenv() in fact <mclasen> walters: not really sufficient, though. there's plenty of ui in g-s-d <mclasen> how about g_spawn ("gsettings get ...") ? :-( <desrt> walters: i'm opposed to introducing global state to gspawn if that's what you were talking about.... <walters> desrt, yeah i agree <walters> i just thought about it <mclasen> walters: what was your approach to fixing set_locale_env ? <walters> i got distracted, but here's it so far: http://pastebin.mozilla.org/2468833 <walters> if that's not too ugly to live, i can finish it <mclasen> I guess you can save a few lines by calling the gsettings tool instead ? <walters> we need to get multiple keys actually <walters> what kinda stopped me though is we'd need to guarantee g_spawn_* doesn't create a thread <walters> or i guess that the GLib worker thread never calls getenv() <mclasen> an alternative would be get the setting, and pass it on to your reexec'ed self <mclasen> the you can pick the value out of the commandline in the reexec'ed instance <walters> that's a good idea <mclasen> and all this just because gettext is not posix... <halfline_laptop> this isn't going to help our startup speed problem <walters> but one thing confuses me; why does all this stuff not also apply to gnome-shell? <mclasen> reexecing yourself right at the start is not that expensive, I hope ? <mclasen> cow, etc <mclasen> walters: does the shell call setenv ? <walters> mclasen, mmeeks did complain about gnome-session doing it, basically the biggest cost is running through the dynamic linker again <mclasen> at least we're not loading the plugins twice <walters> mclasen, you can git grep too ;) answer is yes, though it looks mostly safe <mclasen> I just did <walters> what i meant is don't we also need LC_TIME and such in processes spawned from gnome-shell? <halfline_laptop> yea feels like this code could be in gnome-session or something <walters> halfline_laptop, exactly <gnome-ostree> New build 20130531.47: successful in 273.2 seconds. built: gnome-shell http://build.gnome.org/ostree/buildmaster/tasks/build/successful/20130531.47/output.txt <halfline_laptop> of course gnome-session has a SetEnv api <mclasen> which gsd is using <mclasen> to inject these values into the session env <halfline_laptop> interesting <walters> but g-s-d and g-s are launched in parallel <mclasen> of course, same argument applies <mclasen> gnome-session has some ui still ? <halfline_laptop> no, i think you fixed that didn't you ? <mclasen> which will miss those values unless gnome-session calls setenv <mclasen> mostly <mclasen> indeed <mclasen> but gnome-session is launching zenity, not sure if we take care of using the proper environment for that <halfline_laptop> gnome-shell isn't started in parallel btw <mclasen> an alternative to reexec would be a small shellscript wrapper that uses the gsettings tool and injects the value into g-s-d as an argument <halfline_laptop> yea i was thinking about that <halfline_laptop> then i pictured lenny wagging his finger <walters> halfline_laptop, are you sure? what prevents that? <halfline_laptop> walters: they're started in different "phases" <halfline_laptop> so gnome-shell doesn't start until g-s-d registers <walters> ah <mclasen> those phases are why our login is so slow, right ? <walters> ok so we can keep the code in g-s-d then <halfline_laptop> but i'm not sure why we'd want to keep the code in g-s-d <halfline_laptop> i mean i don't get the advantage of having it there <walters> i have no strong opinion on where the code lives, just stating that we could leave it there <gnome-ostree> New smoketest 20130531.47: successful in 30.7 seconds. http://build.gnome.org/ostree/buildmaster/tasks/smoketest/successful/20130531.47/output.txt <mclasen> for 3.8.3, we clearly just want to fix the race without major code shifting <halfline_laptop> fair enough <gnome-ostree> New build 20130531.48: successful in 199.5 seconds. built: network-manager-applet http://build.gnome.org/ostree/buildmaster/tasks/build/successful/20130531.48/output.txt <gnome-ostree> New integrationtest 20130531.44: failed in 137.7 seconds. SUMMARY: total: 250 passed: 246 skipped: 2 failed: 2 http://build.gnome.org/ostree/buildmaster/tasks/integrationtest/failed/20130531.44/output.txt <halfline_laptop> so i'm sort of missing the nature of the thread unsafety <desrt> you could also fork very very early <desrt> with a pipe between <desrt> do gsettings in the child <desrt> and write the result to the pipe <desrt> which is read by the parent <walters> desrt, interesting idea... <halfline_laptop> bleh <desrt> it's ugly, but in practice it would be very few lines of code... <halfline_laptop> cleaner to exec() imo <desrt> i think i agree <walters> halfline_laptop, the dynamic linking is not free <desrt> using the gsettings tool from some sort of stdout-capturing spawn would be nicer <mclasen> halfline_laptop: gettext calls getenv("LANGUAGE") <mclasen> a great gnu extension... <desrt> mclasen: gettext callls getenv() on a heck of a lot more things than that <mclasen> does it ? <mclasen> i always thought that was the only one <desrt> ah. you know maybe you're right. <gnome-ostree> New build 20130531.49: successful in 194.9 seconds. built: libpeas http://build.gnome.org/ostree/buildmaster/tasks/build/successful/20130531.49/output.txt <mclasen> and it only has to do that because it is not properly integrated in the locale system <desrt> it's setlocale() that does the env work <desrt> and gettext uses that? <halfline_laptop> so my understanding of how putenv works is, there's a big block of contingous memory where environment variables are set <halfline_laptop> oh no i'm not going to finish because i'm wrong <halfline_laptop> you pass malloced memory to putenv <gnome-ostree> New smoketest 20130531.48: failed in 108.6 seconds. http://build.gnome.org/ostree/buildmaster/tasks/smoketest/failed/20130531.48/output.txt <walters> and there's the crash in question =) <halfline_laptop> so I guess putenv() just updates the environ pointer if necessary to point to the new memory location <mclasen> desrt: http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-runtime/intl/dcigettext.c#n1572 <halfline_laptop> someone walk me through the unsafe part <walters> halfline_laptop, free() of the old location while it's being traversed <desrt> const char *logfilename = getenv ("GETTEXT_LOG_UNTRANSLATED"); <desrt> const char *value = getenv ("OUTPUT_CHARSET"); <halfline_laptop> walters: who's doing the freeing ? <halfline_laptop> libc doesn't take control of the memory you pass into putenv <gnome-ostree> New smoketest 20130531.49: successful in 30.4 seconds. http://build.gnome.org/ostree/buildmaster/tasks/smoketest/successful/20130531.49/output.txt <walters> halfline_laptop, setenv(), no? <mclasen> nothing frees it, iirc <halfline_laptop> also the corruption is seemingly in the key not the value, right? <halfline_laptop> we're seeing NGUAGE instead of LANGUAGE in the stack trace <walters> read the glibc function: int __add_to_environ (name, value, combined, replace) <walters> in stdlib/setenv.c <desrt> mclasen: it's funny that you paste this file.... i was reading gettext source just last night (btw: the gsettings test that did german translation was also broken) thinking about how easy it would be to reimplement <gnome-ostree> New integrationtest 20130531.45: failed in 131.5 seconds. SUMMARY: total: 250 passed: 247 skipped: 2 failed: 1 http://build.gnome.org/ostree/buildmaster/tasks/integrationtest/failed/20130531.45/output.txt <halfline_laptop> http://www.scs.stanford.edu/histar/src/pkg/uclibc/libc/stdlib/setenv.c <halfline_laptop> there's a lock! <walters> that's uclibc <walters> (a different thing than glibc and its fork eglibc) <mclasen> pretty sure that wouldn't fly in glibc <halfline_laptop> well why do they have the same function name <halfline_laptop> that's annoying <walters> probably uclibc started by forking glibc and deleting all the locale stuff <desrt> mclasen: we could probably pretty reasonably leverage the existing tools and database and just have our own .mo file parser <desrt> and all of our environment-related gettext troubles would be gone <mclasen> is that what kde does ? <desrt> do they have something? <mclasen> they have their own format syntax though, right ? <walters> halfline_laptop, btw you should do your irc discussion copy for this one to the bug <walters> when it reaches some sort of conclusion <halfline_laptop> http://fossies.org/dox/glibc-2.17/setenv_8c_source.html#l00109 <mclasen> I vaguely recall some %1 %2 %3 syntax <halfline_laptop> same code as uclibc <walters> halfline_laptop, right, but getenv is not locked <mclasen> not sure what that lock is protecting, but there's no locking in getenv that I can see <halfline_laptop> well i'm stilling missing how thread unsafety can cause setlocale() to call getenv("NGUAGE") instead of getenv("LANGUAGE") <halfline_laptop> or gettext or whatever is doing the getenv call <walters> halfline_laptop, one thread is calling setenv(), it updates the array and free()s the old pointer while another thread is still reading from freed memory that gets then reused for something else <halfline_laptop> but if i run getenv("LANGUAGE") i'm not passing in a string from the freed memory <halfline_laptop> i'm passing in a literal string <walters> yes but getenv() has to traverse the array to find what you're looking for, and the array has been freed during the call <walters> right? * amigadave has quit (leaving) <mclasen> if you look at the getenv implementation, it actually increments the passed name pointer <halfline_laptop> ahh <halfline_laptop> so the NGUAGE in the stack trace is a red herring <walters> oh yeah i dunno why that's happening <halfline_laptop> so if i'm reading the code right it looks like it won't free environ if it didn't allocate it <halfline_laptop> i guess that doesn't really help us, because we still don't know when it's safe to free it <halfline_laptop> man the number man hours spent debugging problems related to environment variables has to border on the number of man hours spent debugging problems related to signal handling
Created attachment 246323 [details] [review] Avoid calling setenv after starting threads In particular, we can't get the locale value out of GSettings in order to set LC_PAPER etc, since calling into GSettings initializes the dconf backend which in turn uses gdbus, which starts a worker thread. As a simple workaround, set up the locale environment in a small wrapper script that then exec's the g-s-d binary.
Review of attachment 246323 [details] [review]: This looks like it will work. It's probably the cleanest solution. I just have some minor comments. ::: data/gnome-settings-daemon.desktop.in.in @@ +2,3 @@ Type=Application _Name=GNOME Settings Daemon +Exec=@libexecdir@/gnome-settings-daemon.sh I've never liked adding .sh and such extensions; the language of a binary is an implementation detail. Maybe gnome-settings-daemon-localeexec ? Dunno. Feel free to keep the .sh. ::: gnome-settings-daemon/gnome-settings-daemon.sh.in @@ +1,3 @@ +#! /bin/sh + +REGION=`gsettings get org.gnome.system.locale region | tr -d \'` With this we will run the dynamic linker three times (once for /bin/sh which is pretty small, but gsettings links to a lot).
> With this we will run the dynamic linker three times (once for /bin/sh which is > pretty small, but gsettings links to a lot). Any idea for getting rid of the single quotes without an external command ? We could give gsettings an option to print values 'naturally', instead of as a serialized variant
after some studying of info bash, this seems to work: SETTING=$(gsettings get org.gnome.system.locale region) REGION=${SETTING//\,/}
(In reply to comment #8) > after some studying of info bash, this seems to work: > > SETTING=$(gsettings get org.gnome.system.locale region) > REGION=${SETTING//\,/} I'm fine with the original patch too, btw. If you do go with the in-process tr, I'm not sure whether that's bash-specific, so might as well change it to #!/bin/bash
Attachment 246323 [details] pushed as 5cb44f6 - Avoid calling setenv after starting threads
I'll reopen this, as I'm not happy with having bash running during startup, and I don't trust bash code to be able to deal with weird quoting in any of the values read from GSettings, breaking the desktop.
*** Bug 702088 has been marked as a duplicate of this bug. ***
Bug 702088 shows the first set of problems.
(In reply to comment #13) > Bug 702088 shows the first set of problems. The only other sane approach I think is what Ryan suggested: <desrt> you could also fork very very early <desrt> with a pipe between <desrt> do gsettings in the child <desrt> and write the result to the pipe <desrt> which is read by the parent
Created attachment 246642 [details] [review] daemon: Port env helper to C Instead of using sh, which would require calling out a separate gsettings binary, and parsing its output before calling out to export. FIXME: Launching g-s-d isn't implemented in this first pass
Review of attachment 246642 [details] [review]: ::: gnome-settings-daemon/gnome-settings-daemon-localeexec.c @@ +62,3 @@ + region = g_settings_get_string (locale_settings, "region"); + if (region[0]) { + g_setenv ("LC_TIME", region, TRUE); This is just the original broken code, but in a new binary? I don't understand how that helps.
You're right, sigh. Not sure what I was thinking.
Filed http://sourceware.org/bugzilla/show_bug.cgi?id=15607
(In reply to comment #8) > after some studying of info bash, this seems to work: > > SETTING=$(gsettings get org.gnome.system.locale region) > REGION=${SETTING//\,/} That should have been: REGION=${SETTING//\'/}
Created attachment 246649 [details] [review] main: Fix incorrect quote removal We were trying to remove a comma instead of a single-quote.
Created attachment 246650 [details] [review] main: Use sh not bash for the localeexec helper The script doesn't use bash features, so should also work with sh compatible versions.
Review of attachment 246649 [details] [review]: This looks right to me; I tested the substitution pattern in an interactive shell.
Review of attachment 246650 [details] [review]: This doesn't actually work in dash, which AFAIK tries hard to strictly follow POSIX /bin/sh: $ dash $ setting="'foo'" $ echo ${setting//\'/} dash: Bad substitution $ <control-d> So I think we need to keep the bash interpreter.
(In reply to comment #23) > Review of attachment 246650 [details] [review]: > > This doesn't actually work in dash, which AFAIK tries hard to strictly follow > POSIX /bin/sh: > > $ dash > $ setting="'foo'" > $ echo ${setting//\'/} > dash: Bad substitution > $ <control-d> > > So I think we need to keep the bash interpreter. You could use: setting=${setting#\'} setting=${setting%\'} assuming that there are no single-quotes in the middle.
Created attachment 246651 [details] [review] main: Use sh not bash for the localeexec helper Replace the bash-only substitution with an sh-compatible version.
Review of attachment 246651 [details] [review]: Looks fine to me.
Attachment 246649 [details] pushed as b0ca46d - main: Fix incorrect quote removal Attachment 246651 [details] pushed as 8a88eee - main: Use sh not bash for the localeexec helper
Hi! Looks like the external shell-script seems to be the only workaround, at least for now? Will we see a general solution, implemented inside gnome-settings-daemon, with a later release? The developers of GLIBC described their point to getenv() add getenv_r() and POSIX.
(In reply to comment #28) > Hi! > Looks like the external shell-script seems to be the only workaround, at least > for now? Will we see a general solution, implemented inside > gnome-settings-daemon, with a later release? Unlikely. > The developers of GLIBC described their point to getenv() add getenv_r() and > POSIX. I didn't get that. They said they're not interested, pretty much.
Okay. Just glad to see a fix for this. I wondered for weeks, why nobody else faced this bug :-) > I didn't get that. They said they're not interested, pretty much. Sorry for confusion! I just want remind that the people from GLIBC don't want extend/blow up the API.