After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 701322 - occasional crash on login in gnome-ostree buildmaster
occasional crash on login in gnome-ostree buildmaster
Status: RESOLVED FIXED
Product: gnome-settings-daemon
Classification: Core
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: gnome-settings-daemon-maint
gnome-settings-daemon-maint
: 702088 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2013-05-31 01:16 UTC by Colin Walters
Modified: 2013-07-08 09:00 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
stack trace (5.40 KB, text/plain)
2013-05-31 01:16 UTC, Colin Walters
  Details
Avoid calling setenv after starting threads (4.28 KB, patch)
2013-06-08 18:13 UTC, Matthias Clasen
committed Details | Review
daemon: Port env helper to C (6.28 KB, patch)
2013-06-12 13:51 UTC, Bastien Nocera
rejected Details | Review
main: Fix incorrect quote removal (923 bytes, patch)
2013-06-12 14:57 UTC, Bastien Nocera
committed Details | Review
main: Use sh not bash for the localeexec helper (896 bytes, patch)
2013-06-12 14:57 UTC, Bastien Nocera
reviewed Details | Review
main: Use sh not bash for the localeexec helper (978 bytes, patch)
2013-06-12 15:33 UTC, Bastien Nocera
committed Details | Review

Description Colin Walters 2013-05-31 01:16:32 UTC
Created attachment 245695 [details]
stack trace

Check out this crash, it ends in getenv().  Almost certainly some code doing g_setenv() after threads have started.
Comment 1 Colin Walters 2013-05-31 01:17:04 UTC
Oops, this is a g-s-d bug, not dconf (necessarily).
Comment 2 Colin Walters 2013-05-31 01:19:58 UTC
Yeah, set_locale_env() can't be called after gtk_init() (which probably does gsettings -> dconf -> gdbus -> thread)
Comment 3 Colin Walters 2013-05-31 01:22:07 UTC
See https://bugzilla.gnome.org/show_bug.cgi?id=659326 for more information about why you can only call setenv() before creating threads.
Comment 4 Colin Walters 2013-06-01 00:52:27 UTC
IRC discussion:

walters> mclasen, btw did you see https://bugzilla.gnome.org/show_bug.cgi?id=701322 ?
<Services> Bug 701322: normal, Normal, ---, gnome-settings-daemon-maint, UNCONFIRMED, occasional crash on login in gnome-ostree buildmaster
<Company> mclasen: I was referencing walters' work
<walters> if you've noticed the smoketest failing occasionally, that's usually why
<Company> so no credit for me there
<walters> i started on a patch but it's ugly as all hell
<mclasen> I saw gettext and LANGUAGE being discussed somewhere recently
<mclasen> somebody has seen that same crash in fedora, I think
<walters> yeah i'm sure
<gnome-ostree> New build 20130531.45: successful in 186.6 seconds. built: pango http://build.gnome.org/ostree/buildmaster/tasks/build/successful/20130531.45/output.txt
<gnome-ostree> New smoketest 20130531.45: successful in 32.3 seconds. http://build.gnome.org/ostree/buildmaster/tasks/smoketest/successful/20130531.45/output.txt
<gnome-ostree> New build 20130531.46: successful in 178.3 seconds. built: pango http://build.gnome.org/ostree/buildmaster/tasks/build/successful/20130531.46/output.txt
<gnome-ostree> New integrationtest 20130531.42: failed in 138.6 seconds. SUMMARY: total: 242 passed: 237 skipped: 2 failed: 3 http://build.gnome.org/ostree/buildmaster/tasks/integrationtest/failed/20130531.42/output.txt
<mclasen> walters: can't you just move that call up a few lines ?
<mclasen> oh dang
<walters> mclasen, not afaics, the gsettings calls will init dconf, and that creates a thread
<mclasen> will it, though ?
<mclasen> we're just reading
<mclasen> thats mmapped
<gnome-ostree> New smoketest 20130531.46: successful in 31.3 seconds. http://build.gnome.org/ostree/buildmaster/tasks/smoketest/successful/20130531.46/output.txt
<mclasen> lets ask the expert
<mclasen> desrt: ^?
<desrt> hi
<mclasen> does g_settings_new, g_settings_get initialize gdbus and thus threads ?
<gnome-ostree> New integrationtest 20130531.43: failed in 145.7 seconds. SUMMARY: total: 250 passed: 247 skipped: 2 failed: 1 http://build.gnome.org/ostree/buildmaster/tasks/integrationtest/failed/20130531.43/output.txt
<desrt> yes
<mclasen> bummer
<walters> mclasen, cool, 242 -> 250
<desrt> g_settings_new(), alone, is sufficient to do that
<mclasen> walters: and all 8 passed first try
<desrt> i've often considered delaying it until g_settings_get() and only getting on dbus to watch for changes in the event that someone has connected a change signal handler
<mclasen> that would help us in this instance
<mclasen> because we're pulling a value out of gsettings to use for setenv
<mclasen> and setenv with threads -> eventual explosion
<walters> i was kind of thinking of adding g_spawn_setenv() which would be setenv only for all uses of g_spawn()
<halfline_laptop> https://bugzilla.redhat.com/show_bug.cgi?id=919855
<desrt> walters: REVIEW GSUBPROCESS K THX
<Services> Bug 919855: unspecified, unspecified, ---, bnocera, NEW , [abrt] gnome-settings-daemon-3.7.91-1.fc19: getenv: Process /usr/libexec/gnome-settings-daemon was killed by signal 11 (SIGSEGV)
<walters> desrt, heh
<mclasen> so, g_subprocess_setenv ?
<desrt> g_subprocess_launcher_setenv() in fact
<mclasen> walters: not really sufficient, though. there's plenty of ui in g-s-d
<mclasen> how about g_spawn ("gsettings get ...") ? :-(
<desrt> walters: i'm opposed to introducing global state to gspawn if that's what you were talking about....
<walters> desrt, yeah i agree
<walters> i just thought about it
<mclasen> walters: what was your approach to fixing set_locale_env ?
<walters> i got distracted, but here's it so far: http://pastebin.mozilla.org/2468833
<walters> if that's not too ugly to live, i can finish it
<mclasen> I guess you can save a few lines by calling the gsettings tool instead ?
<walters> we need to get multiple keys actually
<walters> what kinda stopped me though is we'd need to guarantee g_spawn_* doesn't create a thread
<walters> or i guess that the GLib worker thread never calls getenv()
<mclasen> an alternative would be get the setting, and pass it on to your reexec'ed self
<mclasen> the you can pick the value out of the commandline in the reexec'ed instance
<walters> that's a good idea
<mclasen> and all this just because gettext is not posix...
<halfline_laptop> this isn't going to help our startup speed problem
<walters> but one thing confuses me; why does all this stuff not also apply to gnome-shell?
<mclasen> reexecing yourself right at the start is not that expensive, I hope ?
<mclasen> cow, etc
<mclasen> walters: does the shell call setenv ?
<walters> mclasen, mmeeks did complain about gnome-session doing it, basically the biggest cost is running through the dynamic linker again
<mclasen> at least we're not loading the plugins twice
<walters> mclasen, you can git grep too ;)  answer is yes, though it looks mostly safe
<mclasen> I just did
<walters> what i meant is don't we also need LC_TIME and such in processes spawned from gnome-shell?
<halfline_laptop> yea feels like this code could be in gnome-session or something
<walters> halfline_laptop, exactly
<gnome-ostree> New build 20130531.47: successful in 273.2 seconds. built: gnome-shell http://build.gnome.org/ostree/buildmaster/tasks/build/successful/20130531.47/output.txt
<halfline_laptop> of course gnome-session has a SetEnv api
<mclasen> which gsd is using
<mclasen> to inject these values into the session env
<halfline_laptop> interesting
<walters> but g-s-d and g-s are launched in parallel
<mclasen> of course, same argument applies
<mclasen> gnome-session has some ui still ?
<halfline_laptop> no, i think you fixed that didn't you ?
<mclasen> which will miss those values unless gnome-session calls setenv
<mclasen> mostly
<mclasen> indeed
<mclasen> but gnome-session is launching zenity, not sure if we take care of using the proper environment for that
<halfline_laptop> gnome-shell isn't started in parallel btw
<mclasen> an alternative to reexec would be a small shellscript wrapper that uses the gsettings tool and injects the value into g-s-d as an argument
<halfline_laptop> yea i was thinking about that
<halfline_laptop> then i pictured lenny wagging his finger
<walters> halfline_laptop, are you sure?  what prevents that?
<halfline_laptop> walters: they're started in different "phases"
<halfline_laptop> so gnome-shell doesn't start until g-s-d registers
<walters> ah
<mclasen> those phases are why our login is so slow, right ?
<walters> ok so we can keep the code in g-s-d then
<halfline_laptop> but i'm not sure why we'd want to keep the code in g-s-d
<halfline_laptop> i mean i don't get the advantage of having it there
<walters> i have no strong opinion on where the code lives, just stating that we could leave it there
<gnome-ostree> New smoketest 20130531.47: successful in 30.7 seconds. http://build.gnome.org/ostree/buildmaster/tasks/smoketest/successful/20130531.47/output.txt
<mclasen> for 3.8.3, we clearly just want to fix the race without major code shifting
<halfline_laptop> fair enough
<gnome-ostree> New build 20130531.48: successful in 199.5 seconds. built: network-manager-applet http://build.gnome.org/ostree/buildmaster/tasks/build/successful/20130531.48/output.txt
<gnome-ostree> New integrationtest 20130531.44: failed in 137.7 seconds. SUMMARY: total: 250 passed: 246 skipped: 2 failed: 2 http://build.gnome.org/ostree/buildmaster/tasks/integrationtest/failed/20130531.44/output.txt
<halfline_laptop> so i'm sort of missing the nature of the thread unsafety
<desrt> you could also fork very very early
<desrt> with a pipe between
<desrt> do gsettings in the child
<desrt> and write the result to the pipe
<desrt> which is read by the parent
<walters> desrt, interesting idea...
<halfline_laptop> bleh
<desrt> it's ugly, but in practice it would be very few lines of code...
<halfline_laptop> cleaner to exec() imo
<desrt> i think i agree
<walters> halfline_laptop, the dynamic linking is not free
<desrt> using the gsettings tool from some sort of stdout-capturing spawn would be nicer
<mclasen> halfline_laptop: gettext calls getenv("LANGUAGE")
<mclasen> a great gnu extension...
<desrt> mclasen: gettext callls getenv() on a heck of a lot more things than that
<mclasen> does it ?
<mclasen> i always thought that was the only one
<desrt> ah.  you know maybe you're right.
<gnome-ostree> New build 20130531.49: successful in 194.9 seconds. built: libpeas http://build.gnome.org/ostree/buildmaster/tasks/build/successful/20130531.49/output.txt
<mclasen> and it only has to do that because it is not properly integrated in the locale system
<desrt> it's setlocale() that does the env work
<desrt> and gettext uses that?
<halfline_laptop> so my understanding of how putenv works is, there's a big block of contingous memory where environment variables are set
<halfline_laptop> oh no i'm not going to finish because i'm wrong
<halfline_laptop> you pass malloced memory to putenv
<gnome-ostree> New smoketest 20130531.48: failed in 108.6 seconds. http://build.gnome.org/ostree/buildmaster/tasks/smoketest/failed/20130531.48/output.txt
<walters> and there's the crash in question =)
<halfline_laptop> so I guess putenv() just updates the environ pointer if necessary to point to the new memory location
<mclasen> desrt: http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-runtime/intl/dcigettext.c#n1572
<halfline_laptop> someone walk me through the unsafe part
<walters> halfline_laptop, free() of the old location while it's being traversed
<desrt>       const char *logfilename = getenv ("GETTEXT_LOG_UNTRANSLATED");
<desrt>    const char *value = getenv ("OUTPUT_CHARSET");
<halfline_laptop> walters: who's doing the freeing ?
<halfline_laptop> libc doesn't take control of the memory you pass into putenv
<gnome-ostree> New smoketest 20130531.49: successful in 30.4 seconds. http://build.gnome.org/ostree/buildmaster/tasks/smoketest/successful/20130531.49/output.txt
<walters> halfline_laptop, setenv(), no?
<mclasen> nothing frees it, iirc
<halfline_laptop> also the corruption is seemingly in the key not the value, right?
<halfline_laptop> we're seeing NGUAGE instead of LANGUAGE in the stack trace
<walters> read the glibc function: int __add_to_environ (name, value, combined, replace)
<walters> in stdlib/setenv.c
<desrt> mclasen: it's funny that you paste this file.... i was reading gettext source just last night (btw: the gsettings test that did german translation was also broken) thinking about how easy it would be to reimplement
<gnome-ostree> New integrationtest 20130531.45: failed in 131.5 seconds. SUMMARY: total: 250 passed: 247 skipped: 2 failed: 1 http://build.gnome.org/ostree/buildmaster/tasks/integrationtest/failed/20130531.45/output.txt
<halfline_laptop> http://www.scs.stanford.edu/histar/src/pkg/uclibc/libc/stdlib/setenv.c
<halfline_laptop> there's a lock!
<walters> that's uclibc
<walters> (a different thing than glibc and its fork eglibc)
<mclasen> pretty sure that wouldn't fly in glibc
<halfline_laptop> well why do they have the same function name
<halfline_laptop> that's annoying
<walters> probably uclibc started by forking glibc and deleting all the locale stuff
<desrt> mclasen: we could probably pretty reasonably leverage the existing tools and database and just have our own .mo file parser
<desrt> and all of our environment-related gettext troubles would be gone
<mclasen> is that what kde does ?
<desrt> do they have something?
<mclasen> they have their own format syntax though, right ?
<walters> halfline_laptop, btw you should do your irc discussion copy for this one to the bug
<walters> when it reaches some sort of conclusion
<halfline_laptop> http://fossies.org/dox/glibc-2.17/setenv_8c_source.html#l00109
<mclasen> I vaguely recall some %1 %2 %3 syntax
<halfline_laptop> same code as uclibc
<walters> halfline_laptop, right, but getenv is not locked
<mclasen> not sure what that lock is protecting, but there's no locking in getenv that I can see
<halfline_laptop> well i'm stilling missing how thread unsafety can cause setlocale() to call getenv("NGUAGE") instead of getenv("LANGUAGE")
<halfline_laptop> or gettext or whatever is doing the getenv call
<walters> halfline_laptop, one thread is calling setenv(), it updates the array and free()s the old pointer while another thread is still reading from freed memory that gets then reused for something else
<halfline_laptop> but if i run getenv("LANGUAGE") i'm not passing in a string from the freed memory
<halfline_laptop> i'm passing in a literal string
<walters> yes but getenv() has to traverse the array to find what you're looking for, and the array has been freed during the call
<walters> right?
* amigadave has quit (leaving)
<mclasen> if you look at the getenv implementation, it actually increments the passed name pointer
<halfline_laptop> ahh
<halfline_laptop> so the NGUAGE in the stack trace is a red herring
<walters> oh yeah i dunno why that's happening
<halfline_laptop> so if i'm reading the code right it looks like it won't free environ if it didn't allocate it
<halfline_laptop> i guess that doesn't really help us, because we still don't know when it's safe to free it
<halfline_laptop> man the number man hours spent debugging problems related to environment variables has to border on the number of man hours spent debugging problems related to signal handling
Comment 5 Matthias Clasen 2013-06-08 18:13:15 UTC
Created attachment 246323 [details] [review]
Avoid calling setenv after starting threads

In particular, we can't get the locale value out of GSettings
in order to set LC_PAPER etc, since calling into GSettings
initializes the dconf backend which in turn uses gdbus, which
starts a worker thread.

As a simple workaround, set up the locale environment in
a small wrapper script that then exec's the g-s-d binary.
Comment 6 Colin Walters 2013-06-08 18:32:56 UTC
Review of attachment 246323 [details] [review]:

This looks like it will work.  It's probably the cleanest solution.  I just have some minor comments.

::: data/gnome-settings-daemon.desktop.in.in
@@ +2,3 @@
 Type=Application
 _Name=GNOME Settings Daemon
+Exec=@libexecdir@/gnome-settings-daemon.sh

I've never liked adding .sh and such extensions; the language of a binary is an implementation detail.  Maybe gnome-settings-daemon-localeexec ?  Dunno.  Feel free to keep the .sh.

::: gnome-settings-daemon/gnome-settings-daemon.sh.in
@@ +1,3 @@
+#! /bin/sh
+
+REGION=`gsettings get org.gnome.system.locale region | tr -d \'`

With this we will run the dynamic linker three times (once for /bin/sh which is pretty small, but gsettings links to a lot).
Comment 7 Matthias Clasen 2013-06-09 03:44:31 UTC
> With this we will run the dynamic linker three times (once for /bin/sh which is
> pretty small, but gsettings links to a lot).

Any idea for getting rid of the single quotes without an external command ?
We could give gsettings an option to print values 'naturally', instead of as a serialized variant
Comment 8 Matthias Clasen 2013-06-09 03:52:23 UTC
after some studying of info bash, this seems to work:

SETTING=$(gsettings get org.gnome.system.locale region)
REGION=${SETTING//\,/}
Comment 9 Colin Walters 2013-06-09 16:53:54 UTC
(In reply to comment #8)
> after some studying of info bash, this seems to work:
> 
> SETTING=$(gsettings get org.gnome.system.locale region)
> REGION=${SETTING//\,/}

I'm fine with the original patch too, btw.  If you do go with the in-process tr, I'm not sure whether that's bash-specific, so might as well change it to
#!/bin/bash
Comment 10 Matthias Clasen 2013-06-10 19:59:40 UTC
Attachment 246323 [details] pushed as 5cb44f6 - Avoid calling setenv after starting threads
Comment 11 Bastien Nocera 2013-06-11 10:25:43 UTC
I'll reopen this, as I'm not happy with having bash running during startup, and I don't trust bash code to be able to deal with weird quoting in any of the values read from GSettings, breaking the desktop.
Comment 12 Bastien Nocera 2013-06-12 12:35:50 UTC
*** Bug 702088 has been marked as a duplicate of this bug. ***
Comment 13 Bastien Nocera 2013-06-12 12:36:23 UTC
Bug 702088 shows the first set of problems.
Comment 14 Colin Walters 2013-06-12 13:45:01 UTC
(In reply to comment #13)
> Bug 702088 shows the first set of problems.

The only other sane approach I think is what Ryan suggested:

<desrt> you could also fork very very early
<desrt> with a pipe between
<desrt> do gsettings in the child
<desrt> and write the result to the pipe
<desrt> which is read by the parent
Comment 15 Bastien Nocera 2013-06-12 13:51:02 UTC
Created attachment 246642 [details] [review]
daemon: Port env helper to C

Instead of using sh, which would require calling out a separate
gsettings binary, and parsing its output before calling out to export.

FIXME: Launching g-s-d isn't implemented in this first pass
Comment 16 Colin Walters 2013-06-12 13:53:33 UTC
Review of attachment 246642 [details] [review]:

::: gnome-settings-daemon/gnome-settings-daemon-localeexec.c
@@ +62,3 @@
+        region = g_settings_get_string (locale_settings, "region");
+        if (region[0]) {
+                g_setenv ("LC_TIME", region, TRUE);

This is just the original broken code, but in a new binary?  I don't understand how that helps.
Comment 17 Bastien Nocera 2013-06-12 13:54:53 UTC
You're right, sigh. Not sure what I was thinking.
Comment 18 Bastien Nocera 2013-06-12 14:45:49 UTC
Filed http://sourceware.org/bugzilla/show_bug.cgi?id=15607
Comment 19 Bastien Nocera 2013-06-12 14:54:15 UTC
(In reply to comment #8)
> after some studying of info bash, this seems to work:
> 
> SETTING=$(gsettings get org.gnome.system.locale region)
> REGION=${SETTING//\,/}

That should have been:
REGION=${SETTING//\'/}
Comment 20 Bastien Nocera 2013-06-12 14:57:43 UTC
Created attachment 246649 [details] [review]
main: Fix incorrect quote removal

We were trying to remove a comma instead of a single-quote.
Comment 21 Bastien Nocera 2013-06-12 14:57:58 UTC
Created attachment 246650 [details] [review]
main: Use sh not bash for the localeexec helper

The script doesn't use bash features, so should also work with
sh compatible versions.
Comment 22 Colin Walters 2013-06-12 15:01:42 UTC
Review of attachment 246649 [details] [review]:

This looks right to me; I tested the substitution pattern in an interactive shell.
Comment 23 Colin Walters 2013-06-12 15:11:58 UTC
Review of attachment 246650 [details] [review]:

This doesn't actually work in dash, which AFAIK tries hard to strictly follow POSIX /bin/sh:

$ dash
$ setting="'foo'"
$ echo ${setting//\'/}
dash: Bad substitution
$  <control-d>

So I think we need to keep the bash interpreter.
Comment 24 Mantas Mikulėnas (grawity) 2013-06-12 15:26:49 UTC
(In reply to comment #23)
> Review of attachment 246650 [details] [review]:
> 
> This doesn't actually work in dash, which AFAIK tries hard to strictly follow
> POSIX /bin/sh:
> 
> $ dash
> $ setting="'foo'"
> $ echo ${setting//\'/}
> dash: Bad substitution
> $  <control-d>
> 
> So I think we need to keep the bash interpreter.

You could use:

setting=${setting#\'}
setting=${setting%\'}

assuming that there are no single-quotes in the middle.
Comment 25 Bastien Nocera 2013-06-12 15:33:51 UTC
Created attachment 246651 [details] [review]
main: Use sh not bash for the localeexec helper

Replace the bash-only substitution with an sh-compatible version.
Comment 26 Colin Walters 2013-06-12 21:28:59 UTC
Review of attachment 246651 [details] [review]:

Looks fine to me.
Comment 27 Bastien Nocera 2013-06-13 08:45:03 UTC
Attachment 246649 [details] pushed as b0ca46d - main: Fix incorrect quote removal
Attachment 246651 [details] pushed as 8a88eee - main: Use sh not bash for the localeexec helper
Comment 28 Peter 2013-07-03 11:19:55 UTC
Hi!
Looks like the external shell-script seems to be the only workaround, at least for now? Will we see a general solution, implemented inside gnome-settings-daemon, with a later release?

The developers of GLIBC described their point to getenv() add getenv_r() and POSIX.
Comment 29 Bastien Nocera 2013-07-05 08:34:28 UTC
(In reply to comment #28)
> Hi!
> Looks like the external shell-script seems to be the only workaround, at least
> for now? Will we see a general solution, implemented inside
> gnome-settings-daemon, with a later release?

Unlikely.

> The developers of GLIBC described their point to getenv() add getenv_r() and
> POSIX.

I didn't get that. They said they're not interested, pretty much.
Comment 30 Peter 2013-07-08 09:00:36 UTC
Okay. Just glad to see a fix for this. I wondered for weeks, why nobody else faced this bug :-)

> I didn't get that. They said they're not interested, pretty much.
Sorry for confusion! I just want remind that the people from GLIBC don't want extend/blow up the API.