GNOME Bugzilla – Bug 162999
Thread unsafe image loaders can cause a deadlock
Last modified: 2010-07-10 04:07:59 UTC
Yesterday, I noticed a crash when executing 'vino-preferences'. It triggered an assertion in NetBSD's libpthread, and then crashed with a SIGABRT: assertion "next != 0" failed: file "/home/jmmv/NetBSD/src/lib/libpthread/pthread_run.c", line 130, function "pthread__next" Although cryptic, this means (or at least I think so) that the program has reached a deadlock condition (i.e., there are no more available threads to run, hence "next is 0"). Running gdb showed: [...]
+ Trace 54159
Which made me think about some problem in the current theme (Nuvola). Effectively, I switched to another one, Wasp, and everything started to work again. Tried to copy a .svg image inside Wasp's directory, using the gnome-lockscreen.svg name, and the problem reappeared. I tracked down the problem and found that there is some code in gdk-pixbuf that issues a "nested lock", i.e., it tries to lock the same mutex from two different functions from the same calling thread. This only happens when the underlying image loader (in this case, the svg one) is not thread safe on its own, so gdk-pixbuf has to ensure mutual exclusive access to it. More specifically, the gdk_pixbuf_animation_new_from_file (which happens when the original program uses gtk_image_set_from_file) calls _gdk_pixbuf_lock. Then, in the critical section, it calls _gdk_pixbuf_generic_image_load, which in turn calls _gdk_pixbuf_lock again. There it is, the nested lock. I've fixed this issue by narrowing the critical code block in gdk_pixbuf_animation_new_from_file. As I see it, the lock is only needed to ensure mutual exclusive access to the image_module->load_animation function. All other accesses to the image_module structure in the same function are either to check if load_animation is not NULL or to get the module's name (module_name field). These accesses do not need to be in the critical section, because they are read only (filled at module load time). Unfortunately, I haven't been able to reproduce this problem with a little testcase (that initializes threads and calls the offending problem). However, it's 100% reproducible using vino-preferences and Nuvola. (Dunno if it will happen in Linux, though, because I'm using NetBSD.)
Created attachment 35464 [details] [review] Sample patch This patch is against GTK+ 2.6.0 code. I would have preferred to provide it against CVS head, but anoncvs.gnome.org seems to be down, so I can't fetch nor diff the code.
Ok... now that anoncvs seems to be up again, I have updated my sources and found: 2004-12-27 Matthias Clasen <mclasen@redhat.com> * gdk-pixbuf-animation.c (gdk_pixbuf_animation_new_from_file) Avoid deadlock. Pointed out by Callum McKenzie. So this is already fixed. However, what do you think about adapting this patch to the existing code and applying it? This one removes a goto that becomes useless after the fix, while the current code still keeps it.
Created attachment 35469 [details] [review] Patch against CVS This new patch only cleans up the existing code to remove an useless goto construction (used before the problem was fixed, but not after). Should not change functionality.
I'd rather leave the code as is