GNOME Bugzilla – Bug 337332
Fix podcast manager locking insanity
Last modified: 2007-04-26 16:45:17 UTC
Steps to reproduce: 1. Start Rhythmbox with "GNOME_DISABLE_CRASH_DIALOG=1 rhythmbox" 2. Close Rhythmbox Stack trace: Core was generated by `rhythmbox'. Program terminated with signal 6, Aborted.
+ Trace 67473
Thread 4 (Thread 0x8130000 (LWP 100169))
Other information: It appears pthread_mutex_destroy() is returning EBUSY. This would indicate the mutex is locked by another thread. The output on the console is: GThread-ERROR **: file gthread-posix.c: line 161 (): error 'Device busy' during 'pthread_mutex_destroy ((pthread_mutex_t *) mutex)' aborting... Abort trap: 6 (core dumped)
This is probably the same problem as described in bug #332302. Can you run strace -f -p <pid-of-rhythmbox>?
This is assuming that the "crash" is actually a hang. If not, you should be able to strace rhythmbox.
Created attachment 62786 [details] Output from strace. strace was started after just before I closed rhythmbox.
It appears to be a threading problem with podcasts. I can maje Rhythmbox crash with the same backtrace as thread 1 by downloading a podcast.
Created attachment 62789 [details] Backtrace from crash when downloading a podcast This backtrace is when I attempt to download a podcast. The terminal window has the same error message as described in the original bug. Thread 1 has the same backtrace going through abort().
The locking in the podcast manager code is moderately insane, and it needs to be reworked. I'm (slowly) working on it. I don't think there's much point investigating this particular issue further.
Retitling the bug to be clearer.
I noticed a problem on FreeBSD that I think is related. The mutex_working mutex is locked before the call to gnome_vfs_async_xfer(). However, this mutex is unlocked in the GnomeVFSXferProgressCallback function for this transfer. That function is called in another thread from the one that locked the mutex, and thus it cannot be unlocked. Then, when g_mutex_free() is called on mutex_working, rb aborts. I will attach a patch that corrects this, but I think it's a big kludge. Basically, mutex_working is still locked start_job(), but then it's immediately unlocked after the call to gnome_vfs_async_xfer(). However, I lock the mutex again in download_progress_cb if the phase is GNOME_VFS_XFER_PHASE_INITIAL. Not ideal by any means, but it does the job of making sure the callback thread owns the mutex lock.
Created attachment 75475 [details] [review] Avoid mutex free abort
Created attachment 75481 [details] [review] "Fix" all mutex issues The previous patch fixed the crash, but still resulted in the hang from a locked mutex_job. This patch corrects that as well, but it's quite hackish, and prone to races. It should work for people in the meantime, until locking can be fixed for real.
Created attachment 76202 [details] [review] unthreadify patch The only bits that run in other threads are download_progress_cb() and the podcast parser. The latter of which doesn't touch any data used by other threads, This patch does any thread-unsafe work from the former in idle callbacks, and so lets us remove all the locks. I'm fairly sure this fixes it in a less hacky way.
The patch seems to help with the initial crash problem, but there is some strange behavior. If you try to download two podcasts at once, the progress bar from the first downloading podcast, blinks back and forth between the status for the first download, and the status for the second. Meanwhile, the second podcast remains in a Waiting state. Once the first one completes, rhythmbox crashes on a SIGSEGV with the following message: (rhythmbox:33074): RhythmDB-CRITICAL **: rhythmdb_entry_get_string: assertion `entry != NULL' failed
*** Bug 377581 has been marked as a duplicate of this bug. ***
Created attachment 77090 [details] [review] updated patch more cleanup, fixes issues mentioned in comment 12, fixes a few minor memory leaks, fixes user-visible spelling mistakes.
Created attachment 77093 [details] [review] further updated patch also makes download cancellation work.
Looks okay to me.
Committed (with some further work on download cancellation). I haven't actually tested this on FreeBSD; if it turns out this doesn't fix it, please reopen the bug.
*** Bug 363124 has been marked as a duplicate of this bug. ***
*** Bug 362883 has been marked as a duplicate of this bug. ***
*** Bug 433542 has been marked as a duplicate of this bug. ***