GNOME Bugzilla – Bug 628142
abort: error 'Device or resource busy' during 'pthread_mutex_destroy
Last modified: 2011-04-27 23:47:29 UTC
GThread-ERROR **: file gthread-posix.c: line 171 (g_mutex_free_posix_impl): error 'Device or resource busy' during 'pthread_mutex_destroy ((pthread_mutex_t *) mutex)' aborting... ==6588== ==6588== Process terminating with default action of signal 5 (SIGTRAP) ==6588== at 0x88049C9: g_logv (gmessages.c:554) ==6588== by 0x8804D52: g_log (gmessages.c:568) ==6588== by 0x85BA555: g_mutex_free_posix_impl (gthread-posix.c:171) ==6588== by 0x7C3108E: e_flag_free (e-flag.c:183) ==6588== by 0x87EB70F: g_hash_table_remove_internal (ghash.c:448) ==6588== by 0x1AEC7FB2: imapx_server_get_message (camel-imapx-server.c:4970) ==6588== by 0x1AEC818A: camel_imapx_server_get_message (camel-imapx-server.c:4981) ==6588== by 0x1AEC579F: imapx_get_message (camel-imapx-folder.c:268) ==6588== by 0x66A268D: camel_folder_get_message (camel-folder.c:1752) ==6588== by 0xFDE2A36: get_message_exec (mail-ops.c:1900) ==6588== by 0xFDDF377: mail_msg_proxy (mail-mt.c:469) ==6588== by 0x8825EC3: g_thread_pool_thread_proxy (gthreadpool.c:314)
Updated for 2.91.1: ==10929== Process terminating with default action of signal 5 (SIGTRAP) ==10929== at 0x35B3A4B279: g_logv (gmessages.c:563) ==10929== by 0x35B3A4B602: g_log (gmessages.c:577) ==10929== by 0x35B3E02555: g_mutex_free_posix_impl (gthread-posix.c:171) ==10929== by 0x5C5C500: e_flag_free (e-flag.c:183) ==10929== by 0x35B3A31FBF: g_hash_table_remove_internal (ghash.c:452) ==10929== by 0x180A3391: imapx_server_get_message (camel-imapx-server.c:5185) ==10929== by 0x180A9D3F: camel_imapx_server_sync_message (camel-imapx-server.c:5224) ==10929== by 0x1809EDAF: imapx_synchronize_message_sync (camel-imapx-folder.c:513) ==10929== by 0x50A0506: camel_folder_synchronize_message_sync (camel-folder.c:3523) ==10929== by 0x50A7377: offline_downsync_sync (camel-offline-folder.c:91) ==10929== by 0x50B5617: session_thread_proxy (camel-session.c:321) ==10929== by 0x35B3A6C773: g_thread_pool_thread_proxy (gthreadpool.c:319) ==10929== by 0x35B3A69FF5: g_thread_create_proxy (gthread.c:1897) ==10929== by 0x35B1E06D5A: start_thread (pthread_create.c:301) ==10929== by 0x35B1AE427C: clone (clone.S:115) Note of the comment above the g_hash_table_remove call: > /* HACK FIXME just sleep for sometime so that the other waiting locks gets > released by that time. Think of a better way..*/ > g_usleep (1000); > g_hash_table_remove (is->uid_eflags, uid);
Created attachment 173176 [details] [review] proposed eds patch for evolution-data-server; I guess this should make it.
ping David, did you get time to test the above patch, please? I just found a downstream patch from 2.32.0 about the same: https://bugzilla.redhat.com/show_bug.cgi?id=697923
Hm, I had completely forgotten about this (and haven't seen it since). I had noticed a similar issue in EWS though, and fixed it thus: http://git.infradead.org/evolution-ews.git/commitdiff/92711b9c I *tried* to test that one, but didn't know how to provoke the UI into actually trying to fetch the same message twice simultaneously. Any ideas on that?
I think I prefer the approach I took in EWS. Rather than adding a refcounting wrapper to the EFlag, let's just use a simple entry in the hash table to block the fetch, and a GCond to wake up anyone who's waiting on that. This kind of collision should be so infrequent that the 'herd wakeup' won't be an issue.
Hmm, busy-loop which can never-end doesn't seem any better, from my point of view. See how my approach avoids such situation. Anyway, I would let Chen decide, as he is the imapx author.
(In reply to comment #6) > Hmm, busy-loop which can never-end doesn't seem any better, from my point of > view Do you mean this bit? do { g_cond_wait (priv->fetch_cond, priv->state_lock); } while (g_hash_table_lookup (priv->uid_eflags, uid)); That's just standard lock behaviour; it's not really a busy-loop. The g_cond_wait() call means that it waits to be *woken* when something has changed. This is basically how most locks work anyway; they *try* to obtain the lock, and if they can't then they sleep until they're prodded, then try again. The code in comment 2 is doing exactly the same thing; it's just that it happens in e_flag_wait(): while (!flag->is_set) g_cond_wait (flag->cond, flag->mutex);
Any thoughts on how to actually test this? Do I need to hook up a special test case to do two simultaneous fetches using the camel API, or is there some way to reliably make it happen from Evolution? I tried adding a big sleep() into the locked section, but still can't get it to end up in this code twice simultaneously for the same message.
Hm, it's more broken than that. We have a hash table per-server, not per-folder, and it's keyed on just the UID. So if you are simultaneously fetching a message with the same UID in more than one folder, it'll go quite horribly wrong. (The EWS one is OK, since the ItemID we use as a UID *is* unique across the whole server)
Created attachment 186507 [details] [review] patch This fixes it, and I've been able to test it. It drops the hash table entirely; we don't need it because we can just use imapx_is_job_in_queue() to check for an outstanding fetch of the same message — which in fact were we already doing anyway. Rather than allocating a new EFlag for each message, we just wait for *any* fetch to complete, and then use imapx_is_job_in_queue() again to check if it *our* message has completed yet.
Beautiful solution :) I really enjoyed looking at all the patches :) Please commit the patch at comment #10 and I have not made the mistake anywhere else :)
To ssh://dwmw2@git.gnome.org/git/evolution-data-server 5be5b05..e37ca87 gnome-2-32 -> gnome-2-32 0f3e0af..62a1c1d gnome-3-0 -> gnome-3-0 4349d8a..bdd9661 master -> master