Bug 628142 – abort: error 'Device or resource busy' during 'pthread_mutex_destroy

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 628142 - abort: error 'Device or resource busy' during 'pthread_mutex_destroy


Summary:	abort: error 'Device or resource busy' during 'pthread_mutex_destroy


Status:	RESOLVED FIXED

Product:	evolution-data-server
Classification:	Platform
Component:	Mailer
Version:	2.32.x (obsolete)
Hardware:	Other Linux

Importance:	Normal critical
Target Milestone:	---
Assigned To:	evolution-mail-maintainers
QA Contact:	Evolution QA team

URL:
Whiteboard:	evolution[imapx]

Depends on:
Blocks:

Reported:	2010-08-27 15:41 UTC by David Woodhouse
Modified:	2011-04-27 23:47 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
proposed eds patch (3.32 KB, patch) 2010-10-25 13:54 UTC, Milan Crha	none	Details \| Review
patch (3.36 KB, patch) 2011-04-22 23:34 UTC, David Woodhouse	none	Details \| Review

Description David Woodhouse 2010-08-27 15:41:12 UTC

GThread-ERROR **: file gthread-posix.c: line 171 (g_mutex_free_posix_impl): error 'Device or resource busy' during 'pthread_mutex_destroy ((pthread_mutex_t *) mutex)'
aborting...
==6588== 
==6588== Process terminating with default action of signal 5 (SIGTRAP)
==6588==    at 0x88049C9: g_logv (gmessages.c:554)
==6588==    by 0x8804D52: g_log (gmessages.c:568)
==6588==    by 0x85BA555: g_mutex_free_posix_impl (gthread-posix.c:171)
==6588==    by 0x7C3108E: e_flag_free (e-flag.c:183)
==6588==    by 0x87EB70F: g_hash_table_remove_internal (ghash.c:448)
==6588==    by 0x1AEC7FB2: imapx_server_get_message (camel-imapx-server.c:4970)
==6588==    by 0x1AEC818A: camel_imapx_server_get_message (camel-imapx-server.c:4981)
==6588==    by 0x1AEC579F: imapx_get_message (camel-imapx-folder.c:268)
==6588==    by 0x66A268D: camel_folder_get_message (camel-folder.c:1752)
==6588==    by 0xFDE2A36: get_message_exec (mail-ops.c:1900)
==6588==    by 0xFDDF377: mail_msg_proxy (mail-mt.c:469)
==6588==    by 0x8825EC3: g_thread_pool_thread_proxy (gthreadpool.c:314)

Comment 1 Milan Crha 2010-10-25 13:27:51 UTC

Updated for 2.91.1:
==10929== Process terminating with default action of signal 5 (SIGTRAP)
==10929==    at 0x35B3A4B279: g_logv (gmessages.c:563)
==10929==    by 0x35B3A4B602: g_log (gmessages.c:577)
==10929==    by 0x35B3E02555: g_mutex_free_posix_impl (gthread-posix.c:171)
==10929==    by 0x5C5C500: e_flag_free (e-flag.c:183)
==10929==    by 0x35B3A31FBF: g_hash_table_remove_internal (ghash.c:452)
==10929==    by 0x180A3391: imapx_server_get_message (camel-imapx-server.c:5185)
==10929==    by 0x180A9D3F: camel_imapx_server_sync_message (camel-imapx-server.c:5224)
==10929==    by 0x1809EDAF: imapx_synchronize_message_sync (camel-imapx-folder.c:513)
==10929==    by 0x50A0506: camel_folder_synchronize_message_sync (camel-folder.c:3523)
==10929==    by 0x50A7377: offline_downsync_sync (camel-offline-folder.c:91)
==10929==    by 0x50B5617: session_thread_proxy (camel-session.c:321)
==10929==    by 0x35B3A6C773: g_thread_pool_thread_proxy (gthreadpool.c:319)
==10929==    by 0x35B3A69FF5: g_thread_create_proxy (gthread.c:1897)
==10929==    by 0x35B1E06D5A: start_thread (pthread_create.c:301)
==10929==    by 0x35B1AE427C: clone (clone.S:115)

Note of the comment above the g_hash_table_remove call:
> /* HACK FIXME just sleep for sometime so that the other waiting locks gets
> released by that time. Think of a better way..*/
> g_usleep (1000);
> g_hash_table_remove (is->uid_eflags, uid);

Comment 2 Milan Crha 2010-10-25 13:54:34 UTC

Created attachment 173176 [details] [review]
proposed eds patch

for evolution-data-server;

I guess this should make it.

Comment 3 Milan Crha 2011-04-20 07:00:09 UTC

ping David, did you get time to test the above patch, please?

I just found a downstream patch from 2.32.0 about the same:
https://bugzilla.redhat.com/show_bug.cgi?id=697923

Comment 4 David Woodhouse 2011-04-20 07:58:57 UTC

Hm, I had completely forgotten about this (and haven't seen it since). I had noticed a similar issue in EWS though, and fixed it thus:
http://git.infradead.org/evolution-ews.git/commitdiff/92711b9c

I *tried* to test that one, but didn't know how to provoke the UI into actually trying to fetch the same message twice simultaneously. Any ideas on that?

Comment 5 David Woodhouse 2011-04-20 08:04:54 UTC

I think I prefer the approach I took in EWS. Rather than adding a refcounting wrapper to the EFlag, let's just use a simple entry in the hash table to block the fetch, and a GCond to wake up anyone who's waiting on that. This kind of collision should be so infrequent that the 'herd wakeup' won't be an issue.

Comment 6 Milan Crha 2011-04-20 12:33:37 UTC

Hmm, busy-loop which can never-end doesn't seem any better, from my point of view. See how my approach avoids such situation. Anyway, I would let Chen decide, as he is the imapx author.

Comment 7 David Woodhouse 2011-04-20 23:44:48 UTC

(In reply to comment #6)
> Hmm, busy-loop which can never-end doesn't seem any better, from my point of
> view

Do you mean this bit?

    do {
         g_cond_wait (priv->fetch_cond, priv->state_lock);
    } while (g_hash_table_lookup (priv->uid_eflags, uid));

That's just standard lock behaviour; it's not really a busy-loop.

The g_cond_wait() call means that it waits to be *woken* when something has changed. This is basically how most locks work anyway; they *try* to obtain the lock, and if they can't then they sleep until they're prodded, then try again.

The code in comment 2 is doing exactly the same thing; it's just that it happens in e_flag_wait():

        while (!flag->is_set)
                g_cond_wait (flag->cond, flag->mutex);

Comment 8 David Woodhouse 2011-04-20 23:47:41 UTC

Any thoughts on how to actually test this? Do I need to hook up a special test case to do two simultaneous fetches using the camel API, or is there some way to reliably make it happen from Evolution?

I tried adding a big sleep() into the locked section, but still can't get it to end up in this code twice simultaneously for the same message.

Comment 9 David Woodhouse 2011-04-22 21:54:59 UTC

Hm, it's more broken than that. We have a hash table per-server, not per-folder, and it's keyed on just the UID. So if you are simultaneously fetching a message with the same UID in more than one folder, it'll go quite horribly wrong.

(The EWS one is OK, since the ItemID we use as a UID *is* unique across the whole server)

Comment 10 David Woodhouse 2011-04-22 23:34:51 UTC

Created attachment 186507 [details] [review]
patch

This fixes it, and I've been able to test it. It drops the hash table entirely; we don't need it because we can just use imapx_is_job_in_queue() to check for an outstanding fetch of the same message — which in fact were we already doing anyway.

Rather than allocating a new EFlag for each message, we just wait for *any* fetch to complete, and then use imapx_is_job_in_queue() again to check if it *our* message has completed yet.

Comment 11 Chenthill P 2011-04-27 06:51:07 UTC

Beautiful solution :) I really enjoyed looking at all the patches :) Please commit the patch at comment #10 and I have not made the mistake anywhere else :)

Comment 12 David Woodhouse 2011-04-27 23:47:29 UTC

To ssh://dwmw2@git.gnome.org/git/evolution-data-server
   5be5b05..e37ca87  gnome-2-32 -> gnome-2-32
   0f3e0af..62a1c1d  gnome-3-0 -> gnome-3-0
   4349d8a..bdd9661  master -> master