After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 666551 - Crash in g_thread_xp_SleepConditionVariableSRW
Crash in g_thread_xp_SleepConditionVariableSRW
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: gthread
2.31.x
Other Windows
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2011-12-19 18:02 UTC by rodrigorivascosta
Modified: 2018-05-24 13:38 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch to fix crash in gthreads-win32.c (1006 bytes, patch)
2011-12-19 18:02 UTC, rodrigorivascosta
none Details | Review
winxp threads: fix some condition variable races (3.65 KB, patch)
2011-12-19 22:40 UTC, Allison Karlitskaya (desrt)
none Details | Review
Correction of Ryan's patch (2.90 KB, patch)
2011-12-20 11:16 UTC, rodrigorivascosta
none Details | Review
Fix a few dangling pointers (805 bytes, patch)
2011-12-21 13:43 UTC, rodrigorivascosta
committed Details | Review

Description rodrigorivascosta 2011-12-19 18:02:47 UTC
Created attachment 203905 [details] [review]
Patch to fix crash in gthreads-win32.c

I've been getting some random crashes with the latest version of glib/gtk+ in Windows, particularly in the GtkFileChooserDialog and similar.

After some hard debugging days, I think I've found it: the problem is in function g_thread_xp_SleepConditionVariableSRW() (gthread-win32.c).

This function first inserts the current thread 'waiter' in the cv list of waiters; then it calls WaitForSingleObject(), but it never removes the 'waiter' from the list.
This looks correct, because both g_thread_xp_WakeConditionVariable() and g_thread_xp_WakeAllConditionVariable() remove the 'waiter' of the awaken thread.

But what happens if nobody awakes the thread, and WaitForSingleObject() returns with a timeout? The 'waiter' is kept in the cv list!

After that, the thread finishes, the waiter is destroyed, and the next time the cv is used... crash!

My proposed patch removes (I think it does, that list is tricky!) the 'waiter' before returning if present in the list. That should only be necessary if 'status == WAIT_TIMEOUT', but who knows... Please, feel free to change it if you see fit.
Comment 1 Allison Karlitskaya (desrt) 2011-12-19 22:21:31 UTC
I agree with your analysis.

I'm not totally crazy about the solution because it runs even in the non-timeout case and because it's so awkward.  That's not your fault -- the data structure was optimised for exactly three operations: append, in-order traverse and popping.  It's not really suitable for dealing with remove-from-middle.

This could probably be fixed by switching to a full linked list, or by giving each node a 'my_owner' field, such that this invariant is held:

  waiter->my_owner == NULL || *waiter->my_owner == waiter

such that removing any waiter from a list is just a matter of doing this:

  *waiter->my_owner = waiter->next;

We could then do that on the waiter (that we already have hold of) after the WaitForSingleObject() call returns in the timeout case.

We just need to be careful in the other places the linked list is updated (mostly in the wake-single case) that we update the my_owner field of the 2nd element in the list when we pop the first to point back at the CV rather than the next pointer on the 1st.

I'll give this a try.
Comment 2 Allison Karlitskaya (desrt) 2011-12-19 22:33:46 UTC
On reading this code I also notice that the fact that I drop the critical section early and assume that the waiter is still alive is possibly problematic in the timeout case.  If I call signal() just as the thread is timing out, and the thread exits shortly after the timeout, the signalling thread may try to SetEvent() on a no-longer-existent event variable.
Comment 3 Allison Karlitskaya (desrt) 2011-12-19 22:40:09 UTC
Created attachment 203921 [details] [review]
winxp threads: fix some condition variable races

There are some races in the condition variable emulation code for
Windows XP with respect to timeouts while waiting.

First, in the event of a timeout, we never remove the waiter from the
condition variable.  This can cause crashes later.  That problem was
found by Rodrigo Rivas Costa.

Second, if the waiting thread times out and exits just as we were about
to call SetEvent() on its waiter event, we could end up trying to access
the waiter after it was closed/freed.  We need to hold on to the lock a
little bit longer to ensure that that's not possible.
Comment 4 Allison Karlitskaya (desrt) 2011-12-19 22:40:40 UTC
Please review this patch and let me know if you think it will fix the issue.
Comment 5 rodrigorivascosta 2011-12-20 00:22:51 UTC
I've noticed some races, but frankly, after about a week hunting this bug I just wanted to avoid the crash.

Your patch looks fine, I'll test it as soon as I can.

But anyway, implementing condition variables with windows primitives is very tricky. And all those 'volatile' variables make me cringe...

Have you considered checking/copying code from pthreads-win32 (http://sourceware.org/pthreads-win32/), LGPL. They did just that, and they took quite some versions until they got it right. And as far as I've heard, the latest version, from 2006 works flawlessly.
Comment 6 rodrigorivascosta 2011-12-20 11:16:41 UTC
Created attachment 203941 [details] [review]
Correction of Ryan's patch
Comment 7 rodrigorivascosta 2011-12-20 11:16:57 UTC
Ryan, I'm afraid that it crashes even before than the original version!

I added a few 'g_assert' with the invariant you comment and it fails soon enough.

The problem seems to be that each time you remove an item from the list, you have to check check the next one's my_owner, and fix it accordingly. And take into account the case when you delete the last of the list.

Also, you forgot to assign NULL to 'my_owner' in g_thread_xp_WakeConditionVariable.

Maybe it would be good to add a g_assert or two, just to be sure. 

g_assert(waiter->my_owner == NULL || *waiter->my_owner == waiter);

Please, see my updated patch.
Comment 8 rodrigorivascosta 2011-12-21 13:43:10 UTC
Created attachment 204030 [details] [review]
Fix a few dangling pointers

I've seen that Ryan's patch has been commited on Dec. 19th. Nice, but it still crashes because of the reasons I explained in my previous message.

The attached patch corrects these problems against the already updated version.
Comment 9 Allison Karlitskaya (desrt) 2011-12-21 17:07:35 UTC
Review of attachment 204030 [details] [review]:

Of course.

Thanks for the catch.
Comment 10 Fan, Chun-wei 2011-12-22 03:03:36 UTC
Review of attachment 204030 [details] [review]:

Hi rodrigorivascosta,

I have committed your patch as 1b033774 into master on your behalf after changing your diff into git patch format, retaining the items you have mentioned in comment 7.

I have not closed this bug as we may want to check further until all things are well on Windows XP-I checked this patch on my side in Windows 7 XPMode with binaries compiled with Visual C++ 2008 and ran the tests related to GLib threading for 5 times each, which turned out to be fine on my side (previous versions of the XP threading code had intermittent problems on my side when running the rwlock tests, to let you know).

Is there any opportunity that you could post your test case here that triggered the problem on your side, out of curiosity?

Thanks, and God bless.
Comment 11 rodrigorivascosta 2011-12-22 09:10:18 UTC
Hi, Chun-wei.

I have to admit that I don't have a formal test case. I started having crashes with the GTK+-2 demo (gtk-demo.exe) compiled with gcc-4.6.3 (cross compiled under linux), and run in a WinXP. It crashed only with the "Pickers" applet (open, close, open, close, open, close, crash!).

Then, I reduced the problem to the following program:

#include <gtk/gtk.h>
int main(int argc, char **argv)
{
    gtk_init(&argc, &argv);
    GtkWidget *w = gtk_file_chooser_dialog_new("Choose", NULL, 
            GTK_FILE_CHOOSER_ACTION_SELECT_FOLDER, "OK", GTK_RESPONSE_OK, NULL);
    gtk_widget_show(w);
    gtk_main();
    return 0;
}

It failed when changing quickly the folder and clicking the mouse like crazy ;)
Comment 12 GNOME Infrastructure Team 2018-05-24 13:38:05 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/491.