After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 147392 - giowin32 leaks threads for socket channels [patch]
giowin32 leaks threads for socket channels [patch]
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: win32
2.4.x
Other Windows
: High normal
: ---
Assigned To: gtk-win32 maintainers
gtk-win32 maintainers
Depends on:
Blocks:
 
 
Reported: 2004-07-12 10:03 UTC by Peter Zelezny
Modified: 2011-02-18 16:13 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Fixes GIOwin32 not stopping it's thread on socket g_source_remove (524 bytes, patch)
2004-07-12 10:05 UTC, Peter Zelezny
committed Details | Review
Different locking strategy (7.43 KB, patch)
2005-01-07 00:22 UTC, John Ehresman
none Details | Review
New giowin32.c (43.33 KB, text/plain)
2005-04-18 23:02 UTC, Tor Lillqvist
  Details
New giowin32.c (43.47 KB, text/plain)
2005-05-12 12:41 UTC, Tor Lillqvist
  Details
giowin32.c patch (based on glib v2.4.7) (1.14 KB, patch)
2005-06-09 08:05 UTC, David Ergo
none Details | Review
Umm, yes, here's my diff to gnet. actually from June 14 already (31.76 KB, patch)
2005-07-09 18:23 UTC, Tor Lillqvist
none Details | Review

Description Peter Zelezny 2004-07-12 10:03:19 UTC
1. Create a GIOChannel for a socket.
 1a. g_io_channel_win32_new_socket
 1b. tag = g_io_add_watch
 1c. g_io_channel_unref
2. g_source_remove(tag).
3. Watch as the thread doesn't exit. Repeat until out of memory.

However, the above steps work fine with g_io_channel_win32_new_fd.

This patch sets the need_close to 1 in g_io_win32_finalize, which forces
select_thread() out of its loop (it doesn't actually close the socket). I've
tested this with XChat, which previously leaked threads, now it does not.
Comment 1 Peter Zelezny 2004-07-12 10:05:21 UTC
Created attachment 29457 [details] [review]
Fixes GIOwin32 not stopping it's thread on socket g_source_remove
Comment 2 Danielle Madeley 2004-07-13 02:36:33 UTC
Adding the PATCH keyword.
Comment 3 Tor Lillqvist 2004-08-21 21:29:45 UTC
Thanks! Patch applied to glib-2-4 and HEAD.
Comment 4 John Ehresman 2004-09-27 20:48:52 UTC
Won't this exit the select thread prematurely if there's more than one watch on
the channel?  I think adding an explicit call to channel_close would solve the
problem in the motivating example.

glib probably should end the select thread if there are no watches on the
channel because otherwise channels that have close_on_unref set to true will
leak channels.  The tricky part is handling the case where all watches are
removed and then one or more is re-added.
Comment 5 Tor Lillqvist 2004-09-27 20:57:03 UTC
Hmm. (How does it work on Unix?) Do you think the patch should be reversed?
Comment 6 John Ehresman 2004-09-27 21:04:33 UTC
On Unix, the fd's are put in the main select/poll set so there is no separate
thread.  I think the patch will break programs that remove & then add watches,
but the previous behavior leaked threads and channels when close_on_unref is
true.  I think it needs to be reopened and (eventually) fixed.  I may be able to
come with a patch for it.
Comment 7 Tor Lillqvist 2004-11-19 04:38:42 UTC
John, do you still think the patch should be reversed? Do you have a better 
patch? Or should the documentation be improved, i.e. should GLib expect the 
application to call g_io_channel_close() and not just g_io_channel_unref()? (Or 
whatever.)
Comment 8 Peter Zelezny 2004-12-29 02:02:57 UTC
How about:

if (channel->watches == NULL && channel->type == G_IO_WIN32_SOCKET)
{
    /* tell select_thread() to exit */
    channel->needs_close = 1;
    /* wake up select_thread() from its blocking select() */
    send (channel->reset_send, send_buffer, sizeof (send_buffer), 0);
}

Multiple watches on the one socket use the same thread right? So we wait until
there's no watches to force the thread to exit.
Comment 9 Tor Lillqvist 2004-12-29 16:02:49 UTC
Yes, looks reasonable. I guess giowin32.c still needs to be fixed to handle the 
case John mentions, where all watches are removed and then later one or more 
are re-added. Would it be enough to set channel->thread_id to zero at the end 
of select_thread()? Then g_io_win32_create_watch() and 
g_io_channel_win32_make_pollfd() would call create_thread().
Comment 10 John Ehresman 2005-01-07 00:22:20 UTC
Created attachment 35588 [details] [review]
Different locking strategy

Here's what I've come up with, which is a bit more of a change than I first
intended.  The select_thread now locks the channel whenever it isn't calling
select so it doesn't need to constantly lock & unlock it.  It exits if there
are no watches.  I also drop the lock before creating all threads, which may
cause problems for other channel types.

Something I just realized, though, is that g_io_channel_ref / _unref are not
thread safe.  I wonder it the ref / unref could be eliminated from
select_thread and the code would rely on the channel finalize to destroy the
thread.
Comment 11 Tor Lillqvist 2005-04-18 23:00:10 UTC
libsoup uses socket GIOChannels and watches in such a way that didn't work with
current HEAD GLib. It adds and removed watches from the same channel. This
breaks the current code completely.

As it happens, I have already in two other places (D-BUS and ORBit2) used the
WSAEventSelect()/WSAEnumNetworkEvents() API, with good results (once I
understood all the quirks). I decided to try to rewrite giowin32.c to use it,
too, for sockets. This means we can say goodbye and good riddance to the
separate thread for each socket. The code gets much simpler, at least for
sockets. (For C library file descriptors a separate reader thread is still
needed, sigh.) 

I'll attach my current giowin32.c, please test it thoroughly. If it works OK for
you, I'll commit it to HEAD, and maybe even to 2.6.
Comment 12 Tor Lillqvist 2005-04-18 23:02:04 UTC
Created attachment 45420 [details]
New giowin32.c
Comment 13 John Ehresman 2005-04-18 23:11:14 UTC
Does the new approach leave the sockets in blocking mode?  As far as I recall,
this was the reason for needing to use the select() call.
Comment 14 Tor Lillqvist 2005-04-18 23:23:00 UTC
Nope, WSAEventSelect() automatically puts the socket into non-blocking mode.
Does your applications require sockets to be blocking? But I don't think using
select() was because of blocking/non-blocking concerns. I didn't find out about
the WSAEventSelect() call until late last year, if I recall correctly... Or
maybe I had noticed it earlier, but thought that it is too complex or not relevant.

Hmm, if one uses a GIOChannel but not any watch, the socket should stay in
blocking mode (if it initially is so). Only when the prepare method is first
called does WSAEeventSelect() get called. I guess one could add code to put the
socket back to blocking mode after the poll? But then for apps/libs that want to
use non-blocking mode that would then be unnecessary overhead.
Comment 15 John Ehresman 2005-04-18 23:41:27 UTC
Yes, we are using blocking sockets, which could and maybe should be changed. 
But the fact that it's permitted means that some number of application out there
are doing it and they may fail if the mode is changed.

I think I looked into toggling between blocking & non-blocking and don't think
you can do it on win32 (though I may be wrong) and what happens if the app code
changes the mode while it's being watched?
Comment 16 Tor Lillqvist 2005-04-19 05:53:30 UTC
OK, so maybe we need to keep the old implementation as an alternative? Perhaps
selectable by setting an environment variable? For non-blocking
applications/libs, I think the WSAEventSelect()-based approach is clearly superior.
Comment 17 John Ehresman 2005-04-19 15:36:35 UTC
I'd say use the new implementation if the socket is non-blocking when added --
this might break some programs that fiddle with the blocking mode, but I suspect
they have other portability problems.  Also, warn on all platforms that watching
blocking sockets is deprecated and plan to remove the next time we can break
backwards compatibility.  I see this as an api with behavior on Unix that's
nearly impossible to implement on win32.
Comment 18 Tor Lillqvist 2005-05-12 12:40:05 UTC
But how does one find out whether a socket is in blocking or non-blocking
mode...? ioctlsocket(FIONBIO) doesn't tell.

Anyway, I found a bug in the g_io_channel_win32_make_pollfd() implementation in
the new giowin32.c that manifested itself in GIMP when one tried to run
script-fu scripts. Will attach a fixed version.
Comment 19 Tor Lillqvist 2005-05-12 12:41:38 UTC
Created attachment 46365 [details]
New giowin32.c
Comment 20 David Ergo 2005-06-09 08:05:42 UTC
Created attachment 47495 [details] [review]
giowin32.c patch (based on glib v2.4.7)

proposed patch for resolving issues in the following cases :
- several watches exist and only one watch is removed
- new watches are created after the last watch has been removed
This as been tested with the following gnet 2.0.5 examples which
didn't work with the standard glib v2.4.7 : 
echoclient-gconn, echoserver-gserver and echoserver-async
Comment 21 Tor Lillqvist 2005-06-09 08:31:18 UTC
Hmm, David, so with this patch you mean that we should keep the separate thread
implementation for sockets, and we still be able to do complex watch
manipulation? Are you strongly opposed to the WSAEventSelect()-based approach in
comment #19 that does away with the threads (for sockets)?

John, do you have any fresh follow-up to comment #18?
Comment 22 David Ergo 2005-06-09 11:33:02 UTC
No, I'm not sure the separate thread implementation is good.
I even just discovered some more problems with it. There are race conditions 
where pollfd.revents is resetted to 0 (in g_main_context_query()) before it is 
passed to the callback function (in g_io_win32_dispatch()) (even without the 
threads leak patch).
I'm not opposed to the WSAEventSelect()-based approach.
I would like to test your new version of giowin32.c but it doesn't compile in 
glib v2.4.7 :
giowin32.c:45:20: gstdio.h: No such file or directory
giowin32.c:48:20: galias.h: No such file or directory
giowin32.c: In function `g_io_channel_new_file':
giowin32.c:1278: warning: implicit declaration of function `g_open'
giowin32.c: At top level:
giowin32.c:1335: redefinition of `g_io_channel_new_file'
giowin32.c:1198: `g_io_channel_new_file' previously defined here
giowin32.c: In function `g_io_channel_new_file':
giowin32.c:1342: warning: implicit declaration of function 
`g_io_channel_new_file_utf8'
giowin32.c:1342: warning: assignment makes pointer from integer without a cast
giowin32.c:1666:23: galiasdef.c: No such file or directory
make[3]: *** [giowin32.lo] Error 1

Would it be difficult to make a glib 2.4 version ?
Comment 23 Tor Lillqvist 2005-06-09 11:45:09 UTC
Hmm, for 2.4, you can just remove the inclusions of gstdio.h, galias.h and
galiasdef.c, replace g_open() with open(), and remove the "backward
compatibility" version of g_io_channel_new_file(). That should hopefully make it
build.

It might also be a good idea to look at the g_poll() Win32 implementation in
gmain.c in 2.6.4 and make sure your g_poll() looks the same. There was at least
one fix to it that was essential to make the CORBA and bonobo stuff work (or
libsoup, I don't recall), see the ChangeLog entry from 2005-03-29.
Comment 24 David Ergo 2005-06-09 13:09:03 UTC
I tested the WSAEventSelect()-based approach
(your file, modified for v2.4 and update of g_poll in gmain.c, from 2.6.4)
It doesn't seem to work with gnet examples echoclient-gconn, echoserver-gserver 
and echoserver-async, which do some watches manipulations.

And to resolve the race condition mentionned in comment #22, I had to remove
the following line in g_main_context_query(), in gmain.c :
    fds[n_poll].revents = 0;
But actually I don't understand exactly how this field is used.
I don't know if I may do that and I didn't check if it still work under Linux.
Comment 25 David Ergo 2005-06-09 14:19:43 UTC
Actually removing the revents reset line in g_main_context_query() as stated in 
comment #24 doesn't resolve the race condition.
It had disappeared while I added some g_print traces but reappeared when I 
removed them...
Comment 26 David Ergo 2005-06-09 14:59:12 UTC
For the patch specified in comment #24 to work, I had to remove the revents 
reset in g_poll too. But as I said, I don't understand what I do.

Another way to resolve the problem without modifying gmain.c was to modify 
g_io_win32_dispatch() in giowin32.c to use channel->revents instead of 
watch->pollfd.revents, where channel is defined as :
GIOWin32Channel *channel = (GIOWin32Channel *)watch->channel; 

But still I don't understand exactly what I do.
Could someone explain me the revents handling stuff ?

This patch could break other things...
Comment 27 Andrew Lanoix 2005-06-10 20:40:16 UTC
The socket thread should not go away when there are no watches on the channel, 
since programs like GNet do a drop then add of watches. Any patches should be 
reversed if they change that behavior.

If memory serves, the glib main loop should call g_io_win32_sock_close() after 
the ref count is 0, which will trigger the thread to go away by casing the 
select() call to break out, by closesocket (win32_channel->fd);

Andrew Lanoix
Comment 28 Tor Lillqvist 2005-06-10 21:02:34 UTC
Andrew, so you think the patch in comment #1 should be reversed? That has been
in GLib since 2.5.2... What GLib version do GNet-using applications typically
come bundled with? What do you think of the threadless approach in comment #19?
I would very much like to use that in GLib 2.8.0.

I am building HEAD gnet now against HEAD glib (using the threadless socket
giochannel watch), to see myself how the test programs work.
Comment 29 Tor Lillqvist 2005-06-10 21:49:58 UTC
David, you should realize that the 2.4 branch of GLib is not maintained any
longer. So suggesting patches for 2.4 is pointless. The current stable branch is
2.6, and it just had a release today, 2.6.5. I don't know whether any more 2.6
releases are likely. (2.8.0 will be released September-ish, I assume.) But
anyway, if it would make it more likely that GNet apps would switch to at least
2.6.6 (if and when it appears), I certainly could reverse the patch from comment
#1 in the 2.6 branch. (Hmm, assuming there hasn't been any other interfering
changes inbetween that have mixed up things...)
Comment 30 Tor Lillqvist 2005-06-11 08:27:55 UTC
Hmm, yes, the gnet Win32 code seems a bit screwed up. I recall pointing out
(much) earlier in a mail message that it isn't a good idea to keep Unix and
Win32 socket code as separate ifdeffed copies, when they are actually (at least
originally) mostly identical. This will only lead to them diverging by accident
without deliberate design, and people then thinking the differences are there on
purpose. And that indeed seems to have happened at least in tcp.c, if you
compare the Unix and Win32 versions of gnet_tcp_socket_new_async_direct() and
gnet_tcp_socket_new_async_cb(). OK, maybe it works as it currently is with glib
2.4.7, but 2.4.7 is history. I'll try to fix it, make sure it works with the
threadless giowin32.c, and submit a patch to the gnet maintainer. Then hopefully
gnet can catch up on Win32 and start using GLib 2.8.0 once it is released.
Comment 31 John Ehresman 2005-06-13 04:13:28 UTC
I can't find any way to determine a socket's blocking mode on win32.  Sorry, I
simply assumed there was a way to do this when I wrote comment 17.  I don't know
what the best way to proceed here -- I don't like breaking working apps but the
current thread based approach is buggy, though it probably could be fixed.
Comment 32 David Ergo 2005-06-13 06:34:32 UTC
Tor, I tought glib v2.4 was still recommended on win32 because v2.6 could still 
contain bugs on win32 as stated on this page :
http://www.gimp.org/~tml/gimp/win32/downloads.html
If it's stable enough then I'll use glib 2.6.
It should be great if you could fix gnet to work with your code without 
threads,
because I assume the race condition bug should not occur if there are 
not several threads.
(This race condition occurs also if we reverse the original patch!)
Comment 33 Tor Lillqvist 2005-06-13 08:04:18 UTC
> Tor, I tought glib v2.4 was still recommended on win32 because v2.6 
> could still contain bugs on win32 as stated on this page

Hmm, yeah, I guess it's time to change that page ;-) The warning against 2.6 is
not so much because of possible bugs (at least not any longer), but because of
the file name encoding change. But yes, GLib 2.4.7 works fine for most stuff,
but patches for it now are mostly pointless, which was what I meant.

I spent some time the gone weekend trying to get my current threadless
giowin32.c working with the test programs in HEAD gnet, but without success yet.
I guess I was a bit too optimistic when writing comment #30. Oh well, I will
hack more on it some time later...
Comment 34 John Ehresman 2005-06-13 20:26:03 UTC
We could depend on nonblocking being set via g_io_win32_set_flags().  I was just
looking at this because I need a nonblocking mode for pipes.
Comment 35 Tor Lillqvist 2005-06-14 00:17:22 UTC
Oh wow. This was a real horror to debug and to follow what happens. But now
finally gnet's conn_http_test.exe works with a threadless giowin32.c, all tests
say "OK". I had to modify giowin32.c still a bit more. The semantics of the
network event API is really unfriendly, especially the fact that the FD_WRITE
state is handled "slightly differently" as the docs put it. Maybe, for software
written from scratch against the WinSock API, it makes sense, but for GLib,
which is written with Unix select() semantics in mind, getting similar behaviour
out of WinSock means lots of silly bookkeeping, otherwise you will lose events
and/or not do the right callbacks at the right times.

Anyway, I was able to clean up lots of the Win32 ifdefs and special-case code
from gnet: No gnet_Hwnd window and odd PostMessage()s any longer. The same
gthread-using code for async DNS lookups can be used as on Unix. I implement
inet_pton() and inet_ntop() on Win32 using WSAStringToAddress() and
WSAAddressToString(). gethostbyname() and gethostbyaddr() are thread-safe in
WinSock. I will attach patches tomorrow once I've slept on this and read through
the patches once more. I'll have to check also that other stuff that uses
giochannels for sockets still works.
Comment 36 Matthias Clasen 2005-06-18 14:19:36 UTC
not NEEDINFO, rhight ?
Comment 37 Kenneth Haley 2005-06-22 08:30:40 UTC
Just a little info from MSDN on setting a socket back to blocking mode:

The WSAAsyncSelect and WSAEventSelect functions automatically set a socket to
nonblocking mode. To set the socket back to blocking mode, an application must
first disable WSAAsyncSelect by calling WSAAsyncSelect with the lEvent parameter
equal to zero, or disable WSAEventSelect by calling WSAEventSelect with the
lNetworkEvents parameter equal to zero.

Comment 38 Tor Lillqvist 2005-06-22 08:47:20 UTC
Yes, the new giowin32.c does call WSAEventSelect() with a zero lNetworkEvents
parameter in g_io_win32_free(), so after the giochannel has gone the caller is
free to set the socket back to non-blocking mode if necessary. GLib can't do it
as it doesn't know whether it was blocking or non-blocking initially.
Comment 39 Tor Lillqvist 2005-07-08 23:12:37 UTC
GLib 2.8 is getting closer. I will replace the giowin32.c with the new version
in HEAD now. About the patch to gnet I promised to attach in comment #35, will
do that in a moment.
Comment 40 Tor Lillqvist 2005-07-09 18:23:34 UTC
Created attachment 48877 [details] [review]
Umm, yes, here's my diff to gnet. actually from June 14 already
Comment 41 Tor Lillqvist 2006-02-13 15:13:57 UTC
Hmm, is there really any need to keep this bug open? At least the Summary is no longer relevant as there aren't any threads to leak in giowin32.c for sockets any longer.
Comment 42 John Ehresman 2006-02-13 15:29:10 UTC
I think this can be closed.
Comment 43 Benjamin Otte (Company) 2006-05-24 09:36:15 UTC
I have applied the patch to gnet and it works fine, so I'm pretty sure this can be closed.
Comment 44 Tor Lillqvist 2006-05-24 09:44:13 UTC
Great! Resolving this...