After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 120299 - Win32 iochannel/g_poll/etc code needs rewrite
Win32 iochannel/g_poll/etc code needs rewrite
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: win32
2.2.x
Other other
: Normal normal
: ---
Assigned To: gtk-win32 maintainers
gtk-win32 maintainers
Depends on:
Blocks:
 
 
Reported: 2003-08-20 08:25 UTC by Tor Lillqvist
Modified: 2011-02-18 16:13 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Tor Lillqvist 2003-08-20 08:25:22 UTC
Sorry if the terminology below is a bit incorrect, I easily get lost in 
the twisty maze of io channels, gsources, poll fds, main loops etc.

The Win32 implementaion of GIOChannels, g_poll() etc needs a rewrite. It 
has worked well for GIMP and other similar applications, but doesn't work 
for the way the linc2 library (used by ORBit2) uses main loops and 
GSources, for instance.

There are fundamental problems in how the Win32 implementation of 
GIChannels, GSources watching them, and GSources with GPollFDs that should 
poll sockets (or fds) interact.

It's hard to describe the problem consicely, but I'll try.  

The Win32 GIOChannel code creates a thread for each channel when a watch 
for the channel is created using g_io_add_watch() and friends. For file 
descriptors, the thread blocks trying to read from the fd, while for 
sockets it blocks in select(). 

(This is because there is no API in Win32 that would correspond to select()
ing on a (disk or pipe) file descriptor in Unix. Or at least not if the 
file descriptor is created in a program out of your control. If you would 
write an app from scratch for Win32, you could of course handle files in 
special ways to enable select()-like functionality, and much more. But as 
GLib is to be used from apps using more or less normal ANSI C and Unixish 
functionality, that doesn't help.)

The Win32 g_poll() implementation waits for events signalled from those 
per-channel threads. Until now, this has worked quite well in GIMP and 
other apps.

But linc2 reveals several problems with this approach. For instance, there 
is undesirable dependency between Win32 GIOChannels and the GSources 
(watches) created and added to them with g_io_add_watch(). The thread for 
a socket selects for in/out/exception only for those watches added to it 
with g_io_add_watch(). And even worse, only the channel's own read method 
(for fds) or the channel's own check method ever resets the event that 
woke up g_poll(). It isn't possible to create a watch for the socket using 
your own kind of GSource, add GPollFDs to it (even if the GPollFD has been 
created with g_io_channel_win32_make_pollfd()), and attach that GSource to 
a main loop context and run that loop, and expect things to work...

Another problem is that the current code doesn't scale well. The 
WaitForMultipleObjects() used in g_poll() has a relatively low limit of 64 
handles that it can waited on simultaneously, which is much too low for 
many types of applications.

The rewritten code should presumably work somehow like this:

- It probably is a good idea to drop the concept that GPollFD::fd on Win32 
must be a HANDLE. Let it be a file descriptor or SOCKET, similarily as on 
Unix, and handle it transparently. There hopefully shouldn't be many 
applications or libraries out there that know that know to put HANDLEs in 
GPollFD::fd. It's more important to make porting software that puts file 
descriptors or sockets in them easy.

- For GIOChannels to files/pipes, or GPollFDs, there still needs to be one 
thread per file that is being polled. However, the creation of the thread 
should probably be postponed until g_poll actually is called. But the 
thread probably still, once created, has to be blocking trying to read 
from the fd all the time. When polling several fds, there is no way to 
unblock the other threads when one of the fds has fired. 

- For sockets, one should save on the number of threads used, and use just 
one thread for each set of up to FD_SETSIZE sockets.

- The thread for sockets shouldn't be blocking in select() all the time, 
only when in g_poll(). It should be possible to have several GSources with 
GPollFDs for the same socket (linc2 does this), but attached to different 
main loop contexts, as long as they aren't running at the same time. Or 
something like that...

- The g_poll() implementation shouldn't use one event for each (fd or 
sockets) thread, as the limit of HANDLEs in a WaitForMultipleObject() call 
is limited to 64. Use just on event, and then additional logic to find out 
which GPollFD it actually was.
Comment 1 Tor Lillqvist 2003-12-08 21:51:54 UTC
I haven't yet had the inspiration and/or time to finish the rewrite, 
but here is a suggested API addition before the API freeze.

In gmain.h, the long comment could be replaced by something like:

/* Any definitions using GPollFD or GPollFunc are primarily
 * for Unix and not guaranteed to be the compatible on all
 * operating systems on which GLib runs.
 *
 * On Windows, the fd in a GPollFD can either be file descriptor as
 * provided by the C runtime library MSVCRT.DLL, a SOCKET from
 * WinSock, a window handle (HWND), or a HANDLE for an object. (A
 * HANDLE should be such that can be used by the WaitForSingleObject,
 * WaitForMultipleObjects or MsgWaitForMultipleObjects functions. This
 * does not include Win32 API file or pipe handles, for instance.)
 *
 * On Windows, fd can also be the special value G_WIN32_MSG_HANDLE to
 * indicate polling for messages for any window in the current thread.
 *
 * It is possible, even though perhaps not likely in most
 * applications, that the same numerical value will be both a valid
 * file descriptor and a SOCKET, HWND or HANDLE. To guard against this
 * kind of overlap and guarantee that GLib does what you mean, don't
 * set the fd field of GPollFD in your code, but use the function
 * g_pollfd_set_fd(). This sets the fd to an internal magic value
 * recognized by GLib, with associated type information and
 * redirection to the actual file descriptor or socket to be used.
 *
 * Note that G_WIN32_MSG_HANDLE GPollFDs should not be used by GDK
 * (GTK) programs, as GDK itself wants to read messages and convert
 * them to GDK events.
 */

This enum could be added:

typedef enum
{
  G_POLLFD_FILE_DESCRIPTOR,
  G_POLLFD_SOCKET,
  G_POLLFD_WIN32_HWND,
  G_POLLFD_WIN32_HANDLE
} GPollFDType;

And these functions:

void          g_pollfd_set_fd          (GPollFDType type,
					gint        fd,
					GPollFD    *pollfd);

void          g_pollfd_unset_fd        (GPollFD  *pollfd);

On Unix, those two functions would have trivial implementation, just 
setting and zeroing the GPollFD::fd field.
Comment 2 Andrew Lanoix 2003-12-28 21:40:23 UTC
Tor,

I am not sure that we should assume a HWND is a guint or a gint.

I am getting this warning with VC 2003:
c:\Documents and Settings\Administrator\My Documents\Visual Studio
Projects\gnetIPv6\gnet-private.c(300) : warning C4311: 'type cast' :
pointer truncation from 'HWND' to 'unsigned int'

On this line of code:
gnet_iochannel = g_io_channel_win32_new_messages((unsigned int)gnet_hWnd);

VC docs say of C4311:
"A 64-bit pointer was truncated to a 32-bit int or 32-bit long."

I would prefer if we did something like this which I did for GNet
internally for Unix:

#ifndef SOCKET
#define SOCKET gint
#endif

This way things work on Windows and Unix.

Andrew Lanoix
Comment 3 Owen Taylor 2004-01-21 23:00:16 UTC
I don't think usage and contents of GPollFD needs to be or
should be portable across Win32/Unix, though changing how
it works is probably an API change on Win32.

My original conception was that GMainLoop would be significantly
different on different operating systems, though GPollFD *
crept into more of the API (g_main_context_query()) for
GTK+-2.x.

Comment 4 Hans Petter Jansson 2004-05-11 01:29:42 UTC
Tor mentioned having the per-file-fd threads block on read(). Wouldn't it be
better to have them block on a cond when they're idle, and then have a mode
saying if it's supposed to read, write or idle? When the user does something
that changes the mode, the main thread could signal the cond, waking up the thread.

I.e. the channel would be in read mode as long as there is a read watch on it,
and dispatch G_IO_IN if it has data in its buffer. It would be in write mode as
long as it has data in its write buffer, and dispatch G_IO_OUT whenever there is
free space in the buffer, provided there is a write watch, of course. If there
is no read watch and the write buffer is empty, it would be idle.

I don't see how async random access would be possible without such a scheme.
However, I'm unsure what the semantics are if you have both read and write
watches on a file FD...

Also, for select(), I believe you can #define FD_SETSIZE to a higher number
before including winsock2.h. At least the MSDN docs say you can.
Comment 5 Tor Lillqvist 2004-05-11 06:30:03 UTC
> the channel would be in read mode as long as there is a read watch on it
> [...] If there is no read watch [...], it would be idle.

But if a read watch is removed, how do you suggest breaking the thread out of 
the read()?

> Also, for select(), I believe you can #define FD_SETSIZE to a higher number

Yes. If I recall correctly, you can even (re)size fd_sets dynamically using 
approproate code. That of course then depends on knowledge of the fd_set's 
internal structure, and you cannot blindly use the FD_* macros.

I don't know if the underlying socket service providers typically are prepared 
to handle huge numbers of sockets in a fd_set, though, they might well have 
undocumented limits. (I don't remember the correct terminology here, but isn't 
the idea with winsock that you can have various stack implementations from 
different vendors for different protocol families, or something like that.) 
Comment 6 Hans Petter Jansson 2004-05-11 07:10:46 UTC
Yeah, I agree that breaking out of the read() without closing the fd from
another thread (the current approach when finalizing the channel?) would be a
problem. I wonder if moving to a new file descriptor using dup() and close() the
old one would work. Another inelegant solution would be to use tell() and stat()
before entering read(), to determine if we've reached EOF. Then, if we're at
EOF, we could wait a bit and repeat, for as long as the read watch persists -
polling, in effect, like tail.

I doubt tailing files in this way is a common usage pattern, anyway, so maybe
the latency of an infrequent (say, 1s interval) poll is acceptable.
Comment 7 Tor Lillqvist 2004-05-11 07:26:15 UTC
Tailing files *is* common usage pattern, if you in "files" also include pipes. 
GIMP's plug-in communication uses pipes.
Comment 8 Hans Petter Jansson 2004-05-11 07:51:55 UTC
Ah. I guess the implementation you outlined before is primarily meant for pipes,
as it won't work so well for regular files (especially not if you want to write
to them). Of course, handling files and pipes differently is a possibility...
Comment 9 Peter Zelezny 2005-02-21 11:56:21 UTC
Another fun thing to consider. If you pass a MSVCR71.DLL pipe or fd to a glib
DLL compiled with MSVCRT.DLL, all hell breaks loose. And the latest MSVC
compilers link your C programs with 71 by default. It might be an idea to warn
people to only pass msvcrt.dll pipe/fds to glib.
Comment 10 Tor Lillqvist 2005-02-21 12:29:32 UTC
That *has* been mentioned, several times, on the mailing lists, IIRC.

BTW, do you know if it is possible to (in some perhaps not officially sanctioned
way) convince the latest Microsoft compilers to link with MSVCRT.DLL? Presumably
the latest C library headers also work MSVCRT.DLL (at least as long as we are
talking about C library functions only, no C++ stuff)?

How do commercial third-party libraries handle this issue? Do they ship
different DLLs for MSVCRT, MSVCR70 and MSVCR71 -using code? Does the API of
commercial libraries typically not pass file descriptors or FILE pointers?
Comment 11 John Ehresman 2005-02-21 15:26:20 UTC
I wonder if it would be possible to provide an alternate api so that file
descriptors aren't passed between dll's.  This way, one dll could use MSVCRT and
another could use MSVCR71.  I personally think trying to get everyone to use
MSVCRT is not going to work in the long run.
Comment 12 Peter Zelezny 2005-03-01 07:38:33 UTC
> BTW, do you know if it is possible to (in some perhaps not officially sanctioned
> way) convince the latest Microsoft compilers to link with MSVCRT.DLL? Presumably
> the latest C library headers also work MSVCRT.DLL

Yes, I've done that myself. You grab the MSVCRT.LIB from VC++ 6.x and put it in
the $LIB dir. Then try a compile, you'll get undefined symbols. You can find
these missing functions (IIRC it's only one or two tiny math functions) in
VC71's xxx/crt/src/intel/. Just compile them (cl -c file.c) and link them into
MSVCRT.LIB.

I hope that makes sense, let me know if you want more details. Very crude hack,
but I havn't noticed any ill-effects.
Comment 13 Tor Lillqvist 2005-03-02 13:55:08 UTC
BTW, the initial reason for me to open this bug was my first attempt to port
ORBit2, which quickly went nowhere because of the way the linc2 library used
sockets with GLib main loops.

When I now actually did port ORBit2 successfully earlier this year, I used a
different approach in linc2. Instead of passing a socket to GLib directly as it
does on Unix, I create a Windows event for the socket with WSACreateEvent() and
store the event handle in the GPollFD::fd field. The prepare method then selects
event notification with WSAEventSelect(), and the check method calls
WSAEnumNetworkEvents() to see which network event(s) actually caused the Windows
event to be set.

The above is a high-level summary. There were some irritating details in the
behaviour of WSAEventSelect() and friends that caused buggy behaviour that was
noticed only later when debugging ORBit2-using code like libbonobo, and took
quite some hacking to fix. See ORBit2's ChangeLog and the linc2 source code for
details.
Comment 14 Tor Lillqvist 2005-07-16 19:41:38 UTC
Resolving this now, as the socket part of giowin32.c has been rewritten in HEAD.
See discusion in bug #147392.