GNOME Bugzilla – Bug 120299
Win32 iochannel/g_poll/etc code needs rewrite
Last modified: 2011-02-18 16:13:49 UTC
Sorry if the terminology below is a bit incorrect, I easily get lost in the twisty maze of io channels, gsources, poll fds, main loops etc. The Win32 implementaion of GIOChannels, g_poll() etc needs a rewrite. It has worked well for GIMP and other similar applications, but doesn't work for the way the linc2 library (used by ORBit2) uses main loops and GSources, for instance. There are fundamental problems in how the Win32 implementation of GIChannels, GSources watching them, and GSources with GPollFDs that should poll sockets (or fds) interact. It's hard to describe the problem consicely, but I'll try. The Win32 GIOChannel code creates a thread for each channel when a watch for the channel is created using g_io_add_watch() and friends. For file descriptors, the thread blocks trying to read from the fd, while for sockets it blocks in select(). (This is because there is no API in Win32 that would correspond to select() ing on a (disk or pipe) file descriptor in Unix. Or at least not if the file descriptor is created in a program out of your control. If you would write an app from scratch for Win32, you could of course handle files in special ways to enable select()-like functionality, and much more. But as GLib is to be used from apps using more or less normal ANSI C and Unixish functionality, that doesn't help.) The Win32 g_poll() implementation waits for events signalled from those per-channel threads. Until now, this has worked quite well in GIMP and other apps. But linc2 reveals several problems with this approach. For instance, there is undesirable dependency between Win32 GIOChannels and the GSources (watches) created and added to them with g_io_add_watch(). The thread for a socket selects for in/out/exception only for those watches added to it with g_io_add_watch(). And even worse, only the channel's own read method (for fds) or the channel's own check method ever resets the event that woke up g_poll(). It isn't possible to create a watch for the socket using your own kind of GSource, add GPollFDs to it (even if the GPollFD has been created with g_io_channel_win32_make_pollfd()), and attach that GSource to a main loop context and run that loop, and expect things to work... Another problem is that the current code doesn't scale well. The WaitForMultipleObjects() used in g_poll() has a relatively low limit of 64 handles that it can waited on simultaneously, which is much too low for many types of applications. The rewritten code should presumably work somehow like this: - It probably is a good idea to drop the concept that GPollFD::fd on Win32 must be a HANDLE. Let it be a file descriptor or SOCKET, similarily as on Unix, and handle it transparently. There hopefully shouldn't be many applications or libraries out there that know that know to put HANDLEs in GPollFD::fd. It's more important to make porting software that puts file descriptors or sockets in them easy. - For GIOChannels to files/pipes, or GPollFDs, there still needs to be one thread per file that is being polled. However, the creation of the thread should probably be postponed until g_poll actually is called. But the thread probably still, once created, has to be blocking trying to read from the fd all the time. When polling several fds, there is no way to unblock the other threads when one of the fds has fired. - For sockets, one should save on the number of threads used, and use just one thread for each set of up to FD_SETSIZE sockets. - The thread for sockets shouldn't be blocking in select() all the time, only when in g_poll(). It should be possible to have several GSources with GPollFDs for the same socket (linc2 does this), but attached to different main loop contexts, as long as they aren't running at the same time. Or something like that... - The g_poll() implementation shouldn't use one event for each (fd or sockets) thread, as the limit of HANDLEs in a WaitForMultipleObject() call is limited to 64. Use just on event, and then additional logic to find out which GPollFD it actually was.
I haven't yet had the inspiration and/or time to finish the rewrite, but here is a suggested API addition before the API freeze. In gmain.h, the long comment could be replaced by something like: /* Any definitions using GPollFD or GPollFunc are primarily * for Unix and not guaranteed to be the compatible on all * operating systems on which GLib runs. * * On Windows, the fd in a GPollFD can either be file descriptor as * provided by the C runtime library MSVCRT.DLL, a SOCKET from * WinSock, a window handle (HWND), or a HANDLE for an object. (A * HANDLE should be such that can be used by the WaitForSingleObject, * WaitForMultipleObjects or MsgWaitForMultipleObjects functions. This * does not include Win32 API file or pipe handles, for instance.) * * On Windows, fd can also be the special value G_WIN32_MSG_HANDLE to * indicate polling for messages for any window in the current thread. * * It is possible, even though perhaps not likely in most * applications, that the same numerical value will be both a valid * file descriptor and a SOCKET, HWND or HANDLE. To guard against this * kind of overlap and guarantee that GLib does what you mean, don't * set the fd field of GPollFD in your code, but use the function * g_pollfd_set_fd(). This sets the fd to an internal magic value * recognized by GLib, with associated type information and * redirection to the actual file descriptor or socket to be used. * * Note that G_WIN32_MSG_HANDLE GPollFDs should not be used by GDK * (GTK) programs, as GDK itself wants to read messages and convert * them to GDK events. */ This enum could be added: typedef enum { G_POLLFD_FILE_DESCRIPTOR, G_POLLFD_SOCKET, G_POLLFD_WIN32_HWND, G_POLLFD_WIN32_HANDLE } GPollFDType; And these functions: void g_pollfd_set_fd (GPollFDType type, gint fd, GPollFD *pollfd); void g_pollfd_unset_fd (GPollFD *pollfd); On Unix, those two functions would have trivial implementation, just setting and zeroing the GPollFD::fd field.
Tor, I am not sure that we should assume a HWND is a guint or a gint. I am getting this warning with VC 2003: c:\Documents and Settings\Administrator\My Documents\Visual Studio Projects\gnetIPv6\gnet-private.c(300) : warning C4311: 'type cast' : pointer truncation from 'HWND' to 'unsigned int' On this line of code: gnet_iochannel = g_io_channel_win32_new_messages((unsigned int)gnet_hWnd); VC docs say of C4311: "A 64-bit pointer was truncated to a 32-bit int or 32-bit long." I would prefer if we did something like this which I did for GNet internally for Unix: #ifndef SOCKET #define SOCKET gint #endif This way things work on Windows and Unix. Andrew Lanoix
I don't think usage and contents of GPollFD needs to be or should be portable across Win32/Unix, though changing how it works is probably an API change on Win32. My original conception was that GMainLoop would be significantly different on different operating systems, though GPollFD * crept into more of the API (g_main_context_query()) for GTK+-2.x.
Tor mentioned having the per-file-fd threads block on read(). Wouldn't it be better to have them block on a cond when they're idle, and then have a mode saying if it's supposed to read, write or idle? When the user does something that changes the mode, the main thread could signal the cond, waking up the thread. I.e. the channel would be in read mode as long as there is a read watch on it, and dispatch G_IO_IN if it has data in its buffer. It would be in write mode as long as it has data in its write buffer, and dispatch G_IO_OUT whenever there is free space in the buffer, provided there is a write watch, of course. If there is no read watch and the write buffer is empty, it would be idle. I don't see how async random access would be possible without such a scheme. However, I'm unsure what the semantics are if you have both read and write watches on a file FD... Also, for select(), I believe you can #define FD_SETSIZE to a higher number before including winsock2.h. At least the MSDN docs say you can.
> the channel would be in read mode as long as there is a read watch on it > [...] If there is no read watch [...], it would be idle. But if a read watch is removed, how do you suggest breaking the thread out of the read()? > Also, for select(), I believe you can #define FD_SETSIZE to a higher number Yes. If I recall correctly, you can even (re)size fd_sets dynamically using approproate code. That of course then depends on knowledge of the fd_set's internal structure, and you cannot blindly use the FD_* macros. I don't know if the underlying socket service providers typically are prepared to handle huge numbers of sockets in a fd_set, though, they might well have undocumented limits. (I don't remember the correct terminology here, but isn't the idea with winsock that you can have various stack implementations from different vendors for different protocol families, or something like that.)
Yeah, I agree that breaking out of the read() without closing the fd from another thread (the current approach when finalizing the channel?) would be a problem. I wonder if moving to a new file descriptor using dup() and close() the old one would work. Another inelegant solution would be to use tell() and stat() before entering read(), to determine if we've reached EOF. Then, if we're at EOF, we could wait a bit and repeat, for as long as the read watch persists - polling, in effect, like tail. I doubt tailing files in this way is a common usage pattern, anyway, so maybe the latency of an infrequent (say, 1s interval) poll is acceptable.
Tailing files *is* common usage pattern, if you in "files" also include pipes. GIMP's plug-in communication uses pipes.
Ah. I guess the implementation you outlined before is primarily meant for pipes, as it won't work so well for regular files (especially not if you want to write to them). Of course, handling files and pipes differently is a possibility...
Another fun thing to consider. If you pass a MSVCR71.DLL pipe or fd to a glib DLL compiled with MSVCRT.DLL, all hell breaks loose. And the latest MSVC compilers link your C programs with 71 by default. It might be an idea to warn people to only pass msvcrt.dll pipe/fds to glib.
That *has* been mentioned, several times, on the mailing lists, IIRC. BTW, do you know if it is possible to (in some perhaps not officially sanctioned way) convince the latest Microsoft compilers to link with MSVCRT.DLL? Presumably the latest C library headers also work MSVCRT.DLL (at least as long as we are talking about C library functions only, no C++ stuff)? How do commercial third-party libraries handle this issue? Do they ship different DLLs for MSVCRT, MSVCR70 and MSVCR71 -using code? Does the API of commercial libraries typically not pass file descriptors or FILE pointers?
I wonder if it would be possible to provide an alternate api so that file descriptors aren't passed between dll's. This way, one dll could use MSVCRT and another could use MSVCR71. I personally think trying to get everyone to use MSVCRT is not going to work in the long run.
> BTW, do you know if it is possible to (in some perhaps not officially sanctioned > way) convince the latest Microsoft compilers to link with MSVCRT.DLL? Presumably > the latest C library headers also work MSVCRT.DLL Yes, I've done that myself. You grab the MSVCRT.LIB from VC++ 6.x and put it in the $LIB dir. Then try a compile, you'll get undefined symbols. You can find these missing functions (IIRC it's only one or two tiny math functions) in VC71's xxx/crt/src/intel/. Just compile them (cl -c file.c) and link them into MSVCRT.LIB. I hope that makes sense, let me know if you want more details. Very crude hack, but I havn't noticed any ill-effects.
BTW, the initial reason for me to open this bug was my first attempt to port ORBit2, which quickly went nowhere because of the way the linc2 library used sockets with GLib main loops. When I now actually did port ORBit2 successfully earlier this year, I used a different approach in linc2. Instead of passing a socket to GLib directly as it does on Unix, I create a Windows event for the socket with WSACreateEvent() and store the event handle in the GPollFD::fd field. The prepare method then selects event notification with WSAEventSelect(), and the check method calls WSAEnumNetworkEvents() to see which network event(s) actually caused the Windows event to be set. The above is a high-level summary. There were some irritating details in the behaviour of WSAEventSelect() and friends that caused buggy behaviour that was noticed only later when debugging ORBit2-using code like libbonobo, and took quite some hacking to fix. See ORBit2's ChangeLog and the linc2 source code for details.
Resolving this now, as the socket part of giowin32.c has been rewritten in HEAD. See discusion in bug #147392.