GNOME Bugzilla – Bug 548466
async/cancellable DNS resolver
Last modified: 2009-04-22 12:40:42 UTC
We need a good DNS resolver. This is a spinoff of bug 515973 ("Need a network I/O support") and of 3 libsoup DNS bugs (noted below). The resolver API needs to: - Have both sync and async versions, in the standard gio style, including GCancellable support. - Use the system resolver library, not an external resolver library like c-ares that might not obey all of the same config files - Work regardless of whether or not g_thread_init() has been called. (qv libsoup bug 532778) - Handle IPv4 and IPv6, and return all of a host's addresses, so that the caller can try all of them. (qv libsoup bug 526321) - Handle IDNA (unicode domain names, RFC 3490, qv libsoup bug 548287) On Linux, we can do async lookups via getaddrinfo_a. (This uses threads internally, but it doesn't require glib-level threading.) On other posix platforms, the simplest solution that doesn't require threads is probably to use the old GNet/libsoup solution of forking and doing DNS from a separate process. I don't know what possibilities are available on Windows. In most of these cases, there is no way to actually cancel a lookup, so "cancellation" would be implemented by letting the lookup continue, but setting a flag to indicate that the result should just be discarded. (Meaning that even the synchronous versions of our APIs can't use the ordinary getaddrinfo() etc calls.)
> On Linux, we can do async lookups via getaddrinfo_a. This turns out to not actually be as fabulous a solution as was hoped: - There's no corresponding "getnameinfo_a" to do reverse lookups (IP -> name) so we'd need to include most of the non-getaddrinfo_a code path even under Linux anyway. - The notification from getaddrinfo_a arrives in another thread, but since g_thread_init() (potentially) hasn't been called, we can't call any glib methods from that thread. There are various possibilities, but the best solutions all involve using a mutex, and since we know that getaddrinfo_a always goes with pthreads anyway, it's silly to use one of the worse solutions just to avoid calling pthread_mutex_lock(). But then, if we're going to be using pthreads methods directly, AND we need a separate solution for reverse lookups anyway, it starts to seem silly to use getaddrinfo_a at all, rather than using pthread_create and regular getaddrinfo, which would work on Solaris and BSD as well as Linux. Related, Lennart Poettering has written a library called libasyncdns (http://0pointer.de/lennart/projects/libasyncns/) that does both pthreads-based and fork-based async DNS, and which is LPGL. The API is all wrong for glib, but we might be able to steal a bunch of code. Also, it turns out WinSock has async DNS, but only for IPv4, so we do need to use either threads or forking there as well.
(In reply to comment #0) > the simplest solution that doesn't require threads is probably to > use the old GNet/libsoup solution of forking and doing DNS from a separate > process. I don't know what possibilities are available on Windows. Windows doesn't have fork() of course, so we'd end up with: - a simple GThreadPool-based solution, for use on all platforms when gthreads are enabled - a fork-based solution, for POSIX platforms when gthread isn't being used (or even if gthread was being used, on POSIX platforms with only a non-thread-safe resolver, if there are still any such platforms that we care about?) - a windows native-thread-based solution, for Windows when gthread isn't being used There's already precedent for using windows native threads even when gthreads aren't in use (giowin32.c). Alternatively, the fork-based and winthread-based solutions could be merged into a single: - g_spawn-based solution using a separate glib-dns-helper app
Created attachment 117849 [details] sample internationalized domain name encoder/decoder This code does just the IDN encoding/decoding stuff, and seems to work and could be cut+pasted into apps that want that functionality now. The nameprep() step does not comply with either the current RFCs (which require you to do certain things using only the Unicode 3.2 tables), or the current revised drafts (which among other things suggest normalizing with NFC rather than NFKC, and forbidding most of the characters for which NFC vs NFKC would make a difference). It probably does the right thing for all currently-deployed IDNs.
A few things that come to my mind having written libasyncns: - getaddrinfo_a is a trainwreck, don't even think about it. - Automatic transparent IDN support is a security issue. Make sure to support IDN only "manually", otherwise people might expose IDNs in "too simple" UIs - I'd make sure to support both host/address lookup and generic RR lookup. This is very useful for clients needing SRV lookup, such as Jabber libraries. i.e. don't just wrap getaddrinfo/getnameinfo, wrap res_query(), too. (also, complteley ignore the old obsolete apis like gethostbyname please!) - It might be an idea to simply embed libasyncns in glib like we do it already with the xdg mime stuff. libasyncns is just one .c and one .h file. Just copy those into the glib sources and add a nice glib wrapper around it. Writing a completely new worker-thread based async resolver is a lot of work, and difficult to get right. (i.e. you need to make sure not to keep fds open from the parent process, you need to make sure the child gets killed when the father dies and so on. libasyncns nowadays does most of these things right and I wouldn't underestimate how difficult that actually is. Unix as an API makes it very difficult to make this work). I'd simply use libasyncns.[ch] as resolver on Unix, and on Windows write something simple win32-thread based. But yes, I am biased, I wrote asyncns and would of course like it to be reused in glib. ;-) - On Win32 you probably don't want to use Winsock but DnsQuery() for resolving DNS. It has better support for LLMNR and stuff. It's not asycnhronous though. - Execing a resolver binary each time will be *slow*, especially on Win32. Also, this will cause disk access. Not good. - Any of the other async DNS libraries for Linux are a bad idea, since they implement DNS directly, i.e. will not do NSS, not LDAP, no mDNS, no nothing. We *need* to go through glibc NSS (with the exception of res_query).
(In reply to comment #4) > - getaddrinfo_a is a trainwreck, don't even think about it. Too late, already thought about it! But yes, I came to the same conclusion. > - Automatic transparent IDN support is a security issue. Make sure to support > IDN only "manually", otherwise people might expose IDNs in "too simple" UIs How so? My working API is that g_addr_info_new() takes either a UTF-8 domain name or an ACE-encoded name, and decodes/encodes it internally if needed, and it has separate accessors g_addr_info_get_hostname() (for the "display" name) and g_addr_info_get_wire_hostname() (for the ASCII/wire protocol name), so the app always knows what kind of name it's getting back. (Of course for non-IDNs the two methods would return the same name.) > - I'd make sure to support both host/address lookup and generic RR lookup. This > is very useful for clients needing SRV lookup, such as Jabber libraries. i.e. > don't just wrap getaddrinfo/getnameinfo, wrap res_query(), too. OK... I was only planning to implement "resolve name to addresses" and "resolve address to name", and at a slightly higher level than getaddrinfo and getnameinfo themselves (eg, no "hints" parameter). In part because that's all that most apps need, and in part because I haven't ever written anything that needed to do a SRV lookup or anything like that, so I don't have a good sense of what kind of API we'd want for it... > - On Win32 you probably don't want to use Winsock but DnsQuery() for resolving > DNS. It has better support for LLMNR and stuff. My plan had been to just copy the code from libsoup. (Which uses getaddrinfo from WinSock, so it's not hugely different from the Unix code.) I'm willing to believe that's not the best plan. > - Execing a resolver binary each time will be *slow*, especially on Win32. > Also, this will cause disk access. Not good. Well, I was assuming the resolver process would linger in the background for a little while waiting for more requests. But yeah, I don't like it, I was just suggesting it as an alternative to needing to have separate fork-based and windows-threads-based implementations, in case people hated that idea. (Or in case Tor chimed in and said "oh, no, don't try to use native threads behind glib's back, it turns out that that doesn't actually work very well in giowin32".) > - It might be an idea to simply embed libasyncns in glib like we do it already > with the xdg mime stuff. libasyncns is just one .c and one .h file. Just copy > those into the glib sources and add a nice glib wrapper around it. Yeah... Pros: - less code for us to write - already tested, particularly in terms of things like all the nasty cleanup you have to do after forking to make sure you don't mess up your parent or anyone else, and stuff like that Cons: - We'd have to patch all the pthread_ calls to g_thread_ since glib/gio don't link against libpthread. (Pretty minor; can probably automate this as part of the importing-a-new-libasync process.) - We'd need to be able to pick between threads and forking at runtime, not just at compile time. (I assume we could convince you to accept this change upstream.) - GThreadPool behaves more dynamically than libasyncns does; it would be nice if libasyncns used "up to N" threads/processes, rather than "always exactly N", and just started up new ones as they were needed, and shut down idle ones when they weren't. - libasyncns has no Windows support, meaning we'd have to reimplement thread pools and some of the other glue ourselves anyway for the Windows-native-thread-based resolver. Or else add Windows support to libasyncns. (Which probably wouldn't work very well; Winsock doesn't have socketpair(), and your code wouldn't work with pipes.) OTOH, if the Windows code is going to be a lot different from the Unix code anyway (DnsQuery, etc), then it might be simpler to just keep it entirely separate. And there's probably a win32 API for thread pools we could use anyway... - libasyncns doesn't directly support the idea of cancellable *synchronous* lookups. (As noted earlier, since getaddrinfo() itself isn't cancellable, in order to do a cancellable synchronous lookup we have to do the lookup in another thread, and then poll for either that thread to complete or the GCancellable to be triggered.) Since an asyncns_t has only one response fd, we couldn't easily mix sync and async calls, or even do multiple sync calls. We'd have to either create a new single-thread asyncns_t for each synchronous lookup, or else have a separate thread just to poll the response fd and then dispatch each response back to the thread that's waiting for it. Having a separate socket for each worker would fix this, I think, though it would use up more fds and complicate the code, and make it more annoying to poll in the async case since you'd have to deal with multiple fds. I guess one way of solving several of the above problems would be for gio to not use libasyncns's own thread-pooling, but instead create a pool of single-thread asyncns_t's. Then it could dynamically control the number of threads by creating and destroying asyncns_t's, and it could use any given asyncns_t for either sync or async operation as needed.
(In reply to comment #5) > > - Automatic transparent IDN support is a security issue. Make sure to support > > IDN only "manually", otherwise people might expose IDNs in "too simple" UIs > > How so? > > My working API is that g_addr_info_new() takes either a UTF-8 domain name or an > ACE-encoded name, and decodes/encodes it internally if needed, and it has > separate accessors g_addr_info_get_hostname() (for the "display" name) and > g_addr_info_get_wire_hostname() (for the ASCII/wire protocol name), so the app > always knows what kind of name it's getting back. (Of course for non-IDNs the > two methods would return the same name.) I mean, you followed the discussion about characters that look like other unicode characters and that you hence can register a domain that for most people looks like "microsoft.com" but actually is something different? I think the generally accepted solution for this problem is that while browsers do accept international domain names from keyboard input they show only the xn-- name afterwards, except for domains that use very few whitelisted unicode chars only. So, what I'd like to see is that the API forces the programmer to think about this. Completely hiding IDN resolution is thus problematic. (But ototh i don't care too much about this. Users of this will fuck things up anyway...) > > - I'd make sure to support both host/address lookup and generic RR lookup. This > > is very useful for clients needing SRV lookup, such as Jabber libraries. i.e. > > don't just wrap getaddrinfo/getnameinfo, wrap res_query(), too. > > OK... I was only planning to implement "resolve name to addresses" and "resolve > address to name", and at a slightly higher level than getaddrinfo and > getnameinfo themselves (eg, no "hints" parameter). In part because that's all > that most apps need, and in part because I haven't ever written anything that > needed to do a SRV lookup or anything like that, so I don't have a good sense > of what kind of API we'd want for it... Most newer protocols (like XSMPP, SIP, ...) use SRV these days. LDAP and Kerberos use it too. You really should cover res_query(). I mean, there are more Gnome IM and telephony applications than stars in the sky. If Glib doesn't do async res_query() they'd still have to use an additional library just for that. (Also, if you do choose libasyncns on Unix and DnsQuery() on windows, both support res_query() style lookups, so this is actually easy). > - We'd have to patch all the pthread_ calls to g_thread_ since > glib/gio don't link against libpthread. (Pretty minor; can probably > automate this as part of the importing-a-new-libasync process.) Hmm, is this really a problem? I mean, libasyncns is only useful on Unix anyway. And all Unixes do pthreads now. In fact, glib exclusively supports POSIX and Win32 threading, hence I fail to see what we lose by linking Unix glib against POSIX threading. Having said this, I think for compatibility reasons we probably shouldn't be using the pthread-based code in libasyncns at all. The thing is that on most Unixes DNS-NSS lookups are implemented on top of res_query(). And only on the fewest of them res_query is entirely thread safe. In fact, the user is free to fiddle around with "struct state _res" which is globally accessible (and there is no locking for that struct). Which means: doing those queries in-process is a good choice only if nobody else in the process wants to play games with us. I generally favour doing those async queries out-of-process. Besides solveing this _res issue it also has the advantage that we can simply kill the forked-off process when we want to cancel lookups. > - We'd need to be able to pick between threads and forking at runtime, > not just at compile time. (I assume we could convince you to accept > this change upstream.) I'd strongly suggest supporting only fork-based solutions on Unix. Unix is so broken (see above) that doing this out-of-process appears to be the best solution to me. In libasyncns I offer both ways for people who like to burn themselves or for people who have full control over their program and every single library they link too. > - GThreadPool behaves more dynamically than libasyncns does; it would > be nice if libasyncns used "up to N" threads/processes, rather than > "always exactly N", and just started up new ones as they were needed, > and shut down idle ones when they weren't. This is actually something I have on my todo list to fix in asyncns too. > - libasyncns has no Windows support, meaning we'd have to reimplement > thread pools and some of the other glue ourselves anyway for the > Windows-native-thread-based resolver. Or else add Windows support > to libasyncns. (Which probably wouldn't work very well; Winsock > doesn't have socketpair(), and your code wouldn't work with pipes.) > > OTOH, if the Windows code is going to be a lot different from the > Unix code anyway (DnsQuery, etc), then it might be simpler to just > keep it entirely separate. And there's probably a win32 API for > thread pools we could use anyway... The amount of code you can share between Linux and Windows here is minimal anyway. I'd just use asyncns on Unix and something homegrown on windows. > - libasyncns doesn't directly support the idea of cancellable > *synchronous* lookups. (As noted earlier, since getaddrinfo() itself > isn't cancellable, in order to do a cancellable synchronous lookup > we have to do the lookup in another thread, and then poll for either > that thread to complete or the GCancellable to be triggered.) We can cancel getaddrinfo() if it is run out-of-process, simply by killing the child. > Since an asyncns_t has only one response fd, we couldn't easily mix > sync and async calls, or even do multiple sync calls. We'd have to > either create a new single-thread asyncns_t for each synchronous > lookup, or else have a separate thread just to poll the response fd > and then dispatch each response back to the thread that's waiting for > it. > > Having a separate socket for each worker would fix this, I think, > though it would use up more fds and complicate the code, and make it > more annoying to poll in the async case since you'd have to deal with > multiple fds. I do acknowledge these problems. This way libasyncns distributes requests between its worker threads is very efficient (no select() involved, the kernel basically does the load balancing for us) but unfortunately not that compatible with what you need to do. Having a seperate asyncns_t for each lookup would defeat the point of the worker pool in asyncns. Which is a major drawback. Hmm, maybe you shouldn't be using asyncns.c after all. Hmm, if you want to have a seperate fd for each query, then one option would be to pass the fd for the response over the fd with the request: i.e. the worker process reads all requests from a single fd and then sends the responses back to seperate fds for each query. This however would open an entirely new can of worms: fd passing over sockets is far from portable. Also, I am not sure if i'd like to see that change in asyncns. > I guess one way of solving several of the above problems would be for gio to > not use libasyncns's own thread-pooling, but instead create a pool of > single-thread asyncns_t's. Then it could dynamically control the number of > threads by creating and destroying asyncns_t's, and it could use any given > asyncns_t for either sync or async operation as needed. Sounds like an attempt to use asyncns for something it should better not be used for.
(In reply to comment #6) > Most newer protocols (like XSMPP, SIP, ...) use SRV these days. Yeah, I'm happy to have support for SRV lookup, but it should be in terms of some actual "SRV lookup" API, not a "here's a raw DNS response, parse it yourself" API. But we don't have to figure this out right away, because once we have the basic asynchronizing framework, it should be easy to add res_query support to it later and then appropriate higher-level APIs on top of that. > > - We'd have to patch all the pthread_ calls to g_thread_ since > > glib/gio don't link against libpthread. (Pretty minor; can probably > > automate this as part of the importing-a-new-libasync process.) > > Hmm, is this really a problem? I mean, libasyncns is only useful on Unix > anyway. And all Unixes do pthreads now. In fact, glib exclusively supports > POSIX and Win32 threading, hence I fail to see what we lose by linking Unix > glib against POSIX threading. As I understand it, linking to libpthread causes various standard library functions to do locking they would not do otherwise, incurring a slight performance penalty that some people don't want. (The same reason glib-level threads aren't enabled by default.) > I think for compatibility reasons we probably shouldn't be > using the pthread-based code in libasyncns at all. The thing is that on most > Unixes DNS-NSS lookups are implemented on top of res_query(). And only on the > fewest of them res_query is entirely thread safe. Have you seen evidence of getaddrinfo-in-threads failing anywhere or is it just theoretical? Evolution and libsoup both use that approach and I'm not aware of any problems turning up (as opposed to with gethostbyname-in-threads, which definitely did have problems, eg, bug 60642). I think Linux and Solaris getaddrinfo() have been thread-safe for a long time, and current versions of OS X and the BSDs are ok too. (Also, I'm assuming that this is how mozilla and qt do their DNS...?) > In fact, the user is free to > fiddle around with "struct state _res" Hrmph. I am perfectly happy saying that anyone who touches _res deserves to lose. > I generally favour doing those async queries out-of-process. fork-without-exec has its own problems: If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called. ie, it's possible that at the time that you forked, another thread was holding a mutex inside malloc(), which will never be released in the child process because that thread doesn't exist there, meaning the child will block forever the first time it calls malloc(). ARE WE HAVING FUN YET? :) (Fortunately this is not a problem in the single-threaded-app case, so we can still use forking there.) > Hmm, if you want to have a separate fd for each query It doesn't have to be a separate fd for each query, just a separate one for each worker, as long as we can tell which worker has which query. Oh, but we can't. Huh. Yeah, trying to pass per-query fds over the sockets is probably not good. > Hmm, maybe you shouldn't be using asyncns.c after all. Maybe. We probably at least want to reuse some of your code.
(In reply to comment #5) > > - I'd make sure to support both host/address lookup and generic RR lookup. > > This is very useful for clients needing SRV lookup, such as Jabber > > libraries. i.e. don't just wrap getaddrinfo/getnameinfo, wrap res_query(), > > too. > > OK... I was only planning to implement "resolve name to addresses" and "resolve > address to name", and at a slightly higher level than getaddrinfo and > getnameinfo themselves (eg, no "hints" parameter). In part because that's all > that most apps need, and in part because I haven't ever written anything that > needed to do a SRV lookup or anything like that, so I don't have a good sense > of what kind of API we'd want for it... So I read the SRV RFCs, and it's pretty straightforward (and it turns out I was mistaken when I claimed I'd never written code that needed to do a SRV lookup; evolution-exchange does it :-). So I've added support for SRV lookups. As for supporting res_query() directly, I'm not sure if Windows's DnsQuery() supports that; it returns a giant union type containing the *parsed* response, but there doesn't seem to be any way to get the raw *unparsed* data like res_query() returns. (There are some undocumented flags that might make it do that, but... they're undocumented.) > > - On Win32 you probably don't want to use Winsock but DnsQuery() for > > resolving DNS. It has better support for LLMNR and stuff. [citation needed]. The only place I've been able to find suggesting using DnsQuery instead of getaddrinfo is http://support.microsoft.com/kb/831226, which says you can use DnsQuery if you need IDNA support on XP or earlier (and we'll already be handling IDNA at a higher level, so we don't need OS support for it). It also looks like DnsQuery() only supports retrieving 1 kind of record at a time (eg, either just IPv4 or just IPv6 addresses) so we'd have to do all the address merging/sorting by hand. So I think we want to stick with getaddrinfo/getnameinfo on Windows (and only use DnsQuery for SRV records). The gtk-docs of my current code are at http://www.gnome.org/~danw/gio/networking.html Although it currently only supports DNS stuff, it makes some assumptions about the eventual gio networking layer that would be built on top of it; - GNetworkAddress is specifically a *network* address, and neither it nor GSockaddr make any attempt to wrap non-IP types of sockaddr (unix domain sockets, bluetooth sockets, etc). If we want gio-level support for those things, they wouldn't be implemented via sockets on Windows anyway, so we shouldn't expect the gio API for them to look sockety. (Eg, an API that used unix domain sockets on unix would use named pipes on Windows.) Likewise, neither type exposes the difference between IPv4 and IPv6, because apps shouldn't need to care. - The "socket"/"network connection" object would take a GNetworkAddress to tell it what to connect to (so most apps would never need to work with GSockaddr). For GResolver, currently the API has "g_resolver_new()", but it should be something more like "g_resolver_get_default()", where you'd have a single app-wide resolver, which maybe could adjust its resource usage (number of workers) depending on its refcount. And maybe "g_resolver_set_default()", which would allow the top-level application to create a GResolver subclass that did some sort of caching, and then ensure that all libraries in the application would see the same cached results. (Eg, for DNS pinning in a web browser.)
Created attachment 119596 [details] [review] the code The code, including an imported copy of libasyncns for fork-based operation in non-threaded apps. No Windows support (mostly).
Removing bogus dependency.
Created attachment 123649 [details] [review] updated GResolver patch Updated version of previous patch: - Reorganization/internal API changes - Includes Win32 implementation now; I tested it via mingw and wine, and the reverse-lookup and service-lookup code works, but the forward lookup (g_resolver_lookup_name, etc) always fails. I can't tell if this is my bug, or a mingw/wine problem. Anyway, it's *mostly* right. - Imported latest libasyncs - Added a test program for IDNA encoding/decoding
Quite a patch... 30 files changed, 6994 insertions(+), 6 deletions(-) Currently it seems to stand somewhat unrelated next to the rest of GIO. Where will this be used ? gvfs, libsoup, ...? Some questions from looking at the patch for a bit: - The hostname utils seem to be just string manipulation. We've put similar uri functions into glib itself. How about these hostname functions, should they go in glib, next to the uri functions, or in gio, next to the resolver ? - Is there ever a reason to use a non-default resolver ? If there is, maybe there should be a g_resolver_new() ? Ah, ignore that, I see you explained it further up. Maybe that explanation should make it into the docs... + * SRV (service) records are used by some network protocols to provide + * service-specific aliasing and load-balancing. For example, XMPP + * (Jabber) uses SRV records to locate the XMPP server for a domain; + * rather than connecting directly to "example.com" or assuming a + * specific server hostname like "xmpp.example.com", an XMPP client + * would look up the "xmpp-client" SRV record for "example.com", and + * then connect to whatever host was pointed to by that record. Would be great to turn this into an actual example. + case PROP_SERVICE: + g_return_if_fail (srv->priv->service == NULL); + srv->priv->service = g_value_dup_string (value); + break; Is there a reason to treat these properties as write-at-most-once ? Should they be construct-only ? At least, this peculiar restriction needs to be documented. Finally, there are some coding style adjustments to be made, like lining up arguments, etc.
(In reply to comment #12) > Currently it seems to stand somewhat unrelated next to the rest of GIO. Yeah... as mentioned in the original description, this is a spinoff of bug 515973 (networking methods in gio), so the idea is that once we have this, we can then make socket functions on top of it, which would make use of GInputStream and other gio stuff. Another possibility would be to put both the resolver methods and the socket methods into another separate library within glib (gnio/gnet/whatever) > Where will this be used ? gvfs, libsoup, ...? Yes and yes. (The gresolver branch of http://gnome.org/~danw/libsoup.git has a quick port of libsoup to use GResolver rather than the old soup-dns code. It's a little kinky because I still need to keep SoupAddress around for API stability, and SoupAddress supports using non-default GMainContexts, but it works, and it gets rid of the "you must call g_threads_init() to use libsoup" requirement.) Also, presumably most of the people cc:ed on this bug and 515973 are hoping to use it. The goal is that anything that links to glib that currently uses the libc resolver directly would use this instead, to have consistent async support and cancellability. (And IDNA and IPv6.) > - The hostname utils seem to be just string manipulation. We've put similar uri > functions into glib itself. How about these hostname functions, should they go > in glib, next to the uri functions, or in gio, next to the resolver ? g_hostname_is_ip_address() currently uses g_sockaddr_new_from_string(), which on Windows uses methods from -lws2_32 (winsock) to parse IP addresses. But it could be rewritten to just parse the string by hand, or maybe using GRegex. Or maybe we could just get rid of that method and say people who need it can implement it by hand using GSockaddr the same way the code currently does. Other than that, yeah, there's not really any particular reason why ghostutils would need to be in gio rather than glib. > Is there a reason to treat these properties as write-at-most-once ? Should they > be construct-only ? At least, this peculiar restriction needs to be documented. It made more sense in the older version of the patch. I can make them construct-only. > Finally, there are some coding style adjustments to be made, like lining up > arguments, etc. You mean lining up arguments in the actual function declaration, not just the prototype? OK. Any other specifics you noticed? I was trying to emulate glib style.
> Another possibility would be to put both the resolver methods and the socket > methods into another separate library within glib (gnio/gnet/whatever) Nah, I'd rather keep it here. > Other than that, yeah, there's not really any particular reason why ghostutils > would need to be in gio rather than glib. No strong opinion on this, really. Alex, any preference ? > You mean lining up arguments in the actual function declaration, not just the > prototype? OK. Any other specifics you noticed? I was trying to emulate glib > style. What I mean is to use int frob_me_harder (type1 arg1, type2 *arg2, longtype **arg3) { ... } Other things: - Try to use g-types in the api (like gchar, gint,...) even if it hurts... - Since tags - Integrate docs (ie additions to gio-sections.txt, gio-docs.xml and gio.types) + g_set_error (error, G_RESOLVER_ERROR, errnum, + errnum == G_RESOLVER_ERROR_NOT_FOUND ? + _("No '%s' service for '%s'") : + _("Error resolving '%s' service for '%s'"), + g_network_service_get_service (srv), ++ g_set_error (error, G_RESOLVER_ERROR, errnum, + errnum == G_RESOLVER_ERROR_NOT_FOUND ? + _("No '%s' service for '%s'") : + _("Error resolving '%s' service for '%s'"), + g_network_service_get_service (srv), + g_network_service_get_domain (srv)); g_network_service_get_domain (srv)); Here, I'd probably put the format in a variable and set that in the if sequence before, rather than use a ternary, but thats just me.
all the above comments are now addressed in the gresolver branch of http://gnome.org/~danw/glib.git
Hmmm... I took a quick look at this from an API point of view. I see you implemented your own adress objects. We had quite a lot of discussions how represent that in the gio network work (now gnio) and we ended up with a different model. Check it out here: http://sciyoshi.com/gitweb/?p=gnio.git;a=tree;f=gnio;hb=HEAD It has a splitup between GInetAddress (with ipv4 and ipv6 subclasses) which represent an internet address (ip). I.E it doesn't have a port. And a GSocketAddress which is something you can use to connect to a service, i.e. it has a port argument too. It also has support for other types of socket address, so the hierarchy looks like: GInetAddress +-- GInet4Address +-- GInet6Address GSocketAddress +--GInetSocketAddress (GInetAddress + port) +--GLocalSocketAddress (has path, i.e. a unix domain socket) GSocketAddress can be converted to sockaddr_t:s with g_socket_address_to/from_native() Your GNetworkAddress has both hostname and port in addition to the ip address. This is kinda weird imho and not a good separation of concepts. Also, the separation lets us support unix domain sockets, which even if its not really supportable on win32 is useful to have on unix. Also, i don't quite buy the argument in GNetworkAddress to not expose ipv4 vs ipv6. Its true in general that you shouldn't have to know about it, but I've written enought ipv6 specific stuff to know that its not quite true in reality. In the gnio approach most of the time you just handle GInetAddresses, but if you really need to you can check the type and cast down to the specific version. Anyway, I was eventually gonna import the gnio stuff into gio, so we need to sync this up so that we don't get conflicts wrt the address objects.
(In reply to comment #16) > We had quite a lot of discussions how > represent that in the gio network work (now gnio) and we ended up with a > different model. Ah, I wasn't aware that you had been involved in gnio at all. > Your GNetworkAddress has both hostname and port in addition to the ip address. > This is kinda weird imho and not a good separation of concepts. I would say "GNetworkAddress is the perfectly logical way of doing it, whereas gnio is just providing an extremely thin wrapper around the old ugly unixy BSD socket APIs, which makes it less appropriate for glib". :-) At any rate, GNetworkAddress is basically just struct addrinfo/struct hostent, so it's not even completely alien to the BSD sockets API. The difference is just that I'm (mostly) not providing the lower-level types, so you just keep using the GNetworkAddress all the way through, keeping all of the information bundled together, rather than exploding it into smaller types. And it's not "both hostname and port in addition to the ip address", it's "both hostname and port in addition to the ip address*ES*": one of the goals was to prevent bugs like bug 526321, where an app only looks at the first returned IP address (which is especially a problem for the case where a server has both IPv4 and IPv6 addresses, but the client doesn't have both IPv4 and IPv6 connectivity. Indeed, problems like this are actually stopping companies from deploying IPv6 access to their servers. Eg, google is available via IPv6, but only if you explicitly ask for ipv6.google.com, because they didn't want to break www.google.com for some people by adding an IPv6 address to it.) By having GNetworkAddress directly keep track of the multiple IP addresses, rather than forcing the caller to work with each address individually, or else pass around an array-of-something, it makes it easier to ensure that all available IP addrs will eventually get tried by the code that's actually calling connect(). (Yes, the current example code shows the caller having to try each GSockaddr individually, but that's just because we don't have any APIs that consume GNetworkAddresses yet.) Also, another (smaller) win of keeping everything bundled together is that it gives the app easy access to both the UTF-8 and ASCII-encoded versions of a GNetworkAddress's hostname. There are situations where you want to resolve a hostname and aren't thinking about any particular port (although this is much rarer than the situation when you want to resolve a hostname and *are* thinking about a port), but you can just pass "0" for the port in that case. And there are some situations where you want to deal just with an IP address and no hostname (and you can and should use GSockaddr directly for some of those cases). But for the most part, most applications are going to look up a hostname, and then want to connect to a specific port on that hostname. (Or else, they're going to have received a connection, and want to know who it came from.) Bundling all the information together in one object makes the common cases more convenient, and doesn't really make the less-common cases that much less convenient. > Also, the separation lets us support unix domain sockets, which even if > its not really supportable on win32 is useful to have on unix. But there's no need to expose struct sockaddr_un in a gio-level unix domain sockets API; you just need sock = g_unix_domain_socket_new_client ("/tmp/mysocket") It seems like it would make more sense to merge unix domain sockets with Windows named pipes into a single "local IPC via named endpoints" API than it would to merge unix domain sockets with internet sockets. > Also, i don't quite buy the argument in GNetworkAddress to not expose ipv4 vs > ipv6. Its true in general that you shouldn't have to know about it, but I've > written enough ipv6 specific stuff to know that its not quite true in reality. Yeah. At the time, I was thinking, "the application can just cast the GSockaddr to a struct sockaddr if they need to do that", but then that requires including the right socket headers for unix vs windows, etc. And using sucky sockets APIs. :) As mentioned above, a GNetworkAddress itself can't be considered to be either IPv4 or IPv6, since it could be storing both an IPv4 and an IPv6 address. A GSockaddr is always either 4 or 6 though, and we could expose that more explicitly. > In the gnio approach most of the time you just handle GInetAddresses, but if > you really need to you can check the type and cast down to the specific > version. GInet4Address and GInet6Address don't provide much API beyond what GInetAddress does though. They provide g_inet4_address_from_string() and g_inet6_address_from_string(), but again, that seems like the wrong API to me, because in most contexts where you want to parse an IP address string, you want it to work with either IPv4 or IPv6 addresses, but gnio (and BSD sockets) requires you to either already know which kind you have, or else try parsing it twice. Likewise, if someone creates a listening server, the default should be for it to listen on both 0.0.0.0 and :: (or 127.0.0.1 and ::1), without the caller needing to call both g_inet6_address_new_any() and g_inet4_address_new_any(), etc. So having IPv4 and IPv6 subclasses doesn't help a ton, since the higher-level code needs to know explicitly about both of them anyway. So I think it just needs an enum (or actually, probably a flags) type to distinguish IPv4 and IPv6, and a few more methods to create "any" and "localhost" addresses, etc.
(In reply to comment #17) > ...snip... > having GNetworkAddress directly keep track of the multiple IP addresses, rather > than forcing the caller to work with each address individually, or else pass > around an array-of-something, it makes it easier to ensure that all available > IP addrs will eventually get tried by the code that's actually calling > connect(). This doesn't mean that GNetworkAddress couldn't be implemented as a subclass of GInetAddress that instead stores e.g. a GList of addresses along with the hostname. The connect() APIs could check for this, and you would have the benefit of a more abstract API while keeping the logical separation in the background. > There are situations where you want to resolve a hostname and aren't thinking > about any particular port (although this is much rarer than the situation when > you want to resolve a hostname and *are* thinking about a port), but you can > just pass "0" for the port in that case. And there are some situations where > you want to deal just with an IP address and no hostname (and you can and > should use GSockaddr directly for some of those cases). This would be solved, again, by making GNetworkAddress store only the hostnames and IPs, and the GInetSocketAddress wrapping it store the port. > > Also, the separation lets us support unix domain sockets, which even if > > its not really supportable on win32 is useful to have on unix. > > But there's no need to expose struct sockaddr_un in a gio-level unix domain > sockets API; you just need > > sock = g_unix_domain_socket_new_client ("/tmp/mysocket") This is inconvenient if you want to treat socket addresses homogeneously; i.e. if you're getting an address from a config file you don't want to worry about which domain it's going to be. > GInet4Address and GInet6Address don't provide much API beyond what GInetAddress > does though. They provide g_inet4_address_from_string() and > g_inet6_address_from_string(), but again, that seems like the wrong API to me, > because in most contexts where you want to parse an IP address string, you want > it to work with either IPv4 or IPv6 addresses, but gnio (and BSD sockets) > requires you to either already know which kind you have, or else try parsing it > twice. This is an oversight; g_inet_address_from_string is supposed to be in the API, but looking over the code again it seems that I forgot to add it in. The separation of IPv4 and v6 doesn't prevent parsing an IP address string whose format you don't know. > Likewise, if someone creates a listening server, the default should be for it > to listen on both 0.0.0.0 and :: (or 127.0.0.1 and ::1), without the caller > needing to call both g_inet6_address_new_any() and g_inet4_address_new_any(), > etc. Again, if we had GNetworkAddress as a subclass of GInetAddress, g_network_address_new_any() could implement the functionality you're talking about.
(In reply to comment #18) > This doesn't mean that GNetworkAddress couldn't be implemented as a subclass of > GInetAddress that instead stores e.g. a GList of addresses along with the > hostname. But what happens when you create a GSocketAddress out of it? What would g_socket_address_to_native() output in that case? g_socket_connect() (and anything else that uses GSocketAddress) would end up needing to have two completely separate codepaths, one to deal with GNetworkAddress-based GSocketAddresses, and one to deal with other types. It would create more problems than it solved. > This would be solved, again, by making GNetworkAddress store only the hostnames > and IPs, and the GInetSocketAddress wrapping it store the port. "Solved" makes it sounds like a bigger problem than it is. You could just as easily say that GNetworkAddress solves the problem that gnio's resolver makes you create an intermediary address object: // gnio version GInetAddress *addr = g_resolver_lookup_name (resolver, "www.google.com", cancellable, &error); GInetSocketAddress *sockaddr = g_inet_socket_address_new (addr, 80); GSocketConnection *conn = g_socket_connection_new (G_SOCKET_ADDRESS (sockaddr)); // my version GNetworkAddress *addr = g_resolver_lookup_name (resolver, "www.google.com", 80, cancellable, &error); GSocketConnection *conn = g_socket_connection_new (addr); (ignoring cleanup in both cases) > > But there's no need to expose struct sockaddr_un in a gio-level unix domain > > sockets API; you just need > > > > sock = g_unix_domain_socket_new_client ("/tmp/mysocket") > > This is inconvenient if you want to treat socket addresses homogeneously; i.e. > if you're getting an address from a config file you don't want to worry about > which domain it's going to be. To actually be able to have the config-file-reading code not worry about the socket domain, you would have to serialize the raw bytes of the sockaddr into the config file so you could use g_socket_address_from_native(). Which would be crazy. If you want the config file to be able to specify things like "127.0.0.1:999" or "/tmp/mysocketpath", then you can't treat the socket addresses homogeneously.
(In reply to comment #19) > But what happens when you create a GSocketAddress out of it? What would > g_socket_address_to_native() output in that case? g_socket_connect() (and > anything else that uses GSocketAddress) would end up needing to have two > completely separate codepaths, one to deal with GNetworkAddress-based > GSocketAddresses, and one to deal with other types. It would create more > problems than it solved. I can think of a couple of ways around this, but all of them involve more convoluted APIs... The reason I still think that the GInet{,4,6}Address abstraction is nice, however, is because a) it's logical, and b) it's how other frameworks (read .NET and Java) do it, and people using gio from higher-level languages like Vala are likely to find it more familiar. > "Solved" makes it sounds like a bigger problem than it is. You could just as > easily say that GNetworkAddress solves the problem that gnio's resolver makes > you create an intermediary address object: > > // gnio version > GInetAddress *addr = g_resolver_lookup_name (resolver, > "www.google.com", > cancellable, &error); > GInetSocketAddress *sockaddr = g_inet_socket_address_new (addr, 80); > GSocketConnection *conn = g_socket_connection_new (G_SOCKET_ADDRESS > (sockaddr)); > > // my version > GNetworkAddress *addr = g_resolver_lookup_name (resolver, > "www.google.com", 80, > cancellable, &error); > GSocketConnection *conn = g_socket_connection_new (addr); > > (ignoring cleanup in both cases) Maybe "solved" was too strong a word :-). But in the example you give, GInetSocket is created with a floating reference, so there's no cleanup needed; I don't really see a problem here either.
IMHO storing the network address as one entity have several benefits. - The porting into the new IPv8 will be really easy (I know that the IPv6 is build with future in mind but that's just an example). You just wait for the new release of library. - You just use one code to handle both IPv6, IPv4 and any other protocol The ServerSocket[1] operations would look like: GNetworkAddress *address = g_network_address_new_any (8080); GServerSocket *socket = g_network_server_socket (address); g_server_socket_bind (socket, cancellable, error); In g_{unix,windows,...}_server_socket_bind: GInet4Address *addr4; GInet6Address *addr6; addr4 = g_network_address_get_inet4_address (self->priv->addr); _bind_to_ipv4 (self, addr4); addr6 = g_network_address_get_inet6_address (self->priv->addr); _bind_to_ipv4 (self, addr6); And for Socket[1]: GNetworkAddress *address = g_resolver_lookup_name(resolver, "www.google.com", 80, cancellable, &error); GSocket *socket = g_network_socket (address); g_server_socket_connect (socket, cancellable, error); In g_{unix,windows,...}_socket_connect: GInet4Address *addr4; GInet6Address *addr6; addr4 = g_network_address_get_inet4_address (self->priv->addr); if (addr4) { if (_connect_by_ipv4 (self, addr4)) return TRUE; } addr6 = g_network_address_get_inet6_address (self->priv->addr); if (addr6) { if (_connect_by_ipv6 (self, addr6)) return TRUE; } return FALSE: Preference of IPv6 over IPv4 or IPv4 over IPv6 could be event set in runtime. [1] I look from my GNIO/Java perspective where ServerSocket is the separate hierarchy from Socket
Dan, I don't disagree with what GNetworkAddress does. In fact, I think its a good idea to have this in one place, as that makes it easier to avoid the kind of issues you're talking about (multiple ips for a host). However, I don't like the lack of separation of concepts. From the highlevel point of view there are several concepts involved here: * An IP address (of some protocol type) * A socket endpoint specifier (ip + port, or some other identifier) * A network host (hostname, set of ips) I think it would be good to expose the network host concept as an API type, as it would help with the problems above, but since its essentially a container of other things, including ip addresses, I'd like it to be based on the IP address object and not just have that as an internal implementation detail. That way you can also use the ip address object to do ip-specific things, (like checking if an address is ipv4 or ipv6, or getting the actual ip you connected to) without that being contorted by facts like GNetworkAddresses having multiple addresses and other data. On the issue of ipv4 vs ipv6. I agree that the subclasses here aren't that important and could perhaps equally well be implemented as an get_type + enum. I don't particularly care which one we use. So, what about a setup like: GInetAddress - one ipv4 or ipv6 address GNetworkHost - hostname + list of GInetAddresses GSocketAddress - abstract baseclass for socket endpoint GInetSocketAddress GSocketAddressSource - iface to get a list of GSocketAddresses GSocketAddressSource is implemented by GNetworkHost and GSocketAddress (returning just itself), and g_socket_connection_new() takes a GSocketAddressSource. Connecting to a hostname would look like: GNetworkHost *host = g_resolver_lookup_name (resolver, "www.google.com", cancellable, &error); GSocketAddressSource *addresses = g_network_host_get_addresses (host, port); GSocketConnection *connection = g_socket_connection_new (addresses); g_socket_connection_connect (connection, cancallable, &error); We have one extra temporary here ("addresses") which makes the use in C require a bit more cleanup. We could either live with this or make GSocketAddress and GSocketAddressSource created with a floating ref. Of course, we could also make GNetworkHost contain the port and make it implement GSocketAddressSource, but I'm not sure that is a great idea. What if you want to connect to the same host but different ports, etc. I like to cleanly separate the concepts, even if it leads to an extra line of code in a few places.
(In reply to comment #21) > IMHO storing the network address as one entity have several benefits. > - The porting into the new IPv8 will be really easy (I know that the IPv6 is > build with future in mind but that's just an example). You just wait for the > new release of library. I am assuming that (a) there will not be anything beyond IPv6 in gio's lifetime, and (b) if there is, it will screw us over in ways we can't even imagine now :-) (since if it wasn't *radically* different, there'd be no reason to go beyond IPv6). (In terms of the networking layer as a whole, I think the most important thing to be paying attention to for near-term-future-proofing is SCTP (http://tools.ietf.org/html/rfc4960), since it has some nice properties and there has been some discussion of porting TCP-based protocols to use it (http://tools.ietf.org/html/draft-natarajan-http-over-sctp-00). But that's bug 515973, not here, and I think we're probably OK anyway.)
SCTP is supported via the protocol param in: GSocket * g_socket_new (GSocketDomain domain, GSocketType type, const gchar *protocol); So, yeah, we're future free there.
I just come up with an idea. It seems that the servers need several layers of security: - Localhost address - Safe-LAN - Unsafe-LAN - Global Shouldn't the API allow to choose one of them. Of course - only if NetworkManager support is enabled (yes - I know it would require circular dependences - but may be we could find out how to solve it). @Alexander Larsson: 1. May I ask what is the benefit of GNetworkHost? Why don't use GNetworkAddressSource? 2. Shouldn't the GSocketAddressSource be a subclass of GSocketAddress? If we need to connect to a single GSocketAddress (like "/tmp/my_program_iface") we don't really need to create a separate class. 3. I guess that GSocketAddressSource is about client of stream. However the server have to use multiply addresses at once. Having several GServerSocket is not nice - since logically it is one socket (althought in implementation that may be many). (In reply to comment #23) > (In reply to comment #21) > > IMHO storing the network address as one entity have several benefits. > > - The porting into the new IPv8 will be really easy (I know that the IPv6 is > > build with future in mind but that's just an example). You just wait for the > > new release of library. > > I am assuming that (a) there will not be anything beyond IPv6 in gio's > lifetime, and (b) if there is, it will screw us over in ways we can't even > imagine now :-) (since if it wasn't *radically* different, there'd be no reason > to go beyond IPv6). > IPv8 was just an example. In real would it may be AppleTalk or something like that.
(In reply to comment #22) > That way you can also use the ip address object to do ip-specific things, (like > checking if an address is ipv4 or ipv6, or getting the actual ip you connected > to) without that being contorted by facts like GNetworkAddresses having > multiple addresses and other data. (Note that you can do both of these things in the current code. GSockaddr is basically exactly equivalent to GInetSocketAddress in gnio. g_network_address_get_sockaddrs() returns the array of 1 or more GSockaddrs contained by a given GNetworkAddress. The gio equivalents of getsockname() and getpeername() would also return GSockaddr, not GNetworkAddress.) > On the issue of ipv4 vs ipv6. I agree that the subclasses here aren't that > important and could perhaps equally well be implemented as an get_type + enum. > I don't particularly care which one we use. Eh, if we're going to have abstract base classes and interfaces and stuff, we might as well go whole hog and have separate IPv4 and IPv6 subclasses too. > So, what about a setup like: > > GInetAddress - one ipv4 or ipv6 address > GNetworkHost - hostname + list of GInetAddresses > GSocketAddress - abstract baseclass for socket endpoint > GInetSocketAddress So you still want to explicitly support non-IP socket types in the public API? (GSocketAddress vs GInetSocketAddress). > GSocketAddressSource - iface to get a list of GSocketAddresses > > GSocketAddressSource is implemented by GNetworkHost and GSocketAddress > (returning just itself), and g_socket_connection_new() takes a > GSocketAddressSource. OK, so that actually solves a problem I had in the back of my head, which is that it seems like you ought to be able to pass a GNetworkService directly to g_socket_connection_new() as well (GNetworkService is the result of a SRV lookup and contains 1 or more hostname+port combos to try for a given service.) So GNetworkService can just implement GSocketAddressSource, and it will work. (Well... assuming a sufficiently complex GSocketAddressSource interface; you'd want to be able to resolve the hostnames in the SRV record one at a time rather than resolving them all ahead of time, since 95% of the time you'll only end up needing the first one.) > Of course, we could also make GNetworkHost contain the port and make it > implement GSocketAddressSource, but I'm not sure that is a great idea. What if > you want to connect to the same host but different ports, etc. I like to > cleanly separate the concepts, even if it leads to an extra line of code in a > few places. An extra line of code in *every* place, pretty much. FTR, getaddrinfo() itself takes a port number and returns struct sockaddrs with that port number filled in for you. (In reply to comment #24) > SCTP is supported via the protocol param in: > GSocket * g_socket_new (GSocketDomain domain, > GSocketType type, > const gchar *protocol); Right, but to support the advanced features (like multiple independent I/O streams on a single socket), it will probably need its own subclass of GSocket, which may then have implications for how other APIs are designed, etc. Nothing major, just something we want to be keeping in mind.
Maciej: I'm not sure what exactly you mean by the various network types, how you expect to be able to calculate what address is on what, or what you want to do with them Replies to your question. 1. What is GNetworkAddressSource? Do you mean GSocketAddressSource? Its an interface that lets you get a list of socket addresses to connect to, where you try them one at a time when connected. And I don't understand what you're asking about the benefits of GNetworkHost. It encapsulates information about a host, i.e. hostname and a set of addresses and is the result of a resolve and is used for reverse lookups, etc. Its a different entity than GSocketAddressSource. 2. Eh? GSocketAddressSource is an interface. Its not a class. And, GSocketAddress implements the interface, so there is no "separate class". 3. That part is a bit tricky. Linux for instance supports a single socket letting you accept both ipv4 and ipv6 clients, while other unixes doesn't. We need to abstract that out. However, I don't think that is quite related to the things discussed here.
dan: > (Note that you can do both of these things in the current code...) Its *possible* but its not really nice. You need to include system headers (which may have different names on unix vs win32 even?) and use system specific calls to extract any information from it. > So you still want to explicitly support non-IP socket types in the public API? > (GSocketAddress vs GInetSocketAddress). Yes, I'd like to have that. Its not a really big cost and it allows us to use the same code for e.g. unix domain sockets. I know that is kinda unix specific, but sometimes you have to do that, and Its nice that you don't have to do everything yourself then but can just GSocketConnection, etc. > GNetworkService stuff Sounds good, although it would put some higher requirements on the GSocketAddressSource interface, as it would be doing i/o. I.E. it needs to have both async and sync versions of everything. > An extra line of code in *every* place, pretty much. In every place that connects to something, but I don't think that is a very common operation, so its not that many places in total. However, I agree with you that it doesn't look as nice as if GNetworkHost would implement GSocketAddressSource directly. At least in the simple case. It would look a bit weirder in the more complex case where you e.g. resolve a hostname and then connect to it on multiple ports. Then you would need a way to change the port on a GNetworkHost, which is kinda weird. But as you say, getaddrinfo() itself does this, so maybe its a good pragmatic choice to make... Actually, one interesting aspect about having the GSocketAddressSource interface more complex as mentioned above is that it could handle the actual resolve itself. This means you can do: GNetworkHost *host = g_network_host_new ("www.gnome.org", 80); GSocketConnection *conn = g_socket_connection_new (G_SOCKET_ADDRESS_SOURCE (host)); g_socket_connection_connect (connection, cancellable, &error); And have the resolve postponed until the connect. I like it. Although the names are perhaps not ideal, GSocketAddressSource is a bit long.
(In reply to comment #27) > Maciej: > Replies to your question. > > 1. > > What is GNetworkAddressSource? Do you mean GSocketAddressSource? Its an > interface that lets you get a list of socket addresses to connect to, where you > try them one at a time when connected. > Ok. Thanks. I guess it could be better named. >
What about GSocketConnectable? Its a bit shorter, but still obviously socket related, and its got a very interface:ish name (sort of like GSeekable).
This may be off-topic but we recently landed GVfsDnsSdResolver in GVfs [1] http://svn.gnome.org/viewvc/gvfs/trunk/common/gvfsdnssdresolver.h?revision=2112&view=markup Do you think it's desirable and/or feasible to make a generic GIO resolver replace this? I think (but I might be wrong) it might be desirable to have this functionality in gio (apps should easily be able to do DNS-SD); perhaps as a helper class or subclass. I don't know; just an idle thought... (Of course, implementation-wise we can't let libgio depend on the Avahi libs (that bring in libdbus too) but it could be done through extension points or something.) [1] : right now (like all GVfs API) it's private API but I expect it to be public once we make the GVfs API public (so people can do out-of-tree backends)
I don't really think dns-sd is anywhere near as widely used as normal socket i/o, and I don't really think it has a place on the glib level. However, many of the datatypes involved in a gobject based dns-sd API are part of this code, and if said code implemented GSocketConnectable it would plug in very nicely, without needing to be in libgio or needing any extension points.
I've updated the git repo (http://gnome.org/~danw/glib.git): * Imported the address-related classes from gnio, cleaned up a bit, glib-ified (eg, single includes, gioaliases, etc). Made it use "gresolverprivate.h" to get the socket/winsock includes, fixed some of the Windows code, implemented some missing GInet6Address stuff. * Renamed GLocalSocketAddress to GUnixSocketAddress and made it only be built on unix. * Added g_inet_address_from_string(). Moved g_inet[46]_address_to_bytes to g_inet_address_to_bytes. * removed GSockaddr, made GResolver and GNetworkAddress use GInetSocketAddress instead. (We probably still want to change g_resolver_lookup_address() to take a GInetAddress instead of a GInetSocketAddress.) * Initial GSocketConnectable work, with a complete implementation for GSocketAddress, and a partial implementation for GNetworkAddress (it doesn't resolve the address if it's not resolved yet). I've started on GNetworkService too, but that's not committed yet. It definitely seems odd to me that GInetAddress has subclasses, but GInetSocketAddress does not (and has switch statements instead). I haven't decided for sure which direction it would make more sense to fix this in.
I agree that having one use subclasses and not the other is weird...
I have taken a look at the API as implemented in Dan's git branch. The concept of GInetAddress and GSocketAddress makes sense to me. I don't care that much whether IPv4 and IPv6 are implemented in subclasses or not, but I agree that we should be consistent for GInetAddress and GInetSocketAddress. We should really add GNetworkHost, though. In my opinion, it would be quite bad if the API didn't allow resolving a host name without a port. I'm not so sure whether we actually need GSocketConnectable and GNetworkAddress. Alternatively, we could, for example, also just provide two connect functions, one expecting a GSocketAddress and one a GNetworkHost and a port. This would keep the client code simple and still support connection to a list of addresses. We should not add this to the low-level GSocket API but only to the higher level TCP/UDP API. The low-level GSocket API should not know anything about ports.
Jürg: It certainly allows resolving without a specific port, just like the underlying getaddrinfo call does. Just pass in zero for it. However, it does mean that its there if you don't use it, and that you have to change the port or dup the GNetworkAddress if you want to use multiple ports. So, its an inconvenience in the uncommon case rather than "doesn't allow". The advantage of GSocketConnectable is, apart from just having one GSocketConnection constructor, that we can later plug in more sources, even in third party code. For instance an avahi based GSocketConnectable object can allow easy connecting to browsed to dns-sd objects. I don't understand what you mean by "The low-level GSocket API should not know anything about ports.". For sure the inet socket addresses needs to know about ports, but the core socket and connection code needs just to know about socket address objects, not ports, and GSocketConnectable doesn't know about ports either.
(In reply to comment #36) > It certainly allows resolving without a specific port, just like the underlying > getaddrinfo call does. Just pass in zero for it. However, it does mean that its > there if you don't use it, and that you have to change the port or dup the > GNetworkAddress if you want to use multiple ports. So, its an inconvenience in > the uncommon case rather than "doesn't allow". Right, if you can pass an invalid port number, it works, of course, although it still feels a bit strange to me. What about having both, GNetworkHost and GNetworkAddress? The resolver would return a GNetworkHost, however, we could add the function GNetworkAddress *g_network_address_new (const gchar *hostname, guint16 port) This would defer address resolving until calling the connect method similar to comment #28, which makes the client code a lot simpler for the async case. > The advantage of GSocketConnectable is, apart from just having one > GSocketConnection constructor, that we can later plug in more sources, even in > third party code. For instance an avahi based GSocketConnectable object can > allow easy connecting to browsed to dns-sd objects. Yes, certainly makes sense for that. An additional comment regarding GSocketConnectable from a language binding perspective, automatic memory management of the GSocketConnectableIter will be very difficult for language bindings due to the virtual method free_iter. Please use the same mechanism as g_file_enumerate_children and we should probably use use the same naming, for example, g_socket_connectable_enumerate_addresses. > I don't understand what you mean by "The low-level GSocket API should not know > anything about ports.". For sure the inet socket addresses needs to know about > ports, but the core socket and connection code needs just to know about socket > address objects, not ports, and GSocketConnectable doesn't know about ports > either. Yes, GInetSocketAddress knows about ports, of course, I really just meant GSocket, the core socket.
Having both GNetworkHost and GNetworkAddress sounds quite confusing to me. I mean, the names are basically identical and the only difference is the addition of the port info... At the end of the day I don't think having the optional port in GNetworkAddress is that bad. There is no port use in GSocket: $ grep -i port gsocket.c g_set_error (&error, G_IO_ERROR, G_IO_ERROR_NOT_SUPPORTED, "unsupported socket domain"); g_set_error (&error, G_IO_ERROR, G_IO_ERROR_NOT_SUPPORTED, "unsupported socket type"); g_set_error (&error, G_IO_ERROR, G_IO_ERROR_NOT_SUPPORTED, "unsupported socket protocol");
(In reply to comment #37) > An additional comment regarding > GSocketConnectable from a language binding perspective, automatic memory > management of the GSocketConnectableIter will be very difficult for language > bindings due to the virtual method free_iter. Please use the same mechanism as > g_file_enumerate_children and we should probably use use the same naming, for > example, g_socket_connectable_enumerate_addresses. So, first off, GSocketConnectable is still a work in progress. The model I was using was GHashTable/GHashTableIter, where you're asking the object to iterate itself, and it needs to keep some temporary state in order to do that. We could split it up a la GFileEnumerator (so you'd have the GSocketConnectable interface, which only had g_socket_connectable_get_enumerator(), and then GSocketConnectableEnumerator would have all of the other methods), but that seems like the wrong way to do it to me. The knowledge of how to iterate a GNetworkService belongs in GNetworkService, not in some other object. As for the memory-management issues, gobject-introspection has an planned annotation for "use method X to free the data returned from this method", though it's not implemented yet. Another possibility would be to register GSocketConnectableIter as a boxed type, which would simplify things a bit for bindings, at the expense of making things more complicated for connectable implementations.
(In reply to comment #39) > So, first off, GSocketConnectable is still a work in progress. Sure, I just thought it's easier if I comment as early as possible. > The model I was using was GHashTable/GHashTableIter, where you're asking the > object to iterate itself, and it needs to keep some temporary state in order to > do that. We could split it up a la GFileEnumerator (so you'd have the > GSocketConnectable interface, which only had > g_socket_connectable_get_enumerator(), and then GSocketConnectableEnumerator > would have all of the other methods), but that seems like the wrong way to do > it to me. The knowledge of how to iterate a GNetworkService belongs in > GNetworkService, not in some other object. I don't think that this would be wrong at all, this is how enumerator/iterator classes work, for example, in Java and .NET. The class of the enumerator object that GNetworkService returns should be private, of course, so it's not like you have to access private GNetworkService fields from a different place. The only issue I see is that you need a bit more GObject boilerplate but I don't think that this should matter for the API design. > As for the memory-management issues, gobject-introspection has an planned > annotation for "use method X to free the data returned from this method", > though it's not implemented yet. Another possibility would be to register > GSocketConnectableIter as a boxed type, which would simplify things a bit for > bindings, at the expense of making things more complicated for connectable > implementations. Yes, but it's not as simple as that, free_iter has two parameters, one for the GSocketConnectable and one for the GSocketConnectableIter, which is highly unusual for free functions and wouldn't work with GBoxed, for example. With the GFileEnumerator design we could just use g_object_unref, of course.
(In reply to comment #40) > I don't think that this would be wrong at all, this is how enumerator/iterator > classes work, for example, in Java and .NET. Hm... yes, but it seems right there, because they have inner classes. :-) I suppose this is more an argument about code organization than API design. Anyway, yes, I can redo it that way.
(In reply to comment #41) > Anyway, yes, I can redo it that way. So I guess it would it be called GSocketAddressEnumerator? (That would become the new longest filename... assuming that we did not make separate files for GSocketAddressSocketAddressEnumerator (ie, the GSocketAddressEnumerator implementation for GSocketAddress), etc. Right now GResolver has to be called with a hostname or sockaddr, and so to make GNetworkAddress be able to enumerate itself, it would have to create a *new* GNetworkAddress, and then copy the resulting array of sockaddrs. That would be dumb. Possible fixes: 1. Add new methods to GResolver that take pre-existing GNetworkAddresses and GNetworkServices. 2. As with #1, and remove the old methods, so you always have to create a GNetworkAddress or GNetworkService by hand before passing it to GResolver 3. As with #2, and add resolving methods to GNetworkAddress and GNetworkService themselves, so that GResolver becomes a background detail that most code would never deal with. 4. Alternate version of the above, where GResolver would just work on raw char *, GInetAddress *, and GSrvTarget * rather than working with GNetworkAddress and GNetworkService at all, since as shown by comment #28, most code wouldn't need to work directly with GResolver anyway. #4 then lets us ditch ports from the GResolver interface; GNetworkAddress would take a hostname and port, pass the hostname to GResolver, get back an array of GInetAddress, and construct an array of GInetSocketAddress. Code that wants to do DNS but not connect to a server can then just use GResolver directly and not have to fret about what port number to pass.
#4 sounds good to me.
Started working on this on the gresolver-new branch of http://gnome.org/~danw/glib.git. GResolver is rewritten as discussed, but GSocketConnectable isn't there yet, so GNetworkAddress and GNetworkService are a little bit useless at the moment. I started renaming GNetworkAddress to GNetworkHost, but that no longer seems to make sense, since it is now specifically for connecting to a specific host/port combo, so it's more of an "address" than a "host".
I started porting (well - along with rewrite) my branch to Dan's gresolver-new branch. This issue was present before but shouldn't functions such as g_socket_address_from_native throw GError **.
Created attachment 126185 [details] [review] Add hostname utilities (glib/ghostutils)
Created attachment 126186 [details] [review] import gnio's sockaddr types Major differences from gnio: * integrate with gio, glibify source * merge GInet4Address and GInet6Address into GInetAddress * implemented g_inet_address_new_from_string * implemented missing IPv6 GInetAddress properties * renamed GLocalSocketAddress to GUnixSocketAddress * made GInetAddress and GSocketAddress not GInitiallyUnowned * changed prototype of g_socket_address_to_native * made GSocketAddress subclass ctors return a GSocketAddress* * renamed a few ctors from X_from_Y to X_new_from_Y
Created attachment 126187 [details] [review] GResolver
Created attachment 126188 [details] [review] GNetworkAddress, GNetworkService, GSocketConnectable, GSocketAddressEnumerator
Created attachment 126189 [details] [review] merge GSocketConnectable and GSocketAddressEnumerator
OK, that's the code, rebased into multiple patches. The first is just the hostname utils in glib/. The second patch imports the gnio types, with some fixes and additions. (Notably, none of them are GInitiallyUnowned any more. That doesn't really work well in situations where an object doesn't have one obvious owner. Eg, if GInetAddress was initially floating, then passing it to g_resolver_lookup_by_address() would cause it to be ref_sinked and then unreffed, meaning you'd have to ref it before passing it to the resolver to prevent it from being freed. That doesn't make much sense. The third patch implements GResolver as discussed in part 4 of comment #42. The fourth adds GSocketConnectable, and GNetworkAddress and GNetworkService implementing it. But after I wrote it, I realized that GNetworkAddress and GNetworkService don't really seem to have any point any more besides being GSocketConnectables. So the last patch removes a bunch of the no-longer-relevant GNetworkAddress/GNetworkService code. I haven't tested under Windows lately, so it's possible I broke something in the rebasing.
Patch 1: ======== looks good Patch 2: ======= Boilerplate refer to GNIO: +/* GNIO - GLib Network Layer of GIO What if os doesn't support ipv6? Can we make it optional? g_socket_address_native_size => g_socket_address_get_native_size ? Lacks g_unix_socket_address_get_path()? Also, we want to expose abstract unix socket names (a linux extension). Maybe something like: GSocketAddress *g_unix_socket_address_new_abstract (const gchar *name) gboolean g_unix_socket_address_is_abstract (GUnixSocketAddress * address); const char *g_unix_socket_address_get_abstract_name (GUnixSocketAddress * address); Patch 3: ======== g_resolver_lookup_by_name_async(): If string is an address we just put it as a list in the simple result gpointer. However, in g_resolver_lookup_by_name_finish() we don't just return that pointer. Instead we call class->lookup_by_name_finish(), which e.g. in the case of gunixresolver thinks the gpointer is a GUnixResolverRequest *. => Boom Also, the list is not freed. In general you should be free to *not* call the finish() function and still not cause a leak. So we want a destroy notify on the gpointer data. Furthermore, the generic code in g_resolver_lookup_by_name_finish() only handles errors raised by g_resolver_lookup_by_name_async() itself. The general pattern we use is to have it handle all simple results that has an error set, in order to simplify the code in the multiple implementations of the _finish() call. This is true also for all the other _finish() calls, even if the general code doesn't set any errors. g_srv_target_new() - Is it ok to reference time_t in a public header like this in glib? Other code seems to do it to, but it seems kinda different than what we generally do with system-header defined types. I didn't actually review the resolver code, because I don't really know that stuff all that well... I'm sure its ok, and if it has bugs we can fix them later, as long as the API is ok. Will look at the other patches later today.
> What if os doesn't support ipv6? Can we make it optional? Do we know of any platform that glib 2.18 compiles on that does not support IPv6? Googling turns up http://www.ipv6.org/impl/unix.html; the "last modified" dates at the bottoms of the pages linked from there show that IRIX was the only UNIX whose current release didn't support IPv6 as of March 2003 (and googling further shows that IRIX had support as well a few months later). Making support for IPv6 *addresses* conditional is easy. The real problem would be that platforms that don't support IPv6 will most likely not have getaddrinfo() and getnameinfo() either, so we'd need to add codepaths using gethostbyname() and gethostbyaddr() (which are not threadsafe, so we'd have to mutex them and add warnings about not using the non-mutexed system methods if you're using the gio methods, etc). But we can do that if we need to. (libsoup has code for this.) > Lacks g_unix_socket_address_get_path()? > > Also, we want to expose abstract unix socket names (a linux extension). I didn't really think much about GUnixSocketAddress (beyond renaming it from "local" to "unix" for consistency). It's sort of separate from the rest of what I was working on; I only left it in to make sure that GSocketAddress stayed properly generic enough to be able to support unix sockets later. But at this point we could remove it from this patch and commit it separately afterward, once we know more exactly how it should work. > g_resolver_lookup_by_name_async(): > > If string is an address we just put it as a list in the simple result gpointer. > However, in g_resolver_lookup_by_name_finish() we don't just return that > pointer. Instead we call class->lookup_by_name_finish() Huh? if (g_simple_async_result_get_source_tag (simple) == g_resolver_lookup_by_name_async) { g_simple_async_result_propagate_error (simple, error); return g_simple_async_result_get_op_res_gpointer (simple); } > Also, the list is not freed. In general you should be free to *not* call the > finish() function and still not cause a leak. So we want a destroy notify on > the gpointer data. Oh, really? I thought the convention was exactly the opposite. (The caller is *required* to call the _finish function.) Although now I see this is documented under GAsyncResult... (We should move the general how-async-in-gio-works documentation to the start of the "Asynchronous I/O" chapter; the stuff I added to the GCancellable docs about how cancellation is async, not immediate, would go there too.) > g_srv_target_new() - Is it ok to reference time_t in a public header like this > in glib? > Other code seems to do it to, but it seems kinda different than what we > generally do with system-header defined types. There's definitely precedent, but I'm willing to believe that's a bug. I hadn't actually thought about it. We could return a g-type instead. Or we could just remove that method; we don't have access to TTLs for hostname lookups anyway, so providing them for SRV lookups is a little odd.
I don't really mind just flat out requiring ipv6 and getaddrinfo/getnameinfo, this is the year of the Rat after all. Whats the opinion of the maintainers? > Huh? Eh, I guess I just missed that since i'm so used to the pattern like: if (G_IS_SIMPLE_ASYNC_RESULT (result)) { GSimpleAsyncResult *simple = G_SIMPLE_ASYNC_RESULT (result); if (g_simple_async_result_propagate_error (simple, error)) return NULL; /* Or whatever is the "fail" return val */ } I think you should do something like: if (G_IS_SIMPLE_ASYNC_RESULT (result)) { GSimpleAsyncResult *simple = G_SIMPLE_ASYNC_RESULT (result); if (g_simple_async_result_propagate_error (simple, error)) return NULL; if (g_simple_async_result_get_source_tag (simple) == resolver_lookup_by_name_async) return g_simple_async_result_get_op_res_gpointer (simple); } > oh, really? I thought the convention was exactly the opposite. > (The caller is *required* to call the _finish function.) Its quite convenient in some cases to not have to call it: * Sometimes you don't care about the result really, just that something happens, so you pass in a NULL callback * Sometimes you know out of band that the operation was cancelled or its not interesting anymore, then you don't have to call _finish. However, its not supported to call _finish multiple times, because that would mean you'd always have to duplicate the result you return (so the info remains for the next caller), which would cause unnecessary overhead. I agree that the docs for things like this should be easier to find though. > time_t Actually, the specific case of time_t is kinda bad as we know this will change size eventually on 32bit arches, which would push an ABI change on glib. G_FILE_ATTRIBUTE_TIME_MODIFIED is explicitly guint64. Either remove it or use guint64 I say.
Patch 4+5: ========== The gnetworkaddress docs are kinda lowlevel and hopefully we'll update that when we land the GSocket code that does that stuff automatically. Otherwise looks good. Btw, can one ip address have multiple hostnames, and if so, should we handle it in the resolver somehow?
(In reply to comment #55) > Btw, can one ip address have multiple hostnames, and if so, should we handle it > in the resolver somehow? Nope, an IP address can only resolve to one hostname, even if there's more than one hostname that resolves to that IP address.
How does virtual hosting work then?
(In reply to comment #57) > How does virtual hosting work then? > AFAIK many hosts can resolve to one ip but one ip can resolve to one host only by reverse lookup. Please correct me if I'm wrong. (Then the HOST http header is supplied in the stream to indicate vhost - but I guess that wasn't point of the question).
(In reply to comment #56) > Nope, an IP address can only resolve to one hostname, even if there's more than > one hostname that resolves to that IP address. > http://en.wikipedia.org/wiki/Reverse_DNS_lookup#Multiple_PTR_records
desrt: So basically, it can happen, but we don't really care about that.
i sort of take the "triggering bugs in programs that only expect there to ever be a single PTR record" part of that article as a warning :) we should definitely ensure that nothing explodes as the result of getting more than one PTR record back.
desrt: Sure, don't explode. But no need to expose it as part of the API and push this case on app developers.
(In reply to comment #61) > i sort of take the "triggering bugs in programs that only expect there to ever > be a single PTR record" part of that article as a warning :) getnameinfo() is only able to return 0 or 1 names, so it's a warning for the glibc maintainers, not us; if an address does have more than 1 PTR, we'll never know.
Created attachment 126355 [details] [review] fixes for Alex's comments s/GNIO/GIO/ Rename g_socket_address_native_size() to g_socket_address_get_native_size() Fix up async error handling/memory management After some discussion with Alex and Matthias, this is now officially targetted for glib 2.22, to give us time to shake out API problems before it lands in a stable release. It should be ready to commit to trunk immediately after the 2.20 branch. I may copy+paste GResolver and its deps (but not GSocketConnectable) into libsoup for 2.26 to address the IDNA, threading, and IPv6 bugs mentioned in comment 0, and to shake out any bugs in it.
Created attachment 131258 [details] [review] Add hostname utilities (glib/ghostutils)
Created attachment 131259 [details] [review] Add network address and socket types updates and modifications)
Created attachment 131260 [details] [review] GResolver
Created attachment 131261 [details] [review] GNetworkAddress, GNetworkService, GSocketConnectable
OK, here are the 4 patches again, rebased to master and with all various fixes (including ones from desrt's tree) merged in. Relative to the last patch: * GSrvTarget's "expires" field is now gone (for consistency with the hostname resolution stuff, and solving the time_t problem) * GInetAddressFamily is now GSocketFamily (and includes G_SOCKET_FAMILY_UNIX) * misc minor fixes Ryan, I didn't take the "change g_inet_address_is_*() to _get_is()" patch. "type_is_x()" is much more common as a name in glib than "type_get_is_x()", and is a nicer name anyway, and there's no reason vala needs to be exposing both the property and the C convenience method for accessing it, and I don't think it makes sense to make the C API uglier to avoid adding a special case to the vala/gir scanners. (And we could just explicitly annotate those methods as being hidden anyway, once we start adding annotations to the glib sources.) Anyway, I'm willing to be convinced to go the other way on this, but that's how it is now. Also, it would be good to get Tor or someone else Windows-y to review the windows parts. Basically, I think this is ready to go though
Created attachment 131837 [details] [review] Fixes segv if there is no SRV record for the domain and reports the correct error
(In reply to comment #70) > Created an attachment (id=131837) [edit] merged that. also: i think i managed to convince dan again that _get_is_*() is better than _is_*() :)
After discussing with Alex and Matthias, I've now pushed this to the "gresolver" branch of glib on git.gnome.org. (Including, as Ryan noted, the change back from _is_ to _get_is_, and including a slightly modified version of Sjoerd's bugfix.)
Created attachment 132996 [details] [review] Split GSocketAddressEnumerator out of GSocketConnectable as discussed on IRC today, this splits out GSocketAddressEnumerator from GSocketConnectable, which Ryan feels makes more sense at least from gnio's POV. If we decide to go this way I'll squash this commit with the rest of the connectable commit before merging to master.
I think that looks good. However, i'd like some renaming: g_socket_connectable_get_enumerator -> g_socket_connectable_enumerate g_socket_address_enumerator_get_next -> g_socket_address_enumerator_next (same with the the other _get_next variants)
There are those that would argue that it should be next() and get() as separate calls with the semantics that the first time you call next() it prepares the first result (which you can then access with get() as many times as you like) and you keep calling next() until you have false. iter = connectable.iterate() while iter.next (): do_stuff (iter.get ()) do_more_stuff (iter.get ()) still_the_same_item (iter.get ()) I actually don't care too much about that myself, but I think that's how the 'collections API' iterator interface is supposed to work... Might be nice to be consistent with that.
ps: thanks very much for the patch, Dan :)
It's a desrt! We were looking for you on IRC. So this patch (plus Alex's suggestions) seems right for gnio? We were thinking about merging to master, but would prefer to have the API right the first time...
Created attachment 133046 [details] [review] Split GSocketAddressEnumerator out of GSocketConnectable take 2, with Alex's changes
This looks good to me. IMHO with this we should be able to merge. Any further issues can be handled in master.
Committed!