GNOME Bugzilla – Bug 580341
proxy support
Last modified: 2012-01-22 05:34:11 UTC
gnio should support proxies. The issues for applications are: 1. How do I know what proxy server/protocol (if any) to use? 2. How do I communicate with the proxy? 3. If the proxy requires authentication, where do I get the password from? libproxy exists to answer question 1. For question 2, there are in general 3 answers: * SOCKS. This comes in two versions (4 and 5) and is pretty simple. Instead of connecting to the host and port you actually care about, you instead connect to the socks proxy, optionally authenticate to it (v5 only), and then send a command indicating the host to connect to. It returns a status code, and if the connection succeeded, then you now have (effectively) a direct connection to the remote server; the SOCKS proxy plays no further role. * HTTP CONNECT. Some sites/services abuse the HTTP proxy CONNECT method (which establishes a tunnel through an HTTP proxy, and which was originally intended for passing https traffic through proxies). In effect though, this is just like SOCKS5, except harder to parse because it wasn't intended to be used for this. (Though we can still implement this in gnio.) * Something protocol-specific. Eg, using an HTTP proxy to do HTTP. For the first two (the proxy type is SOCKS, or else the proxy type is http and the application protocol *isn't* http) then gnio could do the proxy negotiation itself as part of the "connect" operation, completely transparently to the application. (In particular, for SOCKS at least, most people who need to use SOCKS need to use it for *all* non-local TCP connections, so it really does make sense to do it completely at the gnio level.) For protocol-specific proxy methods, gnio wouldn't be able to talk to the proxy itself, but it could at least redirect the socket to the proxy server and alert the application that it was talking to a proxy rather than having a direct connection. For authentication, currently we store proxy auth information in gconf (and although libproxy doesn't currently return it, that's due to be fixed). But talking to either gconf or gnome-keyring directly from gnio might not be possible, so this might require another extension-point sort of thing.
I guess that should be a system settings. On Windows the values should be fetched from Internet Options. On Gnome from GConf or NetworkManager+GnomeKeyring, if the second will support setting proxy per network etc. It can be an extention point (for example it would make no need of implementing in-glib methods of getting proxy setting in KDE - it could be in separate package). As far as type is concerned. I guess that there are a few types of proxy actucally used. Standard HTTP and sometimes SOCKS. Those could have implementation in GLib. Others can be added by application (possibly as extention points instead of direct writing to provide cleaner interface). I guess it could be an second extention point. I guess that those settings should be overwritable by application but by default it should use the system ones. For example libsoup or similar should be talking directly to HTTP proxy. PS. I know too little about proxies - what happens if one connect() via HTTP Proxy to HTTP service?
(In reply to comment #1) > I guess that should be a system settings. On Windows the values should be > fetched from Internet Options. On Gnome from GConf or > NetworkManager+GnomeKeyring, if the second will support setting proxy per > network etc. It can be an extention point (for example it would make no need of > implementing in-glib methods of getting proxy setting in KDE - it could be in > separate package). This is exactly what libproxy already does though. (Well, it doesn't support Windows yet, but they're working on that.) > Others can be added by application (possibly as > extention points instead of direct writing to provide cleaner interface). I > guess it could be an second extention point. I don't think there's much reason to use extension points; if it's a generic proxy type, you want glib to support it directly, and if it's not, then you would do it entirely at the application level. (I suppose that instead of having gio do fake-minimal HTTP to tunnel through HTTP proxies, you could instead have an extension point and have it use libsoup. But that doesn't really seem worth it.) > PS. I know too little about proxies - what happens if one connect() via HTTP > Proxy to HTTP service? It works exactly like normal (non-proxied) http, except instead of connecting to example.com and sending GET /index.html HTTP/1.1 ... you connect to the proxy and send GET http://example.com/index.html HTTP/1.1 ...
(In reply to comment #2) > > PS. I know too little about proxies - what happens if one connect() via HTTP > > Proxy to HTTP service? > > It works exactly like normal (non-proxied) http, except instead of connecting > to example.com and sending > > GET /index.html HTTP/1.1 > ... > > you connect to the proxy and send > > GET http://example.com/index.html HTTP/1.1 > ... > You misunderstood me. What happen if we try to send http stream over http proxy as if it was non-http (i.e. done in wrong way)? Regards
Oh. So there are two cases: - User has an HTTP proxy that is being abused as a general proxy. In this case it would probably work, though it's possible the proxy would be configured to allow CONNECTs to any port *except* port 80, since you shouldn't ever need to CONNECT to another http server. - User has an HTTP proxy that is only being used for HTTP (and either has a different proxy for other protocols, or doesn't use a proxy for other protocols). In this case, the proxy would almost certainly reject the CONNECT attempt.
Well. May be the most KISS solution would be to just add the proxy type enumeration and property specifing it. If application have certain requirements it can use it. Especially that most sane users eighter write implementation from scratch of HTTP - so it is relativly low cost for them - or use libsoup. This property can also for example force of using other proxy then default allowing user to set it per application. enum GProxyType { G_PROXY_TYPE_NONE, G_PROXY_TYPE_HTTP, G_PROXY_TYPE_HTTPS, G_PROXY_TYPE_SOCKS4, G_PROXY_TYPE_SOCKS5, /* ... */ G_PROXY_TYPE_ANY = 255 /* or -1 or something like that */ };
*** Bug 588495 has been marked as a duplicate of this bug. ***
I have not written any code here yet, but have a few updates anyway... First off, some of what I said before is wrong. The UI of the Network Proxies capplet confused me; "Use the same proxy for all protocols" basically only makes sense in combination with SOCKS. While there *are* some HTTP applications that sometimes tunnel themselves over HTTP proxies, this probably ought to be considered a protocol-specific proxy, and there doesn't seem to be much evidence of people expecting "Use the same proxy for all protocols" + HTTP to work. So there are really three cases: (1) no proxy, (2) SOCKS for everything not in ignore_hosts, (3) protocol-specific proxies for some protocols. I think GSocketClient is the right level to put proxy support in. So for SOCKS, you call g_socket_client_connect*, and it connects to the SOCKS server, negotiates with it, and then returns the proxied connection to the destination server. However, this means if you call g_socket_connection_get_remote_address() on the returned GSocketConnection, you will get the address of the proxy server, not the address of the destination server. That is potentially breaking, and probably means that we have to make proxy support be opt-in... (We'll also want to add another method to GSocketConnection to get either a GSocketAddress or a GSocketConnectable identifying the destination server.) For protocol-specific proxy information, you need to know the protocol being used; port number is not sufficient (http://example.com:8000/ should use the HTTP proxy info). So we will want to add g_socket_client_connect_to_uri(), which will parse a URI, look up proxy information for the given protocol type, and then do a direct or proxied connection to the indicated host and port. (path, query, etc, in the URI would just be ignored). (_connect_to_uri() will also help a bit with TLS support; when using "https" it would know to create a GTlsSocket, etc.)
*** Bug 601009 has been marked as a duplicate of this bug. ***
Indeed, this is all very interesting for socket level proxying and it will be great once it's implemented. But as has at least been alluded to here, dealing with real HTTP proxying doesn't belong in gio. I will go digging, but does anyone here know if there is a generic application level proxying API in glib? Just as gio is a good place to put socket level proxying, it would be nice if glib also had generic HTTP proxying handlers so that dealing with proxies could be done all in one place.
Ahhh. libsoup. Hrm. libsoup looks a bit heavy (with all the XLM-RPC in it) for just handling HTTP. I wonder if that's why not so many applications (just from a poll of my local system here with lsof) are using it. Maybe some applications consider it too heavy?
Most GNOME apps that use HTTP use libsoup. I've never made any effort to evangelize it to non-GNOME developers. You are basically saying "I want a library that handles all of my use cases and none of anyone else's". The only way you're going to get that is to write it yourself.
(In reply to comment #11) > Most GNOME apps that use HTTP use libsoup. OK. Fair enough. I was of course, just commenting on my very unscientific survey here of running applications. I do see the likes of evolution, gnome-panel, pidgin, and seahorse using it. One application in particular that I do see missing though is banshee, but that's a mono app, so maybe that's got something to do with it. > You are basically saying "I want a library that handles all of my use cases and > none of anyone else's". I don't think I am. Iff my assumption was correct about the heaviness of libsoup, my only thought was perhaps to split the pure http handling and the xmlrpc handling into two different libraries. But if my assumption is wrong, of course, nothing needs doing. Basically, my itch here is to use Negotiate authentication for all of my applications and having that all in one place, (libsoup for example) and having all applications that want to use HTTP (via a proxy or not) use that same single implementation.
Dan said: 1. How do I know what proxy server/protocol (if any) to use? 2. How do I communicate with the proxy? 3. If the proxy requires authentication, where do I get the password from? libproxy actually exists to solve #1 and #3. #1 is mostly complete. #3 has plumbing in place, but needs improved backend support. We currently support backends for envvar and gconf (read-only), but would like to add read/write support for something like gnome-keyring. We purposely do not support #2, since it requires protocol-specific knowledge, something better left to (say) an HTTP client. We are also working on a Windows port as well. I have a near-rewrite nearly ready to commit which will become 0.4.0. It re-factors things a bit to make cross-platform support easier. One of the main cross-platform features of the new code is cmake support. I hope to use this to create a MSVC++ port. We're also happy to take patches as well. One final plea, while the KISS is attractive to developers (indeed, it makes for fairly clean code), forcing the policy onto applications means that applications will implement it wrong (since the policy code is the hard part, and pretty much every application has a track record of getting it wrong). libproxy can make this easy and we try to be easy to work with.
*** Bug 593204 has been marked as a duplicate of this bug. ***
I'm currently working on hooking proxy support directly into GNIO. As proposed in previous comment, GSocketClient seems to be the right place. I thought workflow should go like: call to _client_connect: Get proxies from libproxy Iterate proxies calling proxy handlers Moving current code into an handler for direct:// proxy. But that solution has one problem. GNetworkAddress, the GSocketConnectable used for connect_to_host and connect_to_service, is trying to change the DNS into an IP address. We cannot be sure that the DNS (if it exists) inside the network will resolve address outside this network. We might have to check the instance type, and if it's GNetworkAddress, we should grab the hostname and port from properties instead. We also need to make sure that the hostname (if it's a GNetworkAddress) or the IP address (for GInetSocketAddress is not part of the ignore list. I'm not sure yet if DNS should be resolved to be able to check the resulting IP (if we get an answer) against the ignore list. That could mean multiple enumeration, which are not cached inside GNetworkAddress. While all this should work, I'm not sure if it's the cleaner solution. We might miss a way to prevent proxy to be used too. Idea's are strongly appreciated.
Created attachment 155817 [details] [review] Add g_socket_connectable_get_name() Right. The TLS/SSL support runs into the same problem; you need to know the hostname associated with a connectable, not just the IP address (in the TLS case, it's so that you can verify that the certificate returned by the server has the hostname you expected). This patch is from the tls branch, but was intended to solve the proxy code's problem too.
(In reply to comment #15) > Get proxies from libproxy Note that libproxy is problematic; libproxy 0.2's gnome plugin causes crashes if called from a thread other than the main thread, and libproxy 0.3 works around this by being horrifically slow. libsoup/libsoup/soup-proxy-resolver-gnome.c works around the problem by looking in gconf directly by itself, and only using libproxy to handle ignore lists, WPAD, and PAC. But the libsoup code has other problems (it leaks bad environment variables into child processes) and as currently written, it ignores $http_proxy if it's set. (Although that then gets into another problem; gnome-terminal copies the proxy values from gconf into $http_proxy, etc, in the environment, but this means that if you have a proxy configured, start gnome-terminal, launch an application from that terminal, and then disable the proxy in gconf, then that program will see gconf saying to use no proxy (which is correct), but the environment variables saying it should use a proxy (which is wrong). This could be fixed if gnome-terminal set additional variables that would let the app recognize that the environment variables had been set from gconf and so should not be used to override gconf.) > We might have to check the instance type, and if > it's GNetworkAddress, we should grab the hostname and port from properties > instead. The port is in the GInetSocketAddress. So just combine that with the hostname from g_socket_connectable_get_name(), above. > We also need to make sure that the hostname (if it's a GNetworkAddress) or the > IP address (for GInetSocketAddress is not part of the ignore list. I'm not sure > yet if DNS should be resolved to be able to check the resulting IP (if we get > an answer) against the ignore list. The standard behavior is that if the user tries to connect to a hostname, you only check it against hostnames in the ignore list, and if he tries to connect to an IP address, you only check it against the IP addresses.
Dan, etc al, The "horrifically slow" workaround is now gone in libproxy 0.4. We now read the full gconf config at startup time and updates are pushed into libproxy asynchronously (to be consumed quickly at the next call to get_proxies()). I have always said that if anyone finds any performance issue it will be considered a critical bug and will receive my utmost attention. If you experience any such issue, please let me know. It will probably also interest gnio devs to know that libproxy also now runs on win32 and mac with full feature support. Our next major release will support credentials as well. Nathaniel
Actually, I should say that 0.4 actually supports reading credentials from gconf. 0.5 will focus on writing credentials as well.
Ok, I bumped my Gentoo libproxy to version 0.4.0, but things are not going so well. First thing I tried is: proxy http://www.google.com And it crashed: Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7bcd3f8 in libproxy::proxy_factory::get_proxies (this=0x603010, url_=<value optimized out>) at /usr/src/debug/net-libs/libproxy-0.4.0/libproxy-0.4.0/libproxy/proxy.cpp:192 192 if ((*i)->changed()) { I'm running a release build with debug info, so most of the code if optimized out. I will dig a bit more later on a non optimized version.
We welcome a bug report. Also feel free to drop into #libproxy and we can help you debug.
Off Topic: For people that may not know, crash has been solved in libproxy. I also have a patch in gnome backend get result socks:// instead of http:// when socks server is to be used, but it still need more work I think. After more analyses of the GNIO socket code, I found that the socket support cannot be implement at a single place, GClientSocket. The proxy configuration (libproxy) must be integrated into the address enumeration mechanism. The way the SocketClient works, is that it receives a GSocketConnectable and return a GSocketConnection. The GSocketConnectable interface let user iterate on different possible GSocketAddress, until a connection is established. The goal of the iteration mechanism is to solve problem like DNS server sending multiple IPs. Requesting a DNS on local network DNS server while using a proxy is a very common bug that is often call "DNS leak". To avoid this, you need to request proxy information before trying DNS lookup. At first, this problem looked solvable with the get_name patch introduced in GSocketConnectable, but the fact is that you need more precise information in order to query the proxy configuration. Here's some example to clarify this: Let's say you have a GNetworkService with service=http, protocol=tcp and domain=stormer.com:1234. The expected URL for proxy would be http://stormer.com:1234. The precision must be as high as possible to prevent wrong match in the proxy ignore list. Now, let's say you resolve the GNetworkService into GNetwokAddress first. You will get a GNetworkAddress with hostame=stormer.com and port=1234. In this case, you loose the fact that it's for HTTP, and the query URL would be unknown://stormer.com:1234. My goal here is to handle the proxy configuration where the full information for querying it is available. The iteration would return correctly formed GSocketAddress the point to proxy on right port. The GSocketAddress will require additional property for proxy type (I guess an enum), destination hostname (or string version of IP) and destination port. Connecting to the different proxies (socks5, socks4 or HTTP Connect) would be done withing the GClientSocket class, as first expected. I haven't made any decision yet about the GSocketConnection remote address yet, but I think we should be able so override it with the real remote address when available, or maybe find a way to provide both the proxy server address and the destination.
GSocketAddress is intended to be a pretty thin wrapper around struct sockaddr, so adding an additional field to it for "protocol" would be weird. I was imagining that there'd be a g_socket_client_connect_to_uri() method (see comment #7). It would take a URI, pass it to libproxy, and then after it has the proxy info, mostly behave like g_socket_client_connect_to_host(). A related possibility would be to make a third GSocketConnectable implementation, called GNetworkURI or something, and g_socket_client_connect_to_uri() would work by creating one of those and then calling g_socket_client_connect(), the same way connect_to_host and connect_to_service work now. People using the old APIs would not end up being able to use protocol-specific proxies, though we could still handle SOCKS proxies for them.
But then you need to change all the project doing network with GIO to use that connect_to_uri() method. The gain of writing it in GIO is reduced a lot.
I need to clarification that in my plan I want GNIO to handle the socket level proxying while specialized library should handle specialized proxy protocol (like HTTP proxy remains part of libsoup). Those library needs to retrieve proxy configuration using libproxy. One of the problem Dan reported me on IRC is that libproxy is blocking non-glib friendly API. So I also propose to implement a glib API (with blocking and _async) for it.
punting to 2.26
Created attachment 157530 [details] [review] Implemented GProxyResolver, a libproxy wrapper
After more discussion with Dan, it ends up that until we find a better idea, having a new method proxy_enumerate() in GSocketConnectable would be the most ABI friendly way of integrating such feature. Work has started and I already have a first patch to propose. During discussion it was mentioned that libproxy was not GLib friendly and that having a GLib wrapper may help. The new class is called GProxyResolver which is in analogy with GResolver. My work can be found at: URL: http://git.collabora.co.uk/?p=user/nicolas/glib.git Git: git://git.collabora.co.uk/git/user/nicolas/glib.git Branch: proxy Commit: 101b8c10c92603f9e0af4ccd8686ead02345c6d9
Hi there, proxy support for GLib is now ready for review. URL: http://git.collabora.co.uk/?p=user/nicolas/glib.git Git: git://git.collabora.co.uk/git/user/nicolas/glib.git Branch: proxy Here's a quick architecture overview: New classes: === GProxyResolver: An extension point interface for proxy resolving, this is a GLib version of libproxy API. GProxy: An extension point interface for proxy connection and payload. GProxyAddress: A GSocketAddress wrapper for proxy information such as destinations host and port, username and password. GProxyAddressEnumerator: An enumerator for GProxyAddress. How it works: === As usual everything starts from a GScoketConnectable. Instead of using enumerate() method, one would use proxy_enumerate() that will return a GProxyAddressEnumerator. The enumerator uses a URI passed at construction time to resolve the list valid proxies with a GProxyResolver. For each of those proxy, we create a GProxyAddress and a socket. At each try, we attempted to connect the socket to the wrapped GSocketAddress (proxy server). The socket is then wrapped (along with the GProxyAddress) inside a GSocketConnection. This connection is given to a proxy protocol handler (GProxy implementer) using handlers from g_proxy_get_default_for_protocol() and method GProxy::connect() (except if protocol is direct). The proxy handler will return a new GSocketConnection (or same if no payload) that can be returned by the GClientSocket API. How do I avoid proxies ? === If you are using GSocketClient, you can disable proxies with the proxy-enable property. If you are using GSocketConnectable interface, keep using enumerate(). How to connect to a proxy server ? === Disable proxy in GSocketClient and connect to the proxy server. Then get the proxy handler using g_proxy_get_default_for_protocol() and call connect() method. The GProxyAddress inside the GSocketConnection is not used directly to allow manual connection and proxy connection stacking. Comment for reviewers: === Currently only SOCKS5 has been implemented. I will provide SOCKS4a, SOCKS4 and HTTP Connect later. Libproxy returns "socks" when the SOCKS protocol is unknown. To make it simpler, I thought it would be nice to split in three at the source (socks5, socks4a, socks4) so we don't have to write special code to try each of them. I'm concerned about one of my GLib addition g_uri_parse_authority(). I found last week that there was a more complete implementation in gdummyfile.c. The method would return a structure called GDecodedURI iirc, but as I don't know why it's private I decided to wait for others opinion on that before making it public. At the end I'm sure gurifunc.h can only benefit from having more parsing capabilities. It was not clear exactly how the dependency to libproxy should be handled, so I implemented the extension has an optional module (--enable-libproxy) (just like fam support). The GProxyResolver feature is guaranteed by a dummy implementation (always returns direct://). Cheer, Nicolas
ok, having just told you I couldn't review this until this weekend, here's a very quick "pre-review": Use G_DEFINE_INTERFACE to define interfaces. (The rest of gio was patched after you started hacking on this.) Use PKG_CHECK_MODULES for libproxy. You also need to check for >= 0.3. Should GProxyAddress wrap GInetSocketAddress rather than GSocketAddress? Better yet, could it be a subclass of GInetSocketAddress rather than a wrapper around it? And then you could get rid of GProxyAddressEnumerator, because a proxy_enumerator would just be a special-case GSocketAddressEnumerator that always returned GSocketAddresses that were GProxyAddresses. We probably want a parse-the-whole-URI function rather than just parse-out-the-authority-only. There's a bug for this floating around in bugzilla. g_network_address_parse_uri: URI schemes don't necessarily correspond to /etc/services service names. Eg, XMPP uses "xmpp-client" as a service name, but "xmpp" for a URI scheme. Moreover, you don't want to depend on getservbyname() anyway, because that means when a new service is added, the URIs would work for people on new distros with updated /etc/services files, but not for people on old distros. You probably want to just pass a "default_port" like g_network_address_parse() does. (And we might want to have this functionality in a new connectable subclass, GURIAddress or something, instead. For completely correct TLS certificate verification [http://www.ietf.org/id/draft-saintandre-tls-server-id-check-04.txt], we'll need to distinguish connections-to-hostnames, connections-to-SRV-records, and connections-to-URIs, and so having each as its own class might be useful.) The commit "Added proxy_enumerate method to GSocketConnectable" has some stray gsocketclient.c changes in it. The GNetworkService proxy enumerator has the same problem with assuming service name == URI scheme as g_network_address_parse_uri() did. "g_proxy_get_default_for_protocol" makes it sound like proxies are singletons, but it creates a new one each time
(In reply to comment #30) > Use G_DEFINE_INTERFACE to define interfaces. (The rest of gio was patched after > you started hacking on this.) Ok, I'll rebase and have look at that. > > Use PKG_CHECK_MODULES for libproxy. You also need to check for >= 0.3. Right, in fact I would probably push for >= 0.5 since 0.3 and 0.4 depends on X11 for the Gnome part. > > Should GProxyAddress wrap GInetSocketAddress rather than GSocketAddress? Better > yet, could it be a subclass of GInetSocketAddress rather than a wrapper around > it? And then you could get rid of GProxyAddressEnumerator, because a > proxy_enumerator would just be a special-case GSocketAddressEnumerator that > always returned GSocketAddresses that were GProxyAddresses. Hmm, will have to think about that. The result cannot get cleaner since two code paths would get merged together (and spread over GNetworkService, GNetworkAddress and GSocketAddress). In GSocketClient it would get cleaner since call enumerate() vs proxy_enumerate() at the beginning and check your GSocketAddress type is GProxyAddress (current practice would be to use GProxyInetSocketAddress, but it so long ..) at the end. It's really hard to tell if it's better or worst. Would be nice if we can elaborate more on that, since currently I'm puzzled but nothing clearly indicate that I must change this. > > We probably want a parse-the-whole-URI function rather than just > parse-out-the-authority-only. There's a bug for this floating around in > bugzilla. I think you mean that bug: https://bugzilla.gnome.org/show_bug.cgi?id=489862 So I should probably move the GDecodedURI into gurifunc.h and use it. I think the userinfo should be returned has an string array (and decoded). I'll try and revive that bug, I depend on that. > > g_network_address_parse_uri: URI schemes don't necessarily correspond to > /etc/services service names. Eg, XMPP uses "xmpp-client" as a service name, but > "xmpp" for a URI scheme. That one is wrong. The "xmpp" cannot be used in network URI since it does not represent a service (also the reason it's not in /etc/services). The xmpp scheme is used to identify a person, just like sip scheme (e.g. xmpp:jo@xmpp.ca). Note that this URI would be rejected because of the missing authority (prefixed with //) that signal an non-network URI. Essentially the rule here is if you cannot find a default port in the IANA port database (/etc/services) and no port is provided, the URI is most likely invalid. > Moreover, you don't want to depend on getservbyname() > anyway, because that means when a new service is added, the URIs would work for > people on new distros with updated /etc/services files, but not for people on > old distros. You probably want to just pass a "default_port" like > g_network_address_parse() does. I don't agree with the old distros argument. /etc/services MUST be up-to-date just like your time zone data. If it's not the case, you may have bugs, it's life. Also, it breaks the purpose which is to prevent client from having to do that query (with really ugly ifdef for endservent() not being really cross-platform). > > (And we might want to have this functionality in a new connectable subclass, > GURIAddress or something, instead. For completely correct TLS certificate > verification > [http://www.ietf.org/id/draft-saintandre-tls-server-id-check-04.txt], we'll > need to distinguish connections-to-hostnames, connections-to-SRV-records, and > connections-to-URIs, and so having each as its own class might be useful.) That was my first try, but the diff proved me wrong. Both classes where complete copy paste. My solution is to store the URI internally, so you can very easily discriminate URI from hostname, more then 99% of the code is the same. > > The commit "Added proxy_enumerate method to GSocketConnectable" has some stray > gsocketclient.c changes in it. Oops, will fix ... I probably squashed the wrong things once. > > The GNetworkService proxy enumerator has the same problem with assuming service > name == URI scheme as g_network_address_parse_uri() did. This is neither a problem and neither a bug. The service name if perfect choice for scheme. It gives the proxy resolver a maximum information and has zero impact on the NetworkAddress since a port is always set for SRV resolved address. > > "g_proxy_get_default_for_protocol" makes it sound like proxies are singletons, > but it creates a new one each time Maybe it should be a singleton, just have to put them in GHashTable and keep them alive with a refcount ?
(In reply to comment #31) > > Should GProxyAddress wrap GInetSocketAddress rather than GSocketAddress? Better > > yet, could it be a subclass of GInetSocketAddress rather than a wrapper around > > it? And then you could get rid of GProxyAddressEnumerator, because a > > proxy_enumerator would just be a special-case GSocketAddressEnumerator that > > always returned GSocketAddresses that were GProxyAddresses. > > Hmm, will have to think about that. The result cannot get cleaner since two > code paths would get merged together (and spread over GNetworkService, > GNetworkAddress and GSocketAddress). In GSocketClient it would get cleaner > since call enumerate() vs proxy_enumerate() at the beginning and check your > GSocketAddress type is GProxyAddress (current practice would be to use > GProxyInetSocketAddress, but it so long ..) at the end. > Ok, I just wrote it and it uses less code, so I guess it's better. Still need to fix some stuff I broke, but it should be ready for the end of day.
Done for this week. I have put the rework in a new branch 'proxy2'. The missing is things that must be discussed or confirmed, the URI parsing, using singleton for proxy, and I forgot to add the >= 0.5 for libproxy deps. URL: http://git.collabora.co.uk/?p=user/nicolas/glib.git Git: git://git.collabora.co.uk/git/user/nicolas/glib.git Branch: proxy2
(In reply to comment #31) > > g_network_address_parse_uri: URI schemes don't necessarily correspond to > > /etc/services service names. Eg, XMPP uses "xmpp-client" as a service name, but > > "xmpp" for a URI scheme. > > That one is wrong. The "xmpp" cannot be used in network URI since it does not > represent a service My bad. In that case, POP3 uses "pop3" as a service name and "pop" as a URI scheme. > I don't agree with the old distros argument. /etc/services MUST be up-to-date > just like your time zone data. Distros don't appear to work under that theory. On both Fedora 12 and Fedora 13, tzdata was updated to 2010i a few weeks ago. But F13's /etc/services is from January and F12's /etc/services is from last June.
(In reply to comment #34) > Distros don't appear to work under that theory. On both Fedora 12 and Fedora > 13, tzdata was updated to 2010i a few weeks ago. But F13's /etc/services is > from January and F12's /etc/services is from last June. Considering that most the the latest addition are simply not implemented on Linux I don't see that as a big problem. If you find that the standard protocol you want to implement is not yet in your distro /etc/services, I think it will be a good time to file a bug and request an update. Then you leave a note in your release notes that this is a required dependency to your software, exactly like if I use a new feature of GLib. If the protocol you want to implement is not standard, then you simply have to make sure the port number is always part of your URI. Exception to browsers and some calendar software, I rarely see softwares that let user input network URIs. Instead they will request the user partial information (such as email address, and sometimes server hostname) and will complete this information. In this case, an application that wants to be proxy configuration friendly will built up an network URI, make sure it sets that port correctly, and will use _parse_uri(). This is a very limited use case since it only applies to protocol implementor that are know to have a specific proxy settings like HTTP, FTP, RTSP, etc. Also, I've read POP RFC and they don't define network URI for there protocol. So if you know a software that uses pop://host:port it means that that software is wrong. Because pop2 and pop3 does not use the same port (compared to socks4 and socks5) it's even more wrong. In normal situation we should never use URI unless it's defined in RFC. I made an exception in the case of GNetworkServices, but's it's internal and for proxy resolving only. Also, libproxy API is by itself an exception. It's a browser API now used for larger use case. It's non-standard, but it's so simple that I don't want to change it.
(In reply to comment #35) > Also, I've read POP RFC and they don't define network URI for there protocol. pop URIs are defined in RFC 2384
Oh well, I tried, but RFC are so bugged ... This kind of things relegate the use of /etc/servives to pretty printing ... thanks Qualcomm. So I'll add a default_port parameter to g_network_address_parse_uri(). About GNetworkService using service name instead of 'none://' as internal proxy resolution string I still think it's the right thing to do since this is a hint, any objection ? Let me know if you find something else that need change.
Ok, rework is pushed: - Added default_port parameter for URI network address - Check the libproxy >= 3.0 for API compatibility For other readers, I haven't touch the URI parsing yet since Dan have been working on this last weekend. I guess we should try and get this upstream independently. Maybe I we should push the g_network_address_parse_uri() at the same time since it's a dependency for TlsSocket too ?
I've started to review your code, which I've ended up reviewing in the form of patches which I'm pushing to http://gnome.org/~danw/glib.git, proxy2 branch. (I'm reviewing sort of in order of the patches you committed, and have gotten as far as the introduction of GProxyAddress.) Anyway, while looking at the next batch of patches, I'm frowning again at the fact that GProxy only works at the GSocketConnection level. Because the problem is, especially for SOCKS users, all apps need to have proxy support all the time, and so if glib only provides proxy support at the GSocketConnection level, then this basically means that apps can never use GSocket. (We don't even provide an "annoying" proxy API for direct GSocket users; it's GSocketConnection or nothing. Well, I guess you could make a GSocketConnection, proxify it, and then grab the GSocket back out of the returned GSocketConnection, which will work as long as the proxy isn't actually doing encryption. But...) Especially given that 99% (or more?) of real-world proxy use is going to be unencrypted (in part because so many existing proxy-enabled apps assume they only have to worry about SOCKS at connect time), it seems unfortunate to optimize for ease-of-use-of-encrypted-proxies over ease-of-use-of-low-level-socket-API. So I think we should make GProxy work at the GSocket level, and adjust GSocketClient accordingly (to do the proxy stuff before wrapping the connection with a GSocketConnection rather than after), and have the SOCKS5 proxy code not offer to do GSSAPI encryption. (Another possibility is to have a GSocket subclass that wraps a GSocket + GIOStream combination. This would also solve some edge use cases for TLS...)
Comment on attachment 155817 [details] [review] Add g_socket_connectable_get_name() GProxyEnumerator makes this obsolete, though I need to figure out an alternative for TLS...
Comment on attachment 157530 [details] [review] Implemented GProxyResolver, a libproxy wrapper latest version is in Nicolas's git repo
(In reply to comment #39) Just about everything in this comment is above my head, but one thing did stand out for me... > So I think we should make GProxy work at the GSocket level, and adjust > GSocketClient accordingly (to do the proxy stuff before wrapping the connection > with a GSocketConnection rather than after), and have the SOCKS5 proxy code not > offer to do GSSAPI encryption. I'm not sure if this means how it reads to me, but for a proxy to not be able to use GSSAPI would be a great pity, imho. GSSAPI is the holy grail of SSO. To have a user's session be so transparent (i.e. nothing else asking for passwords) once logon credentials were acquired, only to be stopped to enter the same credentials again and again for proxy access would pretty much nullify the entire concept of SSO. If the above statement is saying what I think it's saying, I hope you will please reconsider.
We could still do GSSAPI *authentication* to the proxy, we just wouldn't support encrypting the communication between the client and the SOCKS proxy afterward. (At least in the first version. We could later add the socket-wrapping-a-stream thing and then transparently be able to support encryption too.)
(In reply to comment #43) > We could still do GSSAPI *authentication* to the proxy, we just wouldn't > support encrypting the communication between the client and the SOCKS proxy > afterward. Ahhh. That would be initially sufficient then. > (At least in the first version. We could later add the socket-wrapping-a-stream > thing and then transparently be able to support encryption too.) Which would be sweet, but even just doing authentication answers the SSO IT holy-grail. Thanx!
(In reply to comment #43) > (At least in the first version. We could later add the socket-wrapping-a-stream > thing and then transparently be able to support encryption too.) I'm curious how you will achieve that cleanly. You really should elaborate before throwing away a design that already cover that.
(In reply to comment #39) > I've started to review your code, which I've ended up reviewing in the form of > patches which I'm pushing to http://gnome.org/~danw/glib.git, proxy2 branch. > (I'm reviewing sort of in order of the patches you committed, and have gotten > as far as the introduction of GProxyAddress.) I'm back from a three days weekend, I'll have a look at those patches today. > it seems unfortunate to > optimize for ease-of-use-of-encrypted-proxies over > ease-of-use-of-low-level-socket-API. I don't understand the benefit of using GSocket in stream base connection. GSocket is just a thin porting layer over BSD like sockets. To use it properly, programmers need to go into bunch of enumerations, introspection, proxy handshake. It's clear to me that it's best to port those apps to GSocketClient/GSocketConnection, reducing code size and potential bugs. If current proxy implementation lacks flexibility I'll fix it. I already have two enhancements in the queue. From my point of view, the only case where we want socket base proxy is for UDP sockets, but aside libnice and GStreamer, there is very few use cases where you can't use GSocketClient/GSocketConnection. Also, adding function to GProxy is not an issue for later enhancement. > > So I think we should make GProxy work at the GSocket level, and adjust > GSocketClient accordingly (to do the proxy stuff before wrapping the connection > with a GSocketConnection rather than after), and have the SOCKS5 proxy code not > offer to do GSSAPI encryption. It's an horrible amount of code to not use GIOStream, so the resulting code would be that GProxy implementation would create a GSocketConnection and then delete it which would be followed by GSocketClient recreating one. Why not just try and keep it alive (when possible) ? And at the same time we do cover GSSAPI privacy mode. > > (Another possibility is to have a GSocket subclass that wraps a GSocket + > GIOStream combination. This would also solve some edge use cases for TLS...) That is unclear to me, you should elaborate. Why is TLS is affected by this anyway. Can't we just take any connection and ask the TLS implementation to do it's work, wrapping it in a new GSocketConnection so I don't need to care anymore about the fact it's TLS stream ? I often referring at Wocky, the XMPP library in Telepathy Gabble. Use case is simple, we use GSocketClient which will result into a GSocketConnection (a GIOStream). Essentially we will start XML negotiation in clear and discover we need to encrypt the connection. At that point the simplest could be to have something to handshake and wrap our already existing GSocketConnection. This comes way after proxy support.
(In reply to comment #45) > (In reply to comment #43) > > (At least in the first version. We could later add the socket-wrapping-a-stream > > thing and then transparently be able to support encryption too.) > > I'm curious how you will achieve that cleanly. You really should elaborate > before throwing away a design that already cover that. I'm not claiming that this is "cleanly", I'm just saying that wanting to use SOCKS with GSocket is probably a much more common use case than wanting to use SOCKS with GSSAPI encryption, and so if we can't have *both* APIs be clean, I'd rather have a clean "do SOCKS negotiation on this GSocket" API than a clean "give me an encrypted GSocketConnection to a SOCKS server" API. (Another alternative would be to have GProxy have both GSocket-level and GSocketConnection-level APIs, but the socket-level API would fail if the user had configured GSSAPI encryption.) Anyway, we'd virtualize g_socket_send, g_socket_receive, g_socket_condition_check, g_socket_condition_wait, and g_socket_create_source. In the case of GSSAPI SOCKS encryption, you'd let the condition/source methods just proxy to the underlying GSocket. g_socket_send and g_socket_receive would just translate to g_output_stream_write and g_input_stream_read, with async ones calling g_socket_condition_check() first and returning WOULD_BLOCK if it's not ready. (If that turned out to not work well, we could get trickier; starting a g_input_stream_read_async/g_output_stream_write_async in another thread, running that thread's main loop for one iteration, then if the async op hasn't already completed, cancel it and return WOULD_BLOCK.) GSocketConnection would be smart enough to recognize when it was given a GSocket-proxying-a-GIOStream, and would be able to bypass g_socket_send/receive and just proxy directly to the underlying stream in that case.
(In reply to comment #46) > I don't understand the benefit of using GSocket in stream base connection. > GSocket is just a thin porting layer over BSD like sockets. To use it properly, > programmers need to go into bunch of enumerations, introspection, proxy > handshake. It's clear to me that it's best to port those apps to > GSocketClient/GSocketConnection, reducing code size and potential bugs. Well... maybe eventually it will turn out that no one wants to use plain GSocket. Don't know yet I guess. (I had at one point thought that libsoup was going to want to use plain GSocket, since that's much closer to SoupSocket and would involve less porting. But using iostreams makes it easier to implement caching (where you just substitute in a GFileInputStream instead of the GSocketInputStream, etc). So it won't be using raw GSocket (much) either.) > > (Another possibility is to have a GSocket subclass that wraps a GSocket + > > GIOStream combination. This would also solve some edge use cases for TLS...) > > That is unclear to me, you should elaborate. Why is TLS is affected by this > anyway. Can't we just take any connection and ask the TLS implementation to do > it's work gnutls, nss, and openssl all expect the layer underneath them to be BSD-socket-like, and in particular they expect it to have an EAGAIN-like async API rather than a GAsyncReadyCallback-like async API. So it's impossible to wrap an arbitrary GIOStream with TLS without having some sort of GIOStream-like-API-to-GSocket-like-API adapter.
I have merged and fixed the review branch. Commit comment are not quite right, but to ease review I decided to keep it this way, at least until we are done with review. I think the TLS stuff could use GBufferedInput to simulate the BSD-socket-like API. But as discussed with you earlier I recognize this would add one copy of all the data being passed. For the GProxy stuff, I don't think it's really an issue to provide an utility to also connect using a GSocket, but this means GSocketConnection need something to prevent the socket from being closed when the GSocketConnection is disposed.
Anything else for the review ? Do we agree that a GSocket base GProxy could be added later to the interface if there is really a need or you think it MUST done now ? How is going your work about URI parser? I'll need it to replace with g_uri_parse_authority.
(In reply to comment #50) > Anything else for the review ? eventually, yes. probably not this week. > Do we agree that a GSocket base GProxy could be added later to the interface if > there is really a need or you think it MUST done now ? it can be done later > How is going your work about URI parser? I'll need it to replace with > g_uri_parse_authority. you can just make g_uri_parse_authority be internal to libgio until we have a public replacement for it
It's been a long time without news about this review. Without proper feedback I can only guess what should be changed. Here is the list of my current guesses: - Make g_uri_parse_authority() private - Enable GProxy interface to handshake directly to socket Anything else ? I really want to get this upstream ready as soon as possible, but I need help for this. I can work full time on this as long as I know that I'm working in the right direction. There is at least two other subjects that I have not got much feedback on. The first one is implementing GProxy for HTTP Connect (filtering http, ftp and gopher since they don't need handshake). The second one is to make this completely stackable (like doing double proxy tunneling). If you think those are good or bad, just let me know, I need your 2 cents. To avoid long discussion here, you can also reach me on IRC as stormer (I'm always on #nautilus, #gtk+, #empathy, and freenode #telepathy)
(In reply to comment #52) > It's been a long time without news about this review. yeah, sorry. I don't have enough time to devote to this stuff (and Alex has even less currently). > Here is the list of my current guesses: > > - Make g_uri_parse_authority() private Yes. I have started on the uri functions bug, but I want to make sure we get it right the first time, which requires figuring out what other apps/libraries need, etc... > - Enable GProxy interface to handshake directly to socket IMHO, yes > Anything else ? Splitting the libproxy module out of the glib source tree; since libproxy depends on glib for its gnome plugin, we don't want glib to depend on libproxy as well, because circular dependencies are obnoxious for packagers. The best solution I've come up with here is to re-colonize the "gnio" module, dumping the remainder of what's currently there, and make it contain the plugins for proxying and (eventually) TLS. Maybe other stuff later on. (Eg, extending GResolver to work with avahi for link-local name resolution or something.) > There is at least two other subjects that I have not got much feedback on. The > first one is implementing GProxy for HTTP Connect which, IMHO GProxy should not implement, although related to that, there needs to be a way for libsoup to know after calling g_socket_client_connect() that it's been given a socket that's connected to an HTTP proxy rather than one that's connected directly to the server. (Maybe there already is?) > The second one is to make this > completely stackable (like doing double proxy tunneling). which again, IMHO, is unnecessary. (However, if GProxy inputs and outputs the same type (either GSocket or GSocketConnection) then it seems like this should just work?)
(In reply to comment #53) > (In reply to comment #52) > > There is at least two other subjects that I have not got much feedback on. The > > first one is implementing GProxy for HTTP Connect > > which, IMHO GProxy should not implement, although related to that, there needs > to be a way for libsoup to know after calling g_socket_client_connect() that > it's been given a socket that's connected to an HTTP proxy rather than one > that's connected directly to the server. (Maybe there already is?) You will know, but not through the GSocket. If you use GSocket directly (which I don't recommend) you can 1) used GSocketConnectable::proxy_enumerate() as replacement of GSocketConnectable::enumerate() and check if the returned GSocketAddress is a GProxyAddress which will give you all the information or 2) Directly use the GProxyResolver, in which case you will parse the returned URI (until we have a proper URI parser) and build a GProxyAddress. Note that for those two methods you have full control over when you connect/handshake with the proxy server. You do so by getting the right implementation using g_proxy_get_default_for_protocol(), the proxy protocol is found in the GProxyAddress or has the proxy URI scheme (when using GProxyRresolve). And then calling the GProxy::connect() on the returned instance. With those two methods you can easily prevent the HTTP Connect implementation of GProxy from being used in favor of normal HTTP proxy. With GSocketClient, it's much simpler, you just call g_socket_connection_get_proxy_address() on the returned connection to get the info. You can prevent proxy from being used by calling g_socket_client_set_enable_proxy() with FALSE. While thinking of it, GSocketClient lacks one feature that would make protocol specific proxy like HTTP possible without conflict with libsoup or any (other library that needs HTTP proxy). I will add a function, something like g_socket_client_add_raw_proxy(const gchar *protocol). For protocols that are added using this function, GSocketClient will simply TCP connect to the server and leave the rest to the caller, without looking for a GProxy implementation. For example, assuming libsoup uses GClientSocket, it would call g_socket_client_add_raw_proxy ("http") and look in the GSocketConnection returned by g_socket_client_connect() if a proxy has been used and if it's HTTP. This way, libsoup could coexist with an HTTP Connect GProxy implementation, and libsoup can use proxy without having to implement an GProxy (and doing so, conflicting with let say an FTP library). > > > The second one is to make this > > completely stackable (like doing double proxy tunneling). > > which again, IMHO, is unnecessary. (However, if GProxy inputs and outputs the > same type (either GSocket or GSocketConnection) then it seems like this should > just work?) In my implementation I did a mistake in SOCKS support that prevent that, but I think I'll fix anyway, should be fixed at the end of this week. This is just implementation change, no API change. The thing is that libproxy returns SOCKS (which can mean SOCKSv5, SOCKSv4a or SOCKSv4). As we don't know we try the three, but I did it the wrong way, letting a GProxy rebuilding a connection based on the proxy address. What I will do, is to split SOCKS configuration in three at the source (in the libproxy GProxyResolver implementation) and strictly document that a configuration should not require multiple attempts. Otherwise, with that fixed it should just work. IMHO we need not to assume that there is no filter added to the stream (e.g. SOCKSv5 with GSSAPI Privacy). Adding a filter to a GSocket is big pain, and we should not push this into GProxy API. Instead we would just document that connect to a proxy using GSocket will not work if you need to transform the data.
(In reply to comment #53) > Splitting the libproxy module out of the glib source tree; since libproxy > depends on glib for its gnome plugin, we don't want glib to depend on libproxy > as well, because circular dependencies are obnoxious for packagers. About this one, I was going to put the GIO libproxy extension into libproxy, but I don't mind having it somewhere else, if it's does not take too much time.
I did several more pass with Dan Winship and I think we got it right finally. The libproxy extension is now part of a new module called glib-networking http://git.gnome.org/browse/glib-networking/ If anybody has anything to say, please speak (including a go for merge). as a reminder: Git: http://git.collabora.co.uk/?p=user/nicolas/glib.git;a=shortlog;h=refs/heads/proxy2
I pushed some minor fixes to my repo. Also, the "Add property to override network service scheme" commit should just be squashed into the original implementation of proxy_enumerate on GNetworkService. With those changes I think it's good to go!
I've pulled, squashed and tested you changes along with the squash you mention. Everything looks fine now, pushed !
FIXED? This was pushed to glib-networking, not glib, right?
The bulk of it is in glib. The part that depends on libproxy is in glib-networking, because libproxy depends on glib, and so putting GLibProxyResolver into glib itself would have created circular dependencies.