GNOME Bugzilla – Bug 662623
connection failures on heavily loaded systems
Last modified: 2013-05-14 11:54:16 UTC
from a downstream bug... Apparently connect() on an AF_UNIX socket can return EAGAIN, if the server side of the socket has too much of un-accept()ed connections. This means that if you try to log in to a heavily-loaded machine, some clients may fail to connect to gconfd, because they're all trying to connect at once, and gconfd isn't getting enough cycles to accept all of them, and ORBit's connection code doesn't deal with EAGAIN. The proposed patch downstream was: +retry: LINK_TEMP_FAILURE_RETRY_SOCKET (connect (fd, saddr, saddr_len), rv); + if (rv == -1L && errno == EAGAIN) { + g_usleep (10000); + goto retry; + } (in linc2/src/linc-connection.c:link_connection_do_initiate) given ORBit's deprecation, maybe it's not worth trying to come up with anything better? (This patch was verified to fix the original reporter's problems.)
Hi Dan; sounds like you have a nicely loaded terminal server / SPICE farm: sexy :-) Of course I trust your work implicitly, commit what you like to ORBit2 - though if you want a release that is more difficult - I've rather lost track with that. Oh, and the patch looks good too. Thanks !
Oh, it's not my patch. Some support guy wrote it. :) I was mostly submitting it to see if you thought it was completely insane, and/or if you thought there was some better fix. I don't think it matters much if it gets committed to master... the problem only manifests if you have lots of clients connecting to the same server at once, and now that GConf uses D-Bus, I don't think that's really going to happen anywhere... Feel free to (a) apply the patch, (b) close the bug as OBSOLETE, (c) leave the bug open in case anyone else runs into the problem.
I commited the patch. Closing here.