GNOME Bugzilla – Bug 631379
GDBus nonce-tcp test failing
Last modified: 2011-04-14 17:29:11 UTC
ERROR:gdbus-peer.c:1141:nonce_tcp_service_thread_func: assertion failed (error == NULL): Error binding to address: Address already in use (g-io-error-quark, 33) Program received signal SIGABRT, Aborted. [Switching to Thread 0x7ffff638c710 (LWP 9968)] 0x00007ffff701eba5 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. in ../nptl/sysdeps/unix/sysv/linux/raise.c (gdb) thread apply all bt
+ Trace 224007
Thread 1 (Thread 0x7ffff7fd9700 (LWP 9960))
From strace, it appears to be creating a socket, binding it to a random port (port 0), using getsockname() on that socket and then trying to bind to the same port number for a second time. The port number is different each time (randomly chosen by the kernel, I guess).
the problem is with this loop in gdbusserver.c: resolved_addresses = g_resolver_lookup_by_name (resolver, host, NULL, error); if (resolved_addresses == NULL) goto out; /* TODO: handle family */ for (l = resolved_addresses; l != NULL; l = l->next) { GInetAddress *address = G_INET_ADDRESS (l->data); GSocketAddress *socket_address; GSocketAddress *effective_address; socket_address = g_inet_socket_address_new (address, port_num); if (!g_socket_listener_add_address (server->listener, socket_address, G_SOCKET_TYPE_STREAM, G_SOCKET_PROTOCOL_TCP, NULL, /* GObject *source_object */ &effective_address, error)) { g_object_unref (socket_address); goto out; } if (port_num == 0) /* make sure we allocate the same port number for other listeners */ port_num = g_inet_socket_address_get_port (G_INET_SOCKET_ADDRESS (effective_address)); g_object_unref (effective_address); g_object_unref (socket_address); } for some reason, on my Ubuntu system (bug is not present on Fedora), the resolver returns "127.0.0.1" twice for "localhost". The loop is therefore trying to bind to the same port, twice, on 127.0.0.1.
*** Bug 623306 has been marked as a duplicate of this bug. ***
I've disabled the test as a workaround for now.
Hmm, maybe g_resolver_lookup_by_name() shouldn't return duplicates.. Dan?
GResolver not returning duplicates seems right; in the "usual" GSocketClient-like case, having duplicates would result in us trying to connect to the same host a second time after it failed the first time.
Created attachment 185963 [details] [review] Proposed patch Here's a patch that fixes this. I don't think the O(n^2) algorithm used here is a problem in practice. Thanks for any review.
Comment on attachment 185963 [details] [review] Proposed patch >+/* could make this a public method on GInetAddress */ >+static gboolean >+_g_inet_address_equal (GInetAddress *a, yeah, please do >+ ret = FALSE; >+ if (g_inet_address_get_family (a) != g_inet_address_get_family (b)) >+ goto out; the whole ret/out thing seems unnecessary; there's no cleanup, just return FALSE right away if a check fails >+remove_duplicates (GList **addrs) You'll never remove the first element, so you don't actually need the **. >+ addrs= G_RESOLVER_GET_CLASS (resolver)-> need a space before the =
(In reply to comment #8) > (From update of attachment 185963 [details] [review]) > >+/* could make this a public method on GInetAddress */ > >+static gboolean > >+_g_inet_address_equal (GInetAddress *a, > > yeah, please do OK. > >+ ret = FALSE; > >+ if (g_inet_address_get_family (a) != g_inet_address_get_family (b)) > >+ goto out; > > the whole ret/out thing seems unnecessary; there's no cleanup, just return > FALSE right away if a check fails Meh, mostly a style thing, personally I never use return in the middle of a function ("single entry, single exit"). Either way, it's not my code so I changed it to what to what you want. > >+remove_duplicates (GList **addrs) > > You'll never remove the first element, so you don't actually need the **. I know, I know, but I thought this knowledge was a bit too magical thus making the code more difficult to read than it needs to be (the reader might not initially get it and thus wonder what happens if the first element is removed). I've changed this however, but added a g_warn_if_fail() explaining/verifying things. > >+ addrs= G_RESOLVER_GET_CLASS (resolver)-> > > need a space before the = Fixed. Pushed with these changes as two commits http://git.gnome.org/browse/glib/commit/?id=33515d4eb4175fac70ab42151020336c34bc2083 http://git.gnome.org/browse/glib/commit/?id=4da18247592f1159ee03e414990d2a744aa0a8c7 Thanks, David
More info: apparently this is caused by having 127.0.0.1 localhost ::1 localhost in /etc/hosts, which glibc intentionally does the wrong thing with (http://sources.redhat.com/bugzilla/show_bug.cgi?id=4980), and Fedora worked around the problem by changing /etc/hosts to have: 127.0.0.1 localhost ::1 localhost6 instead (https://bugzilla.redhat.com/show_bug.cgi?id=496300), since arguing with Ulrich is futile. (Related Ubuntu bug: https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/582585. The glibc bug has a patch that could be applied to eglibc to fix it.)
JTFR, I was able to reproduce it simply by having two lines 127.0.0.1 localhost 127.0.0.1 localhost in /etc/hosts