After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 631379 - GDBus nonce-tcp test failing
GDBus nonce-tcp test failing
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: gdbus
unspecified
Other All
: Normal normal
: ---
Assigned To: David Zeuthen (not reading bugmail)
gtkdev
: 623306 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2010-10-05 05:05 UTC by Allison Karlitskaya (desrt)
Modified: 2011-04-14 17:29 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Proposed patch (5.58 KB, patch)
2011-04-14 15:31 UTC, David Zeuthen (not reading bugmail)
reviewed Details | Review

Description Allison Karlitskaya (desrt) 2010-10-05 05:05:24 UTC
ERROR:gdbus-peer.c:1141:nonce_tcp_service_thread_func: assertion failed (error == NULL): Error binding to address: Address already in use (g-io-error-quark, 33)

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff638c710 (LWP 9968)]
0x00007ffff701eba5 in raise (sig=<value optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
	in ../nptl/sysdeps/unix/sysv/linux/raise.c
(gdb) thread apply all bt

Thread 1 (Thread 0x7ffff7fd9700 (LWP 9960))

  • #0 sched_yield
    at ../sysdeps/unix/syscall-template.S line 82
  • #1 test_nonce_tcp
    at gdbus-peer.c line 1192
  • #2 test_case_run
    at gtestutils.c line 1174
  • #3 g_test_run_suite_internal
    at gtestutils.c line 1223
  • #4 g_test_run_suite_internal
    at gtestutils.c line 1233
  • #5 g_test_run_suite
    at gtestutils.c line 1282
  • #6 main
    at gdbus-peer.c line 1465

Comment 1 Allison Karlitskaya (desrt) 2010-10-05 05:10:03 UTC
From strace, it appears to be creating a socket, binding it to a random port (port 0), using getsockname() on that socket and then trying to bind to the same port number for a second time.

The port number is different each time (randomly chosen by the kernel, I guess).
Comment 2 Allison Karlitskaya (desrt) 2010-10-05 05:49:06 UTC
the problem is with this loop in gdbusserver.c:

  resolved_addresses = g_resolver_lookup_by_name (resolver,
                                                  host,
                                                  NULL,
                                                  error);
  if (resolved_addresses == NULL)
    goto out;

  /* TODO: handle family */
  for (l = resolved_addresses; l != NULL; l = l->next)
    {
      GInetAddress *address = G_INET_ADDRESS (l->data);
      GSocketAddress *socket_address;
      GSocketAddress *effective_address;

      socket_address = g_inet_socket_address_new (address, port_num);
      if (!g_socket_listener_add_address (server->listener,
                                          socket_address,
                                          G_SOCKET_TYPE_STREAM,
                                          G_SOCKET_PROTOCOL_TCP,
                                          NULL, /* GObject *source_object */
                                          &effective_address,
                                          error))
        {
          g_object_unref (socket_address);
          goto out;
        }
      if (port_num == 0)
        /* make sure we allocate the same port number for other listeners */
        port_num = g_inet_socket_address_get_port (G_INET_SOCKET_ADDRESS (effective_address));

      g_object_unref (effective_address);
      g_object_unref (socket_address);
    }

for some reason, on my Ubuntu system (bug is not present on Fedora), the resolver returns "127.0.0.1" twice for "localhost".  The loop is therefore trying to bind to the same port, twice, on 127.0.0.1.
Comment 3 Allison Karlitskaya (desrt) 2010-10-19 11:40:45 UTC
*** Bug 623306 has been marked as a duplicate of this bug. ***
Comment 4 Allison Karlitskaya (desrt) 2010-10-19 11:46:25 UTC
I've disabled the test as a workaround for now.
Comment 5 David Zeuthen (not reading bugmail) 2010-10-19 14:51:26 UTC
Hmm, maybe g_resolver_lookup_by_name() shouldn't return duplicates.. Dan?
Comment 6 Dan Winship 2010-10-19 15:31:35 UTC
GResolver not returning duplicates seems right; in the "usual" GSocketClient-like case, having duplicates would result in us trying to connect to the same host a second time after it failed the first time.
Comment 7 David Zeuthen (not reading bugmail) 2011-04-14 15:31:02 UTC
Created attachment 185963 [details] [review]
Proposed patch

Here's a patch that fixes this. I don't think the O(n^2) algorithm used here is a problem in practice. Thanks for any review.
Comment 8 Dan Winship 2011-04-14 16:20:17 UTC
Comment on attachment 185963 [details] [review]
Proposed patch

>+/* could make this a public method on GInetAddress */
>+static gboolean
>+_g_inet_address_equal (GInetAddress *a,

yeah, please do

>+  ret = FALSE;
>+  if (g_inet_address_get_family (a) != g_inet_address_get_family (b))
>+    goto out;

the whole ret/out thing seems unnecessary; there's no cleanup, just return FALSE right away if a check fails

>+remove_duplicates (GList **addrs)

You'll never remove the first element, so you don't actually need the **.

>+  addrs= G_RESOLVER_GET_CLASS (resolver)->

need a space before the =
Comment 9 David Zeuthen (not reading bugmail) 2011-04-14 16:52:20 UTC
(In reply to comment #8)
> (From update of attachment 185963 [details] [review])
> >+/* could make this a public method on GInetAddress */
> >+static gboolean
> >+_g_inet_address_equal (GInetAddress *a,
> 
> yeah, please do

OK.

> >+  ret = FALSE;
> >+  if (g_inet_address_get_family (a) != g_inet_address_get_family (b))
> >+    goto out;
> 
> the whole ret/out thing seems unnecessary; there's no cleanup, just return
> FALSE right away if a check fails

Meh, mostly a style thing, personally I never use return in the middle of a function ("single entry, single exit"). Either way, it's not my code so I changed it to what to what you want.

> >+remove_duplicates (GList **addrs)
> 
> You'll never remove the first element, so you don't actually need the **.

I know, I know, but I thought this knowledge was a bit too magical thus making the code more difficult to read than it needs to be (the reader might not initially get it and thus wonder what happens if the first element is removed). I've changed this however, but added a g_warn_if_fail() explaining/verifying things.

> >+  addrs= G_RESOLVER_GET_CLASS (resolver)->
> 
> need a space before the =

Fixed.

Pushed with these changes as two commits

http://git.gnome.org/browse/glib/commit/?id=33515d4eb4175fac70ab42151020336c34bc2083
http://git.gnome.org/browse/glib/commit/?id=4da18247592f1159ee03e414990d2a744aa0a8c7

Thanks,
David
Comment 10 Dan Winship 2011-04-14 17:05:24 UTC
More info: apparently this is caused by having

  127.0.0.1 localhost
  ::1 localhost

in /etc/hosts, which glibc intentionally does the wrong thing with (http://sources.redhat.com/bugzilla/show_bug.cgi?id=4980), and Fedora worked around the problem by changing /etc/hosts to have:

  127.0.0.1 localhost
  ::1 localhost6

instead (https://bugzilla.redhat.com/show_bug.cgi?id=496300), since arguing with Ulrich is futile.

(Related Ubuntu bug: https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/582585. The glibc bug has a patch that could be applied to eglibc to fix it.)
Comment 11 David Zeuthen (not reading bugmail) 2011-04-14 17:29:11 UTC
JTFR, I was able to reproduce it simply by having two lines

 127.0.0.1 localhost
 127.0.0.1 localhost

in /etc/hosts