GNOME Bugzilla – Bug 701105
glib/gdbus-peer.test is racy
Last modified: 2018-05-24 15:20:58 UTC
I actually caught this in gdb, but accidentally lost the stack trace to a git clean -dfx. Basically somehow, the close-on-disconnect flag is being triggered for the peer-to-peer connections. But this really should not happen, because the exit-on-disconnect flag is off by default. I think it's some sort of thread race in connection initialization. Will attach another stack trace here when I get one.
David, can you explain this test? Background info: the gnome-ostree CI system is now running all of the GLib tests *constantly*, and this is turning up as a very racy test. The symptom we're seeing is that the exit-on-close is triggering and killing the test. <mclasen> looking at gdbus-peer.c, test_overflow is actually setting exit-on-close <walters> i hadn't caught that, but it's a different test... <mclasen> and its using timeouts all over the place... <walters> unless <mclasen> after taking out test_overflow, I haven't reproduced the error yet, locally <walters> yes, you're right <mclasen> could just move that to its own file and take it out of the installed tests * mclasen goes to do so <walters> let's look at this a little more <walters> one sec <mclasen> ok <walters> first question: why are we setting exit on close? <mclasen> no idea
a little further down: <mclasen> walters: the code in gdbus-private.c that produces the disconnection says: <mclasen> /* TODO: hmm, hmm... */ <mclasen> if (bytes_read == 0) { ...} [...] <walters> mclasen, i dunno what that TODO is about, but i think it's expected we get 0 bytes when the other side exits <mclasen> well, whats happening is that the reader gets 0 bytes <mclasen> which causes it to disconnect <mclasen> which in turn ends the connection on the producer side <mclasen> triggering the exit-on-disconnect
I've split the overflow test off into its own binary and taken it out of the installed tests run in ostree for now.
Actually, it's not clear to me exactly what the problem is... I can't decipher what 'glib/gdbus-peer.test' means. Is this referring to a C file or a test? Stack trace, debug output (with G_DBUS_MESSAGE etc) would be helpful... As a drive-by reaction, I don't see how exit-on-disconnect can be leaked from the overflow test case as it is only using private connections, never shared objects. Matthias: FWIW, I don't think it's a good idea to move tests around like this - you're basically destroying git history and in this case makes it harder to figure out who to blame. In this case I guess it works as people from the future will see your name in the blame history and not mine. Works for me :-)
(In reply to comment #4) > Actually, it's not clear to me exactly what the problem is... I can't decipher > what 'glib/gdbus-peer.test' means. Is this referring to a C file or a test? It's this code: https://git.gnome.org/browse/glib/tree/gio/tests/gdbus-overflow.c That formerly lived in gdbus-peer.c > Stack trace, debug output (with G_DBUS_MESSAGE etc) would be helpful... I'll try to get one. > As a drive-by reaction, I don't see how exit-on-disconnect can be leaked from > the overflow test case as it is only using private connections, never shared > objects. Possibly a timing issue with: g_object_unref (consumer); g_object_unref (producer); Where the producer's connection is actually closed first? I'll see if I can get a stack trace.
(In reply to comment #5) > It's this code: > https://git.gnome.org/browse/glib/tree/gio/tests/gdbus-overflow.c > > That formerly lived in gdbus-peer.c I actually think Simon McVittie (added to Cc) wrote that code - that's what my foggy memory tells me (but it could be wrong), also the naming of variables (n_foo vs num_foo) tells me I probably didn't write it. I don't have a shell handy right now but if I had, I'd run git blame on it. Of course that may not tell you anything conclusive...
I don't remember writing that, and git blame agrees.
(In reply to comment #7) > I don't remember writing that, and git blame agrees. OK. Thanks for checking.
Ok, this test still occasionally fails even without the overflow test in the mix. Looks like memory corruption: (/usr/libexec/installed-tests/glib/gdbus-peer:6143): GLib-GObject-WARNING **: invalid unclassed pointer in cast to 'GDBusServer'
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/705.