After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 683755 - DEBUG(+): Lost GDM name on bus
DEBUG(+): Lost GDM name on bus
Status: RESOLVED NOTGNOME
Product: gdm
Classification: Core
Component: general
3.5.x
Other Linux
: Normal critical
: ---
Assigned To: GDM maintainers
GDM maintainers
Depends on:
Blocks:
 
 
Reported: 2012-09-10 23:10 UTC by Tom Wijsman
Modified: 2012-09-18 15:05 UTC
See Also:
GNOME target: ---
GNOME version: 3.5/3.6


Attachments
Output of stracing gdm. (45.71 KB, text/plain)
2012-09-12 19:51 UTC, Tom Wijsman
Details
Output from the messages log that's related to gdm's run. (2.24 KB, text/plain)
2012-09-12 19:54 UTC, Tom Wijsman
Details
Contents of /etc/dbus-1/system.d/ (1.89 KB, text/plain)
2012-09-12 21:04 UTC, Tom Wijsman
Details
Gzipped tar archive of the files in /etc/dbus-1/system.d (4.03 KB, application/x-compressed-tar)
2012-09-12 21:09 UTC, Tom Wijsman
Details
Output of stracing gdm with additional -s 999 parameter. (86.61 KB, text/plain)
2012-09-12 21:23 UTC, Tom Wijsman
Details

Description Tom Wijsman 2012-09-10 23:10:10 UTC
It appears GDM can't start on my installation because of the above error.
This error seems part of the DBus related changes that have been done to GDM.

As far as I can interpret it, the DBus side seems to go fine:

--- Output of `dbus-monitor --system --monitor` ---

signal sender=org.freedesktop.DBus -> dest=:1.0 serial=2 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameAcquired
   string ":1.0"
signal sender=org.freedesktop.DBus -> dest=(null destination) serial=7 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameOwnerChanged
   string ":1.1"
   string ""
   string ":1.1"
signal sender=org.freedesktop.DBus -> dest=(null destination) serial=8 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameOwnerChanged
   string "org.gnome.DisplayManager"
   string ""
   string ":1.1"

---------------------------------------------------

However, GDM disconnects from DBus after showing DEBUG(+): Lost GDM name on bus.

So, I feel there's something wrong on the GDM side, probably due to one of code changes that happened since 3.4.

The above bug happens on 3.5.91 as well as the latest upstream dev commit.

Currently need to run my gnome-session with `startx`, which is not pleasant...

Hope to see this resolved, thanks in advance!
Comment 1 Ray Strode [halfline] 2012-09-12 18:55:54 UTC
is this with fedora?

yum resintall polkit systemd

may fix it if so

(see http://fedoraproject.org/wiki/Upgrading_Fedora_using_yum#Fedora_17_-.3E_Fedora_18 )
Comment 2 Tom Wijsman 2012-09-12 19:18:52 UTC
Platform shouldn't really matter, but this is with Gentoo.

`ps axjf` tells me:

    1 4665 1738 1738 ? -1 Sl 105 0:00 /usr/lib/polkit-1/polkitd --no-debug

Polkit is succesfully running.

    polkit-gnome 0.105
    polkit 0.107

I'm using udev 189 and openrc 0.10.5 and wish to avoid systemd, which is not a dependency as far as I can see (and worked without that on 3.2 and 3.4). Furthermore, I don't see what above error (Lost GDM name on bus) has to do with polkit and systemd at all.

Also, note that I'm not seeing the spinner which is the opposite of what is described in that article, it just stays in TTY1 and outputs me that debugging is enabled and the user has been changed after which it lets me continue using the console (because gdm closed itself), then I can find the above error (Lost GDM name on bus) in /var/log/messages.

This bug got introduced by a change in gdm; if you believe that one of those two services that you mention is the problem, then why doesn't gdm manage to tell me so and then why is the code where this error comes from not related to one of these two services? Could you share some light on that?
Comment 3 Ray Strode [halfline] 2012-09-12 19:24:01 UTC
sounds like comment 1 was off base.

Can you edit /etc/gdm/custom.conf and put "Enable=true" in the [debug] section, reproduce the problem then attach /var/log/gdm/\:0-slave.log and /var/log/gdm/\:0-greeter.log ?
Comment 4 Tom Wijsman 2012-09-12 19:49:56 UTC
It doesn't get to the point where it writes to /var/log/gdm or /var/gdm, both directories are empty apart from the first directory containing .keep_gnome-base_gdm-0. These two directories were made properly after removing them and then reinstalling gdm.

In the upcoming two attachments you'll find the outputs of:

    strace -f -y gdm

    grep gdm /var/log/messages
Comment 5 Tom Wijsman 2012-09-12 19:51:22 UTC
Created attachment 224140 [details]
Output of stracing gdm.
Comment 6 Tom Wijsman 2012-09-12 19:54:44 UTC
Created attachment 224142 [details]
Output from the messages log that's related to gdm's run.
Comment 7 Tom Wijsman 2012-09-12 19:57:47 UTC
Part of the output of `ps axjf` related to gdm:

   1  4564 4564 4564 tty1 4669 Ss   0 0:00 /bin/login --     
4564  4669 4669 4564 tty1 4669 S+   0 0:00  \_ -bash
4669  9797 9797 4564 tty1 4669 T    0 0:00      \_ strace -f -y -oGDMDEBUG gdm
   1  4585 1733 1733 ?      -1 Sl   0 0:00 /usr/sbin/console-kit-daemon --no-daemon
   1  4652 1733 1733 ?      -1 Sl 105 0:00 /usr/lib/polkit-1/polkitd --no-debug
   1  9805 9805 9805 ?      -1 Ss   0 0:00 gdm
Comment 8 Tom Wijsman 2012-09-12 20:02:41 UTC
I should correct what I earlier said, GDM keeps running. Just doesn't do anything but spitting out debug messages...
Comment 9 Ray Strode [halfline] 2012-09-12 20:56:25 UTC
I'm guessing the "Lost name" message is actually "Couldn't get name".  Are the revelant gdm dbus policy files properly installed in /etc/dbus-1/system.d ?

can you add a -s 999 to your strace so we can see more of the buffers?
Comment 10 Tom Wijsman 2012-09-12 21:03:25 UTC
Attached etc.dbus-1.system.d file is the output of 

    ls -la /etc/dbus-1/system.d/

I will also attach individual files I think might be related.

Will do the more detailed strace later on.
Comment 11 Tom Wijsman 2012-09-12 21:04:11 UTC
Created attachment 224149 [details]
Contents of /etc/dbus-1/system.d/
Comment 12 Tom Wijsman 2012-09-12 21:09:11 UTC
Created attachment 224150 [details]
Gzipped tar archive of the files in /etc/dbus-1/system.d

Decided to just share them all, to speed up the process as I'm not experienced enough with that folder yet to know which files matter and which don't.
Comment 13 Tom Wijsman 2012-09-12 21:23:51 UTC
Created attachment 224152 [details]
Output of stracing gdm with additional -s 999 parameter.

From what I see, it tells that it doesn't get the name from dbus due to a timeout. Although that you can see earlier in the trace that some succesful talking has happened, it acquires a name early on (succesful with response) but at a later request for a name it seems to fail. I think this strace has enough information to get an idea what's really going on...
Comment 14 Tom Wijsman 2012-09-12 21:39:13 UTC
If it helps, you can easily see the talk between both if you do a CTRL+F on "msg(4" (without the quotes) or grep that.

Doing so reveals the following:

1. New socket connection gets made from gdm to dbus.
2. gdm negotiates with dbus and says hello.
3. dbus shakes hands and tells gdm that he acquired ":1.4".
4. gdm does a request for the name "org.gnome.DisplayManager".

    5958  sendmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"l\1\0\1$\0\0\0\2\0\0\0}\0\0\0\10\1g\0\2su\0\1\1o\0\25\0\0\0/org/freedesktop/DBus\0\0\0\3\1s\0\v\0\0\0RequestName\0\0\0\0\0\2\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\6\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\30\0\0\0org.gnome.DisplayManager\0\0\0\0\0\0\0\0", 180}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 180

5. dbus never answers to that, I think it appears to timeout.
Comment 15 Tom Wijsman 2012-09-12 22:10:02 UTC
Okay, got someone from #dbus to read it, let me rewrite:

5. It appers that dbus does answer, as evidenced by my first post and the poll telling us that file descriptor 4 has a message.

5958  poll([{fd=5<anon_inode:[eventfd]>, events=POLLIN}, {fd=4<socket:[10352]>, events=POLLIN}, {fd=7<anon_inode:[eventfd]>, events=POLLIN}], 3, -1 <unfinished ...>
5961  setsid( <unfinished ...>7
5958  <... poll resumed> )              = 2 ([{fd=5, revents=POLLIN}, {fd=4, revents=POLLIN}])

6. gdm never responds to that poll and never reads out what's on the socket.

Bottom line of the story: gdm requests a name but doesn't check out the response.
Comment 16 Tom Wijsman 2012-09-12 22:51:18 UTC
Hmm... Might this be a bug in the gio lib of glib? It's in fact the one that calls the name_lost callback, here is the relevant patch where the call to this got introduced:

https://mail.gnome.org/archives/commits-list/2012-July/msg04420.html
Comment 17 Tom Wijsman 2012-09-17 23:59:20 UTC
As discussed on IRC, comments 14 to 16 can be ignored as they are not the core issue.

I'll describe here how far I understand things are happening:

1. There are two threads: 5975 (main thread), 5958 (dbus thread)

2. At some point the main thread does a clone() to 5961, which seems to continue main.

3. After the cloning finished, it calls exit_group(0).

4. exit_group(0) closes both 5975 and 5958.

5. Because 5958 closes, the response (indicated by the poll) isn't read anymore.

6. 5961 continues for some time, tries to start gdm but fails because it doesn't know the name was acquired.

I've gone line by line through the last lines before the exit in the strace here, documenting what each does: https://gist.github.com/3740495 

The things I don't understand are:

1. Why does the main thread clone itself and continue on 5961?

2. Where is the exit_group(0) call actually happening in code?

I think that if we figure out why the thread clones and where the exit_group call originates from we'd be able to get to the offending code.
Comment 18 Ray Strode [halfline] 2012-09-18 14:15:32 UTC
(In reply to comment #17)

> 1. Why does the main thread clone itself and continue on 5961?
Me either.  Are you using any distro specific patches to make it daemonize? 
Maybe something to make it interoperate with openrc better, but isn't completely correct?

> 2. Where is the exit_group(0) call actually happening in code?
return from main() implicitly causes exit_group, so that's probably where it's coming from.
Comment 19 Ray Strode [halfline] 2012-09-18 14:18:38 UTC
I also see in the strace this:

5961  setsid( <unfinished ...>
5958  <... poll resumed> )              = 2 ([{fd=5, revents=POLLIN}, {fd=4, revents=POLLIN}])
5961  <... setsid resumed> )            = 5961
We don't call setsid() from the main proces, so I'm suspecting a vendor patch at this point.
Comment 20 Ray Strode [halfline] 2012-09-18 15:01:10 UTC
<halfline> tomwij: hey just posted a comment to your bug
<halfline> do you have any vendor patches?
<halfline> looking at the strace, it really looks like there is code to try to daemonize the main process, which doesn't exist in the upstream repository
<tomwij> halfline: Here now.
<tomwij> I'll take a look at the ebuild.
<halfline> well 40 sec latency doesn't require a "now" modifier :-)
<tomwij>         # daemonize so that the boot process can continue, bug #236701
<tomwij>         epatch "${FILESDIR}/${PN}-3.5.91-fix-daemonize-regression.patch"
<tomwij> Sounds like it.
<tomwij> halfline: http://bpaste.net/show/46198/
<tomwij> So, disable that patch?
* halfline looks
<halfline> well it sounds like the patch is needed
<halfline> from the summary
<halfline> so probably should fix it instead of disable it
<tomwij> https://bugs.gentoo.org/show_bug.cgi?id=236701 --> https://bugzilla.gnome.org/show_bug.cgi?id=550170 --> http://bugzilla-attachments.gnome.org/attachment.cgi?id=120573&action=view
<halfline> tomwij: unless you can fix it from the boot script
<tomwij> Seems like an adaption of that patch.
<tomwij> halfline: What about not terminating the parent?
<halfline> no that won't help
<halfline> the "easy fix" is to move the daemonify call higher up
<tomwij> Okay, perhaps putting the deamonize before g_type_init?
<halfline> yea
<halfline> though libc already provides a call to do daemonify()
<halfline> called just daemon()
<halfline> so wouldn't need that function
<tomwij> Could do, I suppose that would make the dbus thread on the daemon and not on the original process.
<halfline> right, you can't really do anything after fork()
<halfline> but execve or exit
<halfline> or small i/o
<halfline> calling into a library functions will cause lost of problems
<halfline> unless you call fork() before you've done anything else
<halfline> basically most things don't know they've been fork()'d and lots of confusion can result from that
<halfline> now the daemon() call (and the provided daemonify() call) also have problems
<halfline> see man 7 daemon
<halfline> what we really need is a two part process
<halfline> where we fork immediately before doing anything else
<halfline> and then have the parent wait
<halfline> it can't do anything but wait
<halfline> and then when the child has got on the bus then the parent can exit
<tomwij> Tricky
<tomwij> Do I need to process the nodaemon argument?
<halfline> we're at a bit of a en passe there
<halfline> upstream doesn't daemonize by default
<halfline> we aren't going to change that (see the linked bug)
<halfline> downstream expects us to be compatible with XDM but we aren't
<halfline> so this is probably going to have to be a downstream patch
<halfline> since being compatible with XDM is a non-goal upstream
<tomwij> So, I should file a bug againt Gentoo about this, I suppose?
<halfline> yea, i think so
<halfline> I can help you write a patch that works better if you like
<halfline> or you could just go for the easy approach and move the daemonify() up
<tomwij> Well, I'm going to try moving the daemonify up.
<halfline> okay
<halfline> moving it up above the g_type_init() is the best place
<halfline> tomwij: you could also patch /usr/sbin/gdm (which is a shell script) to do exec setsid /usr/sbin/gdm-binary "$@"
<halfline> instead of just exec /usr/sbin/gdm-binary "$@"
<tomwij> What will that do?
<halfline> the equivalent of the C patch
<halfline> but using a unix utility called setsid
<tomwij> setsid - run a program in a new session
<tomwij> Hmm, I see, that reaches the same effect but without actually messing around with threads (thus, could work better).
<tomwij> I've done the C patch now
<tomwij> Going to reboot and see if it works
<-- tomwij has quit (Lost terminal)
<halfline> the appeal of using setsid is that the distro patch is confined to messing with peripheral shell script code
<halfline> instead of modifying source
--> tomwij (~tomwij@94-226-55-127.access.telenet.be) has joined #gdm
<tomwij> halfline: That brings me to a black screen with a rotating cursor.
<tomwij> Which I suppose is the typical bug others are experiencing.
<tomwij> halfline: Here's the :0-slave.log file: http://bpaste.net/show/46200/
<tomwij> gdm-launch-environment][4691]: DEBUG(+): GdmSessionWorker: authenticating user gdm
<tomwij> gdm-launch-environment][4691]: DEBUG(+): GdmSessionWorker: authentication returned 7: Authentication failure
<tomwij> I think this is where it goes wrong.
<tomwij> Or could it be: gdm-launch-environment][4691]: AccountsService-DEBUG(+): Failed to identify the current session: GDBus.Error:org.freedesktop.ConsoleKit.Manager.GeneralError: Unable to lookup session information for process '4691'
<halfline> you're right
<halfline> so you probably have a file called /etc/pam.d/gdm-welcome
<halfline> that needs to be renamed to /etc/pam.d/gdm-launch-environment
<halfline> then things should wokr
<halfline> *work
<tomwij> Exactly, gdm-welcome
<tomwij> How did you figure that out? :D
<halfline> we used to call the file gdm-welcome
<halfline> and dowstreams need to change it fit their pam config
<halfline> actually we encourage pam configs to go upstream now
<halfline> to prevent this sort of thing from happening, but gentoo isn't upstream yet
<tomwij> Yeah 2.20.11 2.20.11-r1 [M](~)2.32.1 [M](~)2.32.1-r1 (~)3.2.1.1-r1 (~)3.2.1.1-r2 (~)3.4.1 (~)3.4.1-r1[2] **3.5.91[2] **9999[2]
<tomwij> Nothing in front is stable. [M] is hard masked, means to not use the version. (~) is test versions and ** is experimental.
<tomwij> They still gotta do the 2 --> 3 merge on stable.
<tomwij> Okay, moved the file, gonna try again.
<-- tomwij has quit (Lost terminal)
--> tomwij (~tomwij@94-226-55-127.access.telenet.be) has joined #gdm
<tomwij> halfline: Yeah, had to get rid of a patch for gnome-shell which replaced their lock screen. Resolved, succesful start and login... :)
<halfline> yay \o/
<halfline> okay i'm going to close your bug NOTGNOME
Comment 21 Tom Wijsman 2012-09-18 15:03:07 UTC
Thanks for pointing this out, I haven't taken patches into thought and will do so in the future.

The solution was:

1. Move daemonization of gdm in the vendor patch before anything else.

2. `mv /etc/pam.d/gdm-welcome /etc/pam.d/gdm-launch-environment`

3. Get rid of a patch in gnome-shell (that I introduced because of no gdm).

Thank you very much!