GNOME Bugzilla – Bug 118063
splash screen doesn't disappear if launched app doesn't respond
Last modified: 2004-12-22 21:47:04 UTC
Using gnome-2.4 nightly build from 18 July 2003. .logged in as root or normal user -after logging out of a gnome-session with the 'save current settings' box checked, on relogin the splash screen take a very long time to disappear even thought the session has loaded fully.
I have noticed this on a recent build also. After some investigation the commit to version 1.8 of gnome-session/splash-widget.c seems to trigger the problem. If I back out that change only, then everything returns to normal. However, Havoc is meant to know what he is doing with window hints, etc ... so I am not sure that just reverting the b0rked bit is the correct fix.
Mr. Havoc, sir ? :-)
Malcolm told me about this at OLS, frankly if the patch was just "set _NET_WM_WINDOW_TYPE_SPLASH" I'm pretty skeptical that it causes this bug... very skeptical. Sure there wasn't anything else in that patch? In another file perhaps?
What Havoc kindly didn't mention was that I also volunteered to look at this further. The change I mentioned in my previous comment is all that is required to make the problem *appear* to go away .. you just cvs update to an earlier verision of that one file and the splash screen stops hanging around. However, I suspect that this just makes some other problem invisible. Nevertheless, the HEAD version of splash-widget.c is pretty much unusable unless you enjoy having one desktop rendered useless for 30 seconds to a minute after logging in by a completely unmovable, unkillable window. So I shall poke around a bit more here and try to work out what is going wonky. For reference, the patch, in all its glory was... --- gnome-session/splash-widget.c 2 Apr 2003 07:58:33 -0000 1.7 +++ gnome-session/splash-widget.c 24 Jun 2003 18:47:06 -0000 1.8 @@ -542,7 +542,10 @@ return; global_splash = g_object_new (SPLASH_TYPE_WIDGET, - "type", GTK_WINDOW_POPUP, NULL); + "type_hint", + GDK_WINDOW_TYPE_HINT_SPLASHSCREEN, + NULL); + gtk_window_set_decorated (GTK_WINDOW (global_splash), FALSE); gtk_widget_show_now (GTK_WIDGET (global_splash)); }
Hmm, borderline accessibility bug, would certainly cause a problem for those users but probably no more than for any other. Think I'll remove the keyword for now to get it off our accessibility bug list.
The issue is that splash_stop() is never called. Something in the execution path has changed. It is called from update_save_state in manager.c in a less than solid manner. Adding a splash_stop call further up will make it go away, but the patch below is not quite what you want to do, I'll try to work out something better. /* This is a helper function which makes sure that the save_state variable is correctly set after a change to the save_yourself lists. */ static void update_save_state (void) You just have to love this function and it's leading comment. It has enough levels to make your head spin, half a dozen exit points, depends on as well as manipulates global state and has uncounted side effects one of which is to kill the splash screen, *if* things are lined up a certain way. You might as well flip a coin. Index: manager.c =================================================================== RCS file: /cvs/gnome/gnome-session/gnome-session/manager.c,v retrieving revision 1.78 diff -u -r1.78 manager.c --- manager.c 25 Feb 2003 08:50:09 -0000 1.78 +++ manager.c 3 Aug 2003 15:21:07 -0000 @@ -1165,6 +1165,7 @@ if (pending_list) runlevel = client->priority; } + splash_stop (); if (pending_list) return; }
The thing that messes up gnome-session is that it saves itself in the session file and tries to restart itself, the second gnome-session complains and exits, but causes the state in gnome-session to be such that splash_stop() is never called. Examples of working and broken session files are attached below. Seems like two bugs. 1. gnome-session shouldn't register itself. 2. gnome-session shouldn't screw up if something fails to start. It should be easy, splash_start(), start all applications and call splash_update() for each, splash_stop(), but I can't currently see where in gnome-session we are when we're done starting apps.
Created attachment 18880 [details] Broken session file
Created attachment 18881 [details] Working session file
Woah, deja vu! I spent a while last night looking at this and came to exactly the same conclusion. It was a bit spooky to come into work this morning and find somebody reading my mind. So, in short, I agree with Fredrik's evaluation. I am building an instrumented version of gnome-session to try and confirm where the startup stuff is failing. That solves the overriding problem here. What is not so clear is why gnome-session is being saved in the session file in the first place. I cannot reliably reproduce that. Fredrik: can you? I'll work on fixing the failing to call splash_stop() problem and leave the other part until later.
I suspect that if a client executes correctly, but exits with an error-status without re-registering in X, the client will never be removed from the pending_list until after the 120 seconds when purge() is called. There is no check for that. So pending_list will never be empty and hence we'll never get to splash_stop(). It's some work to fix this. Gnome-session seems to save itself to the session file whenever i check the box in the logout dialog.
Fredrik, After staring at this the gnome-session/gnome-sessionc code until I go cross-eyed, my thinking is that your original simple patch is the correct fix for the persistent splash-screen problem (postponing the issue of why we are trying to start processes things until later). The control flow when we are starting a session is - "stuff happens" in main.c (initialisation) - eventually start_session() is called, which calls start_clients() (in manager.c) - at this point save_state == MANAGER_IDLE, so process_load_request() is called with the clients to start. - process_load_request() does "some stuff" (not relevant here) and then sets save_state to STARTING_SESSION and calls update_save_state() The purpose of this (possibly obvious) explanation is to show that we almost immediately arrive at the fragment where your proposed patch goes and that is the first time the session manager is trying to start up the clients. So after a quick swing through the list trying to start everybody once, it makes sense to call splash_stop() and let the session manager make repeated attempts in the background if it wishes. After this, save_state never returns to STARTING_SESSION, so this code path is not executed again. Have I missed something here? You seem to have some deep knowledge of this stuff; what I've laid out here seems convincing to me, but I'd appreciate a quick whack over the head if it is wrong. We need something here to work around errors, since it turns out to be pretty easy to screw up the .gnome2/session file (based on the poking around I just did over lunch), so it is in no way guaranteed that things will start smoothly. Pretending there are no problems and just quietly continuing to try in the background may be the best solution.
Good morning, seems like I'm pretty much in the opposite time zone :) I'm flattered, but the only thing I feel have a deep understanding of right now is Havoc's call for a brand new gnome-session. I'm starting to get tempted. Moving the flash_stop() call further up increases its likelyhood of ever getting called, but it's not positive since there is (*sigh*) yet another exit point within the while loop itself. It also seems like it's a bit too early for some applications to show up at all on the splash. But then, any other fix is likely to get a lot more complicated. Another way would be to make gnome-session not register itself, keep a note of this, cross ones fingers and hope that other applications won't behave like it for while. It's about as satisfactory. I'm impressed if you've managed to roll up the call chain. Going forward rather than backward makes it a bit clearer though. But update_save_state() will exit more or less randomly depending on the pending_list and I pretty much loose control of what's happening then. The call stack gets wounded up pretty much and I'm not sure yet where and how we get back. Note also that process_load_request() is also called at the bottom of update_save_state() itself depending on request_list which I haven't quite been able to figure out yet, casuing the state to switch back to SESSION_STARTING and the "helper function" update_save_state() called again (awww god, this is cracking me up). Good luck, I'll be looking at this again this evening, that is about six hours from now.
Hmm, I don't really have time to look into this closely but ask yourself this: * What changed ? Havoc made the splash screen be window managed. * What's happening now ? The session manager is being saved as part of the session. * I can understand why the window manager would now be saving the splash screen state as part of its session, but why does this affect the list of clients the session manager saves ? * If you pass in GNOME_CLIENT_PARAM_SM_CONNECT, FALSE to gnome_program_init() in gnome-session/main.c does that help ? If backing out the change to the splash screen entirely fixes this problem, then I'm all for doing that unless it gets fixed soon ... Thanks for looking into this guys ...
> I can understand why the window manager would now be saving the > splash screen state as part of its session, but why does this affect > the list of clients the session manager saves ? There's no way I can think of that it would. Maybe it simply causes some race condition to take hold? > If backing out the change to the splash screen entirely fixes this > problem, then I'm all for doing that unless it gets fixed soon Nah, we shouldn't do that until we're in 2-weeks-from-release kind of mode, if then. The root cause of the problem needs finding. These guys are working on it...
It is now indeed verified that as suspected the non-starting bogus gnome-session gets stuffed into pending_list which prevents splash_stop() from being called. There is a timeout that deals with this list and slowly starting clients, but that messes with clients that actually are (or might be) running and the timeout is set to two minutes. Attached patch is perhaps not the prettiest thing you ever saw but it works. It keeps track of the pids of started clients and a timeout that checks if the client is in the pending_list. If so, and the client is not around any longer as indicated by kill, zap it from the pending_list and call update_save_state(). Note that there are issues. First of all gnome-session also attempts to start remote clients by way of rstart (is anyone really using this?). If one is dependent on this and the client is slow to start up it will be killed from the pending list. First of all I doubt this is used much. Secondly, it doesn't really hurt anything if one is not depending of the attempts to restart the client the other timeout mentioned above is trying. A client that actually is running but is very slow to start up will still make the splash stick on the screen. I'm not sure if this is a problem. To prevent that one might simply kill the splash after some amount of time, but that seems pretty ugly.
Created attachment 18928 [details] [review] Patch to handle clients that exists during startup properly.
> If you pass in GNOME_CLIENT_PARAM_SM_CONNECT, FALSE to > gnome_program_init() in gnome-session/main.c does that help ? It's not that simple since it this way gnome-session detects an already running session manager. If you have a bad session file with gnome-session in it gnome-session will start "looping".
I can't add the patch keyword, someone else can perhaps do that.
Forgot to free the client struct. Adding new patch.
Created attachment 18930 [details] [review] New patch, not leeking memory
I've grabbed the patch and will look over it this evening (morning your time, Fredrik).
Ok, I've been looking a bit on why gnome-session registers itself and it's a bit tricky to find out. It ends up in the live_list somehow. It is pretty important to fix this too since the session_properties dialog is populated from it and selecting gnome-session, remove and apply is a bad idea (it sure removes the session though).
Upping priority/severity on this bug. It's totally high visibility and we can't afford to forget it before release.
Hmm, so this bug actually turns out to be quite straightforward :-) The window isn't override_redirect anymore and WM_STATE is set on the window by metacity, but there's no SM_CLIENT_ID on it, so it appears smproxy deems decides the window should be session managed. I don't really know why, I've forgotten the XSMP and WM_SAVE_YOURSELF details, but I've just comitted a patch to do: gdk_set_sm_client_id ("gnome-session-dummy-id"); before gtk_init() and things seem to work fine. Probably not a great idea to do something like this without researching it properly but ... if anyone knows of a reason why not to do it this way, let me know ... 2003-08-08 Mark McLoughlin <mark@skynet.ie> * main.c: (main): make sure SM_CLIENT_ID gets set to a dummy id so that smproxy doesn't think we need to be session managed. Should fix bug 118063.
Oh, btw this problem won't disappear until you remove gnome-session from you .gnome2/session file.
I was suspecting smproxy, but I had the impression that: 1. we had gotten rid of smproxy 2. it only did anything with windows that had WM_SAVE_YOURSELF both of those probably _should_ be true... ;-)
Reopening, changing title and lowering priority..
Is bug 120921 a duplicate?
Yes and no. It causes the splash screen stall for the reasons discussed in this bug (in the back-and-forth between Fredrik and myself), but it needs a different sort of fix. There was a small thread about this on gnome-list recently when somebody else noticed the same thing. The alternative is "just trust fcrozat", since he filed the other bug and has been monitoring this bug forever. :)
Just a heads up. I am seeing this also with latest 2.4rc built with Garnome. It is interesting coz the very first login from a brand new user account behaves normally. But after dropping out of X and restarting, it displays the problematic behaviour at the 2nd login. From the 3rd login onwards it appears to be fine. Must be some sort of race. Also interestingly, if I wait long enough, the splash screen will disappear (1 or 2 minutes).
Have you done what Mark say in one of his comments: « Oh, btw this problem won't disappear until you remove gnome-session from you .gnome2/session file. » (if you already installed gnome 2.3 before)
Federic: why was this re-opened ?
Because you can still reproduce (gnome 2.4) it when launching an app which doesn't connect to SM manager, when added to session-manual
Hmm, but this bug was specifically about Havoc making the splash screen a managed window and that breaking things ... The details around what exactly the bug you are talking about is completely lost in this bug report ... I'd suggest opening a new bug report specifically for that ...
Closing, the remaining problem is already filled in a separate bug.