GNOME Bugzilla – Bug 575548
Window Manager: I already sent my report...
Last modified: 2009-04-28 10:48:48 UTC
Version: 2.24.1 I already sent my report to seb128@ubuntu.com, ubuntu-desktop@lists.ubuntu.com, joss@debian.org. on Fri Feb 27 13:01:53 2009 but got no response. Package: ii gnome-session 2.24.1-0ubuntu1 The GNOME 2 Session Manager on 64bit ubuntu intrepid. I obviously have worked much in the last 9 days of uptime ;) Thus i stumbled over the problem that the communication filedescriptor (unix socket - other transports not verified) of x-session-manager gnome-session is not close()'d after a the x11 client terminates. After gnome-session has reached 1024 "open files", accept() returns: accept(9, 0x7fff7dae01b0, [4294967406]) = -1 EMFILE (Too many open files) and it does this as fast as it can, which produces > 100% CPU load. From now on, new programs could not be started anymore. The user has to logout (or kill his gnome-session, which leads to the same result). See below how i traced the problem down. Regards, - Thomas Osterried Tests: ------ uptime: 9 days Suddenly, I could not open any new X11 program. And I had 100% CPU load. $ ps ax|grep x-sess 6921 ? Tsl 14:10 x-session-manager $ ls -l /etc/alternatives/x-session-manager /etc/alternatives/x-session-manager -> /usr/bin/gnome-session $ strace -t -p 6921 -e accept 10:44:16 accept(9, 0x7fff7dae01b0, [4294967406]) = -1 EMFILE (Too many open files) 10:44:16 accept(9, 0x7fff7dae01b0, [4294967406]) = -1 EMFILE (Too many open files) 10:44:16 accept(9, 0x7fff7dae01b0, [4294967406]) = -1 EMFILE (Too many open files) 10:44:16 accept(9, 0x7fff7dae01b0, [4294967406]) = -1 EMFILE (Too many open files) $ lsof |grep ^x-sessi |egrep "unix 0x"|wc 1009 8072 102899 Q&D hack: "noise reduction" for the cpu fan (since x-session-manager does not work anymore, we could suspend it - kill -SIGSTOP 6921 ;-) Problem: won't kill him because then the x11 login session terminates. Fix: find the bug. I've closed my 1001 firefox windows, and other programs, but the amount of open files remained the same. Test 1: start many xterms fast ======= $ i=0; while [ $i -lt 1025 ]; do echo $i; ( xterm & ) ; sleep 0.1; i=$(($i+1)) ; +done Some error messages occured. -> X11 hangs. no mouse anymore. had to reboot. Test 2: start many xterms slower ======= $ i=0; while [ $i -lt 1025 ]; do echo $i; ( xterm & ) ; sleep 1; i=$(($i+1)) ; done -> after about 220 xterms: i see the messge "maximum number of clients reached". a few secons later, screensaver (?) went on. mouse was still visable on top of a black screen, flickering a bit. had to reboot. Those two problems are most probably other issues which may be descussed later. Test 3: trace of the gnome-session bug. ======= 3 xterms: 1. strace -e accept -p 6520 (pid of x-session-manager) 2. while true;do lsof |grep x-sess|grep unix|wc;sleep 2;done 3. i=0; while [ $i -lt 1025 ]; do echo $i; xterm -e "echo $i ; sleep 1" ; sleep 1.5; i=$(($i+1)) ; done # this opens an xterm which displays the sequence number and terminates after 1 second afterwards Now eat a pizza and come back. The results: 1.: after accept() = ..., accept() = 1023, accept() = 1024: accept(9, 0x7fffa17e17a0, [4294967406]) = -1 EMFILE (Too many open files) 2.: 1004 8032 99405 1005 8040 99504 1006 8048 99603 1008 8064 99801 1009 8072 99900 1009 8072 99900 1009 8072 99900 1009 8072 99900 [..] 3.: 997 998 999 [waits forever] Distribution: Ubuntu 8.10 (intrepid) Gnome Release: 2.24.1 2008-10-24 (Ubuntu) BugBuddy Version: 2.24.1
This sounds a lot like bug 563354. Can you reproduce this with gnome-session 2.24.3?
(fwiw, your test case of starting lots of clients is more or less https://bugs.freedesktop.org/show_bug.cgi?id=2920 -- it's definitely not a gnome-session bug)
Hello Vincent, thank you for reviewing my report. Unfortunately, I could not test the problem with gnome-session 2.24.3, because i assume that the dependencies are quite heavy, and I do not like to disturb my production machine. Today I had the chance to test the gnome-session problem in another environment. Perhaps this helps more. But first please let me emphasize, that the problem I reported was not kind of at brute force system stress test: it occurred during normal (but heavy) computer usage. I had to debug some X11 programs, and started and closed them very often. And suddenly, I was not able to display any new program anymore - and then looked where the problem was. Even if it's not a gnome-session-bug, it's undoubtful, that the number of open unix-sockets (gnome-session or one of it's libraries handles) does only increase, but never decreases. I'd be very interested, if this is also true for your environment: $ lsof |grep "^gnome-ses "|grep "unix 0x" |wc 16 128 1575 Now is start an xterm (an application which does not currently run). $ xterm & [1] 8339 $ lsof |grep "^gnome-ses "|grep "unix 0x" |wc 17 136 1673 $ Now i close the xterm. [1]+ Done xterm $ lsof |grep "^gnome-ses "|grep "unix 0x" |wc 17 136 1673 $ To do the start and termination automatically, i documented "Test 3" with i=0; while [ $i -lt 1025 ]; do echo $i; xterm -e "echo $i ; sleep 1" ; sleep 1.5; i=$(($i+1)) ; done This does not display 1024 xterm's in parallel. It starts an xterm, displays it, then it terminates. Then it starts another one. Until the max. number of files is reached and the problem occurs. Now to my new test scenario from today: I started X :1 & from a console and did export DISPLAY=:1 I started an xterm. And i startet x-window-manager by hand, which is actually /etc/alternatives/x-window-manager -> /usr/bin/metacity. I searched for my gnome-session process, did an lsof, and counted the open unix-sockets. Then i startet an xclock. The number of open unix-sockets remained the same(!). Then i closed an xclock. The number of open unix sockets also remained the same. -> no bug triggered here I _assume_, that gnome-session behaves different, if there's already a window-manager running _before_gome-session starts. If it's not a gnome-session bug, then it may belong to libICE (just guessing..) or to the interaction with gnome-session and i.e. libICE. Nevertheless where we'll find the cause of the bug, I still consider it buggy, that if a user really has nearly 1024 programs open and the max number of open file descriptors (due to the number of unix-socket connections) is reached, that then the accept()-call of gnome-session (or it's library?) is called as fast as it could, which eats 100% CPU time. It should be rate-limited in that case. I know that select() on the unix-socket indicates that an event has to be handled. But accept() immediately returns -1 due to the exhausted number of file descriptors. The next select() call shows that there still the incoming session to be handled (it does not get be cleared). This goes on and on.. Ratelimiting this to i.e. 100ms is the only chance i could think of. Kindly regards, - Thomas
Just a feedback: I've upgraded to ubuntu jaunty. jaunty comes with 2.26.0svn20090408-0ubuntu2. The problem does not occur there (i checked it: the number of open files increases and decreases with the amount of started / terminated X11 applications).
Thomas: thanks. I guess I was probably right in comment #1, then ;-)