After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 575548 - Window Manager: I already sent my report...
Window Manager: I already sent my report...
Status: RESOLVED FIXED
Product: gnome-session
Classification: Core
Component: general
2.24.x
Other All
: Normal normal
: ---
Assigned To: Session Maintainers
Session Maintainers
Depends on:
Blocks:
 
 
Reported: 2009-03-16 12:50 UTC by thomas
Modified: 2009-04-28 10:48 UTC
See Also:
GNOME target: ---
GNOME version: 2.23/2.24



Description thomas 2009-03-16 12:50:00 UTC
Version: 2.24.1

I already sent my report to seb128@ubuntu.com, ubuntu-desktop@lists.ubuntu.com,         
joss@debian.org. on Fri Feb 27 13:01:53 2009 but got no response.

Package: ii  gnome-session                              2.24.1-0ubuntu1         The GNOME 2 Session Manager                                                    
on 64bit ubuntu intrepid.                                                       
                                                                                
I obviously have worked much in the last 9 days of uptime ;) Thus               
i stumbled over the problem that the communication filedescriptor               
(unix socket - other transports not verified) of x-session-manager              
gnome-session is not close()'d after a the x11 client terminates.               
                                                                                
After gnome-session has reached 1024 "open files", accept() returns:            
  accept(9, 0x7fff7dae01b0, [4294967406]) = -1 EMFILE (Too many open files)     
and it does this as fast as it can, which produces > 100% CPU load.             
                                                                                
From now on, new programs could not be started anymore. The user has to        
logout (or kill his gnome-session, which leads to the same result). 
See below how i traced the problem down.                                        
                                                                                
Regards,                                                                        
                                                                                
        - Thomas Osterried                                                      
                                                                                
Tests:                                                                          
------                                                                          
                                                                                
uptime: 9 days                                                                  
                                                                                
Suddenly, I could not open any new X11 program. And I had 100% CPU load.        
                                                                                
$ ps ax|grep x-sess                                                             
 6921 ?        Tsl   14:10 x-session-manager                                    
                                                                                
$ ls -l /etc/alternatives/x-session-manager                                     
/etc/alternatives/x-session-manager -> /usr/bin/gnome-session                   
                                                                                
                                                                                
$ strace -t -p 6921 -e accept                         
                                                                                
10:44:16 accept(9, 0x7fff7dae01b0, [4294967406]) = -1 EMFILE (Too many open files)                                                                         
10:44:16 accept(9, 0x7fff7dae01b0, [4294967406]) = -1 EMFILE (Too many open files)                                                                         
10:44:16 accept(9, 0x7fff7dae01b0, [4294967406]) = -1 EMFILE (Too many open files)                                                                         
10:44:16 accept(9, 0x7fff7dae01b0, [4294967406]) = -1 EMFILE (Too many open files)                                                                         
                                                                                
$ lsof |grep ^x-sessi |egrep "unix 0x"|wc                                       
   1009    8072  102899                                                         
                                                                                
                                                                                
Q&D hack: "noise reduction" for the cpu fan (since x-session-manager            
does not work anymore, we could suspend it - kill -SIGSTOP 6921 ;-)             
                                                                                
Problem: won't kill him because then the x11 login session terminates.          
                                                                                
Fix: find the bug. I've closed my 1001 firefox windows, and other               
programs, but the amount of open files remained the same.                       
                                                                                
                                                                                
Test 1: start many xterms fast                                                  
=======                                                                         
                                                                                
$ i=0; while [ $i -lt 1025 ]; do echo $i; ( xterm & ) ; sleep 0.1; i=$(($i+1)) ;
+done                                                                           
Some error messages occured.                                                    
-> X11 hangs. no mouse anymore. had to reboot.                                  
                                                                                
Test 2: start many xterms slower                                                
=======                                                                         
                                                                                
$ i=0; while [ $i -lt 1025 ]; do echo $i; ( xterm & ) ; sleep 1; i=$(($i+1)) ;  done                                                                           
                                                                                
-> after about 220 xterms: i see the messge "maximum number of clients reached".
                                                                                
a few secons later, screensaver (?) went on. mouse was still visable            
on top of a black screen, flickering a bit. had to reboot.                    
                                                                                                                                                            
Those two problems are most probably other issues which may be descussed        
later.                                                                          
                                                                                
                                                                                                                                         
                                                                                
Test 3: trace of the gnome-session bug.                                         
=======                                                                         
                                                                                
3 xterms:                                                                       
  1. strace -e accept -p 6520  (pid of x-session-manager)                       
  2. while true;do lsof |grep x-sess|grep unix|wc;sleep 2;done                  
  3. i=0; while [ $i -lt 1025 ]; do echo $i; xterm -e "echo $i ; sleep 1" ;  sleep 1.5; i=$(($i+1)) ; done                                                  
     # this opens an xterm which displays the sequence number and terminates after 1 second afterwards                                                      
                                                                                
Now eat a pizza and come back. The results:                                     
                                                                                
1.:                                                                             
  after accept() = ..., accept() = 1023, accept() = 1024:                       
  accept(9, 0x7fffa17e17a0, [4294967406]) = -1 EMFILE (Too many open files)     
                                                                                                                                         
2.:                                                                             
   1004    8032   99405                                                         
   1005    8040   99504                                                         
   1006    8048   99603                                                         
   1008    8064   99801                                                         
   1009    8072   99900                                                         
   1009    8072   99900                                                         
   1009    8072   99900                                                         
   1009    8072   99900                                                         
   [..]                                                                         
                                                                                
3.:                                                                             
997                                                                             
998                                                                             
999                                                                             
[waits forever]                                                                                      



Distribution: Ubuntu 8.10 (intrepid)
Gnome Release: 2.24.1 2008-10-24 (Ubuntu)
BugBuddy Version: 2.24.1
Comment 1 Vincent Untz 2009-03-25 01:14:52 UTC
This sounds a lot like bug 563354. Can you reproduce this with gnome-session 2.24.3?
Comment 2 Vincent Untz 2009-03-25 01:27:34 UTC
(fwiw, your test case of starting lots of clients is more or less https://bugs.freedesktop.org/show_bug.cgi?id=2920 -- it's definitely not a gnome-session bug)
Comment 3 thomas 2009-04-01 10:58:27 UTC
Hello Vincent,                                                                  
                                                                                
thank you for reviewing my report.                                              
                                                                                
Unfortunately, I could not test the problem with gnome-session 2.24.3,          
because i assume that the dependencies are quite heavy, and I do not            
like to disturb my production machine.                                        
                                                                                
Today I had the chance to test the gnome-session          
problem in another environment. Perhaps this helps more.

But first please let me emphasize, that the problem I reported was not          
kind of at brute force system stress test: it occurred during normal (but        
heavy) computer usage. I had to debug some X11 programs, and started and        
closed them very often. And suddenly, I was not able to display any new         
program anymore - and then looked where the problem was.                        
Even if it's not a gnome-session-bug, it's undoubtful, that the number          
of open unix-sockets (gnome-session or one of it's libraries handles) does      
only increase, but never decreases.                                             
I'd be very interested, if this is also true for your environment:
$ lsof |grep "^gnome-ses "|grep "unix 0x" |wc                    
     16     128    1575 
Now is start an xterm (an application which does not currently run).  
$ xterm & 
[1] 8339
$ lsof |grep "^gnome-ses "|grep "unix 0x" |wc                    
     17     136    1673    
$                                                                
Now i close the xterm.                                                          
[1]+  Done                    xterm  
$ lsof |grep "^gnome-ses "|grep "unix 0x" |wc
     17     136    1673         
$ 

To do the start and termination automatically, i documented "Test 3" with        
  i=0; while [ $i -lt 1025 ]; do echo $i; xterm -e "echo $i ; sleep 1" ; sleep 1.5; i=$(($i+1)) ; done
This does not display 1024 xterm's in parallel. It starts an xterm, displays it, 
then it terminates. Then it starts another one. Until the max. number of        
files is reached and the problem occurs.                                        
                                                                                
                                                                                
Now to my new test scenario from today:                                         
                                                                                
I started                                                                      
  X :1 &                                                                        
from a console and did                                                          
  export DISPLAY=:1                                                             
I started an xterm. And i startet x-window-manager by hand, which is            
actually /etc/alternatives/x-window-manager -> /usr/bin/metacity.               
I searched for my gnome-session process, did an lsof, and counted the           
open unix-sockets. Then i startet an xclock. The number of open    
      unix-sockets remained the same(!). Then i closed an xclock. The number          
of open unix sockets also remained the same.                                    
-> no bug triggered here                                                        
                                                                                
I _assume_, that gnome-session behaves different, if there's already            
a window-manager running _before_gome-session starts.                           
                                                                                
                                                                                
If it's not a gnome-session bug, then it may belong to libICE (just             
guessing..) or to the interaction with gnome-session and i.e. libICE.           
                                                                                
Nevertheless where we'll find the cause of the bug, I still consider            
it buggy, that if a user really has nearly 1024 programs open and the           
max number of open file descriptors (due to the number of unix-socket           
connections) is reached, that then the accept()-call of gnome-session           
(or it's library?) is called as fast as it could, which eats 100% CPU           
time.                                                                           
                                                                                
It should be rate-limited in that case. I know that select() on the             
unix-socket indicates that an event has to be handled. But accept()             
immediately returns -1 due to the exhausted number of file descriptors.  

The next select() call shows that there still the incoming session to be        
handled (it does not get be cleared). This goes on and on..                     
                                                                                
Ratelimiting this to i.e. 100ms is the only chance i could think of.            
                                                                                
Kindly regards,                                                                 
                                                                                
        - Thomas                                                
Comment 4 thomas 2009-04-28 10:33:59 UTC
Just a feedback:
I've upgraded to ubuntu jaunty. jaunty comes with 2.26.0svn20090408-0ubuntu2.
The problem does not occur there (i checked it: the number of open files increases and decreases with the amount of started / terminated X11 applications).
Comment 5 Vincent Untz 2009-04-28 10:48:48 UTC
Thomas: thanks. I guess I was probably right in comment #1, then ;-)