After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 554771 - totem spins CPU 100% while allocating (infinitely) because pulseaudio process is not available
totem spins CPU 100% while allocating (infinitely) because pulseaudio process...
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
0.10.x
Other Linux
: Normal normal
: 0.10.11
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2008-10-02 21:41 UTC by Martin Olsson
Modified: 2008-10-03 16:42 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Martin Olsson 2008-10-02 21:41:01 UTC
(I wrote this bug report with a lot of "newbie" info regarding all the stuff I did to analyze it, I'm including this info in here so that if other people find this bug maybe someone can learn something from it)

---

Today my entire X session hung when I was playing some songs in totem. Since the keyboard wasn't responding and the screen wasn't being updated I logged in through ssh from another machine. Once logged in I launched "top" and there I saw "totem" and "pulseaudio" both taking abount 1 percent CPU each constantly. I then killed pulse audio and this for some reason caused totem to start spinning with 100% CPU consumption while slowly also growing in VIRT size (it was both being stuck in a loop and it was also allocating more and more memory). At this point I launched gdb on totem to see why it was taking 100% CPU. 

Anyway, what I did was that I ran "info threads" inside gdb to see which threads where standing still in blocking functions like "select", "poll" and "pthread_cond_wait" etc. In this case it turned out that only one thread was really alive and I used "thread 3" to move to this thread in gdb. The stack I saw in there looked like this:

  • #0 free
    from /lib/libc.so.6
  • #1 gst_object_get_path_string
    at gstobject.c line 1108
  • #2 gst_element_message_full
    at gstelement.c line 1665
  • #3 gst_pulsesink_write
    at pulsesink.c line 556
  • #4 ??
    from /usr/lib/libgstaudio-0.10.so.0
  • #5 ??
    from /usr/lib/libglib-2.0.so.0
  • #6 start_thread
    from /lib/libpthread.so.0
  • #7 clone
    from /lib/libc.so.6
  • #8 ??
  • #0 _IO_str_overflow_internal
    from /lib/libc.so.6
  • #1 _IO_default_xsputn_internal
    from /lib/libc.so.6
  • #2 vfprintf
    from /lib/libc.so.6
  • #3 __vasprintf_chk
    from /lib/libc.so.6
  • #4 IA__g_vasprintf
    at /usr/include/bits/stdio2.h line 199
  • #5 IA__g_strdup_printf
    at /build/buildd/glib2.0-2.18.1/glib/gstrfuncs.c line 218
  • #6 gst_element_message_full
    at gstelement.c line 1670
  • #7 gst_pulsesink_write
    at pulsesink.c line 556
  • #8 audioringbuffer_thread_func
    at gstaudiosink.c line 226
  • #9 g_thread_create_proxy
    at /build/buildd/glib2.0-2.18.1/glib/gthread.c line 635
  • #10 start_thread
    from /lib/libpthread.so.0
  • #11 clone
    from /lib/libc.so.6
  • #12 ??
  • #6 gst_element_message_full
    at gstelement.c line 1670
  • #7 gst_pulsesink_write
    at pulsesink.c line 556
  • #8 audioringbuffer_thread_func
    at gstaudiosink.c line 226

It looks like the program was calling gst_pulsesink_write to write some data into pulse audio, however there was some error condition which caused a call to gst_element_message_full. Since I just abruptly _killed_ pulseaudio while totem was still running (and using it) so that might have something to do with it right? :P Anyway, this is theory is confirmed by the local variables in the stackframe for gst_pulsesink_write. Specifically it says:

sent_text = (gchar *) 0x163c880 "Disconnected: Connection terminated" 

How the question becomes: if there is a reasonable error message coming back saying that the connection to pulseaudio has been terminated... then why would totem just keep trying to write data to pulseaudio? Seems like pretty insane behavior, instead totem should stop trying to send audio data to pulse and instead it should show some error message.

Anyway, to continue debugging I put breakpoints on some of these functions to figure out how deep in the stack the loop was. For example I used "break function_name" put breakpoints on:

* gst_element_message_full
* gst_pulsesink_write
* audioringbuffer_thread_func

It just seemed unlikely that there should be an infinite loop bug inside libc.so so I didn't bother with those functions. Anyway, then I resumed gdb by typing "c" and I immediately hit a breakpoint. I continued it again and again and basically the pattern was that both the functions gst_element_message_full and gst_pulsesink_write were being continuously hit. However, I never hit the breakpoint on "audioringbuffer_thread_func" so evidently this function didn't exit.

At this point it was clear that a loop inside audioringbuffer_thread_func was responsible for continously calling gst_pulsesink_write which in turn continously called gst_element_message_full. Time to look at some code.

The first thing in the code I wanted to investigate was the function called audioringbuffer_thread_func which according to my theory should have an infinite loop. I googled for gstreamer and found it's homepage. Went to the developer section and used cvs to fetch the code for "gstreamer". I went into the directory created by cvs and I did "grep -r audioringbuffer_thread_func *" which actually found nothing :-( My guess was that this code probably was located in the plugin and from the two first stacktraces I remembered that this function was actually in /usr/lib/libgstaudio-0.10.so.0 which meant that it was in the libgstreamer-plugins-base0.10-0 package. Luckily gstreamer lists several modules inside it's CVS and one is the for the base plugins. So I ended up re-running CVS pointing it to the gst-plugins-base module and then I got a new directory full of source code. When I re-ran the command "grep -r audioringbuffer_thread_func *" on this code it found a match inside gstaudiosink.c (which is of course were gdb said we'd find it even though gdb was not smart enough to say in what folder and not in what source repository I should look for it). Anyway, I fired up my trusty old gedit and boom there was the code for audioringbuffer_thread_func().

After look at the code for audioringbuffer_thread_func() for a while I noted that there wasn't any call to gst_pulsesink_write() in there. Still, gdb clearly said that the function audioringbuffer_thread_func() had called into the function gst_pulsesink_write(). At this point the problem could have been that I had the wrong version of the source file because the source file might have been changed between now and whenever ubuntu took their copy of the source and created the package for it. If I had used the "apt-get source blah" command I would have gotten the exact right source for the relevant package but why make life too easy.

However, there is actually a call to writefunc() inside the function audioringbuffer_thread_func() and a few lines above that it's seen that writefunc is just a function pointer so it's pretty likely. That the function pointer writefunc pointed to the function gst_pulsesink_write() when my code was running. From looking at audioringbuffer_thread_func() it's also clear that there is some kind of error handling in here because if the number of written bytes is less than zero then execution will at least break out of the inner while loop.

Okay, fair enough. What about the code inside gst_pulsesink_write()? I ran "grep -r gst_pulsesink_write *" which returned nothing both in the gst-plugins-base and also in gstreamer itself. Probably the pulse audio plugin for gstreamer was not included in the main source repository nor in the base plugins. So I went off to google for the source repository that held the gst_pulsesink_write code and eventually managed to figure out that it's in the gst-plugins-good package. There is probably some smarter way to figure that out though. For this code I found that "grep -r gst_pulsesink_write *" returns a match inside the file pulsesink.c which is exactly where gdb said I would find it.

So, I looked at gst_pulsesink_write(). It has a while loop nested inside a for loop. There is also some macros used, called CHECK_DEAD_GOTO and GST_ELEMENT_ERROR.

At this point I needed to know what the definition of CHECK_DEAD_GOTO was but the editor that I was using (gedit) does not have a right-click function for "Go to definition" (maybe there is some gedit plugin for it, I dunno). So instead, what I did was that I dropped into a terminal, navigated to the cvs directory for gst-plugins-good and then I ran "cscope -R". This is a very old but useful text mode tool for doing "Go to definition". By pressing TAB it's possible to toggle between the upper and the lower panels. I just pressed down arrow once so that I ended up in the "Find global definition" filed and then I entered CHECK_DEAD_GOTO.  This shows a list of all the files where CHECK_DEAD_GOTO is defined and I wanted the definition from pulsesink.c of course. Now it seems that basically CHECK_DEAD_GOTO just checks if there was an error and if there was then ut jumps to the designated label which in this case was "unlock_and_fail". Just from the name it actually sounds like "check dead" could be the error that you end up with when pulseaudio it dead.

So what happens if "check dead" is raised and the code jumps to "unlock_and_fail" ?   Well, the code just unlocks the pulseaudio mainloop and then it returns zero.

ehm, wait what?! It returns zero? Hmm, but won't that make the parent function _continue_ calling gst_pulsesink_write again and again like there is no tomorrow? .... smells like some kind of misunderstanding?

Okay so now we sort of understand what causes the bug. Now we need to talk to some gstreamer dev and ask the the return code policy here actually is. What should the gstreamer plugin return to make gstreamer understand that there is no point in sending more data to the plugin? Looking at what the other plugins do would also be a good starting point.

Another thing is that we didn't actually understand why X.org sort of hung in the first place but at least we found out why totem was frozen taking 100% CPU.
Comment 1 Wim Taymans 2008-10-03 09:37:39 UTC
The pulse plugin should return -1 when a fatal error occurs.
Comment 2 Wim Taymans 2008-10-03 16:13:38 UTC
        * ext/pulse/pulsesink.c: (gst_pulsesink_write):
        * ext/pulse/pulsesrc.c: (gst_pulsesrc_read):
        Return -1 instead of 0 in error cases. Fixes #554771.
Comment 3 Martin Olsson 2008-10-03 16:42:40 UTC
Thank you for the quick fix! :-)