After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 646624 - GstBin: regression: creating too many bins fails, exhausting allowed file descriptor limit
GstBin: regression: creating too many bins fails, exhausting allowed file des...
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gstreamer (core)
git master
Other Linux
: Normal blocker
: 0.10.33
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2011-04-03 15:16 UTC by Tim-Philipp Müller
Modified: 2011-04-11 11:07 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
checks: add GstBin unit test that creates a lot of bins (3.13 KB, patch)
2011-04-03 15:17 UTC, Tim-Philipp Müller
committed Details | Review
bus: Only create the signalling socket pair when required (4.96 KB, patch)
2011-04-05 14:31 UTC, Sebastian Dröge (slomo)
committed Details | Review

Description Tim-Philipp Müller 2011-04-03 15:16:01 UTC
Each bin has a bus, and in git we now create a GstPoll/socketpair for every bus, even if it's not a top-level bin. This can easily exhaust the number of available file descriptors.
Comment 1 Tim-Philipp Müller 2011-04-03 15:17:24 UTC
Created attachment 185033 [details] [review]
checks: add GstBin unit test that creates a lot of bins
Comment 2 Sebastian Dröge (slomo) 2011-04-05 13:02:00 UTC
That's caused by this change but you probably know that already ;)

commit 14d7db1b527b05f029819057aef5c123ac7e013d
Author: Wim Taymans <wim.taymans@collabora.co.uk>
Date:   Thu Oct 28 13:27:43 2010 +0100

    bus: make the bus almost lockfree
    
    Use new GstPoll functionality to wakeup the mainloop.
    Use an atomic queue on the writer side to post the messages.
    The reader side it protected with the lock still because we don't want multi
    concurrent readers.
Comment 3 Sebastian Dröge (slomo) 2011-04-05 13:12:58 UTC
Before it was using condition variables for signalling/waiting and now a socket pair. I guess fixing this requires Wim powers :) Using condition variables is not possible without a mutex and not using a mutex was the whole point of the commit
Comment 4 Sebastian Dröge (slomo) 2011-04-05 14:31:21 UTC
Created attachment 185202 [details] [review]
bus: Only create the signalling socket pair when required

Otherwise a new one would be created for every single bus and
the process could easily run out of file descriptors.

Fixes bug #646624.
Comment 5 Sebastian Dröge (slomo) 2011-04-06 10:06:33 UTC
commit bd1c40011434c1efaa696dc98ef855ef9cce9b28
Author: Sebastian Dröge <sebastian.droege@collabora.co.uk>
Date:   Wed Apr 6 12:03:18 2011 +0200

    bus: Check if pending messages are in the queue
    
    We can't rely completely on the poll fd because the fd might be
    created after messages were posted to the bus.

commit d7ff4ee6cb2493c6a669a5780ec6159cd351520d
Author: Tim-Philipp Müller <tim.muller@collabora.co.uk>
Date:   Sun Apr 3 16:11:50 2011 +0100

    checks: add GstBin unit test that creates a lot of bins
    
    Currently fails (in normal circumstances) because we create a
    socket pair for each bin's bus and exhaust the number of available
    file descriptors.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=646624

commit 4bf8f1524f6e3374b3f3bc57322337723d06b928
Author: Sebastian Dröge <sebastian.droege@collabora.co.uk>
Date:   Tue Apr 5 16:22:48 2011 +0200

    bus: Only create the signalling socket pair when required
    
    Otherwise a new one would be created for every single bus and
    the process could easily run out of file descriptors.

    
    Fixes bug #646624.
Comment 6 Tim-Philipp Müller 2011-04-06 10:46:13 UTC
Re-opening this, as it seems to cause multiple unit test failures.
Comment 7 Sebastian Dröge (slomo) 2011-04-06 12:20:37 UTC
commit 874d60e5899dd5b89854679d1a4ad016a58ba4e0
Author: Sebastian Dröge <sebastian.droege@collabora.co.uk>
Date:   Wed Apr 6 14:06:49 2011 +0200

    bus: Add private API to set a GstBus in child mode
    
    This is used by GstBin to create a child bus without
    a socketpair because child buses will always work
    synchronous. Otherwise too many sockets could be
    created and the limit of file descriptors for the
    process could be reached.
    
    Fixes bug #646624.
Comment 8 Edward Hervey 2011-04-08 16:53:17 UTC
On macosx:
Running suite(s): GstBin
93%: Checks: 15, Failures: 0, Errors: 1
gst/gstbin.c:1140:E:bin tests:test_many_bins:0: (after this point) Received signal 10 (Bus error)

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0xb0aaaffc
[Switching to process 41291 thread 0x5703]
0x00106b59 in gst_pad_push_event (pad=0x13da438, event=0x585328) at gstpad.c:5178
5178	{
(gdb) bt 20
  • #0 gst_pad_push_event
    at gstpad.c line 5178
  • #1 gst_pad_send_event
    at gstpad.c line 5396
  • #2 gst_pad_push_event
    at gstpad.c line 5248
  • #3 gst_pad_send_event
    at gstpad.c line 5396
  • #4 gst_pad_push_event
    at gstpad.c line 5248
  • #5 gst_base_transform_src_eventfunc
    at gstbasetransform.c line 2131
  • #6 gst_base_transform_src_event
    at gstbasetransform.c line 2096
  • #7 gst_pad_send_event
    at gstpad.c line 5396
  • #8 gst_pad_push_event
    at gstpad.c line 5248
  • #9 gst_pad_send_event
    at gstpad.c line 5396
  • #10 gst_pad_push_event
    at gstpad.c line 5248
  • #11 gst_pad_send_event
    at gstpad.c line 5396
  • #12 gst_pad_push_event
    at gstpad.c line 5248
  • #13 gst_base_transform_src_eventfunc
    at gstbasetransform.c line 2131
  • #14 gst_base_transform_src_event
    at gstbasetransform.c line 2096
  • #15 gst_pad_send_event
    at gstpad.c line 5396
  • #16 gst_pad_push_event
    at gstpad.c line 5248
  • #17 gst_pad_send_event
    at gstpad.c line 5396
  • #18 gst_pad_push_event
    at gstpad.c line 5248
  • #19 gst_pad_send_event
    at gstpad.c line 5396
  • #6197 gst_pad_push_event
    at gstpad.c line 5248
  • #6198 gst_pad_send_event
    at gstpad.c line 5396
  • #6199 gst_pad_push_event
    at gstpad.c line 5248
  • #6200 gst_pad_send_event
    at gstpad.c line 5396
  • #6201 gst_pad_push_event
    at gstpad.c line 5248
  • #6202 gst_base_transform_src_eventfunc
    at gstbasetransform.c line 2131
  • #6203 gst_base_transform_src_event
    at gstbasetransform.c line 2096
  • #6204 gst_pad_send_event
    at gstpad.c line 5396
  • #6205 gst_pad_push_event
    at gstpad.c line 5248
  • #6206 gst_pad_send_event
    at gstpad.c line 5396
  • #6207 gst_pad_push_event
    at gstpad.c line 5248
  • #6208 gst_base_sink_send_event
    at gstbasesink.c line 4508
  • #6209 gst_element_send_event
    at gstelement.c line 1633
  • #6210 gst_bin_send_event
    at gstbin.c line 2673
  • #6211 gst_element_send_event
    at gstelement.c line 1633
  • #6212 gst_bin_do_latency_func
    at gstbin.c line 2373
  • #6213 gst_marshal_BOOLEAN__VOID
    at gstmarshal.c line 548
  • #6214 g_closure_invoke
  • #6215 signal_emit_unlocked_R
  • #6216 g_signal_emit_valist
  • #6217 g_signal_emit
  • #6218 gst_bin_recalculate_latency
    at gstbin.c line 2332
  • #6219 gst_bin_change_state_func
    at gstbin.c line 2423
  • #6220 gst_pipeline_change_state
    at gstpipeline.c line 482
  • #6221 gst_element_change_state
    at gstelement.c line 2717
  • #6222 gst_bin_continue_func
    at gstbin.c line 2734
  • #6223 g_thread_pool_thread_proxy
  • #6224 g_thread_create_proxy
  • #6225 _pthread_start
  • #6226 thread_start

Comment 9 Sebastian Dröge (slomo) 2011-04-08 17:04:38 UTC
Does this test succeed on OS X with old core (0.10.32)? Does running it in valgrind lead to any more useful information?
Comment 10 Edward Hervey 2011-04-08 17:10:54 UTC
GST_DEBUG=5 log file is located here : http://people.collabora.co.uk/~edward/gstbincheckfail-20110408.bz2
Comment 11 Edward Hervey 2011-04-08 17:17:23 UTC
valgrind doesn't show anything helpful. And yes it worked in 0.10.32 (core only started failing lately on macosx).
Comment 12 Sebastian Dröge (slomo) 2011-04-08 17:33:07 UTC
(In reply to comment #11)
> valgrind doesn't show anything helpful. And yes it worked in 0.10.32 (core only
> started failing lately on macosx).

I'm asking because this test was added 3 days ago.
Comment 13 Tim-Philipp Müller 2011-04-09 22:34:14 UTC
__tim   : bilboed, did you try 0.10.32 + bin test_many_bins unit test on osx?
bilboed : fails with 0.10.32
__tim   : ok, so it's probably just the silly unit test; could you
          try reducing the number of bins it creates?
          #define NUM_BINS 2000 => 1000/500 or so?
bilboed : 1000 fails
bilboed : 500 passes
__tim   : hrm, wonder why
__tim   : ok, thanks for testing, doesn't sound like a regression
Comment 14 Tim-Philipp Müller 2011-04-11 11:07:42 UTC
According to the log it just times out waiting for the preroll, so let's hope this fixes it:

 commit 6ca7284a5498259bebac10b5163859eb86aa4b08
 Author: Tim-Philipp Müller <tim.muller@collabora.co.uk>
 Date:   Mon Apr 11 12:04:34 2011 +0100

    tests: allow more time for the test_many_bins pipeline to preroll
    
    Hopefully makes this test work on the OSX build bot and other
    not-so-powerful machines.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=646624