GNOME Bugzilla – Bug 646624
GstBin: regression: creating too many bins fails, exhausting allowed file descriptor limit
Last modified: 2011-04-11 11:07:42 UTC
Each bin has a bus, and in git we now create a GstPoll/socketpair for every bus, even if it's not a top-level bin. This can easily exhaust the number of available file descriptors.
Created attachment 185033 [details] [review] checks: add GstBin unit test that creates a lot of bins
That's caused by this change but you probably know that already ;) commit 14d7db1b527b05f029819057aef5c123ac7e013d Author: Wim Taymans <wim.taymans@collabora.co.uk> Date: Thu Oct 28 13:27:43 2010 +0100 bus: make the bus almost lockfree Use new GstPoll functionality to wakeup the mainloop. Use an atomic queue on the writer side to post the messages. The reader side it protected with the lock still because we don't want multi concurrent readers.
Before it was using condition variables for signalling/waiting and now a socket pair. I guess fixing this requires Wim powers :) Using condition variables is not possible without a mutex and not using a mutex was the whole point of the commit
Created attachment 185202 [details] [review] bus: Only create the signalling socket pair when required Otherwise a new one would be created for every single bus and the process could easily run out of file descriptors. Fixes bug #646624.
commit bd1c40011434c1efaa696dc98ef855ef9cce9b28 Author: Sebastian Dröge <sebastian.droege@collabora.co.uk> Date: Wed Apr 6 12:03:18 2011 +0200 bus: Check if pending messages are in the queue We can't rely completely on the poll fd because the fd might be created after messages were posted to the bus. commit d7ff4ee6cb2493c6a669a5780ec6159cd351520d Author: Tim-Philipp Müller <tim.muller@collabora.co.uk> Date: Sun Apr 3 16:11:50 2011 +0100 checks: add GstBin unit test that creates a lot of bins Currently fails (in normal circumstances) because we create a socket pair for each bin's bus and exhaust the number of available file descriptors. https://bugzilla.gnome.org/show_bug.cgi?id=646624 commit 4bf8f1524f6e3374b3f3bc57322337723d06b928 Author: Sebastian Dröge <sebastian.droege@collabora.co.uk> Date: Tue Apr 5 16:22:48 2011 +0200 bus: Only create the signalling socket pair when required Otherwise a new one would be created for every single bus and the process could easily run out of file descriptors. Fixes bug #646624.
Re-opening this, as it seems to cause multiple unit test failures.
commit 874d60e5899dd5b89854679d1a4ad016a58ba4e0 Author: Sebastian Dröge <sebastian.droege@collabora.co.uk> Date: Wed Apr 6 14:06:49 2011 +0200 bus: Add private API to set a GstBus in child mode This is used by GstBin to create a child bus without a socketpair because child buses will always work synchronous. Otherwise too many sockets could be created and the limit of file descriptors for the process could be reached. Fixes bug #646624.
On macosx: Running suite(s): GstBin 93%: Checks: 15, Failures: 0, Errors: 1 gst/gstbin.c:1140:E:bin tests:test_many_bins:0: (after this point) Received signal 10 (Bus error) Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_PROTECTION_FAILURE at address: 0xb0aaaffc [Switching to process 41291 thread 0x5703] 0x00106b59 in gst_pad_push_event (pad=0x13da438, event=0x585328) at gstpad.c:5178 5178 { (gdb) bt 20
+ Trace 226655
Does this test succeed on OS X with old core (0.10.32)? Does running it in valgrind lead to any more useful information?
GST_DEBUG=5 log file is located here : http://people.collabora.co.uk/~edward/gstbincheckfail-20110408.bz2
valgrind doesn't show anything helpful. And yes it worked in 0.10.32 (core only started failing lately on macosx).
(In reply to comment #11) > valgrind doesn't show anything helpful. And yes it worked in 0.10.32 (core only > started failing lately on macosx). I'm asking because this test was added 3 days ago.
__tim : bilboed, did you try 0.10.32 + bin test_many_bins unit test on osx? bilboed : fails with 0.10.32 __tim : ok, so it's probably just the silly unit test; could you try reducing the number of bins it creates? #define NUM_BINS 2000 => 1000/500 or so? bilboed : 1000 fails bilboed : 500 passes __tim : hrm, wonder why __tim : ok, thanks for testing, doesn't sound like a regression
According to the log it just times out waiting for the preroll, so let's hope this fixes it: commit 6ca7284a5498259bebac10b5163859eb86aa4b08 Author: Tim-Philipp Müller <tim.muller@collabora.co.uk> Date: Mon Apr 11 12:04:34 2011 +0100 tests: allow more time for the test_many_bins pipeline to preroll Hopefully makes this test work on the OSX build bot and other not-so-powerful machines. https://bugzilla.gnome.org/show_bug.cgi?id=646624