GNOME Bugzilla – Bug 338827
[patch] gnomevfssrc should use async api to not block on network problems
Last modified: 2009-07-27 21:18:30 UTC
Currently gnomevfssrc blocks when network problems occur (like getting out of WLAN area). The element should use the async api of gnomevfs together with a timeout mechanism.
Created attachment 63743 [details] [review] first (no yet working) attempt for async api still works for local files, but when using uris (http:// or file://) the reads fail
Created attachment 63905 [details] [review] 2nd try to make the change (works now) can we get rid of the usleep(100)? any idea?
Comment on attachment 63905 [details] [review] 2nd try to make the change (works now) wrong patch
Created attachment 64258 [details] [review] 3rd try (works find from gst-launch) When this patch is applied gnomevfssrc becomes very fragile in respect to a running mainloop. Whenever the mainloop is not running operation can quickly timeout. This especially is a problem when setting the pipeline to NULL. Right now one needs to kick then mainloop manually (g_main_context_iteration(NULL,FALSE);). This sounds very wrong.
This bug seems to cause problems in rhythmbox (see backtrace in #313049 comment #76). It should also cause problems for any other app using gnomevfssrc. As the bug had no activity for almost six months and the provided solution seems to be unusable, I'll restate the problem a bit. The culprit is the start () GstBaseSrc vfunc implementation of gnomevfssrc. According to the documentation "subclasses should open resources and prepare to produce data". This effectively gets called by gst_element_set_state (), which must not block. For a network source, "preparing to produce data" usually means to establish a network connection, which is obviously not guaranteed to be an instantaneous operation. Therefore, this breaks the semantics of gst_element_set_state (), causing terrible problems for applications. A very simple method to reproduce the problem is using the netcat program like this: "nc -l -p 10000 localhost". This simulates a TCP server that eats all data sent to it but never sends anything back. "gst-launch-0.10 gnomevfssrc location=http://localhost:10000 ! fakesink" will then hang while setting the state to PAUSED (literally, that is to say in gst_element_set_state ()).
(In reply to comment #5) > (see backtrace in #313049 comment #76). Generate correct autolink: bug #313049 comment #76
Changing from "enhancement" to "normal" severity because it is a bug (it breaks the state_change() semantics).
Created attachment 74506 [details] [review] Open non-local URIs in streaming thread This implements basically what Wim stated in bug #319873 comment #4. The approach is inherently a little imperfect as the time between start() and the first create() isn't used for establishing the connection in the background, but it's better than the current behaviour. It is assumed that local URIs are opened instantly, so I made start() to act as before in this case, which is needed so that get_size() and is_seekable() work (which also seem to be called from the main thread).
Of course my quickfix just moves the problem elsewhere, as the streaming thread is blocked instead. It would make sense if the operation were cancellable. neonhttpsrc (currently in gst-plugins-bad) uses the neon http library directly and seems to suffer from the same problem. The DAAP source element inside Rhythmbox also blocks in start() (bug #362226). This indicates that the documentation for GstBaseSrc needs to be extended: At least start, get_size and is_seekable vfuncs are called by gst_element_set_state and must never block. As far as I can see, this isn't indicated in the current documentation at all.
Normally state changes should never block. I have no real idea how to handle blocking state changes yet...
Confirming that this is indeed a bug.
Created attachment 75180 [details] [review] do state transiton in alternate thread I recently had a idea for a completely different way of handling this. Rather than using gnomevfs's asynchronous API, which doesn't help when a vfunc must have a result before returning, we could push the entire state transition into another thread, which can block. This patch makes gnomevfssrc intercept READY->PAUSED (and the reverse) by creating a new thread which calls the parent state transition function, and retuning ASYNC. I'm not sure whether this method would have other issues, and it definitely need testing, but it might work.
returning ASYNC from a source plugin should theoretically work. The idea would indeed be to start the open() in a different thread (which must also be cancelable or you might end up with a dozen threads hanging around). The thing is that basesrc has no idea that starting the task to perform the read() should be delayed until the state change completes. An alternative way would something like this: start { /* check URI, do all the stuff that does not block. BaseSrc will start a */ /* task. */ ... /* assume all this to be TRUE or based on the uri protocol. */ if (file://,...) { seekable = TRUE; random_access = TRUE; } else { seekable = FALSE; random_access = FALSE; } return TRUE; } stop { /* try to cancel all actions that can block the streaming thread */ cancel_open(); cancel_read(); return TRUE; } change_state { .. res = parent_change_state() /* (de)activates the pad and starts the task */ READY_TO_PAUSED: res = ASYNC; /* overwrite result to be ASYNC */ break; PAUSED_TO_READY: opened = FALSE; break; return res; } create { if (!opened) { open(); /* blocks but is cancelable */ opened = true; commit_state() /* commit state from READY to PAUSED */ } read(); } In fact, you might just want to skip the ASYNC stuff entirely, as long as you don't produce data, the sinks will return ASYNC anyway. It's still awkward because after opening the device, basesrc will need some info (seekable, random access, ...) which you don't know yet. Assuming values based on the uri protocol might work just fine. Also, when operating in random access mode, the sequence is a little different because the start/stop methods will be called when downstream activates itself (so before the change_state). Above code will handle that.
Hmm. I'm repeating myself... If patch in Comment #8 is made cancelable and seekable/random_access only enabled for local uris (opened in start()), this can go in. As further enhancements a simple method could be added to basesrc, called after open(), that (re)checks seekability. If we really care to make it ASYNC, it could do the commit stuff as well.
*** Bug 319873 has been marked as a duplicate of this bug. ***
*** Bug 370769 has been marked as a duplicate of this bug. ***
rene, for your patch from #8, I fail to see when the code to make get_size() and is_seekable() work gets executed for non_local streams. We currently have the problem, that for upnpav servers the get_info stuff can block for 30 seconds :( so that needs to get moved out of _start too.
Looks to me like gnome-vfs (in its current form) is unsuitable for use by gstreamer at all. To recap the whole picture as it occurs to me: gnomevfssrc as-is uses gnome_vfs_open_uri in ->start, which blocks and therefore possibly makes gst_element_set_state hang (very bad). I provided a small patch to move this blocking call into the streaming thread, so it will hang there instead (slightly less bad). James provided a better patch that does the blocking calls in an extra thread, which should free applications from hanging because they call innocent gstreamer functions (even less bad but still bad as the threads are leaked until the operation times out). These are workarounds to ensure people don't lose data because their application locks up. This would be cleanly solvable if the blocking calls of gnome-vfs were cancellable. It looks like that they are not. There is something about cancellation in the API, but only for use by methods (backends). So no matter what we come up with, the best possible solution involves leaking threads that hang for a timeout of several minutes. Therefore, we are left with the async API of gnome-vfs as the only opportunity to get a clean solution. However, the patch Stefan made to convert gnomevfssrc over to the async calls does not work correctly. The problem manifests itself in these lines of code: /* wait for callback to be called */ while (src->async_op_pending) { if (g_main_context_pending (NULL)) { g_main_context_iteration (NULL, FALSE); Iterating a main context is only allowed from a single thread! That's why programs with a main loop become unstable with the patch. gst-launch works because it just polls the bus. So no matter what, these lines have to go. If the thread just sleeps instead, everything should be fine as long as the application iterates its default GMainContext. Doing so is not a requirement for using gstreamer at all - and that's the fundamental difference between gstreamer and gnome-vfs. Applying the patch (fixed in the way as described) would therefore mean that any application that uses gnomevfssrc _must_ run a GMainLoop for the default GMainContext (or iterate the context in regular intervals in another way). I doubt making this a requirement is acceptable. Looking for a solution in gnome-vfs might be the best idea. Its assumption that the application iterates the default main context seems silly from the gstreamer point of view. Then again, it's "gnome-vfs" and not "gvfs".
Regarding the seeking in Comment #14 please also think about Bug #54590. Seekabillity not just depends on wheter the file is local or not.
Starting from the 3rd patches of stefan, with a bit of rewritting, I changed the async_wait() function. Does it solve the bug? - no more main GMainContext - no more main loop. static gboolean gst_gnome_vfs_async_wait(GstGnomeVFSSrc * src) { gboolean res; GTimeVal timeval; g_get_current_time (&timeval); g_time_val_add (&timeval, src->timeout * 1000000); /* FIXME: the mutex is not useful (btw, it should NOT be async_op_mutex, I think) */ if (g_cond_timed_wait (src->async_op_cond, src->async_op_mutex, &timeval) == FALSE) { gnome_vfs_async_cancel (src->handle); res = FALSE; GST_INFO_OBJECT (src, "Timeout occured during gnomevfs async call with %f", src->timeout); } else { src->async_op_pending = FALSE; res = TRUE; } return res; }
Created attachment 77305 [details] [review] 4th attempt (using GCond instead of GMainLoop and sleep...) For COMMENTS, not completely tested.
Marc-Andre: No, it cannot. The problem is in gnome-vfs, which for example uses the convenience of g_idle_add to marshal some callbacks to the so-called main thread. If the application does not iterate the default main context at all, these callbacks are never called.
Is it possible that with the GCond wait, the main thread can be stuck? And so, the 4th attempt does not help at all?
Hi, any updates on this bug? Nothing for six months now. (I became interested after viewing bug 362226.)
I belive this will be reattempt with gvfs - the next generation gnome vfs that currently get developed.
This is indeed a pretty bug, blocking UI for any GStreamer using application that uses gnomevfssrc. Which is basically all of them (Totem, Elisa, ..) Upping priority.
*** Bug 437652 has been marked as a duplicate of this bug. ***
Would be nice if people could test the gio plugin from gst-plugins-bad to make sure this issue doesn't happen again with gio/gvfs. Unfortunately the only existing network protocol support in gvfs is currently samba...
For people like myself who need a quick fix, I've patched libneon to expose the socket fd, and implemented interruption of the neonhttpsrc's create() method by implementing unlock/stop_unlock. Patches for libneon and gst-plugins-bad on ftp://ftp.mind.be/GStreamer/
Created attachment 99136 [details] [review] Fully async HTTP source based on libsoup-2.2 This adds a fully async HTTP source based on libsoup. start() returns immediately. URL parsing, DNS lookup, connection, HTTP negotiation and reception of data is completely asynchronous (with use of a new GMainLoop) and handled in create(). This is a bare-bones implementation and lacks many of the features on neonhttpsrc. HTTPS should work too, but was not tested.
Created attachment 99137 [details] [review] Fully async HTTP source based on libsoup-2.2 Forgot Makefile.am.
Bug 497020 to handle souphttpsrc.
Just a note: both GnomeVFS and libsoup-2.2 are going to be deprecated. GnomeVFS will be replaced by GIO - bug #510393 libsoup-2.2 --> libsoup-2.4 with a totally new API see http://mail.gnome.org/archives/desktop-devel-list/2008-January/msg00098.html
Already working on libsoup-2.4 support...
libsoup-2.4 support implemented, see bug 510708.
Wouter, Stefan : *IF* this bug is still valid, there's a better alternative to fixing this issue, by not doing the connection in NULL=>READY (GstBaseSrc.start()) but in READY=>PAUSED (.render()). This should be done for all sources for which we can't guarantee an immediate response (anything 'potentially' involving network).
> Wouter, Stefan : *IF* this bug is still valid, there's a better alternative to > fixing this issue, by not doing the connection in NULL=>READY > (GstBaseSrc.start()) but in READY=>PAUSED (.render()). > > This should be done for all sources for which we can't guarantee an immediate > response (anything 'potentially' involving network). This only hides the problem, it doesn't fix it at all. You might still get blocking when shutting down if you do this (e.g. when user starts playing something, then hits the next button). Having said that, I think we should just declare gnomevfs* stuff as WONTFIX and deprecate the gnomevfs elements and remove the experimental status from the gio elements.
Agree with tim, but imho gio still has one (minor) regression - handling the authentication. rene should know more.
(In reply to comment #38) > Agree with tim, but imho gio still has one (minor) regression - handling the > authentication. rene should know more. Also GIO needs to support non-default main contexts for it's async operations before we can use them. Until then it's only as good as gnomevfs in this regard (you can cancel sync operations from other threads). The authentication problem is, that we a) can't do authentication inside the plugin because it also runs on the default main context and b) we need to tell the application about the authentication issue, the application then has to show a dialog, mount the filesystem and then tell the plugin that everything is good now.
(In reply to comment #39) > Also GIO needs to support non-default main contexts for it's async operations > before we can use them. Until then it's only as good as gnomevfs in this regard > (you can cancel sync operations from other threads). > > The authentication problem is, that we a) can't do authentication inside the > plugin because it also runs on the default main context and b) we need to tell > the application about the authentication issue, the application then has to > show a dialog, mount the filesystem and then tell the plugin that everything is > good now. > And that's the problem -- you can't interact with the user to get the authentication during a state change. That is why it's OK that Alex closed my feature request for mainloop-less async mount operation as WONTFIX. It's either sync or you error out anyways: The app needs to gio-mount any location itself before setting the state. This is only a regression if you consider the old gnomevfs elements' solution as working, which I don't. It was just a giant hack IMO... It should still be possible to make the gio elements attempt to mount theirselves, but it's impossible to provide credentials so this would only fix password-less locations (good enough for gst-launch and lazy apps though).
Closing this bug report as no further information has been provided. Please feel free to reopen this bug if you can provide the information asked for. Thanks!