Bug 577433 – proper monitoring for long-running metadata operations

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 577433 - proper monitoring for long-running metadata operations


Summary:	proper monitoring for long-running metadata operations


Status:	RESOLVED OBSOLETE

Product:	rhythmbox
Classification:	Other
Component:	general
Version:	3.0.x
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	RhythmBox Maintainers
QA Contact:	RhythmBox Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2009-03-31 10:14 UTC by Per Ångström
Modified:	2018-05-24 14:16 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
A simple patch that will make dbus timeouts ten times longer. (1.19 KB, patch) 2009-04-18 18:20 UTC, Per Ångström	rejected	Details \| Review
0001-Make-sure-idle-timeout-doesn-t-happen-immediately.patch (2.83 KB, patch) 2009-04-22 08:42 UTC, Per Ångström	needs-work	Details \| Review
0002-Double-the-general-metadata-operation-timeout-and-ma.patch (2.10 KB, patch) 2009-04-22 08:47 UTC, Per Ångström	reviewed	Details \| Review
0004-Implement-a-watchdog-thread-for-monitoring-metadata.patch (7.37 KB, patch) 2009-04-23 16:53 UTC, Per Ångström	needs-work	Details \| Review
Refinements-to-watchdog-logic.patch (3.54 KB, patch) 2009-04-28 19:07 UTC, Per Ångström	none	Details \| Review
0001-No-timeout-for-metadata-save.patch (1.11 KB, patch) 2009-05-02 17:46 UTC, Per Ångström	none	Details \| Review

Description Per Ångström 2009-03-31 10:14:08 UTC

Referring to https://bugs.launchpad.net/ubuntu/+source/rhythmbox/+bug/123460 "Rhythmbox:Error while saving song information:Internal GStreamer problem"

rb-metadata-dbus-service.c imposes an arbitrary timeout of 15 seconds on metadata operations. This causes lengthy metadata operations to fail, which can happen if the file is large or the file is on a slow medium.

Additionally, Rhytmbox will not clean up after a failed metadata operation, leaving behind large hidden files in the user's file system.

Comment 1 Per Ångström 2009-04-18 17:45:41 UTC

It should be noted that these problems will probably only occur on very big audio files; the files I'm having most problems with are FLAC files, 25 to 40 minutes in length.

Anyway, I have found another arbitrary timeout of 15 seconds which also need to be changed to a much higher value or even removed:

./metadata/rb-metadata-dbus.h:#define RB_METADATA_DBUS_TIMEOUT	(15000)

This is in addition to the one I previously found:
./metadata/rb-metadata-dbus-service.c:#define ATTENTION_SPAN		30

With "#define RB_METADATA_DBUS_TIMEOUT (15000 * 10)" and "#define ATTENTION_SPAN		(30 * 10)" I can verify that I don't get these problems anymore. On the other hand, maybe these timeouts are too crude anyhow and should be replaced with another mechanism. But for the time being, I would be happy with longer timeouts.

Comment 2 Per Ångström 2009-04-18 18:20:26 UTC

Created attachment 132874 [details] [review]
A simple patch that will make dbus timeouts ten times longer.

The patch fixes bug 577433, "metadata-dbus operations arbitrarily cut off after 15 seconds", by arbitrarily making the arbitrary timeouts ten times longer. I would rather have removed the timeouts completely, but I'm not sufficiently familiar with the code to do that.

Comment 3 Jonathan Matthew 2009-04-19 14:10:45 UTC

The current timeouts are a pretty weak approximation of how the metadata helper process could usefully be monitored.  The solution is to fix the monitoring mechanism, not to increase the timeouts until they're long enough to re-tag the largest imaginable file on the slowest possible storage medium.

Anyway, it seems like the attention span timeout could be fixed by adding a periodic check in the tag writing process that updates the last active time as long as some progress has been made.

The client side timeout would probably work better if the dbus call returned immediately, then the service emitted periodic dbus signals indicating that the process was still running and making some progress.  The final status could be returned using another dbus signal.  This would only need to happen for tag write operations, because if you can't read the tags for a file within 15s, you can't play it either.

Comment 4 Per Ångström 2009-04-19 16:18:41 UTC

I agree that the timeouts are not ideal, but what are you going to do about it in the meantime, until you find the optimal solution? This bug is causing trouble for users.

Also, I think it is unfair of Rhythmbox to falsely blame GStreamer in the error message, when the error lies with Rhythmbox.

Comment 5 Jonathan Matthew 2009-04-20 03:41:51 UTC

I'm not going to do anything about it.  This code has been in place for about 3 years now and it only causes problems in what is best described as a corner case.  I don't even have the capacity to test that a fix for this problem actually works.

If you're willing to work on fixing this properly, I'd be happy to help, though.

Comment 6 Per Ångström 2009-04-20 10:24:11 UTC

I beg you to please reconsider changing the timeouts until a longer-term solution is found. It looks like such a solution is quite a lot of work for a small gain, so it'll probably never be done. Instead you can apply the patch, possibly with some other timeout settings than the ones I chose, with very little work on your side and be content that you have removed a very annoying problem for those that happen to stumble over those "corner cases".

If you are going to keep the short timeouts, at least you ought to make an effort to clean up the large and hidden temporary files that are left behind when a metadata-update operation is aborted prematurely. You should also know that the current error message is very unhelpful when several operations are run in parallel, since the error message doesn't say which operation "failed".

For those who have run into this bug, you can issue something like the following commands to find out what files you have to remove:
$ find my_music_directory -name '.*flac?*'

You can then issue the following command to remove them:
$ find my_music_directory -name '.*flac?*' -print0 | xargs -0 rm

Comment 7 Jonathan Matthew 2009-04-21 02:15:59 UTC

Fixing this properly is not that much work, and by all evidence you think that it's quite important.

Here is an outline of an acceptable correct solution:
- add a 'progress' signal to the RBMetadata class
- in rb_metadata_event_loop, run a position query every few seconds if block == TRUE, emitting the progress signal if the position has changed
- attach a handler to the progress signal in rb-metadata-dbus-service that updates the last_active time
- use a longer dbus client timeout for metadata save operations only

Patches to fix the error messages that you find very unhelpful or that you consider to assign blame unfairly are welcome too.

Cleaning up temporary files is actually a completely separate issue, so I opened bug 579679 for it.

Comment 8 Per Ångström 2009-04-21 10:59:29 UTC

This is getting somewhere! Although I might be able to address some of the steps on your list, I'm still a bit reluctant to go all the way, until you have convinced me of the need to monitor the progress so closely. 

Ary you having problems with metadata operations just getting stuck without making progress? If that is the case, I'd suggest going after the real problem instead.

For now, I think "use a longer dbus client timeout for metadata save operations only" is what should bet targeted first, since it makes good sense to differentiate between different types of operations, especially w.r.t. the dataloss aspect.

As for the error messages, I haven't investigated the current mechanism, but I can imagine that passing context information from the point of detection to the point of reporting can involve some added complexity.

Comment 9 Jonathan Matthew 2009-04-21 11:18:22 UTC

It's not so much that it needs to be monitored but that it *can* be monitored.  Bugs anywhere from the OS up through file systems and protocol implementations up through codec and tagging libraries up through GStreamer up to the metadata code can cause the metadata reading or writing process to deadlock or otherwise get stuck.  It's an isolated process with simple inputs and outputs, which makes it practical to monitor it.

The "real problem" is "software has bugs".  This is one specific case where we can isolate and recover from unknown bugs.

Comment 10 Per Ångström 2009-04-22 08:42:49 UTC

Created attachment 133096 [details] [review]
0001-Make-sure-idle-timeout-doesn-t-happen-immediately.patch

I found that ATTENTION_SPAN actually isn't involved in aborting ongoing operations, it just kills the process when it has been idle for a while. However, due to the way the time accounting was done, it would needlessly kill the process after a long operation even though a new message was waiting to be processed. This patch fixes that, and it increases the verbosity in debug mode.

Comment 11 Per Ångström 2009-04-22 08:47:40 UTC

Created attachment 133097 [details] [review]
0002-Double-the-general-metadata-operation-timeout-and-ma.patch

This patch will make all timeouts a bit longer and especially lengthen the metadata-save timeout.

Comment 12 Jonathan Matthew 2009-04-22 08:54:47 UTC

I don't see any reason to change the timeouts for metadata load and can-save operations.

Comment 13 Per Ångström 2009-04-22 09:29:30 UTC

You're free to change that back, but if you do I suggest you increase the lengthening factor (currently 10). I made the general change because I got the impression from the Ubuntu bug that some people were experiencing timeouts not only on metadata updates.

Comment 14 Jonathan Matthew 2009-04-22 09:34:16 UTC

Well, as I said earlier, if you can't read the metadata for a file in 15 seconds, you can't play it either.

I'd like to see some actual measurements to support changing the save timeout to 300s (or whatever).  A few statements of the form 'it takes n seconds to rewrite metadata for a file of size m on a filesystem of type f across a network of type q'.

Comment 15 Per Ångström 2009-04-22 09:49:12 UTC

How about making the error handling less annoying for metadata-read timeouts? Then I wouldn't mind keeping the old timeout for reads.

I have started some measurements, but they are taking a while to get finished ...

Comment 16 Jonathan Matthew 2009-04-22 10:34:02 UTC

I can't see any indication in the launchpad bug report that people are seeing this problem during metadata reading.  A number of the commenters seem to be talking about various other unrelated issues.  In particular, there were bugs in the past that caused the metadata reader to deadlock on various non-music files, which causes the calling process to wait for the entire timeout.  Increasing the timeout increases the impact of this sort of problem.

What's annoying about the handling of metadata read timeouts?  What can usefully be fixed there?  Why bother importing songs that you can't actually play?

Regarding your first patch: it turns out that the timeout is only invoked during the metadata save operation because of the particular way we run the event loop.  The GStreamer documentation about the function we're using says "You should never use this function, since it is pure evil." .. doing this properly (using gst_bus_timed_pop instead of gst_bus_poll) would actually prevent the timeout from being called.

The above steps (adding the 'progress' signal etc.) would still be necessary to properly monitor the metadata save process from the client side.

Comment 17 Per Ångström 2009-04-22 10:41:17 UTC

The following measurements are of a few of my FLAC files (changing the artist attribute):

Size            CF card (USB 1.0)   Removable drive (USB 2.0)
-------         ---------           ---------
170 MB (40:20)  > 150 s             5 s
51  MB (11:32)  140 s               1 s
25  MB ( 6:02)  38 s                < 1 s

I admit I'm confused by the good performance on the removable drive, since that is the drive where my collection resides, the collection that Rhythmbox has had so much trouble with. There must be some other factors in play that sometimes slow everything down. 

On the other hand, the dismal performance on the CF card is even worse that I had expected.

Comment 18 Per Ångström 2009-04-22 11:06:16 UTC

OK, so let's not bother with changing anything for metadata reads.

As for saves, I admit that I don't like having a hard-coded timeout value at all, and the best thing would be to continue as long as there is progress. On the other hand, I'm not sure that the payoff of monitoring progress is great enough to offset the added complexity. I also get the feeling that we are trying to detect a problem (metadata-save operations getting stuck) that we're not sure is actually happening.

I think that patch 0002, with some possible adjustments to the timeout values, should be sufficient for now. I am not eager to delve into this more than I've done so far.

Comment 19 Bastien Nocera 2009-04-22 11:22:24 UTC

Monitoring progress means that the metadata saves would never get interrupted, and is obviously better than tweaking timeouts. I don't think the code would be that hard to implement. Instead of sending a result, send a progress report, and back off the timeout (so you get at least one progress report within the timeout).

Comment 20 Per Ångström 2009-04-22 15:59:04 UTC

I haven't programmed the dbus API, so correct me if I'm wrong. But anyway I think implementing monitoring means you can't use dbus_connection_send_with_reply_and_block since that's a blocking call. Instead you have to use a non-blocking call to start the operation and go into a message-receiving mode until the operation is finished. On the worker side, code has to be written to send progress messages at sensible intervals.

I'm not saying that this is hard, but it will probably take a few iterations to get right, and I can think of a number of new opportunities for failure that don't exist in the current implementation (besides giving up too soon). The increase in message traffic will also have a performance aspect (maybe marginal).

So why not just live with the blocking call and a long timeout for the time being, knowing that you can implement monitoring easily if you start experiencing stuck metadata operations? You might not gonna need it.

Comment 21 Jonathan Matthew 2009-04-23 02:11:49 UTC

It's pretty clear now that you don't see any point in solving this problem properly and that you're even arguing against it.  We really don't need any further comments to that effect.

Comment 22 Per Ångström 2009-04-23 05:19:07 UTC

Sorry if you're getting that impression. I really am trying to give constructive advice.

I have been thinking of another solution: spawn off a thread on the worker side that will monitor progress and kill the process if it gets stuck. Then you won't have to change the protocol.

Comment 23 Jonathan Matthew 2009-04-23 05:37:40 UTC

That might work, as long as you were careful to ensure that a deadlock in the pipeline wouldn't lock the monitoring thread up too.

Comment 24 Per Ångström 2009-04-23 07:06:50 UTC

I think that more or less the only way the monitoring thread can get stuck is if the worker thread deadlocks while it has a lock on a mutex; if you just keep the lock scope short and simple that risk is very small.

As for the implementation, is GNOME GDK Threads the way to go? I program mainly in C++, and there I use Boost::thread, which I don't think is an option.

Comment 25 Per Ångström 2009-04-23 07:22:01 UTC

GNOME GDK Threads looks a bit too lean (missing condition variables). I think http://library.gnome.org/devel/glib/stable/glib-Threads.html is more feature-complete.

Comment 26 Per Ångström 2009-04-23 08:28:42 UTC

I added some debugging output to rb-metadata-gst.c:
diff --git a/metadata/rb-metadata-gst.c b/metadata/rb-metadata-gst.c
index 7a66b27..9d0d323 100644
--- a/metadata/rb-metadata-gst.c
+++ b/metadata/rb-metadata-gst.c
@@ -964,11 +964,14 @@ rb_metadata_event_loop (RBMetaData *md, GstElement *element, gboolean block)
 	while (!done && !md->priv->eos) {
 		GstMessage *message;
 
-		if (block)
+		if (block) {
+			rb_debug("pang got blocking message");
 			message = gst_bus_poll (bus, GST_MESSAGE_ANY, -1);
-		else
+		}
+		else {
+			rb_debug("pang got non-blocking message");
 			message = gst_bus_pop (bus);
-
+		}
 		if (message == NULL) {
 			gst_object_unref (bus);
 			return;
@@ -1357,7 +1360,9 @@ rb_metadata_save (RBMetaData *md, GError **error)
 	gst_element_set_state (pipeline, GST_STATE_PLAYING);
 
 	rb_metadata_event_loop (md, GST_ELEMENT (pipeline), TRUE);
+rb_debug("pang after event loop");
 	if (gst_element_set_state (pipeline, GST_STATE_NULL) == GST_STATE_CHANGE_ASYNC) {
+rb_debug("pang after gst_element_set_state (1)");
 		if (gst_element_get_state (pipeline, NULL, NULL, 3 * GST_SECOND) != GST_STATE_CHANGE_SUCCESS) {
 			g_set_error (error,
 				     RB_METADATA_ERROR,
@@ -1375,9 +1380,11 @@ rb_metadata_save (RBMetaData *md, GError **error)
 		GFile *src;
 		GFile *dest;
 
+rb_debug("pang before close");
 		if (g_output_stream_close (stream, NULL, &io_error) == FALSE) {
 			goto gio_error;
 		}
+rb_debug("pang after close");
 		g_object_unref (stream);
 		stream = NULL;

And here is the filtered debug output (operating on a 51 MB FLAC file on a slow medium):

(10:16:27) [0xa7a710] [_handle_message] rb-metadata-dbus-service.c:381: handling metadata service message: save
(10:16:27) [0xa7a710] [rb_metadata_event_loop] rb-metadata-gst.c:968: pang got blocking message
(10:16:27) [0xa7a710] [rb_metadata_event_loop] rb-metadata-gst.c:968: pang got blocking message
(10:16:27) [0xa7a710] [rb_metadata_event_loop] rb-metadata-gst.c:968: pang got blocking message
(10:16:27) [0xa7a710] [rb_metadata_event_loop] rb-metadata-gst.c:968: pang got blocking message
(10:16:27) [0xa7a710] [rb_metadata_event_loop] rb-metadata-gst.c:968: pang got blocking message
(10:16:27) [0xa7a710] [rb_metadata_event_loop] rb-metadata-gst.c:968: pang got blocking message
(10:16:27) [0xa7a710] [rb_metadata_event_loop] rb-metadata-gst.c:968: pang got blocking message
(10:16:27) [0xa7a710] [rb_metadata_event_loop] rb-metadata-gst.c:968: pang got blocking message
(10:16:27) [0xa7a710] [rb_metadata_event_loop] rb-metadata-gst.c:968: pang got blocking message
(10:16:27) [0xa7a710] [rb_metadata_event_loop] rb-metadata-gst.c:968: pang got blocking message
(10:16:27) [0xa7a710] [rb_metadata_event_loop] rb-metadata-gst.c:968: pang got blocking message
(10:16:27) [0xa7a710] [rb_metadata_event_loop] rb-metadata-gst.c:968: pang got blocking message
(10:16:27) [0xa7a710] [rb_metadata_event_loop] rb-metadata-gst.c:968: pang got blocking message
(10:16:27) [0xa7a710] [rb_metadata_event_loop] rb-metadata-gst.c:968: pang got blocking message
(10:16:27) [0xa7a710] [rb_metadata_event_loop] rb-metadata-gst.c:968: pang got blocking message
(10:16:27) [0xa7a710] [rb_metadata_event_loop] rb-metadata-gst.c:968: pang got blocking message
(10:17:44) [0xa7a710] [rb_metadata_save] rb-metadata-gst.c:1363: pang after event loop
(10:18:03) [0xa7a710] [rb_metadata_save] rb-metadata-gst.c:1383: pang before close
(10:18:03) [0xa7a710] [rb_metadata_save] rb-metadata-gst.c:1387: pang after close
(10:18:03) [0xa7a710] [rb_metadata_event_loop] rb-metadata-gst.c:972: pang got non-blocking message
(10:18:05) [0xa7a710] [_handle_message] rb-metadata-dbus-service.c:391: done handling metadata service message

My conclusion is that most of the time is spent waiting in gst_element_set_state, after the event loop has run. Are you afraid that function will get stuck, too?

Comment 27 Jonathan Matthew 2009-04-23 09:41:41 UTC

How are you arriving at that conclusion?  10:16:27 to 10:17:44 (77 seconds) is spent running the event loop, 10:17:44 to 10:18:03 (19 seconds) is spent shutting down the pipeline and closing the output stream.  That said, I don't think we can assume that any operation at all is safe.

Comment 28 Per Ångström 2009-04-23 10:06:46 UTC

Sorry, I didn't see that final time gap before the end of the event loop. But still, we have a lot of time that as far as I can see is not covered by any hook function, when we don't really know what is happening.

Comment 29 Per Ångström 2009-04-23 16:53:24 UTC

Created attachment 133194 [details] [review]
0004-Implement-a-watchdog-thread-for-monitoring-metadata.patch

[PATCH] Implement a watchdog thread for monitoring metadata saves.

The watchdog thread is started at the beginning of metadata-save operation
and is stopped when the operation is finished. While it is running, the
watchdog needs periodic updates to its timeout period before the current
period expires, otherwise it will exit the entire process.

Having a number of small timeouts inside the algorithm removes the
need for a coarse external timeout. However, the new timeouts are
not as fine-grained as could be desired, since there are some steps
in the process that can take many seconds when dealing with a collection
that resides on a slow medium.

I have tested out the chosen timeouts on my slow CF Card reader and a USB 2.0 card reader. But I cannot guarantee that they are long enough for even worse cases. And some people will doubtless think they are too long.

Comment 30 Per Ångström 2009-04-24 08:41:38 UTC

(In reply to comment #7)
> Patches to fix the error messages that you find very unhelpful or that you
> consider to assign blame unfairly are welcome too.

I have opened bug 580081 for this purpose.

Comment 31 Per Ångström 2009-04-28 19:07:20 UTC

Created attachment 133519 [details] [review]
Refinements-to-watchdog-logic.patch

This patch adds some refinement to "0004-Implement-a-watchdog-thread-for-monitoring-metadata.patch".

Comment 32 Jonathan Matthew 2009-04-29 07:51:35 UTC

Bus messages on their own aren't a reliable way of detecting progress in the pipeline.  In general you'll get a set of messages at the start (state changes, tags, and a few others), then nothing until the end.

To monitor the pipeline usefully, you should insert a progressreport element into the pipeline (setting silent=TRUE and probably update-freq=1), which posts bus messages at a given interval when data is flowing.

I don't know why setting the pipeline state to NULL is taking so long.  It shouldn't be doing much work in there.  You should investigate this, maybe there's something we can improve there.

Patch 0001: Instead of doing this, replace gst_bus_poll with gst_bus_timed_pop.
Patch 0002: fine, whatever, but it's not going in without the rest
Patch 0004: as above; also, you need to match your coding style to the existing code.

What's the point of your latest patch?  It seems to make things more complicated and I can't see any reason to do so.

Comment 33 Per Ångström 2009-04-29 10:01:39 UTC

I'll look at your suggestions when I get the time. As for the coding style, I have tried to conform; please give concrete advice.

The main reason for the follow-up patch is safety: since the worker thread doesn't wait (potentially time-consuming) for the watchdog thread to finish, there is a slight risk that the old thread lingers when a new thread is started and starts monitoring again, with bad effects. The counter prevents that from happening. I know that it is an unlikely situation, since the time is probably sufficiently long between stopping one watchdog and starting another, but with multiple threads one has to be very careful not to rely on timing alone to take care of synchronization. I'm sorry I didn't take care of that the first time around.

The other reason is efficiency: I noticed that even when the watchdog is not running, timeouts are set, locking a mutex in the process. The patch breaks out of that before locking the mutex.

Comment 34 Per Ångström 2009-04-29 12:19:46 UTC

Regarding patch 0001-Make-sure-idle-timeout-doesn-t-happen-immediately.patch ; c->last_active is not updated often enough, which causes the process to think it is idle when it is not. The patch improves the situation somewhat but not enough. Replacing gst_bus_poll with gst_bus_timed_pop doesn't seem to make much of a difference here, but I probably haven't grasped the full extent of your thinking. However, since the current behavior is only inefficient (creating a new process too often) but not fatal, I'll leave it as it is.

Comment 35 Per Ångström 2009-04-29 14:18:34 UTC

I made a quick and dirty test with a progressreport object, but I'm still getting a lot of time (it looks like it's dependent on the file size) that isn't covered by any message, while setting the pipeline state to NULL.

I'm not sufficiently familiar with GStreamer to be able to investigate what is happening at that time, but I could guess that the delay has got something to do with writing the updated file to disk, since the file operations that follow seem to be very quick. I think this is something we have accept, that I/O on a slow medium will cause a considerable delay that isn't easily monitored and which should not be aborted due to timeouts.

Comment 36 Per Ångström 2009-05-01 08:40:43 UTC

Below is the output from a test run on a big FLAC file on a slow medium. The results are disappointing: even with progressreport enabled and update-freq=1, I'm still getting very long "blackouts" while running the message loop. I have to make the timeouts ridiculously long to avoid running into timeouts on big files. On the other hand, I have not ever seen GStreamer getting completely stuck, it's always the timeouts that cause dataloss.

I'm sure that by digging even further into the message flow you might be able to build a state machine that can help you in finding sensible timeouts for some of the steps in the process, but in the end there will always be the case where the execution time depends on the I/O speed, which means that any timeout you choose can be too short.

My suggestion is you scrap the whole monitoring idea and just disable timeouts for metadata-save operations. If you want to, I can make a patch for that. If you still want to pursue the monitoring approach, you can use my patches as the groundwork.

Output:
(09:50:42) [0x1324590] [rb_watchdog_thread_fn] rb-metadata-gst.c:1340: watchdog_thread starting: 1
(09:50:42) [0x1125710] [rb_metadata_bus_handler] rb-metadata-gst.c:960: message of type state-changed from giostreamsink
(09:50:42) [0x1125710] [rb_set_watchdog_timeout_internal] rb-metadata-gst.c:1314: rb_set_watchdog_timeout: 120
(09:50:42) [0x1125710] [rb_metadata_bus_handler] rb-metadata-gst.c:960: message of type state-changed from flactag
(09:50:42) [0x1125710] [rb_set_watchdog_timeout_internal] rb-metadata-gst.c:1314: rb_set_watchdog_timeout: 120
(09:50:42) [0x1125710] [rb_metadata_bus_handler] rb-metadata-gst.c:960: message of type state-changed from urisrc
(09:50:42) [0x1125710] [rb_set_watchdog_timeout_internal] rb-metadata-gst.c:1314: rb_set_watchdog_timeout: 120
(09:50:42) [0x1125710] [rb_metadata_bus_handler] rb-metadata-gst.c:960: message of type state-changed from progressreport
(09:50:42) [0x1125710] [rb_set_watchdog_timeout_internal] rb-metadata-gst.c:1314: rb_set_watchdog_timeout: 120
(09:50:42) [0x1125710] [rb_metadata_bus_handler] rb-metadata-gst.c:960: message of type state-changed from pipeline
(09:50:42) [0x1125710] [rb_set_watchdog_timeout_internal] rb-metadata-gst.c:1314: rb_set_watchdog_timeout: 120
(09:50:42) [0x1125710] [rb_metadata_bus_handler] rb-metadata-gst.c:960: message of type state-changed from flactag
(09:50:42) [0x1125710] [rb_set_watchdog_timeout_internal] rb-metadata-gst.c:1314: rb_set_watchdog_timeout: 120
(09:50:42) [0x1125710] [rb_metadata_bus_handler] rb-metadata-gst.c:960: message of type state-changed from giostreamsink
(09:50:42) [0x1125710] [rb_set_watchdog_timeout_internal] rb-metadata-gst.c:1314: rb_set_watchdog_timeout: 120
(09:50:42) [0x1125710] [rb_metadata_bus_handler] rb-metadata-gst.c:960: message of type state-changed from urisrc
(09:50:42) [0x1125710] [rb_set_watchdog_timeout_internal] rb-metadata-gst.c:1314: rb_set_watchdog_timeout: 120
(09:50:42) [0x1125710] [rb_metadata_bus_handler] rb-metadata-gst.c:960: message of type state-changed from progressreport
(09:50:42) [0x1125710] [rb_set_watchdog_timeout_internal] rb-metadata-gst.c:1314: rb_set_watchdog_timeout: 120
(09:50:42) [0x1125710] [rb_metadata_bus_handler] rb-metadata-gst.c:960: message of type state-changed from pipeline
(09:50:42) [0x1125710] [rb_set_watchdog_timeout_internal] rb-metadata-gst.c:1314: rb_set_watchdog_timeout: 120
(09:50:42) [0x1125710] [rb_metadata_bus_handler] rb-metadata-gst.c:960: message of type async-done from pipeline
(09:50:42) [0x1125710] [rb_set_watchdog_timeout_internal] rb-metadata-gst.c:1314: rb_set_watchdog_timeout: 120
(09:50:42) [0x1125710] [rb_metadata_bus_handler] rb-metadata-gst.c:960: message of type new-clock from pipeline
(09:50:42) [0x1125710] [rb_set_watchdog_timeout_internal] rb-metadata-gst.c:1314: rb_set_watchdog_timeout: 120
(09:50:42) [0x1125710] [rb_metadata_bus_handler] rb-metadata-gst.c:960: message of type state-changed from giostreamsink
(09:50:42) [0x1125710] [rb_set_watchdog_timeout_internal] rb-metadata-gst.c:1314: rb_set_watchdog_timeout: 120
(09:50:42) [0x1125710] [rb_metadata_bus_handler] rb-metadata-gst.c:960: message of type state-changed from flactag
(09:50:42) [0x1125710] [rb_set_watchdog_timeout_internal] rb-metadata-gst.c:1314: rb_set_watchdog_timeout: 120
(09:50:42) [0x1125710] [rb_metadata_bus_handler] rb-metadata-gst.c:960: message of type state-changed from urisrc
(09:50:42) [0x1125710] [rb_set_watchdog_timeout_internal] rb-metadata-gst.c:1314: rb_set_watchdog_timeout: 120
(09:50:42) [0x1125710] [rb_metadata_bus_handler] rb-metadata-gst.c:960: message of type state-changed from progressreport
(09:50:42) [0x1125710] [rb_set_watchdog_timeout_internal] rb-metadata-gst.c:1314: rb_set_watchdog_timeout: 120
(09:50:42) [0x1125710] [rb_metadata_bus_handler] rb-metadata-gst.c:960: message of type state-changed from pipeline
(09:50:42) [0x1125710] [rb_set_watchdog_timeout_internal] rb-metadata-gst.c:1314: rb_set_watchdog_timeout: 120
(09:50:42) [0x1125710] [rb_set_watchdog_timeout_internal] rb-metadata-gst.c:1314: rb_set_watchdog_timeout: 120
(09:52:42) [0x1324590] [rb_watchdog_thread_fn] rb-metadata-gst.c:1359: watchdog_thread timed out

Comment 37 Per Ångström 2009-05-02 17:46:14 UTC

Created attachment 133815 [details] [review]
0001-No-timeout-for-metadata-save.patch

Going back to fixing the bug "metadata-dbus operations arbitrarily cut off after 15 seconds", this patch removes the timeout entirely for metadata-save operations and nothing more. Other timeouts are left unchanged. This is all that needs to be done to fix this bug.

All other patches are either obsoleted by this patch or don't belong to this bug.

Comment 38 Jonathan Matthew 2009-05-08 08:53:57 UTC

I think you were actually pretty close.  Judging by the output above, the progressreport element wasn't correctly linked into the pipeline, so it wasn't posting progress messages.

I've replaced gst_bus_poll with gst_bus_timed_pop, which has exactly the effect I expected it to.  gst_bus_poll runs the mainloop in order to process messages, while gst_bus_timed_pop doesn't.  No mainloop, no timeout.

I don't think it's reasonable to disable the timeout entirely.  Based on the numbers above I've increased the timeout to 120 seconds.

Comment 39 thpublic+nospam 2011-01-25 13:14:21 UTC

I get this error too, so I logged the problem with:

GST_DEBUG=*:5 rhythmbox 2>dbg.log

Per the instructions @ https://bugzilla.gnome.org/show_bug.cgi?id=627678

After executing/closing multiple song properties, RB will stall - or at least
the GUI will become unresponsive, occasionally resulting in this error.  I
noticed it is more likely to trigger if RB is playing audio and the status
graphic is animating.

RB v0.13.1

gst-inspect-0.10 fakesink | grep Version
  Version:        0.10.30
gst-inspect-0.10 playbin2 | grep Version
  Version:        0.10.30
gst-inspect-0.10 autoaudiosink | grep Version
  Version:        0.10.25

The compressed log is 201mb.  Can this bug tracking system accept that
attachment size?

Comment 40 Jonathan Matthew 2011-01-25 13:24:13 UTC

No, but it wouldn't be useful anyway.

Comment 41 ville.ranki 2012-05-03 13:31:48 UTC

I am still getting 

"Error while saving song information
Timeout was reached"

On Rhythmbox 2.96 when tagging multiple files on a network share. 
I suppose this related to this bug.

Comment 42 ville.ranki 2013-11-05 11:40:13 UTC

"Timeout was reached" is still happening with rb 3.0.1 when tagging files over sftp network share. I haven't seen "Internal GStreamer problem" in long time.

Can this be marked "confirmed"?

Comment 43 GNOME Infrastructure Team 2018-05-24 14:16:04 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/rhythmbox/issues/720.