GNOME Bugzilla – Bug 794487
gvfs-test: Increase timeout to 10s
Last modified: 2018-09-21 18:19:33 UTC
We run the gvfs testsuite as regression tests automatically in Ubuntu. Unfortunately some of the tests are a bit flaky. In particular the FTP tests often timeout when unmounting. We just decided to experiment with increasing the timeout in the testsuite to 10 seconds to see if that resolves the flakiness. It seems that it does help. For Andreas' analysis and the results after increasing the timeout, see the LP bug <https://bugs.launchpad.net/ubuntu/+source/gvfs/+bug/1713098>. tl;dr it seems to help. Would you please consider this patch?
Created attachment 369870 [details] [review] gvfs-test: Increase timeout to 10s In normal operation some operations - particularly unmounting - can take quite a while. Let's give things a bit longer before giving up. Patch originally by Andreas Hasenack <andreas.hasenack@canonical.com>
I also pushed to https://git.gnome.org/browse/gvfs/log/?h=wip/iainl/tests-increase-timeout
10s wasn't enough. This needs some more digging. All I see in the logs is that unmounting fails because the "filesystem is busy": (...) ftp: <- 0 -- 226 Transfer Complete. ftp: send_reply(0x8ef44072210), failed=0 () ftp: backend_dbus_handler org.gtk.vfs.Mount:Unmount (pid=8092) ftp: g_vfs_job_unmount_new request: 0x74d15c02adf0 ftp: Queued new job 0x8ef4407f740 (GVfsJobUnmount) ftp: send_reply(0x8ef4407f740), failed=1 (File system is busy) ftp: send_reply(0x8ef4407f740), failed=1 (File system is busy) That transfer is from when the test fetches the txt file and checks its contents. All that works. It's the unmount that fails.
Ah, guess we got lucky in the test runs. OK, then this bug should be about investigating why that test sometimes fails. There's a *separate* test failure we see sometimes too <https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-bionic/bionic/i386/g/gvfs/20180319_192159_6d5d9@/log.gz>: test_file_in_system (__main__.Trash) trash:// deletion for system location ... ok ====================================================================== FAIL: test_file_in_home_api (__main__.Trash) trash:// deletion, attributes, restoring for a file in $HOME (API) ---------------------------------------------------------------------- Traceback (most recent call last):
+ Trace 238487
self.assertEqual(self.files_in_trash(), set(['hello_gvfs_tests.txt']))
'hello_gvfs_tests.txt' ====================================================================== FAIL: test_file_in_home_cli (__main__.Trash) trash:// deletion, attributes, restoring for a file in $HOME (CLI) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/gvfs-testbed-script", line 1627, in test_file_in_home_cli self.assertEqual(self.files_in_trash(), set(['hello_gvfs_tests.txt'])) AssertionError: Items in the first set but not the second: 'hello_gvfs_tests.2.txt'
I'm about to get access to a ppc64el system where the unmount failure happens most of the time. Then I will be able to debug interactively I hope.
Review of attachment 369870 [details] [review]: No problem with this since the timeouts used to be always problematic, but as per the following comments it seems it doesn't help anyway... or does it help in some cases?
(In reply to Andreas Hasenack from comment #3) > 10s wasn't enough. This needs some more digging. > > All I see in the logs is that unmounting fails because the "filesystem is > busy": > (...) > ftp: <- 0 -- 226 Transfer Complete. > ftp: send_reply(0x8ef44072210), failed=0 () > ftp: backend_dbus_handler org.gtk.vfs.Mount:Unmount (pid=8092) > ftp: g_vfs_job_unmount_new request: 0x74d15c02adf0 > ftp: Queued new job 0x8ef4407f740 (GVfsJobUnmount) > ftp: send_reply(0x8ef4407f740), failed=1 (File system is busy) > ftp: send_reply(0x8ef4407f740), failed=1 (File system is busy) > > That transfer is from when the test fetches the txt file and checks its > contents. All that works. It's the unmount that fails. Never see this error with the test suite. It means that there are still some ongoing operations... but hard to say way. It might help to obtain the full debug log. Just a note that the two send_reply debug messages are normal when unmounting, respectively, I've just removed the redundant one recently: https://git.gnome.org/browse/gvfs/commit/?id=22e16dcd0cfd07e36439a946b4520a5825fb744e
(In reply to Iain Lane from comment #4) > Ah, guess we got lucky in the test runs. OK, then this bug should be about > investigating why that test sometimes fails. > > There's a *separate* test failure we see sometimes too > <https://objectstorage.prodstack4-5.canonical.com/v1/ > AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-bionic/bionic/i386/g/gvfs/ > 20180319_192159_6d5d9@/log.gz>: > > > > test_file_in_system (__main__.Trash) > trash:// deletion for system location ... ok > > ====================================================================== > FAIL: test_file_in_home_api (__main__.Trash) > trash:// deletion, attributes, restoring for a file in $HOME (API) > ---------------------------------------------------------------------- > Traceback (most recent call last): Hmm, it seems that first test failed because 'hello_gvfs_tests.txt' was not yet detected and consequently was not removed. The following test consequently failed, because the file was added second time 'hello_gvfs_tests.2.txt'... I recall some bug that g_file_enumerate_children doesn't list file which was trashed immediately before this call, but can't find it out right now. However, it should be already fixed somehow...
Created attachment 369907 [details] Test run showing the failure I haven't seen evidence that the greater timeout helped so far, so let's hold off on that for the moment please. Attached is the log of a failed run. The full log, including the testbed setup steps (where one can see the versions involved) is at https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-bionic/bionic/ppc64el/g/gvfs/20180318_055744_2a286@/log.gz I got access to a ppc64el vm which is the architecture where this test has been failing most of the time. Let's see if I can reproduce it there. If not, then it might be something in our build farm that is somehow triggering it and we will need more instrumentation.
Created attachment 369914 [details] [review] job: Add missing send_reply debug messages GVfsJobQueryInfoRead and GVfsJobQueryInfoWrite doesn't print debug info in send_reply as other jobs. Let's add the missing debug messages.
Comment on attachment 369914 [details] [review] job: Add missing send_reply debug messages Attachment 369914 [details] pushed as e9070ff - job: Add missing send_reply debug messages
I don't see anything unusual in the output. Maybe the channel was not closed yet for some reason... Can you please try to add small timeout before unmount, or use G_MOUNT_UNMOUNT_FORCE to see if it helps?
I can't reproduce this manually not even on a ppc hardware, where our build farm hits it most frequently. Laney, should we try G_MOUNT_UNMOUNT_FORCE? The diff would be: diff --git a/test/gvfs-test b/test/gvfs-test index 4537d8a5..c82e13a8 100755 --- a/test/gvfs-test +++ b/test/gvfs-test @@ -278,7 +278,7 @@ class GvfsTestCase(unittest.TestCase): ml = GLib.MainLoop() - mount.unmount_with_operation(Gio.MountUnmountFlags.NONE, None, None, + mount.unmount_with_operation(Gio.MountUnmountFlags.FORCE, None, None, unmount_done, ml) # ensure we are timing out GLib.timeout_add_seconds(5, lambda data: ml.quit(), None) I can upload that to https://launchpad.net/~ahasenack/+archive/ubuntu/gvfs-test-fixes-1713098 to make sure it's not worse (we never saw the failure with ppa builds).
We ran the tests with Gio.MountUnmountFlags.FORCE 10 times and got no failure: https://paste.ubuntu.com/p/kNcRc4Dd3c/ test_anonymous_api_user() passed all times. It's not conclusive, but it's a datapoint.
Created attachment 370047 [details] [review] daemon: Add debug print with blocking job Thanks for testing, so FORCE flag can be used to workaround this, but still would be good to know what is happening there. From the logs, it is not obvious that there are any blocking jobs. Can you please revert the workaround and try this patch whether it prints something useful?
Comment on attachment 370047 [details] [review] daemon: Add debug print with blocking job Attachment 370047 [details] pushed as b8216b7 - daemon: Add debug print with blocking job
I'll apply the debugging/logging patches and push a new build to that ppa
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gvfs/issues/331.