Bug 754824 – Handle seek errors

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 754824 - Handle seek errors


Summary:	Handle seek errors


Status:	RESOLVED OBSOLETE

Product:	gvfs
Classification:	Core
Component:	http backend
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gvfs-maint
QA Contact:	gvfs-maint

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2015-09-10 11:03 UTC by Ondrej Holy
Modified:	2018-09-21 17:52 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
http: Return error if seek was not successful (1.72 KB, patch) 2015-09-10 11:05 UTC, Ondrej Holy	none	Details \| Review
http: Return error if seek was not successful (1.84 KB, patch) 2015-09-11 08:56 UTC, Ondrej Holy	none	Details \| Review
http: Return error if seek was not successful (1.84 KB, patch) 2015-09-16 11:50 UTC, Ondrej Holy	committed	Details \| Review
http: Remove outdated sync methods (4.33 KB, patch) 2015-09-18 07:32 UTC, Ondrej Holy	committed	Details \| Review
http: Skip data if range requests are not supported (7.87 KB, patch) 2015-09-18 07:33 UTC, Ondrej Holy	reviewed	Details \| Review

Description Ondrej Holy 2015-09-10 11:03:27 UTC

Some webdav servers doesn't implement range requests, which are
necessary for a seek support. GVfsHttpInputStream doesn't handle this
case. Consequently g_seekable_tell returns a requested offset, however
g_input_stream_read returns bytes from beginning of a file...

For more info see downstream bugreport:
https://bugzilla.redhat.com/show_bug.cgi?id=1259746

Comment 1 Ondrej Holy 2015-09-10 11:05:16 UTC

Created attachment 311053 [details] [review]
http: Return error if seek was not successful

Return error if Accept-Ranges header field is not received as a reply for Range header field. This changes are needed to avoid data corruption when reading...

Comment 2 Ondrej Holy 2015-09-10 11:08:09 UTC

It would be best to return false from g_seekable_can_seek, but I am afraid this isn't possible without another blocking request...

Comment 3 Ondrej Holy 2015-09-10 11:10:50 UTC

Dan, can you look at please?

Comment 4 Dan Winship 2015-09-10 17:54:15 UTC

Comment on attachment 311053 [details] [review]
http: Return error if seek was not successful

>+        g_task_return_new_error (task,
>+                                 SOUP_HTTP_ERROR,
>+                                 priv->msg->status_code,
>+                                 "Error seeking in stream");

priv->msg->status_code is going to be 200 in the example you gave, so that's not really a very good status. Something in G_IO_ERROR might be better.


Does this actually make LibreOffice *work*, or does it just change the error from "file is corrupt" to "could not open file"?

The alternative would be to just notice that you got a 200 rather than a 206, and then just skip bytes until you get to the part that the caller had requested. Though that would be pretty slow with a large file and lots of seeking back and forth. You'd probably want to cache the file to disk on the first failed seek(), and then you could make conditional requests on future reads after that to make sure it hasn't changed remotely, and return chunks from the locally-cached file if it hasn't. (In your copious free time... :)

Comment 5 Ondrej Holy 2015-09-11 08:12:00 UTC

(In reply to Dan Winship from comment #4)
> Comment on attachment 311053 [details] [review] [review]
> http: Return error if seek was not successful
> 
> >+        g_task_return_new_error (task,
> >+                                 SOUP_HTTP_ERROR,
> >+                                 priv->msg->status_code,
> >+                                 "Error seeking in stream");
> 
> priv->msg->status_code is going to be 200 in the example you gave, so that's
> not really a very good status. Something in G_IO_ERROR might be better.

Ok, probably G_IO_ERROR_FAILED is the best...

> Does this actually make LibreOffice *work*, or does it just change the error
> from "file is corrupt" to "could not open file"?

It is even worse, libreoffice doesn't expect such behavior and crashes with exception after several tries. However libreoffice crashes also when you return FALSE from g_seekable_can_seek, so it needs some love to avoid this...

However we can't just pretend that everything is allright...
 
> The alternative would be to just notice that you got a 200 rather than a
> 206, and then just skip bytes until you get to the part that the caller had
> requested. Though that would be pretty slow with a large file and lots of
> seeking back and forth. You'd probably want to cache the file to disk on the
> first failed seek(), and then you could make conditional requests on future
> reads after that to make sure it hasn't changed remotely, and return chunks
> from the locally-cached file if it hasn't. (In your copious free time... :)

Actually I proposed something like this in my "free time":
https://bugzilla.gnome.org/show_bug.cgi?id=728375#c2

It is pretty simple and it was intended for backends based on libraries, which doesn't provide streaming at all. It downloads whole file on open using pull method and consequently provides read and seek methods on temporary file.

However we can't do this generally, maybe it can be limited for small files only and for big ones return error, but this is dump... 

I've already recommended to one libreoffice guy that they should download whole file in temp and then use it, because opening bigger files from webdav using libreoffice takes a really long time, e.g. opening 1.5MB file:

libreoffice dav://file --- 2min30sec
gvfs-copy dav://file temp && libreoffice temp --- 5sec

It takes such a long time, because they are almost all the time seeking somewhere. I have to look at on libreoffice codes...

Comment 6 Ondrej Holy 2015-09-11 08:56:27 UTC

Created attachment 311120 [details] [review]
http: Return error if seek was not successful

Error code changed to G_IO_ERROR_FAILED.
Marked error message for translation.

Comment 7 Dan Winship 2015-09-14 13:55:29 UTC

Comment on attachment 311120 [details] [review]
http: Return error if seek was not successful

OK. Maybe instead call soup_message_headers_get_content_range(), and make sure you actually got the range (or at least, the starting byte) you expected. That's not much more difficult, and more reliable.

Comment 8 Stephan Bergmann 2015-09-14 14:32:03 UTC

Returning failure from g_seekable_seek:  As long as there is no dedicated error code to indicate the failure is due to a server's broken support of Accept-Ranges, it is hard for clients to determine upon such a failed g_seekable_seek whether to retry reading/buffering the whole file, or whether to propagate the failure out to their respective clients.

For LibreOffice, that would probably mean that we no longer rely on calls to g_seekable_seek at all, but to instead always read and buffer the whole file.  This might have negative impact on performance (in case there are large in-between chunks that could be avoided reading either immediately upon opening or at all), but may indeed just as well have positive impact instead, given LO's notorious seeking and re-reading during it's content type detection phase.

Comment 9 Dan Winship 2015-09-14 15:00:15 UTC

(In reply to Stephan Bergmann from comment #8)
> Returning failure from g_seekable_seek:  As long as there is no dedicated
> error code to indicate the failure is due to a server's broken support of
> Accept-Ranges, it is hard for clients to determine upon such a failed
> g_seekable_seek whether to retry reading/buffering the whole file, or
> whether to propagate the failure out to their respective clients.

And we don't want to force every application to implementing buffering on their own anyway.

Comment 10 Ondrej Holy 2015-09-16 11:50:27 UTC

Created attachment 311451 [details] [review]
http: Return error if seek was not successful

Modified to check starting byte of the range using soup_message_headers_get_content_range.

Comment 11 Dan Winship 2015-09-16 13:06:24 UTC

Comment on attachment 311451 [details] [review]
http: Return error if seek was not successful

looks good but now I've made you miss the code freeze...

Comment 12 Ondrej Holy 2015-09-17 17:18:17 UTC

So I have made a test with skipping bytes by opening the 1.5MB file in LO. There is comparison:

libreoffice dav://file (skipping) --- 6min
libreoffice dav://file (range requests) --- 2min30sec
gvfs-copy dav://file temp && libreoffice temp --- 5sec

I think that already opening such "big" file in LO from dav using range requests is almost unusable and user probably close the LO before file is load, so consequently the version with skipping is absolutely unusable...

I do not really plan to implement the caching for it. I suppose there isn't much servers without range requests support.

Stephan already pushed fix to cache the file in LO instead of seeking anyway:
http://cgit.freedesktop.org/libreoffice/core/commit/?id=93a0696e74e96e5a5ca821f33cb791b87e876f49

However I am thinking about pushing the skipping code, but it has to be limited only for small files, I don't know it depends on the use case, maybe something like 5MB...

Dan, do you think it is reasonable?

Comment 13 Ondrej Holy 2015-09-18 07:32:13 UTC

Created attachment 311613 [details] [review]
http: Remove outdated sync methods

Async and sync methods differ one from other. Async methods have more error checking. We don't need the sync methods, because http and dav backends don't use them. Therefor remove the sync methods.

Comment 14 Ondrej Holy 2015-09-18 07:33:09 UTC

Created attachment 311614 [details] [review]
http: Skip data if range requests are not supported

Comment 15 Dan Winship 2015-09-18 12:30:04 UTC

(In reply to Ondrej Holy from comment #12)
> However I am thinking about pushing the skipping code, but it has to be
> limited only for small files, I don't know it depends on the use case, maybe
> something like 5MB...
> 
> Dan, do you think it is reasonable?

Not using it for large files definitely sounds right. I'm not sure where the cutoff for "small" should be though. (In a perfect implementation you'd measure the latency and bandwidth to the server and etc etc etc, but I'm not sure there's any simple way to approximate that.)

The problem with having it work for small files but not large ones is that rather than "app works with files on some servers but not with files on other servers", you now get "app works with all files on some servers and small files on other servers, but once the file gets too large, the app suddenly stops working"...

Comment 16 Dan Winship 2015-09-18 12:30:32 UTC

Comment on attachment 311613 [details] [review]
http: Remove outdated sync methods

this part makes sense anyway

Comment 17 Dan Winship 2015-09-18 12:35:05 UTC

Comment on attachment 311614 [details] [review]
http: Skip data if range requests are not supported

seems like it should work, though it's not going to get tested all that much so it could easily break in the future...

Comment 18 Ondrej Holy 2015-11-26 13:44:09 UTC

Comment on attachment 311451 [details] [review]
http: Return error if seek was not successful

commit cb950723da8b6817dbe894fed6a78fa77c932d16

Comment 19 Ondrej Holy 2015-11-26 13:44:26 UTC

Comment on attachment 311613 [details] [review]
http: Remove outdated sync methods

commit 4302a595da3bece9ef89fb7342766f5a3c2ce256

Comment 20 GNOME Infrastructure Team 2018-09-21 17:52:42 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gvfs/issues/260.