GNOME Bugzilla – Bug 754824
Handle seek errors
Last modified: 2018-09-21 17:52:42 UTC
Some webdav servers doesn't implement range requests, which are necessary for a seek support. GVfsHttpInputStream doesn't handle this case. Consequently g_seekable_tell returns a requested offset, however g_input_stream_read returns bytes from beginning of a file... For more info see downstream bugreport: https://bugzilla.redhat.com/show_bug.cgi?id=1259746
Created attachment 311053 [details] [review] http: Return error if seek was not successful Return error if Accept-Ranges header field is not received as a reply for Range header field. This changes are needed to avoid data corruption when reading...
It would be best to return false from g_seekable_can_seek, but I am afraid this isn't possible without another blocking request...
Dan, can you look at please?
Comment on attachment 311053 [details] [review] http: Return error if seek was not successful >+ g_task_return_new_error (task, >+ SOUP_HTTP_ERROR, >+ priv->msg->status_code, >+ "Error seeking in stream"); priv->msg->status_code is going to be 200 in the example you gave, so that's not really a very good status. Something in G_IO_ERROR might be better. Does this actually make LibreOffice *work*, or does it just change the error from "file is corrupt" to "could not open file"? The alternative would be to just notice that you got a 200 rather than a 206, and then just skip bytes until you get to the part that the caller had requested. Though that would be pretty slow with a large file and lots of seeking back and forth. You'd probably want to cache the file to disk on the first failed seek(), and then you could make conditional requests on future reads after that to make sure it hasn't changed remotely, and return chunks from the locally-cached file if it hasn't. (In your copious free time... :)
(In reply to Dan Winship from comment #4) > Comment on attachment 311053 [details] [review] [review] > http: Return error if seek was not successful > > >+ g_task_return_new_error (task, > >+ SOUP_HTTP_ERROR, > >+ priv->msg->status_code, > >+ "Error seeking in stream"); > > priv->msg->status_code is going to be 200 in the example you gave, so that's > not really a very good status. Something in G_IO_ERROR might be better. Ok, probably G_IO_ERROR_FAILED is the best... > Does this actually make LibreOffice *work*, or does it just change the error > from "file is corrupt" to "could not open file"? It is even worse, libreoffice doesn't expect such behavior and crashes with exception after several tries. However libreoffice crashes also when you return FALSE from g_seekable_can_seek, so it needs some love to avoid this... However we can't just pretend that everything is allright... > The alternative would be to just notice that you got a 200 rather than a > 206, and then just skip bytes until you get to the part that the caller had > requested. Though that would be pretty slow with a large file and lots of > seeking back and forth. You'd probably want to cache the file to disk on the > first failed seek(), and then you could make conditional requests on future > reads after that to make sure it hasn't changed remotely, and return chunks > from the locally-cached file if it hasn't. (In your copious free time... :) Actually I proposed something like this in my "free time": https://bugzilla.gnome.org/show_bug.cgi?id=728375#c2 It is pretty simple and it was intended for backends based on libraries, which doesn't provide streaming at all. It downloads whole file on open using pull method and consequently provides read and seek methods on temporary file. However we can't do this generally, maybe it can be limited for small files only and for big ones return error, but this is dump... I've already recommended to one libreoffice guy that they should download whole file in temp and then use it, because opening bigger files from webdav using libreoffice takes a really long time, e.g. opening 1.5MB file: libreoffice dav://file --- 2min30sec gvfs-copy dav://file temp && libreoffice temp --- 5sec It takes such a long time, because they are almost all the time seeking somewhere. I have to look at on libreoffice codes...
Created attachment 311120 [details] [review] http: Return error if seek was not successful Error code changed to G_IO_ERROR_FAILED. Marked error message for translation.
Comment on attachment 311120 [details] [review] http: Return error if seek was not successful OK. Maybe instead call soup_message_headers_get_content_range(), and make sure you actually got the range (or at least, the starting byte) you expected. That's not much more difficult, and more reliable.
Returning failure from g_seekable_seek: As long as there is no dedicated error code to indicate the failure is due to a server's broken support of Accept-Ranges, it is hard for clients to determine upon such a failed g_seekable_seek whether to retry reading/buffering the whole file, or whether to propagate the failure out to their respective clients. For LibreOffice, that would probably mean that we no longer rely on calls to g_seekable_seek at all, but to instead always read and buffer the whole file. This might have negative impact on performance (in case there are large in-between chunks that could be avoided reading either immediately upon opening or at all), but may indeed just as well have positive impact instead, given LO's notorious seeking and re-reading during it's content type detection phase.
(In reply to Stephan Bergmann from comment #8) > Returning failure from g_seekable_seek: As long as there is no dedicated > error code to indicate the failure is due to a server's broken support of > Accept-Ranges, it is hard for clients to determine upon such a failed > g_seekable_seek whether to retry reading/buffering the whole file, or > whether to propagate the failure out to their respective clients. And we don't want to force every application to implementing buffering on their own anyway.
Created attachment 311451 [details] [review] http: Return error if seek was not successful Modified to check starting byte of the range using soup_message_headers_get_content_range.
Comment on attachment 311451 [details] [review] http: Return error if seek was not successful looks good but now I've made you miss the code freeze...
So I have made a test with skipping bytes by opening the 1.5MB file in LO. There is comparison: libreoffice dav://file (skipping) --- 6min libreoffice dav://file (range requests) --- 2min30sec gvfs-copy dav://file temp && libreoffice temp --- 5sec I think that already opening such "big" file in LO from dav using range requests is almost unusable and user probably close the LO before file is load, so consequently the version with skipping is absolutely unusable... I do not really plan to implement the caching for it. I suppose there isn't much servers without range requests support. Stephan already pushed fix to cache the file in LO instead of seeking anyway: http://cgit.freedesktop.org/libreoffice/core/commit/?id=93a0696e74e96e5a5ca821f33cb791b87e876f49 However I am thinking about pushing the skipping code, but it has to be limited only for small files, I don't know it depends on the use case, maybe something like 5MB... Dan, do you think it is reasonable?
Created attachment 311613 [details] [review] http: Remove outdated sync methods Async and sync methods differ one from other. Async methods have more error checking. We don't need the sync methods, because http and dav backends don't use them. Therefor remove the sync methods.
Created attachment 311614 [details] [review] http: Skip data if range requests are not supported
(In reply to Ondrej Holy from comment #12) > However I am thinking about pushing the skipping code, but it has to be > limited only for small files, I don't know it depends on the use case, maybe > something like 5MB... > > Dan, do you think it is reasonable? Not using it for large files definitely sounds right. I'm not sure where the cutoff for "small" should be though. (In a perfect implementation you'd measure the latency and bandwidth to the server and etc etc etc, but I'm not sure there's any simple way to approximate that.) The problem with having it work for small files but not large ones is that rather than "app works with files on some servers but not with files on other servers", you now get "app works with all files on some servers and small files on other servers, but once the file gets too large, the app suddenly stops working"...
Comment on attachment 311613 [details] [review] http: Remove outdated sync methods this part makes sense anyway
Comment on attachment 311614 [details] [review] http: Skip data if range requests are not supported seems like it should work, though it's not going to get tested all that much so it could easily break in the future...
Comment on attachment 311451 [details] [review] http: Return error if seek was not successful commit cb950723da8b6817dbe894fed6a78fa77c932d16
Comment on attachment 311613 [details] [review] http: Remove outdated sync methods commit 4302a595da3bece9ef89fb7342766f5a3c2ce256
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gvfs/issues/260.