GNOME Bugzilla – Bug 547494
gio.InputStream.read() looks broken
Last modified: 2008-08-27 21:22:55 UTC
As I said on IRC, the function fails to return the whole content, patch in a second.
Created attachment 116453 [details] [review] patch and test
Comment on attachment 116453 [details] [review] patch and test Do you have tests for this?
The problem is... I don't know how I can male a reliable test through http, using a file on gnome server maybe?
(In reply to comment #3) > The problem is... I don't know how I can male a reliable test through http, > using a file on gnome server maybe? > Why can't you use file:// ?
(In reply to comment #4) > Why can't you use file:// ? I see the problem only when using http:// dunno why, just tried again with file:/// everything is fine, using http:// shows the problem (with the patch applied everything is fine, both http and file)
Comment on attachment 116453 [details] [review] patch and test > I see the problem only when using http:// dunno why, just tried again with > file:/// everything is fine, using http:// shows the problem (with the patch > applied everything is fine, both http and file) I guess this is because g_input_stream_read() can stop reading before 'count' is reached over slow connections, e.g. HTTP. Not sure, though, documentation is not clear here. >+ if (bytesread > buffersize) This is absolutely meaningless, because this can never happen. To avoid buffer overflow, g_input_stream_read() never returns more than 'count' (its third argument), which is in our case 'buffersize - bytesread'. I.e. bytesread_new = bytesread_old + chunksize <= <= bytesread_old + (buffersize - bytesread_old) = buffersize So, bytesread <= buffersize unless g_input_stream_read() is broken (badly) itself. It seems that the second change in the patch is meaningless. First (about zero return value) is correct. Gian, can you test what exactly g_input_stream_read() returns when the connection is HTTP? I.e. printf() chunksize, buffersize and bytesread before and after every read. Disable all early stops except for that for zero return value (which means EOF).
Actually, it seems I was right. From documentation of g_input_stream_read_all(): > This function is similar to g_input_stream_read(), except it tries to read as > many bytes as requested, only stopping on an error or end of stream. I'm not sure what the best way to fix would be then. Just replacing the call with g_input_stream_read_all() will solve the HTTP problem. However, then we have a strange difference from GIO itself. How about this: gio.InputStream.read ([count=-1, [cancellable]]) Read up to 'count' (unlimited if 'count' is negative) bytes from the stream. This method can stop at any time after reading at least one byte. See read_all if you need a more reliable way to read stream. Exact behavior strongly depends on actual stream. I.e. for local files it will likely behave like read_all(), but for slow connections (e.g. HTTP) it will often return before 'count' bytes are read. gio.InputStream.read_all ([count=-1, [cancellable]]) Read up to 'count' (unlimited if 'count' is negative) bytes from the stream. Unlike read(), it will only stop if 'count' bytes are read, EOF is reached or if there is an error.
Created attachment 116523 [details] [review] different fix patch - Fix read() method; - Add read_all() method; - Change all tests to use read_all() instead of read(): the latter can in principle return not all data (though for local files will practically return all). Rationale to have both read() and read_all(). The first method is allowed to return early, without reading all requested data. This is important for slow connections, e.g. over HTTP, because allows to have smaller blocked timespans. However, read_all() should be preferred for normal (local) files as it allows to ditch cumbersome loops. We _could_ rename the methods as read_part() and read(), but I think that'd be bad as hardly justified deviation from GIO function names. To summarize, difference between two methods (assuming no reading errors): * When count = -1: - read() reads anything between one byte and everything left until end of stream; - read_all() reads up till the end of the stream, regardless of how much time that takes; * When count > 0: - read() reads anything between one byte and min(count, bytes_left); - read_all() reads min(count, bytes_left).
I really want stream.read() to have an identical behaviour to the builtin python file.read(). If I call stream.read() all data should be returned no matter what, same applies for stream.read(-1). In other words, read_all() becomes unnecessary. To me it's more important to follow existing python behaviour than the C GIO behaviour. read_part() would be okay if we want to add an api similar to inputstream.read() in python.
OK, how about we rename the two functions read_part and read in that order? With explanation in docs that PyGObject prefers to follow Python naming in this case, therefore our read() is read_all() in C GIO, our read_part() is read() in C GIO? BTW, the same probably applies to write().
In _principle_ we could drop the partial read function, but I think it might be useful. I understand Python is not about efficiency in general, but in this case this is about something external, not the language itself.
Created attachment 116805 [details] [review] fix + wrapping (read|write)_all + rename *_all -> * and * -> *_part + docs * read() fixed; * read_all() and write_all() wrapped; * renamed: read_all -> read, write_all -> write, read -> read_part, write -> write_part, for consistency with standard Python file objects methods; * add documentation; * add the tests from Gian's patch. Note that renaming fix is important, as shown by Gian's problems with read() over HTTP connection. It was actually expected, but somewhat confusing. Now, the unsuffixed method always read requested number of bytes / write the whole buffer (save for an error).
Comment on attachment 116805 [details] [review] fix + wrapping (read|write)_all + rename *_all -> * and * -> *_part + docs I'd rather see the renaming done in the .override, eg _wrap..._read() should just call read_all() and read_parts should be %% define:d. read_parts/write_parts should be tested.
I disagree. _wrap_* names explicitly include names of the wrapped functions, so it would be weird to have wrapper for function X call function Y instead. Unless you actually mean to change wrapper names too, but then I don't know how to do that. About needed tests I agree, but I'm not sure how to test the methods best. By their semantics the method might be quite unpredictable and this wouldn't constitute any error. (Well, for local files they actually behave like their predictable counterparts, at least currently, but it seems not proper to assert more in the test than is required in documentation.) Maybe assert that reading/writing in a loop eventually reads/write the whole required contents?
Created attachment 117484 [details] [review] as previous patch, but with more docs and yet more fixes - Add tests for read_part() and write_part(); yet more tests for read(). - Fix read() and read_part() ignoring positive 'count' (new tests test that too).
Sending ChangeLog Sending gio/ginputstream.override Sending gio/gio.defs Sending gio/gio.override Sending gio/goutputstream.override Sending tests/test_gio.py Transmitting file data ...... Committed revision 950. 2008-08-28 Paul Pogonyshev <pogonyshev@gmx.net> * gio/gio.defs (gio.InputStream.read_part): Rename from read(), document. (gio.InputStream.read): Rename from read_all(), document. (gio.OutputStream.write_part): Rename from write(), document. (gio.OutputStream.write): Rename from write_all(), document. * gio/ginputstream.override (_wrap_g_input_stream_read): Fix several bugs. (_wrap_g_input_stream_read_all): New function. * gio/goutputstream.override (_wrap_g_output_stream_write_all): New function. * tests/test_gio.py (TestInputStream.testRead): Add more tests. (TestInputStream.test_read_part): New test. (TestInputStream._read_in_loop): New helper method. (TestOutputStream.test_write_part): New test.