Bug 547494 – gio.InputStream.read() looks broken

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 547494 - gio.InputStream.read() looks broken


Summary:	gio.InputStream.read() looks broken


Status:	RESOLVED FIXED

Product:	pygobject
Classification:	Bindings
Component:	gio
Version:	Git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Nobody's working on this now (help wanted and appreciated)
QA Contact:	Python bindings maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2008-08-12 20:40 UTC by Gian Mario Tagliaretti
Modified:	2008-08-27 21:22 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
patch and test (1.36 KB, patch) 2008-08-12 20:40 UTC, Gian Mario Tagliaretti	rejected	Details \| Review
different fix patch (6.51 KB, patch) 2008-08-13 20:22 UTC, Paul Pogonyshev	none	Details \| Review
*fix + wrapping (read\|write)_all + rename _all -> * and * -> _part + docs* (9.86 KB, patch) 2008-08-17 16:06 UTC, Paul Pogonyshev	reviewed	Details \| Review
as previous patch, but with more docs and yet more fixes (12.25 KB, patch) 2008-08-27 20:02 UTC, Paul Pogonyshev	committed	Details \| Review

Description Gian Mario Tagliaretti 2008-08-12 20:40:18 UTC

As I said on IRC, the function fails to return the whole content, patch in a second.

Comment 1 Gian Mario Tagliaretti 2008-08-12 20:40:51 UTC

Created attachment 116453 [details] [review]
patch and test

Comment 2 Johan (not receiving bugmail) Dahlin 2008-08-12 21:59:34 UTC

Comment on attachment 116453 [details] [review]
patch and test

Do you have tests for this?

Comment 3 Gian Mario Tagliaretti 2008-08-12 22:33:48 UTC

The problem is... I don't know how I can male a reliable test through http, using a file on gnome server maybe?

Comment 4 Johan (not receiving bugmail) Dahlin 2008-08-13 07:39:22 UTC

(In reply to comment #3)
> The problem is... I don't know how I can male a reliable test through http,
> using a file on gnome server maybe?
> 

Why can't you use file:// ?

Comment 5 Gian Mario Tagliaretti 2008-08-13 09:24:43 UTC

(In reply to comment #4)

> Why can't you use file:// ?

I see the problem only when using http:// dunno why, just tried again with file:/// everything is fine, using http:// shows the problem (with the patch applied everything is fine, both http and file)

Comment 6 Paul Pogonyshev 2008-08-13 19:11:22 UTC

Comment on attachment 116453 [details] [review]
patch and test

> I see the problem only when using http:// dunno why, just tried again with
> file:/// everything is fine, using http:// shows the problem (with the patch
> applied everything is fine, both http and file)

I guess this is because g_input_stream_read() can stop reading before 'count' is reached over slow connections, e.g. HTTP.  Not sure, though, documentation is not clear here.

>+            if (bytesread > buffersize)

This is absolutely meaningless, because this can never happen.  To avoid buffer overflow, g_input_stream_read() never returns more than 'count' (its third argument), which is in our case 'buffersize - bytesread'.  I.e.

    bytesread_new = bytesread_old + chunksize <=
        <= bytesread_old + (buffersize - bytesread_old) = buffersize

So,

    bytesread <= buffersize

unless g_input_stream_read() is broken (badly) itself.

It seems that the second change in the patch is meaningless.  First (about zero return value) is correct.  Gian, can you test what exactly g_input_stream_read() returns when the connection is HTTP?  I.e. printf() chunksize, buffersize and bytesread before and after every read.  Disable all early stops except for that for zero return value (which means EOF).

Comment 7 Paul Pogonyshev 2008-08-13 19:24:55 UTC

Actually, it seems I was right.  From documentation of g_input_stream_read_all():

> This function is similar to g_input_stream_read(), except it tries to read as
> many bytes as requested, only stopping on an error or end of stream.

I'm not sure what the best way to fix would be then.  Just replacing the call with g_input_stream_read_all() will solve the HTTP problem.  However, then we have a strange difference from GIO itself.

How about this:

  gio.InputStream.read ([count=-1, [cancellable]])

    Read up to 'count' (unlimited if 'count' is negative) bytes from the
    stream. This method can stop at any time after reading at least one byte.
    See read_all if you need a more reliable way to read stream.

    Exact behavior strongly depends on actual stream.  I.e. for local files
    it will likely behave like read_all(), but for slow connections (e.g.
    HTTP) it will often return before 'count' bytes are read.

  gio.InputStream.read_all ([count=-1, [cancellable]])

    Read up to 'count' (unlimited if 'count' is negative) bytes from the
    stream. Unlike read(), it will only stop if 'count' bytes are read, EOF
    is reached or if there is an error.

Comment 8 Paul Pogonyshev 2008-08-13 20:22:26 UTC

Created attachment 116523 [details] [review]
different fix patch

- Fix read() method;
- Add read_all() method;
- Change all tests to use read_all() instead of read(): the latter can in principle return not all data (though for local files will practically return all).

Rationale to have both read() and read_all().
The first method is allowed to return early, without reading all requested data.  This is important for slow connections, e.g. over HTTP, because allows to have smaller blocked timespans.  However, read_all() should be preferred for normal (local) files as it allows to ditch cumbersome loops.  We _could_ rename the methods as read_part() and read(), but I think that'd be bad as hardly justified deviation from GIO function names.

To summarize, difference between two methods (assuming no reading errors):
* When count = -1:
  - read() reads anything between one byte and everything left until end of stream;
  - read_all() reads up till the end of the stream, regardless of how much time that takes;
* When count > 0:
  - read() reads anything between one byte and min(count, bytes_left);
  - read_all() reads min(count, bytes_left).

Comment 9 Johan (not receiving bugmail) Dahlin 2008-08-14 07:42:04 UTC

I really want stream.read() to have an identical behaviour to the builtin python file.read(). If I call stream.read() all data should be returned no matter what, same applies for stream.read(-1). In other words, read_all() becomes unnecessary.

To me it's more important to follow existing python behaviour than the C GIO behaviour.

read_part() would be okay if we want to add an api similar to inputstream.read() in python.

Comment 10 Paul Pogonyshev 2008-08-14 19:56:24 UTC

OK, how about we rename the two functions read_part and read in that order?  With explanation in docs that PyGObject prefers to follow Python naming in this case, therefore our read() is read_all() in C GIO, our read_part() is read() in C GIO?

BTW, the same probably applies to write().

Comment 11 Paul Pogonyshev 2008-08-14 19:58:45 UTC

In _principle_ we could drop the partial read function, but I think it might be useful.  I understand Python is not about efficiency in general, but in this case this is about something external, not the language itself.

Comment 12 Paul Pogonyshev 2008-08-17 16:06:43 UTC

Created attachment 116805 [details] [review]
fix + wrapping (read|write)_all + rename *_all -> * and * -> *_part + docs

* read() fixed;
* read_all() and write_all() wrapped;
* renamed: read_all -> read, write_all -> write, read -> read_part, write -> write_part, for consistency with standard Python file objects methods;
* add documentation;
* add the tests from Gian's patch.

Note that renaming fix is important, as shown by Gian's problems with read() over HTTP connection.  It was actually expected, but somewhat confusing.  Now, the unsuffixed method always read requested number of bytes / write the whole buffer (save for an error).

Comment 13 Johan (not receiving bugmail) Dahlin 2008-08-19 07:07:10 UTC

Comment on attachment 116805 [details] [review]
fix + wrapping (read|write)_all + rename *_all -> * and * -> *_part + docs

I'd rather see the renaming done in the .override, eg _wrap..._read() should just call read_all() and read_parts should be %% define:d.
read_parts/write_parts should be tested.

Comment 14 Paul Pogonyshev 2008-08-19 21:07:18 UTC

I disagree.  _wrap_* names explicitly include names of the wrapped functions, so it would be weird to have wrapper for function X call function Y instead.  Unless you actually mean to change wrapper names too, but then I don't know how to do that.

About needed tests I agree, but I'm not sure how to test the methods best.  By their semantics the method might be quite unpredictable and this wouldn't constitute any error.  (Well, for local files they actually behave like their predictable counterparts, at least currently, but it seems not proper to assert more in the test than is required in documentation.)  Maybe assert that reading/writing in a loop eventually reads/write the whole required contents?

Comment 15 Paul Pogonyshev 2008-08-27 20:02:44 UTC

Created attachment 117484 [details] [review]
as previous patch, but with more docs and yet more fixes

- Add tests for read_part() and write_part(); yet more tests for read().
- Fix read() and read_part() ignoring positive 'count' (new tests test that too).

Comment 16 Paul Pogonyshev 2008-08-27 21:22:55 UTC

Sending        ChangeLog
Sending        gio/ginputstream.override
Sending        gio/gio.defs
Sending        gio/gio.override
Sending        gio/goutputstream.override
Sending        tests/test_gio.py
Transmitting file data ......
Committed revision 950.

2008-08-28  Paul Pogonyshev  <pogonyshev@gmx.net>

	* gio/gio.defs (gio.InputStream.read_part): Rename from read(),
	document.
	(gio.InputStream.read): Rename from read_all(), document.
	(gio.OutputStream.write_part): Rename from write(), document.
	(gio.OutputStream.write): Rename from write_all(), document.

	* gio/ginputstream.override (_wrap_g_input_stream_read): Fix
	several bugs.
	(_wrap_g_input_stream_read_all): New function.

	* gio/goutputstream.override (_wrap_g_output_stream_write_all):
	New function.

	* tests/test_gio.py (TestInputStream.testRead): Add more tests.
	(TestInputStream.test_read_part): New test.
	(TestInputStream._read_in_loop): New helper method.
	(TestOutputStream.test_write_part): New test.