Bug 744392 – Optimise async I/O on GResourceFile streams

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 744392 - Optimise async I/O on GResourceFile streams


Summary:	Optimise async I/O on GResourceFile streams


Status:	RESOLVED OBSOLETE

Product:	glib
Classification:	Platform
Component:	gio
Version:	2.43.x
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2015-02-12 13:09 UTC by Philip Withnall
Modified:	2018-05-24 17:25 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Philip Withnall 2015-02-12 13:09:09 UTC

It was pointed out on the mailing list that async I/O calls on GResourceFile streams create worker threads pointlessly, since the backing data for a GResourceFile is already in memory.

https://mail.gnome.org/archives/desktop-devel-list/2015-February/msg00074.html

I haven’t investigated fully (and don’t have time to; this bug is just so the problem doesn’t get forgotten), but here are two suggestions for fixes:

 1. Implement the async vfuncs for GResourceFileInputStream and GResourceFileOutputStream to return results directly.

 2. Create a new GMemoryBased interface and implement it on GResourceFileInputStream and GResourceFileOutputStream. The base stream classes could then implement a fast path for objects implementing that interface, just as with GFileDescriptorBased. Better suggestions welcome for the name. Suggested API:
    GBytes *g_memory_based_get_bytes (GMemoryBased *self);

Comment 1 Colin Walters 2015-02-12 14:29:49 UTC

This is a general topic for GIO async, but see also:

http://lwn.net/Articles/619326/

If GIO first tried preadv2 with RWF_NONBLOCK and only if that failed fell back to a threadpool approach, it would likely improve performance quite a bit.

Comment 2 Allison Karlitskaya (desrt) 2015-02-12 21:16:03 UTC

I have a pretty high desire to abuse g_pollable_input_stream_read_nonblocking() here...

There are a few things that I'm not crazy about, though:

 1) I don't like how detecting the 'would block' case requires creation of
    GError* objects

 2) we can't really implement the is_readable() part of the interface for
    files


I wonder if we could do a second go at a subset of the GPollableInputStream interface that implements the one call we want: "read if possible without blocking" and avoids GError in the common "would block" case.

Comment 3 Allison Karlitskaya (desrt) 2015-02-12 21:31:38 UTC

In fact, let's just add a new vfunc to the GInputStream interface:

gboolean
g_input_stream_read_nonblocking (GInputStream  *stream,
                                 gpointer       buffer,
                                 gsize         *count,   /* inout */
                                 GError       **error);


On bytes read: updates *count with bytes read, returns TRUE

On "would block": updates *count with -1, returns TRUE

On other error: returns %FALSE, error is set



The default implementation of that function would be to return %TRUE with count equal to -1.


We could add a similar interface for OutputStream as well, but I consider it to be substantially less useful there.


The only thing that we miss here is a way to detect that a particular input stream doesn't support non-blocking reads so that callers can avoid repeatedly banging their head up against read_nonblocking().  I don't think this is a large problem, however, since getting -1 returned to you from the default implementation will happen pretty quickly...

Comment 4 Philip Withnall 2015-02-13 09:09:05 UTC

(In reply to Ryan Lortie (desrt) from comment #2)
> I have a pretty high desire to abuse
> g_pollable_input_stream_read_nonblocking() here...

I’m not sure that’s the right approach. For example, if I have an async function my_library_foobaz_async(GInputStream *some_stream, GAsyncReadyCallback blah, …), which reads from the input stream:
 • I want it to work for all implementations of input stream, yielding on the input stream read where appropriate; but
 • I still don’t want to spawn a new thread (etc.) if the input stream happens to be a GResourceFileInputStream.

With the read_nonblocking() approach, I essentially end up with the code:

    g_input_stream_read_nonblocking(…)
    if (would_block)
        g_input_stream_read_async(…)

either in my_library_foobaz_async(), or in g_input_stream_read_async(). If the latter, this optimisation never gets used for classes which have already reimplemented the read_async vfunc (though that’s a pretty weak argument).

Also, in this specific case, it would be nicer to return a GBytes for GResourceFile, since that makes it easier for the calling code to use without copying.

(In reply to Ryan Lortie (desrt) from comment #3)
> In fact, let's just add a new vfunc to the GInputStream interface:

…that all said, here are some specific comments on this proposal:

> gboolean
> g_input_stream_read_nonblocking (GInputStream  *stream,
>                                  gpointer       buffer,
>                                  gsize         *count,   /* inout */
>                                  GError       **error);
> 
> 
> On bytes read: updates *count with bytes read, returns TRUE

=> count should be a gssize*, but that then doesn’t match with g_input_stream_read().

> On "would block": updates *count with -1, returns TRUE

How about: returns TRUE, error is *not* set?

> We could add a similar interface for OutputStream as well, but I consider it
> to be substantially less useful there.

When we did this with libnice we ended up with asymmetric input and output stream APIs and it continues to pain me. The difference between blocking and non-blocking writes is a little more important there because they involve round trips and potentially connection establishment, rather than being dumped into a send buffer and forgotten about,

> The only thing that we miss here is a way to detect that a particular input
> stream doesn't support non-blocking reads so that callers can avoid
> repeatedly banging their head up against read_nonblocking().  I don't think
> this is a large problem, however, since getting -1 returned to you from the
> default implementation will happen pretty quickly...

I think it’s a bigger problem than you think. If a caller has multiple chunks of data to read, they’re going to end up doing a read_nonblocking() call to establish whether it works, then looping over that or jumping to a read_async() call. I can foresee people getting this wrong.

Comment 5 Philip Withnall 2015-02-13 09:17:44 UTC

(In reply to Philip Withnall from comment #4)
> (In reply to Ryan Lortie (desrt) from comment #2)
> > I have a pretty high desire to abuse
> > g_pollable_input_stream_read_nonblocking() here...
> 
> I’m not sure that’s the right approach. 

…but then it does have the advantage of working for GFileInputStream reads on /proc, for example. If we could work that into some general solution, that would be cool, though I’m getting a bit off-topic for GResourceFile.

Comment 6 Allison Karlitskaya (desrt) 2015-02-13 11:30:27 UTC

(In reply to Philip Withnall from comment #4)
> => count should be a gssize*, but that then doesn’t match with
> g_input_stream_read().

My mistake, but note that read() returns gssize -- it's not possible to read more than that anyway, so I have no problem changing it to gssize here as well.

> > On "would block": updates *count with -1, returns TRUE
> 
> How about: returns TRUE, error is *not* set?

Never!  ;)

We have a very strong policy of the return value matching error being set or not and we deviate from it only in a few old places...  I don't want to introduce a new one.

> When we did this with libnice we ended up with asymmetric input and output
> stream APIs and it continues to pain me. The difference between blocking and
> non-blocking writes is a little more important there because they involve
> round trips and potentially connection establishment, rather than being
> dumped into a send buffer and forgotten about,

Ya.. I could see 'real users' of the write side popping up quickly now that you mention it...

> I think it’s a bigger problem than you think. If a caller has multiple
> chunks of data to read, they’re going to end up doing a read_nonblocking()
> call to establish whether it works, then looping over that or jumping to a
> read_async() call. I can foresee people getting this wrong.

I consider this to be purely a performance enhancement.  I think adding another function to check the return value of and behave differently only _increases_ people's chances of 'getting it wrong'.

We go from:

read_nonblock()
if (would_block)
  read_async()

to:


earlier, to save the value:
  can_nonblock = can_nonblock()

then:

  if (can_nonblock)
    {
      read_nonblock();
      if (would_block)
        read_async()
    }
  else
    read_async()

or some other variant of the above.

Comment 7 Dan Winship 2015-02-13 12:20:17 UTC

(In reply to Ryan Lortie (desrt) from comment #2)
>  1) I don't like how detecting the 'would block' case requires creation of
>     GError* objects

Yes, that API seemed more consistent with everything else, but it was definitely a bad idea.

(In reply to Ryan Lortie (desrt) from comment #3)
> The only thing that we miss here is a way to detect that a particular input
> stream doesn't support non-blocking reads

  if (!G_IS_POLLABLE_INPUT_STREAM (stream) ||
      !g_pollable_input_stream_can_poll (G_POLLABLE_INPUT_STREAM (stream)))

?

(In reply to Philip Withnall from comment #4)
> With the read_nonblocking() approach, I essentially end up with the code:
> 
>     g_input_stream_read_nonblocking(…)
>     if (would_block)
>         g_input_stream_read_async(…)

Is this really the only use case for the new function? If so, maybe *that* should be the new method. (ie, g_input_stream_read_nonblocking_or_fall_back_to_async(), except hopefully with a better name)

Comment 8 Allison Karlitskaya (desrt) 2015-02-13 13:03:33 UTC

(In reply to Dan Winship from comment #7)
>   if (!G_IS_POLLABLE_INPUT_STREAM (stream) ||
>       !g_pollable_input_stream_can_poll (G_POLLABLE_INPUT_STREAM (stream)))
> 
> ?

The idea is that we could also have this for non-pollable streams.  We do not normally consider file-backed stream to be pollable, for example, and GResourceFile certainly isn't pollable.

> Is this really the only use case for the new function? If so, maybe *that*
> should be the new method. (ie,
> g_input_stream_read_nonblocking_or_fall_back_to_async(), except hopefully
> with a better name)

We can't do this because this function would either need to have a async-ish signature or not.  We don't have a precedent for a call that may call an async callback or may return immediately, and I don't want one.  I'd also like to avoid the unnecessary trip around the mainloop, which is sort of the entire point here -- otherwise we could just implement read_async() on GResourceFile and be done with it.

(In reply to Philip Withnall from comment #4)
> Also, in this specific case, it would be nicer to return a GBytes for
> GResourceFile, since that makes it easier for the calling code to use
> without copying.

I missed it at the time, but this is actually a very good suggestion.  Unfortunately, there are many cases (such as the file IO case) where a GBytes would be the worse thing to do.... if we want to optimally handle both cases, then it's going to get complicated.  It also depends on what the caller would prefer.  Maybe the caller needs to copy the data anyway?  Maybe the caller would create a GBytes with the result anyway?  Maybe either way would be fine...

So where does that put us?  three APIs?

 - preferred nonblocking read style (not supported, bytes, buffer/size)
 - get it via bytes
 - get it via buffer/size

getting a bit out of hand now...

Comment 9 Philip Withnall 2015-02-17 12:39:10 UTC

(In reply to Ryan Lortie (desrt) from comment #8)
> So where does that put us?  three APIs?
> 
>  - preferred nonblocking read style (not supported, bytes, buffer/size)
>  - get it via bytes
>  - get it via buffer/size
> 
> getting a bit out of hand now...

As you say, due to the (sensible) need to keep sync-style and async-style functions separate, it does look like the best we can do is add:

gboolean
g_input_stream_read_nonblocking (GInputStream  *stream,
                                 gpointer       buffer,
                                 gssize        *count,   /* inout */
                                 GError       **error);
and

GBytes *
g_input_stream_read_nonblocking_bytes (GInputStream  *stream,
                                       gsize          count,
                                       GError       **error);

These would take up 2 of the remaining 5 reserved vfunc slots in GInputStream. We would need equivalent g_output_stream_write_nonblocking() and g_output_stream_write_nonblocking_bytes() functions too.

I suppose callers would have the choice of:
 • Knowing they’re using a non-blocking stream, and hence always calling read_nonblocking().
 • Knowing they’re using a blocking stream, and hence always calling read_async().
 • Not being sure, and trying both.

I guess that’s OK.

If that sounds like a good approach I can come up with a patch.

Comment 10 Philip Withnall 2018-02-16 11:10:42 UTC

Something else I just thought about (but which I haven’t tried to integrate into the API proposals above — that’s for next time someone takes a look at this): it should be possible to use vmsplice() in the case we’re splicing from a GMemoryInputStream or GResourceInputStream to a GFileDescriptorBased output stream.

Comment 11 GNOME Infrastructure Team 2018-05-24 17:25:44 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/990.