GNOME Bugzilla – Bug 690525
g_file_replace_contents_async doesn't copy its @contents argument
Last modified: 2014-08-24 09:28:21 UTC
Created attachment 231934 [details] Testcase Calling g_file_replace_contents_async seems to write incorrect data to the file. For data included in file it seems to write at least three different values, alternating every several runs. It writes the correct ammount of data, but data is malformed. This behaviour is reproducible both on Python 2.7 and 3.3.
Reproducible on master(53bc12a87da8), and PyGI 3.6.x. Quoting git bisect: “c7aa0e79dfb4c1092c51ae1464b8414083b4f3fc is the first bad commit”.
Looks more like a problem of bad interaction between introspection and and library rules. As replace_contents_async has transfer-ownership="none" and Gio doesn't ref content array it gets. Now I'm a bit confused about this transfer-ownership business (I guess none means that you're free it any time) so I will not try to decide which project is to blame.
Forgot to mention, Comment 2 should be ignored as it is utter nonsense.
Created attachment 233020 [details] [review] Tests for pygobject
(In reply to comment #2) > Looks more like a problem of bad interaction between introspection and and > library rules. As replace_contents_async has transfer-ownership="none" and Gio > doesn't ref content array it gets. > > Now I'm a bit confused about this transfer-ownership business (I guess none > means that you're free it any time) so I will not try to decide which project > is to blame. In this case I think g_file_replace_contents_async should be making a copy of the input data. Especially in an async situation. That or a binding friendly version needs to be created (something like g_file_replace_contents_async_with_copy) and use a "Rename to" annotation.
Comment on attachment 233020 [details] [review] Tests for pygobject Thanks for the tests! I committed them with a few PEP-8 cleanups, but I disabled the test as it currently breaks the whole testsuite: test_replace_contents_async (test_gio.TestGFile) ... (runtests.py:6679): GLib-GIO-CRITICAL **: g_task_propagate_boolean: assertion `task->result_set == TRUE' failed
(In reply to comment #5) > In this case I think g_file_replace_contents_async should be making a copy of > the input data. Especially in an async situation. That or a binding friendly > version needs to be created (something like > g_file_replace_contents_async_with_copy) and use a "Rename to" annotation. I ran into this in telepathy-glib C code: in <https://bugs.freedesktop.org/show_bug.cgi?id=63402> I was surprised that Chandni had to copy the data, write the copy to the file and keep it around for the duration of the async operation. I think it should copy the data, or perhaps have a version that takes a GBytes. Workaround: use g_file_replace_async(), write to the returned output stream, and explicitly close the output stream.
I think it is important to avoid data copy when possible, we don't know how big is the data going to be written. So I think we should either have a _take() and/or a _bytes() version, but not something that g_memdup() the whole thing. I think we have 2 possibilities for g_file_replace_contents_async(): 1) Don't change it but document clearly that it does not copy the data. 2) Make it g_membup() the data and free it when done. And make it clear in the doc. We could also deprecate it in favor of _take() and/or _bytes() version. In our use case (telepathy-glib) we want to write data coming from DBus. The data is given as GArray so we could do zero-copy... bytes = g_bytes_new_with_free_func(array->data, array->len, g_array_unref, g_array_ref (array)); ... if only dbus-glib wasn't using g_array_free()... With GDBus we could just do g_variant_get_data_as_bytes() to do zero-copy.
Compare with g_output_stream_write_async(). This function (even in its default implementation) requires that the passed-in buffer remains valid until the end of the async call. Since we have prior art here, I think this isn't a bug. We definitely could use better documentation on both of these functions, though, and a GBytes version wouldn't hurt. For binding, we have been talking about adding an 'asynccall' scope for callbacks. Maybe a similar concept could be useful for input arguments as well...
Created attachment 255684 [details] [review] GMainContext: add qdata
Comment on attachment 255684 [details] [review] GMainContext: add qdata Uh. Misfire, obviously.
Created attachment 263319 [details] [review] GFile: add GBytes version of _replace_contents_async()
I fell in the trap again, so I decided to make that patch. This allows doing: bytes = g_bytes_new_with_free_func (data, size, g_free, data); g_file_replace_contents_bytes_async(file, bytes, ...); g_bytes_unref (bytes); Like that we don't have to take care of freeing the data in the callback, and we can even use it without a callback if we don't care about the result.
g_bytes_new_take() even.
Review of attachment 263319 [details] [review]: This bit me too in hotssh. Also, can you add a test? ::: gio/gfile.c @@ +7315,3 @@ + * Note that no copy of @content will be made, so it must stay valid until + * @callback is called. See g_file_replace_contents_bytes_async() for a #GBytes + * version that allows forgetting about the contents immediately. ...that will automatically hold a reference to the contents (without copying) for the duration of the call. ? This might be an evil enough trap that it could go under <warning> @@ +7347,3 @@ + * @user_data: the data to pass to callback function + * + * Same as g_file_replace_contents_async() but taking a #GBytes input instead. "but takes" ::: gio/gfile.h @@ +1220,3 @@ GCancellable *cancellable, GError **error); +GLIB_AVAILABLE_IN_2_40 Er, no, this needs to stay _IN_ALL. @@ +1230,3 @@ GAsyncReadyCallback callback, gpointer user_data); GLIB_AVAILABLE_IN_ALL And *these* should be _IN_2_40
Created attachment 263331 [details] [review] GFile: add GBytes version of _replace_contents_async()
Created attachment 263332 [details] [review] Document clearly async functions not copying its args Usually async methods copy/ref its arguments so caller can forget about them. g_file_replace_contents_async() and g_output_stream_write_async() are exceptions.
(In reply to comment #15) > Review of attachment 263319 [details] [review]: > > Also, can you add a test? g_file_replace_contents_async() is implemented using g_file_replace_contents_bytes_async() so it's already tested for free. > ::: gio/gfile.c > @@ +7315,3 @@ > + * Note that no copy of @content will be made, so it must stay valid until > + * @callback is called. See g_file_replace_contents_bytes_async() for a > #GBytes > + * version that allows forgetting about the contents immediately. > > ...that will automatically hold a reference to the contents (without copying) > for the duration of the call. Sounds better, changed. > This might be an evil enough trap that it could go under <warning> Did that. > @@ +7347,3 @@ > + * @user_data: the data to pass to callback function > + * > + * Same as g_file_replace_contents_async() but taking a #GBytes input instead. > > "but takes" fixed > ::: gio/gfile.h > @@ +1220,3 @@ > GCancellable *cancellable, > GError **error); > +GLIB_AVAILABLE_IN_2_40 > > Er, no, this needs to stay _IN_ALL. > > @@ +1230,3 @@ > GAsyncReadyCallback callback, > gpointer user_data); > GLIB_AVAILABLE_IN_ALL > > And *these* should be _IN_2_40 right, fixed. Note that my patch does not add g_file_replace_contents_bytes_finish() because g_file_replace_contents_finish() can be used for it. Is that a problem? Not sure what's the policy here.
Review of attachment 263332 [details] [review]: Excellent!
Review of attachment 263331 [details] [review]: Still no test =( But otherwise looks fine.
(In reply to comment #18) > > > g_file_replace_contents_async() is implemented using > g_file_replace_contents_bytes_async() so it's already tested for free. Yeah...but that's not the same thing exactly. Code coverage tools want each one to be called. > Note that my patch does not add g_file_replace_contents_bytes_finish() because > g_file_replace_contents_finish() can be used for it. Is that a problem? Not > sure what's the policy here. We have other cases like that (g_data_input_stream_read_line_utf8() for example), I think it's a bit unfortunate in that vala has to guess at the name of the _finish function, but that's OK by me.
(In reply to comment #21) > (In reply to comment #18) > > > > > > g_file_replace_contents_async() is implemented using > > g_file_replace_contents_bytes_async() so it's already tested for free. > > Yeah...but that's not the same thing exactly. Code coverage tools want each > one to be called. Sorry, I was thinking about this backwards...you're right. (But there's still something to be said for a test for each public API entry point)
Thanks, pushed both commits. All entry points needs to be tested explicitly, really? I think coverage tools can detect when we test A and A calls into B, no?
Coverage tools don't care, but maintainers do. Having each public api tested directly is certainly a goal that we have. For instance, you mentioned a particular point about being able to unref the GBytes before the async is finished - that is exactly the kind of api guarantee that I want to see a test for.
*** Bug 730067 has been marked as a duplicate of this bug. ***