GNOME Bugzilla – Bug 622123
Handle functions that take string+len args
Last modified: 2014-02-06 04:02:50 UTC
I am trying to have gtk_text_buffer_insert_text() working. The function takes a string and its len (or optionally -1 if the string is null terminated). jdahlin on irc suggested to annotate them with * @text: (array length=len) (element-type uint8): text in UTF-8 format However it turns out that pygi does not handle this correctly. The attached patch gets this working, but I am pretty sure it needs to be reworked by someone who actually knows how the code works. The patch is inspired by gjs where they also handle this as a special case of arrays, though gjs halso handles arrays of int32 etc
Created attachment 164092 [details] [review] patch
Review of attachment 164092 [details] [review]: This isn't the right approach. We can only allocate a buffer for an out caller-allocates parameter. It should be allocated as a void pointer of correct size and the rest of the argument API will take care of converting it to a pyobject. There is no need to create a g_array. This means it is much more generic and can handle any type. I'll take a stab at it right now. Can you write a test case I can test against?
Scratch that review. I was thinking of another bug.
Comment on attachment 164092 [details] [review] patch >From 0766c1749bd612e854fe418d6365b5281afd5cf3 Mon Sep 17 00:00:00 2001 >From: Paolo Borelli <pborelli@gnome.org> >Date: Sat, 19 Jun 2010 21:46:15 +0200 >Subject: [PATCH] First cut at getting "char arrays" working > >--- > gi/pygi-argument.c | 23 +++++++++++++++++++++++ > 1 files changed, 23 insertions(+), 0 deletions(-) > >diff --git a/gi/pygi-argument.c b/gi/pygi-argument.c >index 76fa58d..2527b29 100644 >--- a/gi/pygi-argument.c >+++ b/gi/pygi-argument.c >@@ -318,6 +318,10 @@ check_number_release: > break; > case GI_TYPE_TAG_ARRAY: > { You should probably also check the element type. If it is a pointer we don't want to allow characters to be dereference as this could be used as a security exploit. Add a test for this by looking in the Everything or GIMarshallingTests modules and finding an interface that takes a funky array. Make sure we can't pass a string. >+ if (PyString_Check (object)) { >+ break; >+ } >+ > gssize fixed_size; > Py_ssize_t length; > GITypeInfo *item_type_info; >@@ -833,6 +837,25 @@ _pygi_argument_from_object (PyObject *object, > GITransfer item_transfer; > Py_ssize_t i; > >+ if (PyString_Check (object)) { >+ /* Allow strings as int8/uint8/int16/uint16 arrays */ >+ gchar *string; >+ Py_ssize_t len; >+ GArray *array; so this looks like it will work for int32 since the array gets resized based on the element type bellow >+ if (PyString_AsStringAndSize (object, &string, &len) < 0) { >+ break; >+ } >+ >+ array = g_array_new (TRUE, FALSE, 1); >+ >+ array->data = g_strndup (string, len);; >+ array->len = len; >+ >+ arg.v_pointer = array; >+ break; >+ } >+ > if (object == Py_None) { > arg.v_pointer = NULL; > break; >-- >1.7.0.1 > Looks pretty good besides the check. Please look at the gjs tests to see what they are doing to check this. We can also add tests to GObject Introspection if needed.
Created attachment 166882 [details] [review] patch Finally took the time to update this patch and address J5 comments. The patch includes unit tests, though one of them uses a specific function added to Everything for which I filed a patch in bug #625717 The removal of the TextView override also requires that the corresponding annotations are added to gtk (bug #625718)
IMHO, this is awful. I'd say gobject-introspection should have a type char, so we could handle it properly. Anyway, I'm going to have a look at the patch soon. Meanwhile, it doesn't apply to pygobject master/HEAD anymore, and I'd appreciate if you could rebase it. Thanks!
Review of attachment 166882 [details] [review]: This patch doesn't take into account fixed-size non-null-terminated strings. But how to check length equality? Should we interpret the length as the size in bytes, which may differ from the length for (u)int16 arrays? Will the passed string always have 8-bit characters? Will it work for Python 3.x, where only Unicode objects remain? As I already said, I don't like this approach at all. To keep consistency, we should also accept characters for regular numeric arguments (and, BTW, why only (u)int(8|16), why not also (u)int(32|64)?) I think the right approach is to introduce a type for char in gobject-introspection. ::: tests/test_everything.py @@ +125,3 @@ + self.assertEquals(10, Everything.test_array_gint8_in([1,2,3,4])) + self.assertEquals(10, Everything.test_array_gint16_in([1,2,3,4])) + self.assertEquals(10, Everything.test_array_gint32_in([1,2,3,4])) Similar tests exist in test_gi.py; no need to add them here.
(In reply to comment #7) > Review of attachment 166882 [details] [review]: > > This patch doesn't take into account fixed-size non-null-terminated strings. > By "fixed-size" do you mean an api that takes a char pointer which is a non-null-terminated string of lenght 3? To be honest I have never seen such an api, I think we should be pragmatic here and do not consider it a blocking issue. IIRC gjs does not handle that either. > But how to check length equality? Should we interpret the length as the size in > bytes, which may differ from the length for (u)int16 arrays? Will the passed > string always have 8-bit characters? Will it work for Python 3.x, where only > Unicode objects remain? > > As I already said, I don't like this approach at all. To keep consistency, we > should also accept characters for regular numeric arguments (and, BTW, why only > (u)int(8|16), why not also (u)int(32|64)?) I think the right approach is to > introduce a type for char in gobject-introspection. > As said on irc I am not so fond of this apprach either but it is the one Johan suggested me and it is also the one that gjs takes. As for (u)int(8|16|32|64) once again I just followed what gjs does. To be honest the only one that makes sense to me is uint8: this really means take a chunk of N bytes. > ::: tests/test_everything.py > @@ +125,3 @@ > + self.assertEquals(10, Everything.test_array_gint8_in([1,2,3,4])) > + self.assertEquals(10, Everything.test_array_gint16_in([1,2,3,4])) > + self.assertEquals(10, Everything.test_array_gint32_in([1,2,3,4])) > > Similar tests exist in test_gi.py; no need to add them here.
(In reply to comment #8) > (In reply to comment #7) > > Review of attachment 166882 [details] [review] [details]: > > > > This patch doesn't take into account fixed-size non-null-terminated strings. > > > > By "fixed-size" do you mean an api that takes a char pointer which is a > non-null-terminated string of lenght 3? To be honest I have never seen such an > api, I think we should be pragmatic here and do not consider it a blocking > issue. IIRC gjs does not handle that either. Right. Annotations allow it, so we should support it. And the code for fixed-size arrays is already there, so you only need to integrate with it. > > But how to check length equality? Should we interpret the length as the size in > > bytes, which may differ from the length for (u)int16 arrays? Will the passed > > string always have 8-bit characters? Will it work for Python 3.x, where only > > Unicode objects remain? > > > > As I already said, I don't like this approach at all. To keep consistency, we > > should also accept characters for regular numeric arguments (and, BTW, why only > > (u)int(8|16), why not also (u)int(32|64)?) I think the right approach is to > > introduce a type for char in gobject-introspection. > > > > As said on irc I am not so fond of this apprach either but it is the one Johan > suggested me and it is also the one that gjs takes. > > As for (u)int(8|16|32|64) once again I just followed what gjs does. To be > honest the only one that makes sense to me is uint8: this really means take a > chunk of N bytes. (u)int8 does not suffer from the length interpretation issue, so if we have to implement this, I'd only support (u)int8, as a replacement of char.
Also, what if you pass a utf8 string with characters that need to be encoded on more than 1 byte?
Created attachment 167316 [details] [review] patch - handle fixed-size - only deal with uint8 arrays as discussed on irc - handle unicode strings
Created attachment 168321 [details] [review] patch updated for the rename of test functions in the corresponding g-i patch
Review of attachment 168321 [details] [review]: ::: gi/pygi-argument.c @@ +851,3 @@ + + array = g_array_new (TRUE, FALSE, 1); + /* Allow strings and unicode strings as byte (uint8) arrays */ The array is leaked here, it should be freed at some point
(In reply to comment #13) > Review of attachment 168321 [details] [review]: > > ::: gi/pygi-argument.c > @@ +851,3 @@ > + > + array = g_array_new (TRUE, FALSE, 1); > + /* Allow strings and unicode strings as byte (uint8) arrays */ > > The array is leaked here, it should be freed at some point note thta also other branches of that function do the same: array = g_array_sized_new (is_zero_terminated, FALSE, item_size, length); so either callers already free the array or we are already leaking even without the patch
Comment on attachment 168321 [details] [review] patch > >diff --git a/gi/pygi-argument.c b/gi/pygi-argument.c >index 8c7c321..f36e702 100644 >--- a/gi/pygi-argument.c >+++ b/gi/pygi-argument.c >@@ -268,11 +268,43 @@ check_number_release: > break; > case GI_TYPE_TAG_ARRAY: > { >+ PyObject *str = NULL; > gssize fixed_size; > Py_ssize_t length; > GITypeInfo *item_type_info; > Py_ssize_t i; > >+ /* Allow strings and unicode strings as byte (uint8) arrays */ >+ if (PyUnicode_Check (object)) >+ str = PyUnicode_AsUTF8String (object); >- else if (PyString_Check (object)) + else if (PYGLIB_PyBytes_Check (object)) >+ str = object; >+ >+ if (str) { >+ GITypeTag element_type; >+ >+ item_type_info = g_type_info_get_param_type (type_info, 0); >+ g_assert (item_type_info != NULL); >+ >+ element_type = g_type_info_get_tag (item_type_info); >+ g_base_info_unref ((GIBaseInfo*) item_type_info); >+ >+ if (element_type == GI_TYPE_TAG_UINT8) { >+ fixed_size = g_type_info_get_array_fixed_size (type_info); >+ if (fixed_size >= 0) { >+ Py_ssize_t l; >+ >- l = PyString_Size (str); + l = PYGLIB_PyBytes_Size (str); >+ if (l != fixed_size) { >+ PyErr_Format (PyExc_ValueError, "String must be %zd bytes, not %zd", >+ fixed_size, l); >+ retval = 0; >@@ -789,6 +822,44 @@ _pygi_argument_from_object (PyObject *object, > GITransfer item_transfer; > Py_ssize_t i; > >+ /* Allow strings and unicode strings as byte (uint8) arrays */ >+ if (PyUnicode_Check (object)) >+ str = PyUnicode_AsUTF8String (object); >- else if (PyString_Check (object)) + else if (PYGLIB_PyBytes_Check (object)) >+ str = object; >+ >+ if (str) { >+ /* Allow strings as byte (uint8) arrays */ >+ GITypeInfo *item_type_info; >+ GITypeTag element_type; >+ gchar *string; >+ Py_ssize_t len; >+ GArray *array; >+ >+ item_type_info = g_type_info_get_param_type(type_info, 0); >+ g_assert(item_type_info != NULL); >+ >+ element_type = g_type_info_get_tag(item_type_info); >+ g_base_info_unref((GIBaseInfo*) item_type_info); >+ >+ if (element_type != GI_TYPE_TAG_UINT8) { >+ break; >+ } >+ >- if (PyString_AsStringAndSize (str, &string, &len) < 0) { + if (PYGLIB_PyBytes_AsStringAndSize (str, &string, &len) < 0) { >+ break; >+ } >+ Make sure to include <pyglib-python-compat.h> if it isn't already
(In reply to comment #14) > (In reply to comment #13) > > Review of attachment 168321 [details] [review] [details]: > > > > ::: gi/pygi-argument.c > > @@ +851,3 @@ > > + > > + array = g_array_new (TRUE, FALSE, 1); > > + /* Allow strings and unicode strings as byte (uint8) arrays */ > > > > The array is leaked here, it should be freed at some point > > note thta also other branches of that function do the same: > > array = g_array_sized_new (is_zero_terminated, FALSE, item_size, > length); > > so either callers already free the array or we are already leaking even without > the patch _pygi_argument_release should be called on this after the call ends, as part of cleanup.
Ping, can we get a revised patch? I would like to add this ASAP.
well, as far as I understood this approach was shot down by walters... or did I miss something?
*** Bug 638470 has been marked as a duplicate of this bug. ***
Created attachment 179706 [details] [review] updated
> ::: gi/pygi-argument.c > @@ +851,3 @@ > + > + array = g_array_new (TRUE, FALSE, 1); > + /* Allow strings and unicode strings as byte (uint8) arrays */ > > The array is leaked here, it should be freed at some point The array itself is freed in pygi-invoke.c line 561. The data is freed when the argument is released, as far as I can tell. The previous version of the patch leaked the initial allocation for the array data, since it just overwrote array->data, but the latest version doesn't. I haven't looked at dealing with unicode problems yet.
Not a blocker, according to J5
Comment on attachment 179706 [details] [review] updated Paolo, thanks for this. I am assuming this patch depends on fixing the annotations in Gtk, i. e. to mark gtk_text_buffer_insert()'s "len" argument as length of "text"? This does not currently seem to be the case. Can you please update this patch to current master?
This ticket should be considered when unifying marshaling code paths outlined in bug 693405. The uint8 special casing is already handled in _pygi_marshal_from_py_array. While a complete unification of array marshaling for these two code paths isn't really feasible at this point, parts of the marshalers can be shared. For instance, we can have a standalone function which marshals a specific sub-set of arrays optimally, like uint8 arrays. Eventually this function should be generalized to take advantage of objects which simply support the python buffer protocol and can easily be translated into a GArray of the annotated item type.
Additional notes. I assume this bug and patch were created prior to the work John did with caching. The marshaling of array arguments for function calls is now handled by _pygi_marshal_from_py_array which already supports the desired behavior. The marshalers living in pygi-argument are used for vfuncs, callbacks, struct fields, and properties, but not function calls. Furthermore changing the annotation at this point suffers the same dilemma as bug 686263 in that it will break API and would no longer support encoding. However the spirit of this ticket is still valid and as previously mentioned, should be handled in a marshaling unification pass.
We support calling functions which take strings/arrays and implicitly fill out the length argument given the annotations are correct.