After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 622123 - Handle functions that take string+len args
Handle functions that take string+len args
Status: RESOLVED FIXED
Product: pygobject
Classification: Bindings
Component: introspection
unspecified
Other Linux
: Normal normal
: ---
Assigned To: Nobody's working on this now (help wanted and appreciated)
Python bindings maintainers
: 638470 (view as bug list)
Depends on: 625717
Blocks: 625942
 
 
Reported: 2010-06-19 19:56 UTC by Paolo Borelli
Modified: 2014-02-06 04:02 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch (1.53 KB, patch)
2010-06-19 19:56 UTC, Paolo Borelli
needs-work Details | Review
patch (4.89 KB, patch)
2010-07-31 13:15 UTC, Paolo Borelli
needs-work Details | Review
patch (5.90 KB, patch)
2010-08-07 12:06 UTC, Paolo Borelli
none Details | Review
patch (5.46 KB, patch)
2010-08-19 20:06 UTC, Paolo Borelli
needs-work Details | Review
updated (5.86 KB, patch)
2011-01-31 11:49 UTC, Jonathan Matthew
needs-work Details | Review

Description Paolo Borelli 2010-06-19 19:56:02 UTC
I am trying to have gtk_text_buffer_insert_text() working.

The function takes a string and its len (or optionally -1 if the string is null terminated).

jdahlin on irc suggested to annotate them with 

 * @text: (array length=len) (element-type uint8): text in UTF-8 format

However it turns out that pygi does not handle this correctly.

The attached patch gets this working, but I am pretty sure it needs to be reworked by someone who actually knows how the code works.

The patch is inspired by gjs where they also handle this as a special case of arrays, though gjs halso handles arrays of int32 etc
Comment 1 Paolo Borelli 2010-06-19 19:56:38 UTC
Created attachment 164092 [details] [review]
patch
Comment 2 johnp 2010-06-22 20:27:43 UTC
Review of attachment 164092 [details] [review]:

This isn't the right approach.  We can only allocate a buffer for an out caller-allocates parameter.  It should be allocated as a void pointer of correct size and the rest of the argument API will take care of converting it to a pyobject.  There is no need to create a g_array.  This means it is much more generic and can handle any type.  I'll take a stab at it right now.  Can you write a test case I can test against?
Comment 3 johnp 2010-06-22 21:06:31 UTC
Scratch that review.  I was thinking of another bug.
Comment 4 johnp 2010-06-22 21:24:19 UTC
Comment on attachment 164092 [details] [review]
patch

>From 0766c1749bd612e854fe418d6365b5281afd5cf3 Mon Sep 17 00:00:00 2001
>From: Paolo Borelli <pborelli@gnome.org>
>Date: Sat, 19 Jun 2010 21:46:15 +0200
>Subject: [PATCH] First cut at getting "char arrays" working
>
>---
> gi/pygi-argument.c |   23 +++++++++++++++++++++++
> 1 files changed, 23 insertions(+), 0 deletions(-)
>
>diff --git a/gi/pygi-argument.c b/gi/pygi-argument.c
>index 76fa58d..2527b29 100644
>--- a/gi/pygi-argument.c
>+++ b/gi/pygi-argument.c
>@@ -318,6 +318,10 @@ check_number_release:
>             break;
>         case GI_TYPE_TAG_ARRAY:
>         {

You should probably also check the element type.  If it is a pointer we don't want to allow characters to be dereference as this could be used as a security exploit.  Add a test for this by looking in the Everything or GIMarshallingTests modules and finding an interface that takes a funky array.  Make sure we can't pass a string.  

>+            if (PyString_Check (object)) {
>+                break;
>+            }
>+
>             gssize fixed_size;
>             Py_ssize_t length;
>             GITypeInfo *item_type_info;
>@@ -833,6 +837,25 @@ _pygi_argument_from_object (PyObject   *object,
>             GITransfer item_transfer;
>             Py_ssize_t i;
> 
>+            if (PyString_Check (object)) {
>+                /* Allow strings as int8/uint8/int16/uint16 arrays */
>+                gchar *string;
>+                Py_ssize_t len;
>+                GArray *array;

so this looks like it will work for int32 since the array gets resized based on the element type bellow

>+                if (PyString_AsStringAndSize (object, &string, &len) < 0) {
>+                    break;
>+                }
>+
>+                array = g_array_new (TRUE, FALSE, 1);
>+
>+                array->data = g_strndup (string, len);;
>+                array->len = len;
>+
>+                arg.v_pointer = array;
>+                break;
>+            }
>+
>             if (object == Py_None) {
>                 arg.v_pointer = NULL;
>                 break;
>-- 
>1.7.0.1
>


Looks pretty good besides the check.  Please look at the gjs tests to see what they are doing to check this.  We can also add tests to GObject Introspection if needed.
Comment 5 Paolo Borelli 2010-07-31 13:15:17 UTC
Created attachment 166882 [details] [review]
patch

Finally took the time to update this patch and address J5 comments.

The patch includes unit tests, though one of them uses a specific function added to Everything for which I filed a patch in bug #625717

The removal of the TextView override also requires that the corresponding annotations are added to gtk (bug #625718)
Comment 6 Simon van der Linden 2010-08-02 12:01:36 UTC
IMHO, this is awful. I'd say gobject-introspection should have a type char, so we could handle it properly.

Anyway, I'm going to have a look at the patch soon. Meanwhile, it doesn't apply to pygobject master/HEAD anymore, and I'd appreciate if you could rebase it. Thanks!
Comment 7 Simon van der Linden 2010-08-06 13:55:35 UTC
Review of attachment 166882 [details] [review]:

This patch doesn't take into account fixed-size non-null-terminated strings.

But how to check length equality? Should we interpret the length as the size in bytes, which may differ from the length for (u)int16 arrays? Will the passed string always have 8-bit characters? Will it work for Python 3.x, where only Unicode objects remain?

As I already said, I don't like this approach at all. To keep consistency, we should also accept characters for regular numeric arguments (and, BTW, why only (u)int(8|16), why not also (u)int(32|64)?) I think the right approach is to introduce a type for char in gobject-introspection.

::: tests/test_everything.py
@@ +125,3 @@
+        self.assertEquals(10, Everything.test_array_gint8_in([1,2,3,4]))
+        self.assertEquals(10, Everything.test_array_gint16_in([1,2,3,4]))
+        self.assertEquals(10, Everything.test_array_gint32_in([1,2,3,4]))

Similar tests exist in test_gi.py; no need to add them here.
Comment 8 Paolo Borelli 2010-08-06 21:26:57 UTC
(In reply to comment #7)
> Review of attachment 166882 [details] [review]:
> 
> This patch doesn't take into account fixed-size non-null-terminated strings.
> 

By "fixed-size" do you mean an api that takes a char pointer which is a non-null-terminated string of lenght 3? To be honest I have never seen such an api, I think we should be pragmatic here and do not consider it a blocking issue. IIRC gjs does not handle that either.


> But how to check length equality? Should we interpret the length as the size in
> bytes, which may differ from the length for (u)int16 arrays? Will the passed
> string always have 8-bit characters? Will it work for Python 3.x, where only
> Unicode objects remain?
> 
> As I already said, I don't like this approach at all. To keep consistency, we
> should also accept characters for regular numeric arguments (and, BTW, why only
> (u)int(8|16), why not also (u)int(32|64)?) I think the right approach is to
> introduce a type for char in gobject-introspection.
> 


As said on irc I am not so fond of this apprach either but it is the one Johan suggested me and it is also the one that gjs takes.

As for (u)int(8|16|32|64) once again I just followed what gjs does. To be honest the only one that makes sense to me is uint8: this really means take a chunk of N bytes.

> ::: tests/test_everything.py
> @@ +125,3 @@
> +        self.assertEquals(10, Everything.test_array_gint8_in([1,2,3,4]))
> +        self.assertEquals(10, Everything.test_array_gint16_in([1,2,3,4]))
> +        self.assertEquals(10, Everything.test_array_gint32_in([1,2,3,4]))
> 
> Similar tests exist in test_gi.py; no need to add them here.
Comment 9 Simon van der Linden 2010-08-07 08:43:39 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > Review of attachment 166882 [details] [review] [details]:
> > 
> > This patch doesn't take into account fixed-size non-null-terminated strings.
> > 
> 
> By "fixed-size" do you mean an api that takes a char pointer which is a
> non-null-terminated string of lenght 3? To be honest I have never seen such an
> api, I think we should be pragmatic here and do not consider it a blocking
> issue. IIRC gjs does not handle that either.

Right. Annotations allow it, so we should support it. And the code for fixed-size arrays is already there, so you only need to integrate with it.

> > But how to check length equality? Should we interpret the length as the size in
> > bytes, which may differ from the length for (u)int16 arrays? Will the passed
> > string always have 8-bit characters? Will it work for Python 3.x, where only
> > Unicode objects remain?
> > 
> > As I already said, I don't like this approach at all. To keep consistency, we
> > should also accept characters for regular numeric arguments (and, BTW, why only
> > (u)int(8|16), why not also (u)int(32|64)?) I think the right approach is to
> > introduce a type for char in gobject-introspection.
> > 
>
> As said on irc I am not so fond of this apprach either but it is the one Johan
> suggested me and it is also the one that gjs takes.
>
> As for (u)int(8|16|32|64) once again I just followed what gjs does. To be
> honest the only one that makes sense to me is uint8: this really means take a
> chunk of N bytes.

(u)int8 does not suffer from the length interpretation issue, so if we have to implement this, I'd only support (u)int8, as a replacement of char.
Comment 10 Simon van der Linden 2010-08-07 08:46:06 UTC
Also, what if you pass a utf8 string with characters that need to be encoded on more than 1 byte?
Comment 11 Paolo Borelli 2010-08-07 12:06:11 UTC
Created attachment 167316 [details] [review]
patch

 - handle fixed-size
 - only deal with uint8 arrays as discussed on irc
 - handle unicode strings
Comment 12 Paolo Borelli 2010-08-19 20:06:08 UTC
Created attachment 168321 [details] [review]
patch

updated for the rename of test functions in the corresponding g-i patch
Comment 13 Johan (not receiving bugmail) Dahlin 2010-08-19 20:30:39 UTC
Review of attachment 168321 [details] [review]:

::: gi/pygi-argument.c
@@ +851,3 @@
+
+                array = g_array_new (TRUE, FALSE, 1);
+            /* Allow strings and unicode strings as byte (uint8) arrays */

The array is leaked here, it should be freed at some point
Comment 14 Paolo Borelli 2010-08-19 20:51:57 UTC
(In reply to comment #13)
> Review of attachment 168321 [details] [review]:
> 
> ::: gi/pygi-argument.c
> @@ +851,3 @@
> +
> +                array = g_array_new (TRUE, FALSE, 1);
> +            /* Allow strings and unicode strings as byte (uint8) arrays */
> 
> The array is leaked here, it should be freed at some point

note thta also other branches of that function do the same:

            array = g_array_sized_new (is_zero_terminated, FALSE, item_size, length);

so either callers already free the array or we are already leaking even without the patch
Comment 15 johnp 2010-08-19 21:03:29 UTC
Comment on attachment 168321 [details] [review]
patch


> 
>diff --git a/gi/pygi-argument.c b/gi/pygi-argument.c
>index 8c7c321..f36e702 100644
>--- a/gi/pygi-argument.c
>+++ b/gi/pygi-argument.c
>@@ -268,11 +268,43 @@ check_number_release:
>             break;
>         case GI_TYPE_TAG_ARRAY:
>         {
>+            PyObject *str = NULL;
>             gssize fixed_size;
>             Py_ssize_t length;
>             GITypeInfo *item_type_info;
>             Py_ssize_t i;
> 
>+            /* Allow strings and unicode strings as byte (uint8) arrays */
>+            if (PyUnicode_Check (object))
>+                str = PyUnicode_AsUTF8String (object);   
>-            else if (PyString_Check (object))
+             else if (PYGLIB_PyBytes_Check (object))
>+                str = object;
>+
>+            if (str) {
>+                GITypeTag element_type;
>+
>+                item_type_info = g_type_info_get_param_type (type_info, 0);
>+                g_assert (item_type_info != NULL);
>+
>+                element_type = g_type_info_get_tag (item_type_info);
>+                g_base_info_unref ((GIBaseInfo*) item_type_info);
>+
>+                if (element_type == GI_TYPE_TAG_UINT8) {
>+                    fixed_size = g_type_info_get_array_fixed_size (type_info);
>+                    if (fixed_size >= 0) {
>+                        Py_ssize_t l;
>+
>-                        l = PyString_Size (str);
+                         l = PYGLIB_PyBytes_Size (str);
>+                        if (l != fixed_size) {
>+                            PyErr_Format (PyExc_ValueError, "String must be %zd bytes, not %zd",
>+                                          fixed_size, l);
>+                            retval = 0;


>@@ -789,6 +822,44 @@ _pygi_argument_from_object (PyObject   *object,
>             GITransfer item_transfer;
>             Py_ssize_t i;
> 
>+            /* Allow strings and unicode strings as byte (uint8) arrays */
>+            if (PyUnicode_Check (object))
>+                str = PyUnicode_AsUTF8String (object);
>-            else if (PyString_Check (object))
+             else if (PYGLIB_PyBytes_Check (object))
>+                str = object;
>+
>+            if (str) {
>+                /* Allow strings as byte (uint8) arrays */
>+                GITypeInfo *item_type_info;
>+                GITypeTag element_type;
>+                gchar *string;
>+                Py_ssize_t len;
>+                GArray *array;
>+
>+                item_type_info = g_type_info_get_param_type(type_info, 0);
>+                g_assert(item_type_info != NULL);
>+
>+                element_type = g_type_info_get_tag(item_type_info);
>+                g_base_info_unref((GIBaseInfo*) item_type_info);
>+
>+                if (element_type != GI_TYPE_TAG_UINT8) {
>+                    break;
>+                }
>+
>-                if (PyString_AsStringAndSize (str, &string, &len) < 0) {
+                 if (PYGLIB_PyBytes_AsStringAndSize (str, &string, &len) < 0) {
>+                    break;
>+                }
>+


Make sure to include <pyglib-python-compat.h> if it isn't already
Comment 16 Tomeu Vizoso 2010-08-20 07:36:04 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > Review of attachment 168321 [details] [review] [details]:
> > 
> > ::: gi/pygi-argument.c
> > @@ +851,3 @@
> > +
> > +                array = g_array_new (TRUE, FALSE, 1);
> > +            /* Allow strings and unicode strings as byte (uint8) arrays */
> > 
> > The array is leaked here, it should be freed at some point
> 
> note thta also other branches of that function do the same:
> 
>             array = g_array_sized_new (is_zero_terminated, FALSE, item_size,
> length);
> 
> so either callers already free the array or we are already leaking even without
> the patch

_pygi_argument_release should be called on this after the call ends, as part of cleanup.
Comment 17 johnp 2010-11-12 16:22:25 UTC
Ping, can we get a revised patch?  I would like to add this ASAP.
Comment 18 Paolo Borelli 2010-11-12 16:24:47 UTC
well, as far as I understood this approach was shot down by walters... or did I miss something?
Comment 19 Holger Berndt 2011-01-02 00:15:20 UTC
*** Bug 638470 has been marked as a duplicate of this bug. ***
Comment 20 Jonathan Matthew 2011-01-31 11:49:16 UTC
Created attachment 179706 [details] [review]
updated
Comment 21 Jonathan Matthew 2011-01-31 11:55:09 UTC
> ::: gi/pygi-argument.c
> @@ +851,3 @@
> +
> +                array = g_array_new (TRUE, FALSE, 1);
> +            /* Allow strings and unicode strings as byte (uint8) arrays */
> 
> The array is leaked here, it should be freed at some point

The array itself is freed in pygi-invoke.c line 561.  The data is freed when the argument is released, as far as I can tell.

The previous version of the patch leaked the initial allocation for the array data, since it just overwrote array->data, but the latest version doesn't.

I haven't looked at dealing with unicode problems yet.
Comment 22 Matthias Clasen 2011-03-11 01:50:25 UTC
Not a blocker, according to J5
Comment 23 Martin Pitt 2012-04-22 08:10:57 UTC
Comment on attachment 179706 [details] [review]
updated

Paolo, thanks for this. I am assuming this patch depends on fixing the annotations in Gtk, i. e. to mark gtk_text_buffer_insert()'s "len" argument as length of "text"? This does not currently seem to be the case.

Can you please update this patch to current master?
Comment 24 Simon Feltman 2013-02-27 10:49:29 UTC
This ticket should be considered when unifying marshaling code paths outlined in bug 693405. The uint8 special casing is already handled in _pygi_marshal_from_py_array. While a complete unification of array marshaling for these two code paths isn't really feasible at this point, parts of the marshalers can be shared. For instance, we can have a standalone function which marshals a specific sub-set of arrays optimally, like uint8 arrays. Eventually this function should be generalized to take advantage of objects which simply support the python buffer protocol and can easily be translated into a GArray of the annotated item type.
Comment 25 Simon Feltman 2013-02-27 11:16:13 UTC
Additional notes. I assume this bug and patch were created prior to the work John did with caching. The marshaling of array arguments for function calls is now handled by _pygi_marshal_from_py_array which already supports the desired behavior. The marshalers living in pygi-argument are used for vfuncs, callbacks, struct fields, and properties, but not function calls.

Furthermore changing the annotation at this point suffers the same dilemma as bug 686263 in that it will break API and would no longer support encoding. However the spirit of this ticket is still valid and as previously mentioned, should be handled in a marshaling unification pass.
Comment 26 Simon Feltman 2014-02-06 04:02:50 UTC
We support calling functions which take strings/arrays and implicitly fill out the length argument given the annotations are correct.