Bug 623615 – It should convert gunichars to strings

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 623615 - It should convert gunichars to strings


Summary:	It should convert gunichars to strings


Status:	RESOLVED FIXED

Product:	pygobject
Classification:	Bindings
Component:	introspection
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Nobody's working on this now (help wanted and appreciated)
QA Contact:	Python bindings maintainers

URL:
Whiteboard:

Depends on:
Blocks:	623618 623938

Reported:	2010-07-05 16:21 UTC by Ignacio Casal Quinteiro (nacho)
Modified:	2010-11-29 20:03 UTC

See Also:	621234
GNOME target:	---
GNOME version:	---

Attachments
patch (3.96 KB, patch) 2010-11-21 11:23 UTC, Paolo Borelli	none	Details \| Review
patch (5.11 KB, patch) 2010-11-21 15:28 UTC, Paolo Borelli	committed	Details \| Review
[gi] add check for UNICHAR (1.55 KB, patch) 2010-11-23 00:18 UTC, johnp	none	Details \| Review
Fixes check and tests (3.07 KB, patch) 2010-11-23 18:17 UTC, johnp	committed	Details \| Review

Description Ignacio Casal Quinteiro (nacho) 2010-07-05 16:21:46 UTC

I think gunichars should be converted to strings of length 1. This was also done by pygtk

Comment 1 johnp 2010-09-07 02:29:43 UTC

I would +1 the patch if it is submitted

Comment 2 Paolo Borelli 2010-11-21 11:23:58 UTC

Created attachment 174943 [details] [review]
patch

Support for unichar was added to g-i in commit 

http://git.gnome.org/browse/gobject-introspection/commit/?id=c8940a11562fd7b888595c6298e39836192fa3d7

This patch adds support for it in pygobject.


Some comments about the patch:

1 - I added the the a /*TODO*/ about checking the string len in _pygi_g_type_info_check_object... but to do it we would need tp decode to utf8 etc, so basically completely duplicating the logic is later done to perform the actual conversion. I still do not understand why pygi does this two-steps logic.

2 - I would have preferred to put the test in the marshalling tests along all the the other basic types, but walters put the g-i test function in regress.c

Comment 3 Paolo Borelli 2010-11-21 15:28:26 UTC

Created attachment 174964 [details] [review]
patch

I forgot to patch also pygi-info. The provious comments are still valid.

Comment 4 johnp 2010-11-23 00:18:53 UTC

Created attachment 175078 [details] [review]
[gi] add check for UNICHAR

Comment 5 johnp 2010-11-23 00:26:39 UTC

I added a patch for checking a GUNICHAR which may or may not be wrong depending on how we read treat GUNICHARs.  Right now you are treating it like a UTF-8 string.  Is this correct or should it be treated more like how I treat UINT8 where we expect interfaces to return a single UINT8 or an array of UINT8's?

If gunichar foo() is equivalent to gunichar *foo() then your patch looks correct.  However if gunichar *foo() follows the array code path your patch will fail.

Comment 6 Paolo Borelli 2010-11-23 12:17:16 UTC

for what is worth I started off from this gjs patch which treats unichars as utf8

http://git.gnome.org/browse/gjs/commit/?id=34fdc293117109f003967aec397b0e87a33c9f53

Comment 7 Paolo Borelli 2010-11-23 12:22:36 UTC

Review of attachment 175078 [details] [review]:

::: gi/pygi-argument.c
@@ +395,3 @@
+            }
+
+            if (size != 1) {

I think this is wrong.

Unless I misunderstood py api docs PyString_GET_SIZE will give you the size is bytes.

A unichar can be more than a byte and yet be a single utf8 character.
What we need to check is that we are getting a single valid utf8 character.

Comment 8 johnp 2010-11-23 18:17:31 UTC

Created attachment 175120 [details] [review]
Fixes check and tests

git-bz barfs on unicode characters because of the http python module so I posted this manually.

The check now transforms strings in to unicode before calling GET_SIZE in order to get the string length, not the byte array length.

I also fixed the tests to work in py3 and py2 as well as reflect the new checks which does not allow empty gunichars to be passed.  Conceptually this is correct but in reality do we want to allow "" to be a valid unichar?  What does that map to?

As far as the original patch is concerned, I believe it to be correct though we don't support arrays of UNICHAR.  I think that just maps to UTF-8 anyway so we should be good.

Comment 9 Paolo Borelli 2010-11-23 21:51:05 UTC

Review of attachment 175120 [details] [review]:

Thanks for updating the patch. It looks good to me now.