Bug 524740 – Strings should be demarshalled to unicode

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 524740 - Strings should be demarshalled to unicode


Summary:	Strings should be demarshalled to unicode


Status:	RESOLVED WONTFIX

Product:	pyorbit
Classification:	Deprecated
Component:	general
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Eitan Isaacson
QA Contact:	Python bindings maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2008-03-28 03:11 UTC by Eitan Isaacson
Modified:	2008-05-20 14:57 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Proposed patch (449 bytes, patch) 2008-03-28 03:16 UTC, Eitan Isaacson	rejected	Details \| Review

Description Eitan Isaacson 2008-03-28 03:11:54 UTC

If I read this correctly:
http://mail.gnome.org/archives/orbit-list/2002-December/msg00008.html

It is safe to assume that CORBA strings will always be UTF-8.

If we could demarshall CORBA strings to python unicode objects we would avoid a lot of confusion. I am encountering this a lot in Orca.

Comment 1 Eitan Isaacson 2008-03-28 03:16:22 UTC

Created attachment 108156 [details] [review]
Proposed patch

This one-liner creates a python unicode object instead of a string object.
I think the main concern is that this will affect apps that use this library.
It has breakage potential. For example I need to alter pyatspi a bit to accommodate this.

Comment 2 Eitan Isaacson 2008-03-28 03:23:40 UTC

We basically need to make sure to do encode('utf-8') where things need to be 1 character wide. I think.

I also think that this patch is safe for pre-unicode pyorbit too, so that should be a relief.

Comment 3 Gustavo Carneiro 2008-04-12 14:22:20 UTC

(In reply to comment #0)
> If I read this correctly:
> http://mail.gnome.org/archives/orbit-list/2002-December/msg00008.html
> 
> It is safe to assume that CORBA strings will always be UTF-8.

You are not reading it correctly IMHO.  Michael Meeks is only saying we should be using utf-8 encoded strings everywhere.  The CORBA standard only says CORBA_string is mapped to a C char* string, with no assumption being made on the encoding.  You can't decode a string assuming encoding is utf-8; what about non-GNOME apps?

> If we could demarshall CORBA strings to python unicode objects we would avoid a
> lot of confusion. I am encountering this a lot in Orca.
 
Besides not being standard compliant~[1], this patch introduces an API change.  Incidentally, PyGtk, which could more easily switch to unicode strings by default because there is no standard to forbid it, also has this problem and is still using python non-unicode strings because of API compatibility.  The same change in PyORBit would break both the CORBA standard and break backward API, and since PyORBit is part of the GNOME Language Bindings platform it cannot change API without creating a new parallel installable version of itself.

So, sorry, thanks for the patch, but no thanks.

[1] "Both the bounded and the unbounded string type of IDL are mapped to the Python string type.", in "Python Language Mapping, v1.2 November 2002".

Comment 4 Tomas Cerha 2008-05-20 06:19:06 UTC

(In reply to comment #3)
> You are not reading it correctly IMHO.  Michael Meeks is only saying we should
> be using utf-8 encoded strings everywhere.  The CORBA standard only says
> CORBA_string is mapped to a C char* string, with no assumption being made on
> the encoding.

Ok, so at which level can this assumption be made?  If I understand it well (I'm not a Gnome hacker) CORBA doesn't specify that but in Gnome it is always used with UTF-8.  So is there a common point, where decoding could be done safely?  Leaving it up to the application would seem quite unfortunate to me.  And even if we need to do it at the application level, are we safe to always assume UTF-8?

Comment 5 Gustavo Carneiro 2008-05-20 09:46:39 UTC

Sure, GNOME uses UTF-8 everywhere, but PyORBit and ORBit provide a generic CORBA implementation and are not in any way GNOME specific except for the use of GLib as runtime.

Unicode handling has to be left to whatever sits on top of CORBA, application or whatever.

Comment 6 Johan (not receiving bugmail) Dahlin 2008-05-20 13:03:22 UTC

Gustavo: Are anyone actually using ORBit/PyORBit outside of a GNOME-related application in practice?

This is similar to setting the default python encoding in pango/gtk. It's theoretically far from correct, but in practice it'll make it easier to get the common (99%) use cases right.

Or am I missing something?

Comment 7 Gustavo Carneiro 2008-05-20 14:57:46 UTC

I have no idea who's using it.

In any case, returning unicode instead of str objects would be an incompatible API change.  That alone would block this change, so...