After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 769532 - Python3, ListStores and 64bit integers
Python3, ListStores and 64bit integers
Status: RESOLVED FIXED
Product: pygobject
Classification: Bindings
Component: introspection
3.20.x
Other Linux
: Normal normal
: ---
Assigned To: Nobody's working on this now (help wanted and appreciated)
Python bindings maintainers
Depends on:
Blocks:
 
 
Reported: 2016-08-04 22:36 UTC by andreastacchiotti
Modified: 2016-08-14 15:26 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Shows the difference between append() and insert_with_valuesv() (2.30 KB, text/plain)
2016-08-04 22:36 UTC, andreastacchiotti
Details

Description andreastacchiotti 2016-08-04 22:36:54 UTC
Created attachment 332753 [details]
Shows the difference between append() and insert_with_valuesv()

I've found a lot of strange issues with python3 and ListStores with 64bit integers, see attached example.

Some outstanding problems:

* Gtk.ListStore(int).append([10**10]) will fail, because 10**10 is bigger than 2^31, despite `int` being the only integer type in python3

* Gtk.ListStore(GObject.TYPE_GUINT64).append([10**10]) works correctly

* Gtk.ListStore(GObject.TYPE_GUINT64).insert_with_valuesv(-1, [0], [10**10]) will succeed, but truncate the inserted value to int32 range

The last example is not an academical test, on python2 it works perfectly (with long's) and it's 8x faster than append.

To recover it in python3 I could do an explicit GObject cast:
addr = GObject.Value(GObject.TYPE_UINT64, addr)
and add this, but this cast loses all the speed that insert_with_valuesv() gains.

Some armchair debugging: python int is mapped to C int32 regardless of py2/3.

I'd be very happy even to have a workaround, if a fix isn't feasible.

Thanks.

OS: Debian sid
Version: python-gi 3.20.1-1
Comment 1 Christoph Reiter (lazka) 2016-08-10 15:39:52 UTC
Some details on what's going on here:

> Gtk.ListStore(int).append([10**10])

PyGObject has some default mappings of Python types to GTypes. As you
suspected Python int gets mapped to GObject.TYPE_INT (long to TYPE_LONG). The
append() function in this case is a Python override which takes the column
type and creates the right GValue and then passes it along.

> Gtk.ListStore(GObject.TYPE_GUINT64).append([10**10])

same as above, but instead of TYPE_INT you use TYPE_UINT64

> Gtk.ListStore(GObject.TYPE_GUINT64).insert_with_valuesv(-1, [0], [10**10])

If you use insert_with_valuesv directly the 10**10 will be converted to a
GValue and because it is an int, default to a TYPE_INT. Only after that will
it be passed to insert_with_valuesv. Arguably this should raise and not
silently truncate.

If you want to use insert_with_valuesv() here you have to set up the GValue
manually. This will still be faster than append() as you remove some branches,
function calls and can use one GObject.Value for inserting multiple rows:


    l = Gtk.ListStore(GObject.TYPE_UINT64)
    value = GObject.Value()
    value.init(GObject.TYPE_UINT64)

    for i in range(100):
        value.set_uint64(i)
        l.insert_with_valuesv(-1, [0], [value])

----

If you want to work with arbitrary Python ints you can also use TYPE_PYOBJECT
which just stores the Python object without converting it to a C integer.
When using a TreeView you'd then have to use a custom cell renderer func which
converts the Python object to text apply it to the cell renderer.

    l = Gtk.ListStore(GObject.TYPE_PYOBJECT)
    value = GObject.Value()
    value.init(GObject.TYPE_PYOBJECT)

    for i in range(100):
        value.set_boxed(i)
        l.insert_with_valuesv(-1, [0], [value])
Comment 2 andreastacchiotti 2016-08-10 16:06:54 UTC
A huge thank you to Christoph Reiter for explaining me what was going on.

Honestly, the fact that the native py2 long behaviour (and its speed) is unobtainable in py3 felt like a huge let down.

Have you though about providing a dummy `fakelong` for py3 which is an `int` on the python side (subclassed from int) but an `int64` on the C side?

I used a workaround similar to the one CR proposed, but I reinstanced the GValue every time, with:

addr = GObject.Value(GObject.TYPE_UINT64, int(addr, 16))

If I instance it only once, I get a large speed boost (thanks a lot!), but it's still slightly slower than py2.

append2: 370 ms
append3: 330 ms

valuesv2: 55 ms
valuesv3: 70 ms
old valuesv3: 150 ms

Any idea on what overhead is still left in py3 (apart the extra GValue.set_uint64() call)?


Relevant code portion (test is done with ~4k lines, string manipulation is irrelevant to timing):


            start_time = timeit.default_timer()
            if misc.PY3K:
                addr = GObject.Value(GObject.TYPE_UINT64)
                off = GObject.Value(GObject.TYPE_UINT64)
            for line in lines:
                line = str(u(line))
                (mid, line) = line.split(']', 1)
                mid = int(mid.strip(' []'))
                (addr_str, off_str, rt, val, t) = list(map(str.strip, line.split(',')[:5]))
                t = t.strip(' []')
                if t == 'unknown':
                    continue
                # `insert_with_valuesv` has the same function of `append`, but it's 7x faster
                # PY3 has problems with int's, so we need a forced guint64 conversion
                # Still 3x faster even with the extra baggage
                if misc.PY3K :
                    addr.set_uint64(int(addr_str, 16))
                    off.set_uint64(int(off_str.split('+')[1], 16))
                else :
                    addr = long(addr_str, 16)
                    off = long(off_str.split('+')[1], 16)
                self.scanresult_liststore.insert_with_valuesv(-1, [0, 1, 2, 3, 4, 5, 6], [addr, val, t, True, off, rt, mid])
                #~ self.scanresult_liststore.append([addr, val, t, True, off, rt, mid])
            print((timeit.default_timer() - start_time)*1000, 'ms')
Comment 3 Christoph Reiter (lazka) 2016-08-10 17:32:27 UTC
I guess the difference is due to Python code vs C code.

I use the following pattern everywhere (classes derived from object are automatically converted to TYPE_PYOBJECT):

    from gi.repository import Gtk, GObject

    class MyEntry(object):
        def __init__(self, addr, off):
            self.addr = addr
            self.off = off

    l = Gtk.ListStore(GObject.TYPE_PYOBJECT)

    for i in range(100):
        l.insert_with_valuesv(-1, [0], [MyEntry(i, -i)])

    for row in l:
        print row[0].addr, row[0].off
Comment 4 andreastacchiotti 2016-08-10 18:52:01 UTC
I may try this last suggestion, but for the sake of the poor guy who'll inherit this code I'll likely stick with my latest approach.

Thank you, C.R.

Is there somewhere I should write about the silent truncation of
Gtk.ListStore(GObject.TYPE_GUINT64).insert_with_valuesv(-1, [0], [10**10]) ?
Comment 5 Christoph Reiter (lazka) 2016-08-12 10:10:59 UTC
(In reply to andreastacchiotti from comment #4)
> Is there somewhere I should write about the silent truncation of
> Gtk.ListStore(GObject.TYPE_GUINT64).insert_with_valuesv(-1, [0], [10**10]) ?

I've opened bug 769789 for this.

Is there anything else you think should be addressed or can we close this issue?
Comment 6 andreastacchiotti 2016-08-14 14:52:40 UTC
I wonder if this suggestion:

> Have you though about providing a dummy `fakelong` for py3 which is an `int` on the python side (subclassed from int) but an `int64` on the C side?

is feasible and where should I post it in that case.

Other than that, this can be closed, thanks for your help.
Comment 7 Christoph Reiter (lazka) 2016-08-14 15:26:06 UTC
(In reply to andreastacchiotti from comment #6)
> I wonder if this suggestion:
> 
> > Have you though about providing a dummy `fakelong` for py3 which is an `int` on the python side (subclassed from int) but an `int64` on the C side?
> 
> is feasible and where should I post it in that case.

I remember some discussion on this, but can't seem to find it right now :/ Feel free to open a new bug for this.

Imo we should prioritize documenting the current behavior first before trying to make it easier. I've started something here: https://pygobject.readthedocs.io/en/latest/gobject.html but my motivation is currently lacking :)

> Other than that, this can be closed, thanks for your help.

OK.