GNOME Bugzilla – Bug 679438
Python 3
Last modified: 2015-09-30 13:51:06 UTC
Hi all, I just (mostly) finished a gobject-introspection port to windows [1]. This includes: - Building for Python 3.2 (especially g-ir-scanner has been ported to Python 3) - Building with cmake (including a port of introspection.m4) I basically needed this to create the GObject-mainloop from python to write a dbus service (what a long tail in the end ...). Nevertheless, I did a certain amount of work and it would be a shame if it was all lost. I understand that most of you will probably not want switch from autotools to cmake, but just in case, feel free to do so. Anyway, I guess the Python patches should be usable in an autotools world, too ;). There are plenty of small fixes (and probably not all discovered yet). I also rewrote parts of the c-extensions to work with Python 3.2. In general I tried to stay backwards compatible, but might have failed. I know that those patches over there are not in the best pull-request shape, that is mostly because I needed to get the work done. I don't have that much time, but feel free to ask for certain changes to the repo if that helps upstreaming the work. best regards Christoph [1] https://github.com/choeger/gobject-introspection/tree/windows_python3_cmake
Yeah, it requires somebody to break this into smaller, consumable patches. I'm afraid that nobody will have time to find and extract the actual improvements.
(In reply to comment #1) > Yeah, it requires somebody to break this into smaller, consumable patches. I'm > afraid that nobody will have time to find and extract the actual improvements. Yeah, or even just split up python 3 from cmake.
not going to complete this for 3.8, unless somebody shows up and does the work.
*** Bug 691241 has been marked as a duplicate of this bug. ***
*** Bug 669866 has been marked as a duplicate of this bug. ***
dropping of the blocker list.
One thing that could help here is the usage of "six": http://pythonhosted.org/six/ This provides a compatibility shim that can be shipped with GI if we need to support multiple version of Python.
*** Bug 728079 has been marked as a duplicate of this bug. ***
Created attachment 275597 [details] [review] giscanner: Use Python 3 compatible octal literal syntax
Created attachment 275598 [details] [review] giscanner: Use modern exception handling for Python 3 compatibility
Created attachment 275599 [details] [review] giscanner: Use range instead of xrange for Python 3 compatibility
Created attachment 275600 [details] [review] giscanner: Use binary files for comparison utility Explicitly open files for comparison in utils.files_are_identical() in binary mode for reading (rb).
Created attachment 275601 [details] [review] giscanner: Convert map() results to list Convert the results map() calls to a list for Python 3 compatibility. In Python 3, map() returns an iterable "map object" which does not allow indexing or iteration more than once.
Created attachment 275602 [details] [review] giscanner: Use items() instead of iteritems() Replace usage of iteritems() and itervalues() with items() and values() respectively.
Created attachment 275603 [details] [review] giscanner: Use absolute_import for all Python files Use absolute_import to ensure Python 3 compatibility of the code base.
Created attachment 275604 [details] [review] giscanner: Enable "true division" for all Python files Import Python 3 compatible "true division" from the future (PEP 238). This changes the Python 2 classic division which uses floor division on integers to true division. Verfied we don't actually use the division operator anywhere in the code base so this a safety for supporting both Python 2 and 3.
Created attachment 275605 [details] [review] giscanner: Use print as a function for Python 3 compatibility Use future import "print_function" and update relevant uses of print as a function call. See: PEP 3105
Created attachment 275606 [details] [review] giscanner: Replace repr format usage with string formatter Replace occurances of "%r" (repr) in format strings where the intended behaviour is to output a quoted string "'foo'" with explicit usage of "'%s'". This is needed to move the codebase to unicode literals in order to upgrade to Python 3. Python 2 unicode strings are expanded with repr formatting prefixed with a "u" as in "u'foo'" which causes failures for various text formatting scenarios.
Created attachment 275607 [details] [review] giscanner: Use unicode literals in all Python files Add unicode_literals future import which turns any string literal into a unicode string. Return unicode strings from the Python C extension module. Force writing of annotations (g-ir-annotation-tool) to output utf8 encoded data to stdout. This is an initial pass at following the "unicode sandwich" model of programming (http://nedbatchelder.com/text/unipain.html) needed for supporting Python 3.
Created attachment 275608 [details] [review] giscanner: Port scanner extension module to work with Python 3 Define portable macros for use between Python 2 and 3. Replace usage of PyString related functions with PyBytes. Update pygi_source_scanner_parse_macros to support both PyBytes and PyUnicode.
Created attachment 275609 [details] [review] giscanner: Use pickle when cPickle is not available This adds compatibility with Python 3 which removed the cPickle module. Explicitly use binary files for reading and writing the cache.
Created attachment 275610 [details] [review] giscanner: Use builtins module in Python 3 Add conditional import for Python 3's renamed builtins module.
Created attachment 275611 [details] [review] giscanner: Use StringIO instead of cStringIO in Python 2 Replace usage of the Python 2 cStringIO module with StringIO and conditionally use io.StringIO for Python 3. This is needed to build up a unicode version of the XML since cStringIO does not support unicode. Add XMLWriter.get_encoded_xml() which returns a utf-8 encoded bytes object of the XML data. Open files for reading/writing in binary mode since we explicitly encode and decode as utf-8.
Created attachment 275612 [details] [review] giscanner: Decode command output for Python 3 compatibility Decode the output of various subprocess calls assuming ascii encoding.
Created attachment 275613 [details] [review] giscanner: Encode data passed to subprocess.stdin.write ASCII encode bytes sent to subprocess.stdin.write to ensure Python 2 and 3 compatibility.
Created attachment 275614 [details] [review] giscanner: Encode sha1 input for Python 3 compatibility
Created attachment 275615 [details] [review] giscanner: Update namespace sort for Python 3 compatibility Use key function instead of cmp for list.sort which is compatible with both Python 2 and 3. Make sure a list is returned from split function. Don't use identity comparison "is" on strings.
Created attachment 275616 [details] [review] docwriter: Update for Python 3 compatibility Convert the results of various filter() calls to lists. This is needed because filter() returns a generator in Python 3 and len() checks are used on the results (which doesn't work on a generator). Explicitly open resulting files for output in binary mode.
Created attachment 275617 [details] [review] giscanner: Use rich comparison methods for Python 3 compatibility Add lt, le, gt, ge, eq, ne, and hash dunder methods to all classes that implement custom comparisons with __cmp__. This is needed to support Python 3 compatible sorting of instances of these classes. Avoid using @functools.total_ordering which does not work for some of these classes and also is not available in Python 2.6.
Created attachment 275618 [details] [review] giscanner: Sort unknown parameters in error message Sort the parameters displayed for the "unknown parameters" error message. The parameter names are stored in a set which returns a different ordering between Python 2 and 3 (set/dict ordering should not be relied upon anyhow). This fixes test failures in warning tests.
Created attachment 275619 [details] [review] configure.ac: Add --with-python configure flag Add --with-python flag which overrides the $PYTHON environment variable when used.
Created attachment 275620 [details] [review] Change update-glib-annotations to use Python 3
Pushing trivial fixes. Attachment 275597 [details] pushed as 7af0443 - giscanner: Use Python 3 compatible octal literal syntax Attachment 275598 [details] pushed as c83efc8 - giscanner: Use modern exception handling for Python 3 compatibility Attachment 275599 [details] pushed as 1c1039b - giscanner: Use range instead of xrange for Python 3 compatibility
Review of attachment 275612 [details] [review]: I was uncertain about the codec to use when reading from the subprocesses stdin with this patch. We could use the systems default, but I think it should probably be based on what the given commands being run output which needs some research. ASCII has the benefit of giving us an early error during testing, as opposed to making a guess or using the system encoding which may fail later on... We interact with pkgconfig and gcc via. stdin and stdout which I think need to be double checked for what they can encode/decode. Or can we rely on some default?
Review of attachment 275613 [details] [review]: Similar to previous comment, I am unsure about the encoding to use. This is sending data to gcc via. stdin. gcc supposedly defaults to decoding utf8 for input files but I am not sure this is the case when using stdin. I just found [1] which states textual output from the preprocessor using -E does in fact result in utf-8 output, so that at least answers part of my question in the previous patch. [1] http://gcc.gnu.org/onlinedocs/cpp/Character-sets.html
[Mass-moving gobject-introspection tickets to its own Bugzilla product - see bug 708029. Mass-filter your bugmail for this message: introspection20150207 ]
What is holding up getting this patches into master? I'd really like to see Python 3 support and it looks like all the bits are there, no?
(In reply to Marvin Schmidt from comment #37) > What is holding up getting this patches into master?] A proper review.
Review of attachment 275600 [details] [review]: This is correct, if binary vs text mode matters, this will do the right thing, if it does not it is a no-op See https://docs.python.org/3/library/functions.html#open
Review of attachment 275601 [details] [review]: This is correct.
Review of attachment 275602 [details] [review]: This looks correct and should work on both 2 and 3, but it may be better to pick up a dependency on six (or just vendor the relevant bits) to use the `iteritems` function, ex `dd.iteritems()` -> `six.iteritems(dd)` which will give a generator is all cases.
Review of attachment 275603 [details] [review]: Looks fine. Only comment would be to merge all of the `__future__` imports into a single line. I assume there will be a `from __future__ import print_function` coming someplace else down the patch list.
Review of attachment 275604 [details] [review]: +1
Review of attachment 275605 [details] [review]: +1. Same comment as 2 patches above, might be worth merging all of the `__future__` imports into one line.
Review of attachment 275606 [details] [review]: L538 in annotationparser.py seems to have a bug, but it was pre-existing. This seems reasonable.
Review of attachment 275607 [details] [review]: in matplotlib we do use `from __future__ import unicode_literals` and it has caused a good deal of headache, but I don't know if not using it provides any more/less.
Review of attachment 275607 [details] [review]: Other wise, this looks good.
Review of attachment 275609 [details] [review]: +1, looks correct.
Review of attachment 275610 [details] [review]: +1, looks correct.
Review of attachment 275611 [details] [review]: Looks correct.
Review of attachment 275614 [details] [review]: +1 looks fine
Review of attachment 275615 [details] [review]: I don't see where "Make sure a list is returned from split function. " is done. Otherwise looks good.
Review of attachment 275615 [details] [review]: This also fixes an inconsistency in the existing implementation as `self._sort_order(a, b) == self._sort_order(b, a)` if they are both in the namespace. The current code should fall back to checking if both are in the namespace and sorting on `a[2]`. This patch fixes this behavior.
Review of attachment 275616 [details] [review]: L375 adds a sorted call which I think is a change in behavior. I assume this is needed for deterministic output (as in py3.4+ the iteration order of dictionaries is random process-to-process (as the hash seed is randomized per-process)). Other than a minor change in behavior, which is unavoidable due to upstream python changes, +1
Thomas, are you doing any testing of this against git master? If so, I would try to apply the easy ones early, rather than reviewing the whole batch at once then waiting for someone else to apply.
Review of attachment 275617 [details] [review]: A year and a half later, it is probably safe to drop 2.6 support, but that is above my pay-grade on this project. Bunch of duplicated code, but probably not worth making a mix-in or some such for this.
Review of attachment 275618 [details] [review]: in 3.3+ (by default, and user configurable earlier) the ordering depends on a hash seed. +1
Review of attachment 275616 [details] [review]: ^off by one version, 3.3 is when dict iteration order is randomized by default.
Review of attachment 275612 [details] [review]: Maybe consult `locale` here? Isn't this sort of thing very system dependent?
This stack of patches re-based onto current master is at https://github.com/GNOME/gobject-introspection/pull/4 Please advise on g-o's process for this sort of thing?
(In reply to Thomas Caswell from comment #60) > This stack of patches re-based onto current master is at > https://github.com/GNOME/gobject-introspection/pull/4 > > Please advise on g-o's process for this sort of thing? So...I *personally* think bugzilla/splinter is for single patches but it breaks down once you go past 3. If you reference github (or external git repositories) in general that's OK by me.
I played with this a bit and hit: ab9cb31951ea199b326eb543fac8516eb3ccc08e is the first bad commit GISCAN GLib-2.0.gir Caught exception: <type 'exceptions.TypeError'> TypeError("'output_dir' must be a string or None",) > /usr/lib64/python2.7/distutils/ccompiler.py(329)_setup_compile() -> raise TypeError, "'output_dir' must be a string or None" (Pdb) q Before I try to debug this I'm going to go ahead and merge the patches up before that commit, to make some progress here.
(In reply to Thomas Caswell from comment #41) > Review of attachment 275602 [details] [review] [review]: > > This looks correct and should work on both 2 and 3, but it may be better to > pick up a dependency on six (or just vendor the relevant bits) to use the > `iteritems` function, ex `dd.iteritems()` -> `six.iteritems(dd)` which will > give a generator is all cases. Yeah...it's annoying. But g-i isn't really in a performance fast path now, and I'm not sure I want to bring six into the picture yet...so we can just take the overhead of duplicating lists?
Fixed on Python 2 with: diff --git a/giscanner/ccompiler.py b/giscanner/ccompiler.py index a401cfd..481b1e4 100644 --- a/giscanner/ccompiler.py +++ b/giscanner/ccompiler.py @@ -236,7 +236,7 @@ class CCompiler(object): include_dirs=includes, extra_postargs=extra_postargs, output_dir=source_str[tmpdir_idx + 1: - source_str.rfind(os.sep)]) + source_str.rfind(os.sep)].encode('UTF-8')) def link(self, output, objects, lib_args): # Note: This is used for non-libtool builds only! Looking at more issues... SyntaxError: (unicode error) 'rawunicodeescape' codec can't decode bytes in position 17-18: truncated \uXXXX
Okay, fixed that. Pushed all of this to git master. Thomas (and anyone else) - let's do followup bugs/PRs etc. for issues? OK to close this?
Also thanks Simon, you are awesome as usual! Sorry about the long delay...I'm going to try to spend a bit more time on g-i again FWIW.
Awesome! Thanks for the quick response to my trouble making :)