GNOME Bugzilla – Bug 791131
gtk-doc and python3: fixxref failures
Last modified: 2017-12-04 16:06:37 UTC
I'm using gtk-doc together with python3; for this reason I have a number of patches from git added to it. In most cases this works fine, I have a couple build fails observed with errors like: [ 380s] Traceback (most recent call last): [ 380s] File "/usr/bin/gtkdoc-fixxref", line 57, in <module> [ 380s] fixxref.Run(options) [ 380s] File "/usr/share/gtk-doc/python/gtkdoc/fixxref.py", line 114, in Run [ 380s] FixCrossReferences(options) [ 380s] File "/usr/share/gtk-doc/python/gtkdoc/fixxref.py", line 217, in FixCrossReferences [ 380s] FixHTMLFile(options, full_entry) [ 380s] File "/usr/share/gtk-doc/python/gtkdoc/fixxref.py", line 261, in FixHTMLFile [ 380s] lines[i] = re.sub(r'<GTKDOCLINK\s+HREF="([^"]*)"\s*>(.*?)</GTKDOCLINK\s*>', repl_func_with_ix(i), lines[i]) [ 380s] File "/usr/lib64/python3.6/re.py", line 191, in sub [ 380s] return _compile(pattern, flags).sub(repl, string, count) [ 380s] File "/usr/share/gtk-doc/python/gtkdoc/fixxref.py", line 257, in repl_func [ 380s] return MakeXRef(options, file, i + 1, m.group(1), m.group(2)) [ 380s] File "/usr/share/gtk-doc/python/gtkdoc/fixxref.py", line 321, in MakeXRef [ 380s] common.LogWarning(file, line, 'no link for: "%s" -> (%s).' % (id, text)) [ 380s] File "/usr/share/gtk-doc/python/gtkdoc/common.py", line 144, in LogWarning [ 380s] print ("%s:%d: warning: %s" % (filename, line, message)) [ 380s] UnicodeEncodeError: 'ascii' codec can't encode character '\u201c' in position 126: ordinal not in range(128) So it looks like there are a couple things we might have missed yet to get fully py3 compatible
Instead of the plain 'print' we could use logging.warning in common.LogWarning, with the side-effect that the messaes are prefixed with WARNING:root But the logging infrastructure is already setup to cope with the encoding requirements of stdout (which, in rpmbuild, has LANG=C and is not UTF-8 capable by default).
Created attachment 364816 [details] [review] Use logging infrastructure for LogWarning Using pure 'print' statements often fail if we have a message containing UTF-8, but output on a terminal/pipe not supporting UTF-8. Instead of trying to en/decode (and likely fail all the time) use logging.warning This causes a slight change of the format, as the log is prefixed with WARNING:root but that seems better than crashing
Review of attachment 364816 [details] [review]: Can't we setup a special logformat for this so that the output won't change? ::: gtkdoc/common.py @@ +150,2 @@ # TODO: write to stderr + logging.warning ("%s:%d: warning: %s" % (filename, line, message)) The logger will write to stderr, right? Please remove the TODO then as well.
(In reply to Stefan Sauer (gstreamer, gtkdoc dev) from comment #3) > Review of attachment 364816 [details] [review] [review]: > > Can't we setup a special logformat for this so that the output won't change?> Honestly, no idea, but I guess yes. Would probably need a 2nd instance of logger though. > ::: gtkdoc/common.py > @@ +150,2 @@ > # TODO: write to stderr > + logging.warning ("%s:%d: warning: %s" % (filename, line, message)) > > The logger will write to stderr, right? Please remove the TODO then as well. Good point
Okay, let me take over the patch and try a separate logger with a custom logformat.
Created attachment 364831 [details] test script For python2 we do this: https://git.gnome.org/browse/gtk-doc/tree/gtkdoc/common.py#n73 # When redirecting the output on python2 we get UnicodeEncodeError: if not sys.stdout.encoding: import codecs sys.stdout = codecs.getwriter('utf8')(sys.stdout) maybe we just need to expand this for python3. Please see the attached script. I can run this as: python3 encoding.py | cat python2 encoding.py | cat Before changing the LogWarning() I'd like to understand why it fails.
Running the encoding.py inside a limited build environment (where all package builds run): > python3 encoding.py ANSI_X3.4-1968 True ANSI_X3.4-1968 ascii None Traceback (most recent call last):
+ Trace 238211
print(u'\u263a \u263b')
Created attachment 364834 [details] test script for the logger Could you try this one too?
Created attachment 364837 [details] test script for the logger This one seems to pass all scenarios.
ensonic@square:~/projects/test:> python3 logtest.py foo.py:32:hello foo.py:50:☺ ☻ ensonic@square:~/projects/test:> LC_ALL=C python3 logtest.py foo.py:32:hello foo.py:50:☺ ☻ ensonic@square:~/projects/test:> python2 logtest.py foo.py:32:hello foo.py:50:☺ ☻ ensonic@square:~/projects/test:> LC_ALL=C python2 logtest.py foo.py:32:hello foo.py:50:☺ ☻
Created attachment 364921 [details] test script for print And now a version that seems to fix print. Tested by running: > python2 encoding.py UTF-8 True UTF-8 UTF-8 None ☺ ☻ > python2 encoding.py | cat None False UTF-8 UTF-8 None ☺ ☻ > LC_ALL=C python2 encoding.py ANSI_X3.4-1968 True ANSI_X3.4-1968 ANSI_X3.4-1968 None ☺ ☻ and the same for python3.
commit 0cc67bd997d472b9f0a95763fef30aed11b8e6dd (HEAD -> master, origin/master, origin/HEAD) Author: Stefan Sauer <ensonic@users.sf.net> Date: Mon Dec 4 17:04:18 2017 +0100 common: more hacks to avoid UnicodeErrors in print() Handle cases similar to LC_ALL=C. Fixes https://bugzilla.gnome.org/show_bug.cgi?id=791131