After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 791131 - gtk-doc and python3: fixxref failures
gtk-doc and python3: fixxref failures
Status: RESOLVED FIXED
Product: gtk-doc
Classification: Platform
Component: general
unspecified
Other Linux
: Normal normal
: 1.27
Assigned To: gtk-doc maintainers
gtk-doc maintainers
Depends on:
Blocks:
 
 
Reported: 2017-12-02 13:34 UTC by Dominique Leuenberger
Modified: 2017-12-04 16:06 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Use logging infrastructure for LogWarning (1.09 KB, patch)
2017-12-02 15:16 UTC, Dominique Leuenberger
needs-work Details | Review
test script (371 bytes, text/x-python)
2017-12-02 20:19 UTC, Stefan Sauer (gstreamer, gtkdoc dev)
  Details
test script for the logger (437 bytes, text/x-python)
2017-12-02 21:33 UTC, Stefan Sauer (gstreamer, gtkdoc dev)
  Details
test script for the logger (647 bytes, text/x-python)
2017-12-02 22:17 UTC, Stefan Sauer (gstreamer, gtkdoc dev)
  Details
test script for print (593 bytes, text/x-python)
2017-12-04 15:18 UTC, Stefan Sauer (gstreamer, gtkdoc dev)
  Details

Description Dominique Leuenberger 2017-12-02 13:34:13 UTC
I'm using gtk-doc together with python3; for this reason I have a number of patches from git added to it.

In most cases this works fine, I have a couple build fails observed with errors like:

[  380s] Traceback (most recent call last):
[  380s]   File "/usr/bin/gtkdoc-fixxref", line 57, in <module>
[  380s]     fixxref.Run(options)
[  380s]   File "/usr/share/gtk-doc/python/gtkdoc/fixxref.py", line 114, in Run
[  380s]     FixCrossReferences(options)
[  380s]   File "/usr/share/gtk-doc/python/gtkdoc/fixxref.py", line 217, in FixCrossReferences
[  380s]     FixHTMLFile(options, full_entry)
[  380s]   File "/usr/share/gtk-doc/python/gtkdoc/fixxref.py", line 261, in FixHTMLFile
[  380s]     lines[i] = re.sub(r'<GTKDOCLINK\s+HREF="([^"]*)"\s*>(.*?)</GTKDOCLINK\s*>', repl_func_with_ix(i), lines[i])
[  380s]   File "/usr/lib64/python3.6/re.py", line 191, in sub
[  380s]     return _compile(pattern, flags).sub(repl, string, count)
[  380s]   File "/usr/share/gtk-doc/python/gtkdoc/fixxref.py", line 257, in repl_func
[  380s]     return MakeXRef(options, file, i + 1, m.group(1), m.group(2))
[  380s]   File "/usr/share/gtk-doc/python/gtkdoc/fixxref.py", line 321, in MakeXRef
[  380s]     common.LogWarning(file, line, 'no link for: "%s" -> (%s).' % (id, text))
[  380s]   File "/usr/share/gtk-doc/python/gtkdoc/common.py", line 144, in LogWarning
[  380s]     print ("%s:%d: warning: %s" % (filename, line, message))
[  380s] UnicodeEncodeError: 'ascii' codec can't encode character '\u201c' in position 126: ordinal not in range(128)

So it looks like there are a couple things we might have missed yet to get fully py3 compatible
Comment 1 Dominique Leuenberger 2017-12-02 15:15:54 UTC
Instead of the plain 'print' we could use logging.warning in common.LogWarning, with the side-effect that the messaes are prefixed with WARNING:root

But the logging infrastructure is already setup to cope with the encoding requirements of stdout (which, in rpmbuild, has LANG=C and is not UTF-8 capable by default).
Comment 2 Dominique Leuenberger 2017-12-02 15:16:04 UTC
Created attachment 364816 [details] [review]
Use logging infrastructure for LogWarning

Using pure 'print' statements often fail if we have a message containing
UTF-8, but output on a terminal/pipe not supporting UTF-8.

Instead of trying to en/decode (and likely fail all the time) use logging.warning

This causes a slight change of the format, as the log is prefixed with WARNING:root
but that seems better than crashing
Comment 3 Stefan Sauer (gstreamer, gtkdoc dev) 2017-12-02 17:44:19 UTC
Review of attachment 364816 [details] [review]:

Can't we setup a special logformat for this so that the output won't change?

::: gtkdoc/common.py
@@ +150,2 @@
     # TODO: write to stderr
+    logging.warning ("%s:%d: warning: %s" % (filename, line, message))

The logger will write to stderr, right? Please remove the TODO then as well.
Comment 4 Dominique Leuenberger 2017-12-02 18:11:51 UTC
(In reply to Stefan Sauer (gstreamer, gtkdoc dev) from comment #3)
> Review of attachment 364816 [details] [review] [review]:
> 
> Can't we setup a special logformat for this so that the output won't change?>

Honestly, no idea, but I guess yes. Would probably need a 2nd instance of logger though.

> ::: gtkdoc/common.py
> @@ +150,2 @@
>      # TODO: write to stderr
> +    logging.warning ("%s:%d: warning: %s" % (filename, line, message))
> 
> The logger will write to stderr, right? Please remove the TODO then as well.

Good point
Comment 5 Stefan Sauer (gstreamer, gtkdoc dev) 2017-12-02 18:21:50 UTC
Okay, let me take over the patch and try a separate logger with a custom logformat.
Comment 6 Stefan Sauer (gstreamer, gtkdoc dev) 2017-12-02 20:19:09 UTC
Created attachment 364831 [details]
test script

For python2 we do this:
https://git.gnome.org/browse/gtk-doc/tree/gtkdoc/common.py#n73
    # When redirecting the output on python2 we get UnicodeEncodeError:
    if not sys.stdout.encoding:
        import codecs
        sys.stdout = codecs.getwriter('utf8')(sys.stdout)


maybe we just need to expand this for python3. Please see the attached script. I can run this as:
python3 encoding.py | cat
python2 encoding.py | cat

Before changing the LogWarning() I'd like to understand why it fails.
Comment 7 Dominique Leuenberger 2017-12-02 20:26:11 UTC
Running the encoding.py inside a limited build environment (where all package builds run):

> python3 encoding.py 
ANSI_X3.4-1968
True
ANSI_X3.4-1968
ascii
None
Traceback (most recent call last):
  • File "encoding.py", line 11 in <module>
    print(u'\u263a \u263b')
UnicodeEncodeError: 'ascii' codec can't encode character '\u263a' in position 0: ordinal not in range(128)

Comment 8 Stefan Sauer (gstreamer, gtkdoc dev) 2017-12-02 21:33:27 UTC
Created attachment 364834 [details]
test script for the logger

Could you try this one too?
Comment 9 Stefan Sauer (gstreamer, gtkdoc dev) 2017-12-02 22:17:56 UTC
Created attachment 364837 [details]
test script for the logger

This one seems to pass all scenarios.
Comment 10 Stefan Sauer (gstreamer, gtkdoc dev) 2017-12-02 22:18:25 UTC
ensonic@square:~/projects/test:> python3 logtest.py
foo.py:32:hello
foo.py:50:☺ ☻
ensonic@square:~/projects/test:> LC_ALL=C python3 logtest.py
foo.py:32:hello
foo.py:50:☺ ☻
ensonic@square:~/projects/test:> python2 logtest.py
foo.py:32:hello
foo.py:50:☺ ☻
ensonic@square:~/projects/test:> LC_ALL=C python2 logtest.py
foo.py:32:hello
foo.py:50:☺ ☻
Comment 11 Stefan Sauer (gstreamer, gtkdoc dev) 2017-12-04 15:18:19 UTC
Created attachment 364921 [details]
test script for print

And now a version that seems to fix print. Tested by running:
> python2 encoding.py
UTF-8
True
UTF-8
UTF-8
None
☺ ☻
> python2 encoding.py | cat
None
False
UTF-8
UTF-8
None
☺ ☻
> LC_ALL=C python2 encoding.py
ANSI_X3.4-1968
True
ANSI_X3.4-1968
ANSI_X3.4-1968
None
☺ ☻

and the same for python3.
Comment 12 Stefan Sauer (gstreamer, gtkdoc dev) 2017-12-04 16:06:37 UTC
commit 0cc67bd997d472b9f0a95763fef30aed11b8e6dd (HEAD -> master, origin/master, origin/HEAD)
Author: Stefan Sauer <ensonic@users.sf.net>
Date:   Mon Dec 4 17:04:18 2017 +0100

    common: more hacks to avoid UnicodeErrors in print()
    
    Handle cases similar to LC_ALL=C.
    Fixes https://bugzilla.gnome.org/show_bug.cgi?id=791131