Bug 785113 – glib-mkenums Python port fixes

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 785113 - glib-mkenums Python port fixes


Summary:	glib-mkenums Python port fixes


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	build
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	2.56
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Duplicates:	789230 (view as bug list)
Depends on:
Blocks:

Reported:	2017-07-19 09:52 UTC by Christoph Reiter (lazka)
Modified:	2017-12-13 13:45 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
glib-mkenums: Use utf-8 for reading files (1.72 KB, patch) 2017-07-19 09:52 UTC, Christoph Reiter (lazka)	committed	Details \| Review
glib-mkenums: Don't use FileNotFoundError, it's Python 3 only. (1.12 KB, patch) 2017-07-19 09:53 UTC, Christoph Reiter (lazka)	committed	Details \| Review
glib-mkenums: fix encoding error when writing files (1.07 KB, patch) 2017-07-22 18:50 UTC, Christoph Reiter (lazka)	committed	Details \| Review
glib-mkenums: Python2: use locale encoding when redirecting stdout (1.58 KB, patch) 2017-07-26 09:44 UTC, Christoph Reiter (lazka)	committed	Details \| Review
glib-mkenums: Handle source files which are not utf-8, somewhat (2.20 KB, patch) 2017-10-20 15:18 UTC, Christoph Reiter (lazka)	reviewed	Details \| Review
replace and warn (2.58 KB, patch) 2017-10-23 12:22 UTC, Patrick Welche	reviewed	Details \| Review
replace and warn v2 (3.55 KB, patch) 2017-10-23 13:03 UTC, Patrick Welche	needs-work	Details \| Review
v3 (3.52 KB, patch) 2017-12-06 11:35 UTC, Ross Burton	committed	Details \| Review

Description Christoph Reiter (lazka) 2017-07-19 09:52:24 UTC

Building gtk with glib master fails on Windows. Here are some fixes

Comment 1 Christoph Reiter (lazka) 2017-07-19 09:52:46 UTC

Created attachment 355924 [details] [review]
glib-mkenums: Use utf-8 for reading files

On Windows open() defaults to ANSI and on Python 2 it doesn't take
an encoding. Use io.open() instead which provides the same interface
on both Python versions.

Comment 2 Christoph Reiter (lazka) 2017-07-19 09:53:07 UTC

Created attachment 355925 [details] [review]
glib-mkenums: Don't use FileNotFoundError, it's Python 3 only.

Comment 3 Emmanuele Bassi (:ebassi) 2017-07-19 10:46:04 UTC

Review of attachment 355924 [details] [review]:

ACK

Comment 4 Emmanuele Bassi (:ebassi) 2017-07-19 10:46:27 UTC

Review of attachment 355925 [details] [review]:

ACK

Comment 5 Marc-Andre Lureau 2017-07-22 18:13:17 UTC

another utf8 bug?:

gcab (master %)$ glib-mkenums --identifier-prefix GCab --symbol-prefix gcab --template libgcab/gcab-enums.h.etemplate libgcab/gcab-folder.h libgcab/gcab-file.h > libgcab/gcab-enums.h
Traceback (most recent call last):

+ Trace 237688

File "/usr/local/bin/glib-mkenums", line 395 in <module>
```
write_output(prod)
```
File "/usr/local/bin/glib-mkenums", line 74 in write_output
```
print(output, file=output_stream)
```

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 34: ordinal not in range(128)

Comment 6 Emmanuele Bassi (:ebassi) 2017-07-22 18:21:01 UTC

Possibly, but all files are marked as UTF-8.

Comment 7 Emmanuele Bassi (:ebassi) 2017-07-22 18:22:39 UTC

This is also Python 2.x only; Python 3.x parses the files successfully.

Comment 8 Christoph Reiter (lazka) 2017-07-22 18:50:36 UTC

Created attachment 356187 [details] [review]
glib-mkenums: fix encoding error when writing files

Instead of using NamedTemporaryFile, which doesn't take an encoding in Python 2
use mkstemp() to create a file and open it with io.open(), with a proper
encoding set.

Comment 9 Christoph Reiter (lazka) 2017-07-22 18:55:03 UTC

This is just a guess.. Python 3 should have the same problem with LANG=C

Comment 10 Marc-Andre Lureau 2017-07-24 09:23:40 UTC

Review of attachment 356187 [details] [review]:

Thanks, that fixes the issue.

Comment 11 Christoph Reiter (lazka) 2017-07-24 09:27:48 UTC

Thanks for testing

Comment 12 Michael Catanzaro 2017-07-26 01:15:31 UTC

Still happening with glib master, same error as in comment #5, 100% reproducible for me today.

Comment 13 Christoph Reiter (lazka) 2017-07-26 08:46:23 UTC

Thanks. Python 2 gives up when redirecting stdout and sets stdout.encoding to None which makes it default to ascii. Python 3 sets it to the locale encoding.

Comment 14 Christoph Reiter (lazka) 2017-07-26 09:44:09 UTC

Created attachment 356410 [details] [review]
glib-mkenums: Python2: use locale encoding when redirecting stdout

In case of Python 2 and stdout being redirected to a file, sys.stdout.encoding
is None and it defaults ASCII for encoding text.

To match the behaviour of Python 3, which uses the locale encoding also when
redirecting to a file, wrap sys.stdout with a StreamWriter using the
locale encoding.

Comment 15 Christoph Reiter (lazka) 2017-07-26 10:05:03 UTC

Ideally gcab should use "--output" to not depend on the locale encoding.

Comment 16 Michael Catanzaro 2017-07-26 15:56:23 UTC

Why are you attempting to support Python 2?

Comment 17 Emmanuele Bassi (:ebassi) 2017-07-26 15:57:58 UTC

(In reply to Michael Catanzaro from comment #16)
> Why are you attempting to support Python 2?

Because glib-mkenums is used to build projects on systems without Python 3.

Comment 18 Christoph Reiter (lazka) 2017-07-26 15:58:50 UTC

It also defaults to Python 2 here.

Comment 19 Emmanuele Bassi (:ebassi) 2017-07-26 15:59:16 UTC

Review of attachment 356410 [details] [review]:

This looks okay to me.

Comment 20 Patrick Welche 2017-10-20 11:06:38 UTC

2.54.1 contains the above patches, yet I hit something similar when trying
to compile xfce4-gtk2-engine 3.2.0:

Traceback (most recent call last):

+ Trace 238083

File "/usr/pkg/bin/glib-mkenums", line 687 in <module>
```
process_file(fname)
```
File "/usr/pkg/bin/glib-mkenums", line 419 in process_file
```
line = curfile.readline()
```

File "/usr/pkg/lib/python3.6/codecs.py", line 321 in decode

(result, consumed) = self._buffer_decode(data, self.errors, final)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 980: invalid continuation byte

Traceback (most recent call last):

+ Trace 238083

File "/usr/pkg/bin/glib-mkenums", line 687 in <module>
```
process_file(fname)
```
File "/usr/pkg/bin/glib-mkenums", line 419 in process_file
```
line = curfile.readline()
```

File "/usr/pkg/lib/python3.6/codecs.py", line 321 in decode

(result, consumed) = self._buffer_decode(data, self.errors, final)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 980: invalid continuation byte

Makefile:737: recipe for target 'xfce_typebuiltin.h' failed

0xd6 in iso-8859-1 is &Oumlaut;, and the author's name in the commented out
copyright notice is Tomas &Oumlaut;gren.


(glib 2.54.1, python 3.6)

Comment 21 Emmanuele Bassi (:ebassi) 2017-10-20 11:24:00 UTC

(In reply to Patrick Welche from comment #20)
> 2.54.1 contains the above patches, yet I hit something similar when trying
> to compile xfce4-gtk2-engine 3.2.0:

This issue was not for files with mixed encoding: it was for the output of files in Unicode when the local output encoding was not UTF-8.

The parsed files must be UTF-8 because C sources have *no* notion of encoding.

Comment 22 Patrick Welche 2017-10-20 11:36:39 UTC

... and from the C point of view, you don't care about what is in the comments.

Suddenly the content of every comment in every old bit of source becomes vitally important to the success of a build.

Comment 23 Emmanuele Bassi (:ebassi) 2017-10-20 11:58:04 UTC

(In reply to Patrick Welche from comment #22)
> ... and from the C point of view, you don't care about what is in the
> comments.

Which is true for a C compiler, *but* glib-mkenums parses comments for directives, and needs to be able to run regular expressions on various bits of C, including comments.

> Suddenly the content of every comment in every old bit of source becomes
> vitally important to the success of a build.

Yes, stuff ends up breaking when behaviour is undefined.

The old, Perl-based script was happy to run regular expressions on random byte sequences, which is a recipe for other, exciting undefined behaviour.

You have two options: either use ASCII, or use UTF-8. Using other encodings in file formats that have no possibility to define the encoding they use is a recipe for disaster.

Comment 24 Patrick Welche 2017-10-20 12:01:22 UTC

OK, you convinced me - in some sense, it isn't C code, but mkenum code.

Comment 25 Christoph Reiter (lazka) 2017-10-20 12:06:56 UTC

A possible workaround would be to use the "replace" error handler when decoding, which would give us "?" for every byte failing to decode.

Ideally xfce4-gtk2-engine would get converted to utf-8, but I can see how that seems inconvenient, given its more or less abandoned/"finished" state.

Comment 26 Patrick Welche 2017-10-20 12:16:09 UTC

Sounds like a good idea, or we'll keep finding strange new build failures in abandonware.

(I just spotted the equivalent debian bug with the magic iso-8559-1 -> utf-8
O umlaut patch:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=870738
)

Comment 27 Christoph Reiter (lazka) 2017-10-20 15:18:11 UTC

Created attachment 361959 [details] [review]
glib-mkenums: Handle source files which are not utf-8, somewhat

Hm, so that only affects one case in one file which needs to be patched. If this remains the only case where mkenums breaks I'm not sure working that around in glib is worth the trouble. Old unmaintained software is bound to break over time one way or another when used with newer tools/deps.

Anyway, potential patch attached, I'll let others decide if if should be applied.

Comment 28 Michael Catanzaro 2017-10-20 15:51:14 UTC

Probably reporting an error and expecting it to be fixed is the better thing to do. Compilers get more strict about input all the time, and this is comparatively easy to fix.

Comment 29 Patrick Welche 2017-10-21 08:17:52 UTC

(In reply to Christoph Reiter (lazka) from comment #27)
> Created attachment 361959 [details] [review] [review]
> glib-mkenums: Handle source files which are not utf-8, somewhat
> 
> Hm, so that only affects one case in one file which needs to be patched. If
> this remains the only case where mkenums breaks I'm not sure working that
> around in glib is worth the trouble. Old unmaintained software is bound to
> break over time one way or another when used with newer tools/deps.
> 
> Anyway, potential patch attached, I'll let others decide if if should be
> applied.

Thank you! I will certainly be applying it to pkgsrc.org.

Macports discovered the same issue building libgnome
debian discovered the same issue building libgnomeui
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=870729 

Given that gnome 2 is far more non-linux friendly than gnome 3, users of it will persist, but would you want to patch ancient libgnome? Presumably not, so comment 28 position doesn't really hold.

Comment 30 Michael Catanzaro 2017-10-21 14:25:20 UTC

(In reply to Patrick Welche from comment #29)
> Given that gnome 2 is far more non-linux friendly than gnome 3, users of it
> will persist, but would you want to patch ancient libgnome? Presumably not,
> so comment 28 position doesn't really hold.

See:

https://git.gnome.org/browse/libgnome/commit/?id=33313713c4f5c1de500859ff128d6fd7e3af5722

https://git.gnome.org/browse/libgnomeui/commit/?id=30334c28794ef85d8973f4ed0779b5ceed6594f2

Distros that are still shipping these libraries should apply the patches.

Comment 31 Patrick Welche 2017-10-21 16:00:59 UTC

Even better: distros simply apply the patch in comment 27.

Comment 32 Emmanuele Bassi (:ebassi) 2017-10-21 16:11:22 UTC

Let's reopen this, instead of constantly commenting on a closed bug.

Comment 33 Emmanuele Bassi (:ebassi) 2017-10-21 16:14:01 UTC

(In reply to Patrick Welche from comment #31)
> Even better: distros simply apply the patch in comment 27.

Personally, I'd rather people ported their code to 2017.

This should be a (non-fatal) warning, but it should still cause people to port their code to UTF-8. Legacy modules that have long since deprecated notwithstanding, modern code should not be mixing encodings.

Comment 34 Patrick Welche 2017-10-21 16:20:43 UTC

I am certainly not advocating that shiny new Gnome 3 code should mix encodings, but it is a shame to force distros to maintain N patches for something as trivial as an author's name containing an iso8859-1 accented character.

Comment 35 Emmanuele Bassi (:ebassi) 2017-10-21 16:22:33 UTC

Review of attachment 361959 [details] [review]:

::: gobject/glib-mkenums.in
@@ +38,3 @@
+# Replace invalid data with a replacement character to keep things working.
+# https://bugzilla.gnome.org/show_bug.cgi?id=785113#c20
+decoding_errors = "replace"

I'd like to get a warning for this, for people porting code.

AFAICS, you can use something like:

```
import codecs

def replace_and_warn(err):
    print_warning('Invalid Unicode at {}'.format(err.start))
    return ('?', err.end)

codecs.register_error('replace_and_warn', replace_and_warn)
```

and then use `replace_and_warn` as the errors policy.

Comment 36 Patrick Welche 2017-10-22 07:09:34 UTC

Sounds good. Better being the enemy of good, I tried to do better with:

def replace_and_warn(err):
    warnings.warn('{} at {}'.format(err.reason, err.start), UnicodeWarning)
    return ('?', err.end)

but see

/usr/pkg/bin/glib-mkenums:44: UnicodeWarning: invalid continuation byte at 980
  warnings.warn('{} at {}'.format(err.reason, err.start), UnicodeWarning)
/usr/pkg/bin/glib-mkenums:44: UnicodeWarning: invalid continuation byte at 980
  warnings.warn('{} at {}'.format(err.reason, err.start), UnicodeWarning)
Creating xfce_typebuiltin.c...

(Why print the warnings.warn source code line?)

The principle works though! (Successful build + warning)

Comment 37 Philip Withnall 2017-10-23 11:26:15 UTC

*** Bug 789230 has been marked as a duplicate of this bug. ***

Comment 38 Patrick Welche 2017-10-23 12:22:16 UTC

Created attachment 362097 [details] [review]
replace and warn

Given the legwork required to remove the "warnings.warn" line from the build log, I bailed and went for the trivial patch attached which just prints to stderr. I checked +/-7000 for the range for err.object[], and didn't have out of bounds problems, so +/-7 should be safe to get a clue where the problematic character is.

Example output:

Making all in gtk-3.0
gmake[2]: Entering directory '/tmp/pkgsrc/x11/xfce4-gtk2-engine/work.x86_64/gtk-
xfce-engine-3.2.0/gtk-3.0'
UnicodeWarning: invalid continuation byte at 980 (b' Tomas \xd6gren <s')
UnicodeWarning: invalid continuation byte at 980 (b' Tomas \xd6gren <s')
Creating xfce_typebuiltin.c...

Comments?

Comment 39 Emmanuele Bassi (:ebassi) 2017-10-23 12:40:25 UTC

Review of attachment 362097 [details] [review]:

::: gobject/glib-mkenums.in
@@ +40,3 @@
+decoding_errors = "replace_and_warn"
+
+def replace_and_warn(err):

Can you move this down, and use `print_warning()` instead?

We get the colorised output, and we get the automatic `--g-fatal-warnings` handling for stopping the processing (and the build).

@@ +43,3 @@
+    sys.stderr.write('UnicodeWarning: {} at {} ({})\n'.format(
+        err.reason, err.start,
+decoding_errors = "replace_and_warn"

Hardcoded numbers are a bit meh, especially without a comment on what they mean — or, better yet, a symbolic constant.

Additionally, you probably want to add spaces around the `-`, for readability.

Comment 40 Patrick Welche 2017-10-23 13:03:04 UTC

Created attachment 362098 [details] [review]
replace and warn v2

Output now looks like:

gmake[2]: Entering directory '/tmp/pkgsrc/x11/xfce4-gtk2-engine/work.x86_64/gtk-xfce-engine-3.2.
0/gtk-3.0'
WARNING: UnicodeWarning: invalid continuation byte at 980 (b' Tomas \xd6gren <s')
WARNING: UnicodeWarning: invalid continuation byte at 980 (b' Tomas \xd6gren <s')
Creating xfce_typebuiltin.c...

Comment 41 Patrick Welche 2017-11-08 13:57:17 UTC

OK to commit?

Comment 42 Emmanuele Bassi (:ebassi) 2017-11-08 14:08:21 UTC

Review of attachment 362098 [details] [review]:

Looks good to me, but somebody with more Python-fu should probably comment on the style.

Comment 43 Philip Withnall 2017-11-08 14:47:58 UTC

Review of attachment 362098 [details] [review]:

Looks alright to me. If a Pythonista comes along later with suggested improvements, they can always provide a patch. I’d like this one fix in first though, since the decoding_errors variable seems unnecessary. (If you change decoding_errors to another name, everything will break, since it’s not passed as the first parameter of register_error(). I’d just drop the variable though.)

::: gobject/glib-mkenums.in
@@ +87,3 @@
+# Replace invalid data with a replacement character to keep things working.
+# https://bugzilla.gnome.org/show_bug.cgi?id=785113#c20
+decoding_errors = "replace_and_warn"

Is there any reason for this to be defined as a separate variable? I’d be inclined to just pass the string constant 'replace_and_warn' to the errors= arguments.

Comment 44 Philip Withnall 2017-12-04 11:48:21 UTC

(In reply to Philip Withnall from comment #43)
> Is there any reason for this to be defined as a separate variable? I’d be
> inclined to just pass the string constant 'replace_and_warn' to the errors=
> arguments.

Patrick?

Comment 45 Ross Burton 2017-12-06 11:35:12 UTC

Created attachment 365102 [details] [review]
v3

I'll admit I've not actually tested this...

Removed the variable and replaced it with string literals.

Comment 46 Emmanuele Bassi (:ebassi) 2017-12-06 12:22:37 UTC

Review of attachment 365102 [details] [review]:

ACK-by: me

Comment 47 Philip Withnall 2017-12-13 13:45:19 UTC

I don’t know why git bz didn’t leave a comment here, but I’ve pushed this to master:

b6b74402d (HEAD -> master, origin/master, origin/HEAD) glib-mkenums: best effort attempt on non-utf8 encoded files.

and (trivially) backported it and some other outstanding glib-mkenums patches to glib-2-54:

2fc3948ba (HEAD -> glib-2-54) glib-mkenums: best effort attempt on non-utf8 encoded files.
b6d4edbae mkenums: Don't raise when unlinking a file that does not exist
f5d8956ec glib-genmarshal/glib-mkenums: Add comment clarifying licensing
5fe01f182 Sort the list of files being processed by glib-mkenums
0c7d52ba2 glib-mkenums: Add default comment template if none is provided
1d4afde2f glib-mkenums: Fix typo in version string