GNOME Bugzilla – Bug 787862
Always open files in text mode and always use utf-8
Last modified: 2017-11-01 20:00:50 UTC
Created attachment 360014 [details] [review] Always open files in text mode and always use utf-8 (I'm currently trying to get gtk-doc working on Window: https://github.com/Alexpux/MINGW-packages/pull/2918) ---- Introduces a common.open_text() helper with saner defaults for opening text files across Python versions. open() defaults to the locale encoding which on a properly configured Unix is utf-8, but cp-1252 on Windows which can't handle all of Unicode. Instead of using the default always use utf-8 for text files. To reduce the difference of types processed by Python 2 vs 3 use codecs.open() to open text files in text mode on Python 2. The resulting file object will return unicode like on Python 3, but still allows passing in ASCII only str. Also fixes a few missing file.close() operations, which is important on Windows as non-closed files can't be renamed/deleted on Windows.
Do the tests work for you? when using python2 make[4]: Entering directory `/home/ensonic/projects/gnome/gtk-doc/tests/gobject/docs' DOC 00:00:00.004161985: Scanning header files DOC 00:00:00.101422715: Introspecting gobjects DOC 00:00:00.297362182: Building XML Traceback (most recent call last):
+ Trace 238107
mkdb.Run(options)
changed, book_top, book_bottom = OutputDB(os.path.join(ROOT_DIR, MODULE + "-sections.txt"), options)
sig_synop, sig_desc = GetSignals(symbol)
sid, name)
DOC 00:00:00.378855244: Building HTML
Oh, oops, I'll have a look
Created attachment 362289 [details] [review] mkdb: Mark multiple Unicode strings as such These are utf-8 encoded byte strings under Python 2 and when concatonated with unicode objects get auto-decoded using the default ascii encoding, which fails as they are not ascii.
The following fixes have been pushed: 2135887 mkdb: Mark multiple Unicode strings as such 1eeec38 Always open files in text mode and always use utf-8
Created attachment 362365 [details] [review] mkdb: Mark multiple Unicode strings as such These are utf-8 encoded byte strings under Python 2 and when concatonated with unicode objects get auto-decoded using the default ascii encoding, which fails as they are not ascii.
Created attachment 362366 [details] [review] Always open files in text mode and always use utf-8 Introduces a common.open_text() helper with saner defaults for opening text files across Python versions. open() defaults to the locale encoding which on a properly configured Unix is utf-8, but cp-1252 on Windows which can't handle all of Unicode. Instead of using the default always use utf-8 for text files. To reduce the difference of types processed by Python 2 vs 3 use codecs.open() to open text files in text mode on Python 2. The resulting file object will return unicode like on Python 3, but still allows passing in ASCII only str. Also fixes a few missing file.close() operations, which is important on Windows as non-closed files can't be renamed/deleted on Windows.
Thanks!
while building gmime-2.6, I still see a gtk-doc failure like this: [ 99s] Traceback (most recent call last): [ 99s] File "/usr/bin/gtkdoc-mkdb", line 61, in <module> [ 99s] mkdb.Run(options) [ 99s] File "/usr/share/gtk-doc/python/gtkdoc/mkdb.py", line 281, in Run [ 99s] ReadSourceDocumentation(sdir, suffix_list, source_dirs, ignore_files) [ 99s] File "/usr/share/gtk-doc/python/gtkdoc/mkdb.py", line 3638, in ReadSourceDocumentation [ 99s] ScanSourceFile(fname, ignore_files) [ 99s] File "/usr/share/gtk-doc/python/gtkdoc/mkdb.py", line 3679, in ScanSourceFile [ 99s] for line in SRCFILE: [ 99s] File "/usr/lib/python3.6/codecs.py", line 321, in decode [ 99s] (result, consumed) = self._buffer_decode(data, self.errors, final) [ 99s] UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 6694: invalid start byte Related? different/new bug?
Different issue. The files "gmime-iconv-utils.c" and "gmime-filter-charset.c" are latin1 encoded instead of utf-8.
Still sucks though. I wonder if we could peek at mode-lines in the file and reopen in the right encoding if it is specified. The files in gmime use mode-lines, but don't specify the encoding :/