After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 789714 - crash: xmlParserPrintFileContextInternal mangles utf8
crash: xmlParserPrintFileContextInternal mangles utf8
Product: libxml2
Classification: Platform
Component: general
git master
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
: 791691 (view as bug list)
Depends on:
Reported: 2017-10-31 15:49 UTC by Shaun McCance
Modified: 2021-07-05 13:22 UTC
See Also:
GNOME target: 3.28
GNOME version: ---

Program that reliably crashes under py3 (548 bytes, text/x-python)
2017-10-31 15:51 UTC, Shaun McCance

Description Shaun McCance 2017-10-31 15:49:46 UTC
I've hit a crasher in itstool that I believe is libxml2's fault. The crash happens during error reporting with a custom error handler. I'm attaching a python program that reliably crashes. It only crashes under Python 3, not Python 2, for some reason.

I've tracked this down to xmlParserPrintFileContextInternal, specifically these lines:

    while ((n++ < (sizeof(content)-1)) && (cur > base) &&
           (*(cur) != '\n') && (*(cur) != '\r'))

If the size of content is reached while in the middle of a multi-byte UTF-8 character, this will result in broken UTF-8 being passed around, and I think somewhere in Python's callback mechanisms that broken UTF-8 causes a segfault.

The solution, I think, is to add something like this:

    while (!is_a_valid_first_byte_for_a_utf8_character(*cur))

I've made up that function name. I'm hopeful such a function exists already in libxml2 or its deps. Maybe iconv?
Comment 1 Shaun McCance 2017-10-31 15:51:11 UTC
Created attachment 362639 [details]
Program that reliably crashes under py3
Comment 2 Dominique Leuenberger 2017-10-31 16:03:17 UTC
I think this could be resolved to this downstream bug:

it carries a patch - which I have applied on my python3/libxml2 integration and the attached from comment#1 results in this:

> python3 
Entity: line 1: 
error : 
Opening and ending tag mismatch: p line 1 and key

вая клавишу key href="help:gnome-help/keyboard-key-super">Super</key>


Entity: line 1: 
error : 
Extra content at the end of the document

вая клавишу key href="help:gnome-help/keyboard-key-super">Super</key>


=> probably not perfect, but no crash
Comment 3 Shaun McCance 2017-10-31 21:06:08 UTC
That's definitely the same bug. The output you pasted is what I would expect. Maybe not the best error output, but certainly the expected error output.

My proposal was to fix the UTF-8 mangling where it happens. That patch adjust for malformed UTF-8 in another place. That patch has the advantage of catching other garbage input before handing it off to crash inside Python. But it does seem cleaner to me to just never create broken UTF-8.

Any chance we could get a patch landed and a release?
Comment 4 Nick Wellnhofer 2017-11-08 11:41:53 UTC
Yes, this should be fixed in xmlParserPrintFileContextInternal. This function has a couple of other issues regarding UTF-8:

- The end of the error message could be a truncated UTF-8 sequence as well.
- The contents beyond the current position in the stream could contain invalid UTF-8.
- The function should return up to 80 Unicode characters instead of bytes.
- The position of the caret indicator should be based on Unicode characters, not bytes.
Comment 5 Nick Wellnhofer 2019-11-05 21:02:36 UTC
*** Bug 791691 has been marked as a duplicate of this bug. ***
Comment 6 Nick Wellnhofer 2019-11-07 12:17:51 UTC
This is also tracked as Gitlab issue #64:
Comment 7 GNOME Infrastructure Team 2021-07-05 13:22:59 UTC
GNOME is going to shut down in favor of
As part of that, we are mass-closing older open tickets in
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
and create a new ticket at

Thank you for your understanding and your help.