After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 720977 - Incomplete UTF-8 preceding newline gets dropped
Incomplete UTF-8 preceding newline gets dropped
Status: RESOLVED FIXED
Product: vte
Classification: Core
Component: general
0.35.x
Other Linux
: Normal normal
: ---
Assigned To: VTE Maintainers
VTE Maintainers
[fixed-0-36][needed-next][commit:c674...
Depends on:
Blocks:
 
 
Reported: 2013-12-23 11:15 UTC by Egmont Koblinger
Modified: 2014-04-06 18:26 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
UTF-8 decoding cleanup (1.30 KB, patch)
2013-12-25 18:25 UTC, Egmont Koblinger
none Details | Review
UTF-8 decoding cleanup v2 (1.51 KB, patch)
2013-12-25 18:36 UTC, Egmont Koblinger
committed Details | Review
Fix (2.07 KB, patch)
2014-01-05 19:58 UTC, Egmont Koblinger
committed Details | Review

Description Egmont Koblinger 2013-12-23 11:15:37 UTC
Actual (incomplete UTF-8 immediately preceding a newline gets dropped):

$ echo -e '\0303foobar'
�foobar
$ echo -e '\0303'

$ 


Expected (incomplete UTF-8 immediately preceding a newline should be replaced by the replacement symbol):

$ echo -e '\0303foobar'
�foobar
$ echo -e '\0303'
�
$
Comment 1 Egmont Koblinger 2013-12-25 17:19:10 UTC
vteconv.c L91:
        /* Determine why the end of the string is not valid.
         * We are pur b@stards for running g_utf8_next_char() on an
         * invalid sequence. */
        skip = g_utf8_next_char(*inbuf) - *inbuf;

Indeed you are b@stards :) , skip becomes 2 instead of 1.
Comment 2 Egmont Koblinger 2013-12-25 18:25:48 UTC
Created attachment 264880 [details] [review]
UTF-8 decoding cleanup

This one cleans up the previously found issue.

This does not fix the actual bug, though. The bug resides in iso2022.c around "nextctl", vte splits the processing of data at control characters (\n, \r and a couple more), forgetting about incomplete sequences left behind.
Comment 3 Egmont Koblinger 2013-12-25 18:36:04 UTC
Created attachment 264881 [details] [review]
UTF-8 decoding cleanup v2
Comment 4 Behdad Esfahbod 2013-12-30 07:19:32 UTC
Ah, nice!  ChPe, can you commit please?
Comment 5 Egmont Koblinger 2014-01-05 19:58:42 UTC
Created attachment 265388 [details] [review]
Fix

I don't fully understand the code, but I hope this patch is a proper fix.
Comment 6 Egmont Koblinger 2014-01-07 18:40:16 UTC
Fixed in 0-36, keeping open for vte-next.