Bug 720977 – Incomplete UTF-8 preceding newline gets dropped

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 720977 - Incomplete UTF-8 preceding newline gets dropped


Summary:	Incomplete UTF-8 preceding newline gets dropped


Status:	RESOLVED FIXED

Product:	vte
Classification:	Core
Component:	general
Version:	0.35.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	VTE Maintainers
QA Contact:	VTE Maintainers

URL:
Whiteboard:	[fixed-0-36][needed-next][commit:c674...

Depends on:
Blocks:

Reported:	2013-12-23 11:15 UTC by Egmont Koblinger
Modified:	2014-04-06 18:26 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
UTF-8 decoding cleanup (1.30 KB, patch) 2013-12-25 18:25 UTC, Egmont Koblinger	none	Details \| Review
UTF-8 decoding cleanup v2 (1.51 KB, patch) 2013-12-25 18:36 UTC, Egmont Koblinger	committed	Details \| Review
Fix (2.07 KB, patch) 2014-01-05 19:58 UTC, Egmont Koblinger	committed	Details \| Review

Description Egmont Koblinger 2013-12-23 11:15:37 UTC

Actual (incomplete UTF-8 immediately preceding a newline gets dropped):

$ echo -e '\0303foobar'
�foobar
$ echo -e '\0303'

$ 


Expected (incomplete UTF-8 immediately preceding a newline should be replaced by the replacement symbol):

$ echo -e '\0303foobar'
�foobar
$ echo -e '\0303'
�
$

Comment 1 Egmont Koblinger 2013-12-25 17:19:10 UTC

vteconv.c L91:
        /* Determine why the end of the string is not valid.
         * We are pur b@stards for running g_utf8_next_char() on an
         * invalid sequence. */
        skip = g_utf8_next_char(*inbuf) - *inbuf;

Indeed you are b@stards :) , skip becomes 2 instead of 1.

Comment 2 Egmont Koblinger 2013-12-25 18:25:48 UTC

Created attachment 264880 [details] [review]
UTF-8 decoding cleanup

This one cleans up the previously found issue.

This does not fix the actual bug, though. The bug resides in iso2022.c around "nextctl", vte splits the processing of data at control characters (\n, \r and a couple more), forgetting about incomplete sequences left behind.

Comment 3 Egmont Koblinger 2013-12-25 18:36:04 UTC

Created attachment 264881 [details] [review]
UTF-8 decoding cleanup v2

Comment 4 Behdad Esfahbod 2013-12-30 07:19:32 UTC

Ah, nice!  ChPe, can you commit please?

Comment 5 Egmont Koblinger 2014-01-05 19:58:42 UTC

Created attachment 265388 [details] [review]
Fix

I don't fully understand the code, but I hope this patch is a proper fix.

Comment 6 Egmont Koblinger 2014-01-07 18:40:16 UTC

Fixed in 0-36, keeping open for vte-next.