GNOME Bugzilla – Bug 777733
Terminal output hangs on special character, until script finishes
Last modified: 2018-03-27 17:50:35 UTC
Created attachment 344202 [details] zip of scriptreplay log file with accompanying timing file -----Original description as posted on launchpad: I wrote a Python 3 script which scans directory trees, and outputs various filenames (with path) to the terminal, as the files are being hashed. When the following filename was to be printed, gnome-terminal stopped printing output with the cursor where the question mark would have been printed out: ._13 Trapezoid singâ?€.mp3 When I interrupted the program using keyboard interrupt, much more output was displayed, indicating the script had continued to run, despite the output stopping after "â". I navigated to the directory and tried using ls, which was able to print the filename without issue. I tried using: find `pwd` -name *Trapezoid* This also displayed the filename, with full path, correctly. I tried running the script with terminator, which had no problem printing the filename and continuing on with the rest of the output as expected. Terminal output at freeze, and after keyboard interrupt shown in the cropped/pixelized screenshots here: http://imgur.com/a/D0XRK Gnome Terminal version: 3.18.3 Ubuntu 64 Gnome Edition -----Updated information after being contacted by a Mainstream gnome-terminal/vte developer/contributor: If I scan only the directory those files are in, the output does not hang. Instead, it displays the filename with a rectangle and what appears to be 4 numbers inside of it, in two rows (0 0, 9 0). If I copy the filename and paste it into Thunderbird here, it shows the name properly, without the rectangle between the â and €. Cropped screenshot (from subset of directory tree I mention later) here: http://imgur.com/gOZ7B9j (NOTE: while pasting this section, I see the character is included just to the left of the euro symbol in this text box. This was not visible in Mozilla Thunderbird.) If I scan the entire parent directory, the output hangs as reported. This occurs in ROXTerm as well as gnome-terminal. I was not running screen or tmux. gnome-terminal is set to use UTF-8 encoding. I was able to reproduce the bug with that subset of the directory tree. However, it doesn't hang nearly as long. Apparently the output hangs until the script finishes, however long that may be. This time it was a matter of seconds. I did not attempt any other interaction with the terminal window, such as resizing or using the menu. This file was downloaded from an unknown source, several years ago. It was likely saved to an NTFS partition, using Windows XP or Windows 7. It is still on an NTFS partition. I'm attaching the script replay/timing files from a scan of a subset of the directory tree. This scan produces the bug, but it doesn't stay hung up as long due to the much smaller directory tree. The script.log file was edited to replace the real username with "username".
The special character I mentioned was visible in the text box, does not show up on the webpage after being submitted. It is shown in the last screenshot from a log file, though. In the terminal window, that character overlaps a bit on the euro symbol. Here's a screenshot of that: http://imgur.com/4pXS5pj
The filename contains a control character, U+0090 DEVICE CONTROL STRING, which means that vte will eat up the output until it encounters the sequence terminator. So not a bug in vte; if there is a bug here, I'd say it's in coreutils which might consider escaping control characters in filenames when printing to a pty, instead of outputting them verbatim.
Actually coreutils:ls already does this. I missed that you're using your own (python) script; you may want to do the same as ls.
Ah. Thanks. I'll look into that. Being that output was fine with other terminal apps like Terminator, I assumed the it was a bug with gnome-terminal. I'll look into what I can do with my code to escape problematic character(s). Thanks again.
Terminator is irrelevant. It uses the same widget as gnome-terminal, either the current version (in which case it behaves the same as gnome-terminal) or an ancient one (in which case we just don't care). xterm is our reference. Could you please check xterm's behavior? If xterm is different from gnome-terminal and does not hang, we should see what it does differently, e.g. terminates the device control string at newline, or after a certain amount of data etc., which we should incorporate in gnome-terminal. If xterm hangs too then we're good.
xterm does not hang.
I guess we should further investigate then.
Based on output, it looks to me like xterm omits the character. Comparison of screenshots of the filename output: http://imgur.com/9M1Sydj gnome-terminal hangs on the U+0090 character, and displays it when script finishes. terminator doesn't hang on the U+0090 character, but displays it. xterm doesn't hang and doesn't display the U+0090 character. gnome-terminal version: 3.18.3 Terminator version: 0.98 xterm version: 322
Just occurred to me: If I remember correctly, xterm doesn't recognize C1 control characters in UTF-8. So we should either try with its C0 counterpart in xterm, or with a legacy (e.g. latin-1) encoding.
The behavioural difference between terminator and g-t is probably due to terminator being the old gtk2 version. (In reply to Egmont Koblinger from comment #9) > Just occurred to me: If I remember correctly, xterm doesn't recognize C1 > control characters in UTF-8. So we should either try with its C0 counterpart > in xterm, or with a legacy (e.g. latin-1) encoding. Here xterm swallows everything between DCS (ESC P) and ST.
Oh my goodness... Random findings, questions, crazinesses, whatthehecks etc. (vte git head-ish versus xterm-324 shipped by Yakkety): - scriptreplay does not properly replay the attached log, it finishes with "/media/username/WD2TB1/b" and the last few lines are missing. At least it's the same in xterm, vte and konsole. I guess it's a scriptreplay bug. I'm lazy to debug (run "scriptreplay" under "script" to see what it does, haha). - xterm indeed does not support C1 in UTF-8 at all. ctlseqs.txt explains the reason: for some reason (I don't get it) it claims that it would need to decode the stream twice, or recognize C1 prior to decoding UTF-8, that is, it would still be the 0x80 - 0x9F bytes (0x90 in this concrete issue), which (this one's obvious) can legally occur inside a valid UTF-8 character. Konsole and we happily decode UTF-8 first and then look for C1, that is, U+0080 to U+009F, in particular for U+0090 now. - Even if xterm is started up with "-en iso8859-1", it still does not recognize C1 (0x90 and friends, single bytes). It does, however, if I also specify the "+u8" flag. This contradicts its manpage: "[-u8] and the utf8 resource are overridden by the -lc and -en options...". - Trivia: The attached file is valid UTF-8, and contains several U+0090 characters. One of the bytes of the UTF-8 encoded U+0090 is actually a 0x90, so if the same file is interpreted as latin1 then it still contains C1 DCS characters. (It's obviously not necessarily true the other way around.) - As opposed to OSCs, the DCS (device control sequence) needs to be terminated with ST, BEL is not accepted. - What is so special about displaying the prompt that makes it unstuck from this crazy DCS mode, that is, terminates the DCS? Obviously a command like cat script.log; sleep 10; echo -n 'joe@foo$ '; sleep 10 does not finish the output for 20 seconds. So it's not about the prompt itself, but perhaps changing some stty setting that makes it unstuck? Which one and why? - LC_ALL=en_US xterm -en iso8859-1 +u8: (this operates in 8-bit and recognizes C1) cat script.log ; sleep 10 prints the entire file immediately, as opposed to VTE in iso8859-1 which, as the reporter said, hangs until the prompt is shown. Why? Let's copy the first problematic line only (line 66) to a new file and play with that. - Remove the 0x82 byte (part of the Euro symbol), cat line66-0x82-removed; sleep 10 Now xterm hangs and swallows too, just as VTE. - Add this byte back. Change C1 (0x90) to C0 (ESC P). cat line66-c0; sleep 10 Does not get stuck again, while VTE does. - LC_ALL=en_US xterm -en iso8859-1: We've removed C1 support; repeat the previous command. It gets stuck. - Replace the 0x82 byte with ESC B (C1 -> C0 counterpart): Does not get stuck. Anyone still following me? Looks like xterm terminates a DCS upon encountering another C0, or C1 (if support for that is enabled) character. VTE does not. This might result in different behavior in other invalide(-ish?) sequences as well, e.g. this also behavies differently: echo -ne '\e]0;newti\eBtle\a'; sleep 10 Back to the question of why is the prompt so special: No, it is not, and no stty setting is in the game. If you have a terribly simple prompt then it's not displayed, the terminal emulators are still waiting for DCS to be terminated. Even VTE seems to abort the DCS mode on encountering the escape character, which is emitted by all the OSC 0 / 7 and also the various colors of the prompt. Huhh... Looks like I've spent two hours of crazy debugging just to pretty much rediscover bug 730154 comment 7.
So in the new parser, vte will terminate DCS by encountering C1 ST, any other C1 control (DCS is then ignored), CAN or SUB (cancels the DCS), and ESC (which may be the start of ESC \ aka C0 ST, or any other sequence which will then be executed instead of the DCS). Any other codes will be pushed to the DCS string. The state diagram at https://vt100.net/emu/dec_ansi_parser documents this; so I think this bug will be obsolete/fixed when the new parser lands.
Fixed on master.