GNOME Bugzilla – Bug 787157
wrapping breaks within an utf-8 char
Last modified: 2017-09-20 20:21:00 UTC
Needs a better description. Anyway: 1. having (a fairly standard) PS1='\[\033[01;32m\]\u@\h\[\033[01;34m\] \W \$\[\033[00m\] ', create a dir with a very long name (for example '奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記' and go into it As PS1 states, current directory will be printed as a part of the promptt. The bug is that - as the name is very long - the name will get wrapped, but it gets wrapped inside an utf-8 sequence (for me (but that obviously changes with term's width) 1 byte stays in the first line, next two get into second one. Also, there seems to be some confusion in counting char width, byte length and char count: line gets wrapped with several empty cells on the right side of first line, then - just to make things more interesting - in the second line, after the colored '$ ' there are again a few empty cells before the cursor. Now an odd thing happens when typing in that line: the first char get out put where the cursor was, but the cursor jumps then into its expected proper position - the second cell after '$ '. Upon Backspace, the trip is reversed...well, at least it's consistent. ...
Hi, Thanks for the report. I cannot reproduce any faulty behavior. > 1 byte stays in the first line, next two get into second one. What does it mean 1 _byte_ stays in the first line? Bytes are grouped together to form Unicode characters whose glyphs are drawn. Is the proper glyph cut in half? Or are the bytes not joined together? VTE's behavior has been quite thoroughly tested around CJK characters. I suspect probably you're having a problem with your shell, or your locale setup. What shell (and which version) are you using? What distribution, version? Are you sure your locale is set up correctly and to an UTF-8 one? Do you have a fixed width (aka. monospace) font chosen? Could you please demonstrate with screenshot (or videocast) what you're seeing? If you execute a command like echo 奮闘記奮... (with lots of double wide characters), can you reproduce this bug in the echoed output (after you execute that command)? What's the value of your Profile Preference -> Compatibility -> Ambiguous-width characters?
Actually I do see one problem, with bash-4.4.7 on Ubuntu Zesty. If the last CJK of the line would get cut in half, VTE automatically overflows it to the next line. Bash, however, adds another newline after it. Seems like bash has an off-by-one calculation around CJK width. With your example prompt and directory name, at 65 columns terminal width, the prompt looks like this: egmont@foobar 奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記奮 闘 記奮闘記奮闘記奮闘記 $ whereas it should be egmont@foobar 奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記奮 闘記奮闘記奮闘記奮闘記 $ I'm absolutely sure it's a bash bug.
well, the (part of) visible output is: 奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記�� �闘記奮闘記奮闘記奮闘記奮闘記 $ as '奮' is \xe5\xa5\xae... well that's why I'm saying '1 byte' (OK, so it's 2 - got the order wrong, the point stands: still within utf8 sequence)
As for the shell, it's bash 4.3.48, so slightly earlier.
(seems I need more patience today) ...also no, 'echo' doesn't seem to have such problem, so it seems... Damn, a quick test with 'busybox sh' doesn't seem to show this problem... Looks like it's indeed bash...that's...not good
...though busybox shows a different problem: if you're in that long dir: user@host 奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記奮闘記奮闘 記奮闘記奮闘記 $ trew ertyuu tyuioppi tyuiop uoipoipioo OK, I know above strings are silly. Anyway, the point is the second line gets wrapped long before it has ran of out space...
One more thing: I bet the behavior is pretty much the same in xterm, isn't it? It would be cool if you could please try it with bash-4.4/readline-7.0 and if the problem persists then report to them. (Or just report the one I discovered in comment 2, I'll have no time to take care of it.) (Not sure about busybox, I don't know if it aims to support CJK or even non-English letters at all. Maybe it's just counting the bytes, which, with the 3-byte 2-cell CJK characters would result in off by about a factor of 2/3.)
Can we close this as NOTGNOME?
Sorry, got distracted by stuff. Well, it sure seems so, but a funny thing: so, I got prodded to test things with xterm... that result from comment 6 happens there too, but this time I did a longer test...or to be exact put in a longer string. Now, this is kind of interesting: neither bash and busybox nor vte and xterm likely share much code, yet... That bash bug aside, in each of the four combinations, that early jump happened, but only on the second line... Kind of odd. Is it possible that both vte *and* xterm don't handle line wrapping correctly if PS1 overflows a single line ?
...though it seems to happen only if double-width chars are involved, a very long pure ascii name doesn't trigger the problem. ... Correction, char width has nothing to do with it, seems anything outside ascii will do: a dir with a sufficiently long name consisting of 'ź' will trigger it too.
(In reply to Rafał Mużyło from comment #9) > Is it possible that both vte *and* xterm don't handle line wrapping > correctly if PS1 overflows a single line ? Totally unlikely. vte and xterm don't share a single bit of code. Both have been heavily tested around linewrap. In case you're in doubt, you could also try other independent terminal emulators, such as the Linux console, konsole, st, terminology, pterm, urxvt and so on. Terminal emulators don't have any concept of PS1, they don't know what's a prompt, they don't know if the prompt or some other data is being printed. Shells _probably_ don't just print the entire prompt in a single step and let the terminal emulator wrap, _probably_ they wrap manually (that is, insert an explicit carriage return - new line wherever they think the terminal would wrap). There's a chance that bash sometimes gets it wrong, or you have your locale somehow set up incorrectly resulting in bash's faulty behavior. I wouldn't be surprised if busybox didn't have any locale / utf-8 support whatsoever and hence it always got it wrong if any non-US character was involved.
...doesn't seem so... While busybox does indeed act a bit strange with double width chars (or is that only for cjk range ?), single width seems supported (well, actually reading in-tarball docs, it's still WIP, but a long term goal), this problem aside. on an interesting note: as I've said, the generic layout seems to be: <start of ${PS1}>\n <rest of ${PS1}<input text>\n(full line) <more input>\n(premature linebreak) <rest of input>(full lines) the interesting part seems to be that length(<start of ${PS1}>+<more input>) is equal to full line...
Could you please - Run "stty size" and note the window size. - At the same window size change to a directory whose names contain such characters, then run "script". This will launch a new shell where your prompt appears incorrectly. Then quit this shell. You'll be told that a newly created file called "typescript" contains a record of what the shell printed. Please attach this file. - Also take a screenshot of your terminal (or copy-paste here what it looks like; for bugzilla not to wrap early you might want to start this whole story with a 60-ish character narrow terminal). Examining the contents will unambiguously reveal who is the culprit.
Created attachment 360092 [details] script from bash
You forgot to tell me the window size, and to attach a screenshot. Anyway I'll try my best. At the beginning of the second line there's a "\e]0;" followed by the entire path, which contains a directory of 200 "t" letters and another one of 103 "ź" letters. This is where the window's title is set (between "\e]0;" and the trailing "\a"), these characters do not appear inside the terminal. This is followed by the escape sequence to switch to green, the username@hostname part of your prompt, switch to blue, a space, then solely 29 "ź"s. Then a carriage return, a newline, yet another carriage return, and the remaining 74 "ź"s. That is, it is absolutely clear that it's not VTE wrapping there, it's bash that explicitly tells VTE to start a new line after the 29th "ź". Similar story can be seen with the "r"s that you type from your keyboard. I am absolutely sure that VTE doesn't do anything incorrectly here. If you experience an overall incorrect behavior, it's bash (readline) asking VTE to print silly things. I recommend that you contact bash/readline developers with these findings. I'd appreciate if you could also show them my finding in comment 2, as I won't have time for that. I also suggest that you first upgrade to bash-4.4/readline-7.0, you'll have better chances than if you reported bugs against an older version. Closing this one as NOTGNOME.
Created attachment 360154 [details] script from busybox ...at least I think I've run it correctly with 'script -c 'busybox sh'' ... This was meant to be posted yesterday (seconds after the last), but Gnome's bugzilla decided to not cooperate and a bit later I went to get some sleep. Either I did something wrong or busybox shows the same problem...I'm pretty sure busybox doesn't use readline. As for the size, a pretty standard one 24x80.
Examining typescript-2: The prompt is 22 printable chars of user@host and a space, followed by 103 times ź, then 3 more chars (space dollar space). That's 128 printed characters. (Plus there are a few escape sequences to change the color or query the cursor position.) Unlike bash, busybox doesn't explicitly wrap the multiline prompt, it lets the terminal do so. And vte and xterm (and presumably all other) indeed do so. 128 chars means 80 in the first row and 48 in the second, leaving room for 32 more in the second row. Then follows only 9 "t" letters (I assume this time you pressed and held down the "t" key), then a premature CR-LF printed by busybox. So while bash and busybox probably don't share any code, busybox also asks the terminal to do something silly. I think it just doesn't understand UTF-8 at all. It believes that every 2-byte ź indeed occupies 2 column. In that case the prompt would take 22 + 2*103 + 3 = 231 columns, which is 9 less than 3 entire rows. This totally explains why it inserts a line wrap after 9 "t" letters. It doesn't explain though why it queries the mouse position then :P Busybox simply doesn't seem to support UTF-8.