GNOME Bugzilla – Bug 584160
use harfbuzz directly (to handle Unicode complex text rendering)
Last modified: 2021-06-10 14:12:34 UTC
Please describe the problem: When the correct Unicode text is copied from the Wikipedia edit box (Of http://en.wikipedia.org/w/index.php?title=Hindi&action=edit for example, pick the devanagari for the word Hindi) and pasted into the terminal, it gets mangled since complex text support is not respected as needed for Hindi and many other Asian languages. If I recall, previous versions of Gnome-terminal did not have this problem. Steps to reproduce: 1. Open Gnome-terminal 2. Paste in Unicode text that needs complex text rendering 3. Text will be mangled Actual results: Text is mangled Expected results: Text would read correctly according to http://en.wikipedia.org/wiki/Complex_text_rendering rules. Does this happen every time? Yes Other information: I'll try attaching a screenshot if I can, if not I've attached one to the Ubuntu launchpad bug report https://bugs.launchpad.net/ubuntu/+source/gnome-terminal/+bug/381429
Created attachment 135519 [details] Screenshot showing the problem Adding a screenshot that shows the error. The correct Unicode text is shown in the Wikipedia Window and the mangled text is shown in the gnome-terminal
-> vte
*** Bug 583718 has been marked as a duplicate of this bug. ***
As bug 583718 has been resolved as this bug's duplicate, can I discuss some Thai-specific solution here? Probably, similar approach is available for other scripts as well. For Thai, as stated in bug 583718, the only case in problem is SARA AM (U+0E33), which is handled in traditional typewriters by rendering it in composed form in a separate cell. This can be achieved by hacking the Thai shaper so it does not decompose this character when the font is monospace. And I have got a patch for this, which has been applied to a customized and locally distributed version of Pango for a while. Is similar approach possible for other complex text scripts? If so, let's discuss how traditional typewriters work for those scripts. Otherwise, and if CTL is not likely to be fully supported soon, can I reopen bug 583718 and propose the Pango patch there? Note that partial CTL support in vte becomes a regression for Thai, compared to no support at all, where pure monospace rendering is acceptable at some degree. And the workaround is quite small.
*** Bug 606370 has been marked as a duplicate of this bug. ***
Created attachment 151075 [details] [review] Pango patch for Thai For discussion's sake, this is my pango workaround for Thai. It makes pango render SARA AM in its separate cell if the font is monospace. And this is how Thai traditional typewriters behave. So, it's what users expect. How about Indic typewriters?
Bug 583718 reopened. I think it's not a duplicate to this bug if we choose a simple path for Thai.
Why isn't this begin fixed ? This is an emergency.
Created attachment 170898 [details] Problem with Gnome Terminal This clearly shows what is the problem.
Created attachment 170899 [details] Devanagari File used in Testing Gnome Terminal's Rendering This is file used in the previous attachment.
The status must be confirmed. I can recreate this problem. This is all in systems. This is critical.
Cool down...
So, what is the problem behind this ? It has been posted at 2009-05-28 and today it's 2010-09-23 .
Even if it was posted in 2001 it wouldn't have made much difference. Complex text in terminal is a generally hard problem, and we simply have not got to fix it yet. I have no clear idea of when it will be fixed. Feel free to submit a patch.
And if you don't have the time or skills to fix the bug that's fine, but it still should be listed as confirmed and probably high priority.
Ya! At least that.
How about starting with a survey on how traditional typewriters work for CTL scripts?
@Taxman, @ujjwollamichhane: Please stop spamming this bug with irrelevant comments. Comment 0 and comment 1 describe the problem sufficiently well, no more confirmation is needed at this time. For the developer's response, see comment 14. The fact that bug was 'unconfirmed' has no relevance; we simply don't make a difference between UNCONFIRMED and NEW bugs. The bug priority is for the _developers_ to set; not you. If this bug is 'high priority' for _you_, consider hiring someone to work on it.
Anyway, please consider providing some input, such as adding some information to comment 4. Probably, such information for Indic and other CTL scripts could help shape the idea on how to solve this. I'm not a vte developer. So, that's what I can do.
(In reply to comment #18) > The bug priority is for the _developers_ to set; not you. If this bug is 'high > priority' for _you_, consider hiring someone to work on it. I really hate this guy. Oh man, please realize that almost all non-latin script using languages are broken.
I don't know why complex scripts don't work. I just know that it doesn't work. The only think I can say is that this is an critical issue, and why don't you use the same rendering engine for gedit uses for the gnome-terminal. Gedit renders it correct. Or that can't be done, why can't you release a different terminal will gedit's rendering engine. That's all I can say. I had been a programmer, I would really work into it.
(In reply to comment #21) > I don't know why complex scripts don't work. I just know that it doesn't work. > The only think I can say is that this is an critical issue, and why don't you > use the same rendering engine for gedit uses for the gnome-terminal. Gedit > renders it correct. Or that can't be done, why can't you release a different > terminal will gedit's rendering engine. That's all I can say. I had been a > programmer, I would really work into it. This may be an important point why people have given few input to my comment so far.. Not a vte developer, I'll still try to clarify it as far as I know. In Gedit, the whole text is rendered as a single unit, as it's proportional by nature. So, the rendering engine is free to interact between character cells, including reordering and composing conjuncts as required by Indic and other CTL scripts. Terminals, on the other hand, are display grid, where individual characters are put in grid. And the grid cells are independent of one another. That's how it works from the beginning. This is fine for Latin and CJK, and probably for Thai-Lao with typewriter convention applied. But it needs a tremendous change to support complex text like Indic and Arabic, where adjacent display cells must interact with one another. So, such deep structural change is not an easy task. It even deserves a redesign. However, as Thai has demonstrated a simple path to solve it with typewriter convention, I'm curious how much this can be applied to other CTL. For Thai, the only problem is the decomposition of SARA AM, which requires an interaction with its previous cell. But in traditional typewriters, it's just rendered undecomposed. And people are familiar with it enough to accept it. So, I wonder how much we can fall back similarly in case of Indic, as I learned that there once existed Hindi typewriters, at least.
Ya! Hindi had typewriters but the problem is that typing sequence in typewriter is wrong according to system of writing Devanagari, it is not used nowadays. As it requires changing sequence too much it like a new layout to learn. The standard unicode input is the correct sequence of writing Devanagari. Also another thing to consider is the monospace fonts for terminal. Devanagari is written under a common line so there is nothing called mono-space fonts for Devanagari as required by terminal (If I'm sorry, somebody correct it).
So, Thai seems to be lucky that it doesn't have the problem with encoding order. The "logical" order for writing is the same as that for typewriters. Thus, terminal implementation for Thai is relatively straightforward. Behdad, does this convince you to reconsider bug 583718 and treat Thai separately from CTL category? For Indic, I'm sure the required change is quite substantial, then. Let's give up the typewriter model and continue the discussion with something more radical.
Ya! That's nice. There are many things to consider before coming up with something radical. As far as I know is the issue is grid-wise treatment of chars. Grid-wise treatment of chars for devanagari is impossible every glyph has different size, many may merge to become one, shifting of position is required which makes grid-wise treatment no wise for devanagari. The question comes is why, grid-wise treatment in first place ? Devanagari is written continuously under a connected head-line. So there is way to consider for spacing between chars and putting it into grid.
(In reply to comment #25) > The question comes is why, grid-wise treatment in first place ? Because it's a "terminal emulator", that is, a program that emulates text console. > Devanagari is written continuously under a connected head-line. So there is way > to consider for spacing between chars and putting it into grid. Please still consider fitting it into grid. Imagine many fullscreen text console programs that use line drawing characters to appear ugly if the co-existing Devanagari text is not monospace, for example. Instead, imagine a VGA text console and try to place Devanagari into it. That's our task. It may not be proportionally spaced, but the top line can still be connected. And a grid cell can contain vertically stacking characters, or even a conjunct composed of multiple characters. All these considered, Devanagari can certainly be monospace. The problem is rather at how to make computing multiple adjacent cells at a time possible, so the reordering can take place. This change alone is technically substantial enough, I think.
(In reply to comment #24) > Behdad, does this convince you to reconsider bug 583718 and treat Thai > separately from CTL category? I'm afraid not. I don't have any interest in maintaining a separate shaping system just for the terminal.
(In reply to comment #20) > (In reply to comment #18) > > The bug priority is for the _developers_ to set; not you. If this bug is 'high > > priority' for _you_, consider hiring someone to work on it. > > I really hate this guy. Oh man, please realize that almost all non-latin script > using languages are broken. Stop your hateful behavior *now* (ie. no followup needed.).
(In reply to comment #27) > (In reply to comment #24) > > > Behdad, does this convince you to reconsider bug 583718 and treat Thai > > separately from CTL category? > > I'm afraid not. I don't have any interest in maintaining a separate shaping > system just for the terminal. No, it's not a separate *system*. SARA AM appears to be a *special case* in many Thai implementations. This situation is well-known to Thai implementors. Rendering it in undecomposed form is how it has been implemented since MS-DOS age. Note that this is still true in some famebuffer terminals in Linux. The decomposition has just been introduced in graphical environments, where proportional rendering is possible. Trying to follow CTL path for Thai in terminal would mean inventing a new system which never existed before. (Well, you may say such extra point is good. But all Thai users want now is just a terminal that works like others.)
Created attachment 177719 [details] Rendering of Devanagari script before making any changes This figure shows the rendering of Devanagari script on the vte terminal before making any changes to it.
Created attachment 177720 [details] Rendering of Devanagari script after making changes to vte This image shows the rendering of Devanagari script after we made some changes to the code of vte.
We are 3 students from the College of Engineering, Pune working on providing support for Devanagari script on GNOME terminal. For that we have modified 3 files of vte. All the details of our present work can be found at:- http://indiclanguagecomputing.wordpress.com I am uploading the two screenshots which show the previous rendering and new rendering after we made the changes. There are still some issues which need to be resolved [explained on the blog]. My colleague will attach the patches for the same.
Created attachment 177729 [details] [review] Patch for Devanagari script rendering I am attaching the patch for rendering of Devanagari script on vte. Please review and suggest the improvements.
Christian Persch has changed this bug's severity from Normal to Enhancement. I'd say this is not true for Thai. For Thai, this bug is considered a *regression* from some ancient versions of VTE, where Pango shaping had not been applied at all. See bug 583718 for more info. But if you accept that Thai should *not* be handled along with Indic scripts in this bug, but rather fixing bug 583718 instead, that's another story.
Coding shaping login in vte is not an option. So all complex text is simply one request, which is hooking up harfbuzz or pango shaping into vte.
Yeah, but Thai used to be acceptably rendered on VTE without Pango. Just simple monospace stacking (aka charcell in XLFD term) applied elsewhere are sufficient, including xterm, emacs, or even framebuffer consoles and DOS VGA. And this had been achieved using proper monospace (charcell) fonts. No complex text handling has been required so far. Just direct imitation of traditional typewriters. While hooking Pango helps improve the quality a little bit in some cases, it introduces a noticeable regression for Thai, which could be treated as non-CTL in text console aspect. But as Pango or Harfbuzz is to be applied, regression should be fixed, so it works the way it does everywhere else, with or without the CTL handling to be implemented, as proposed in bug 583718.
For clarity: my proposal is to fix Pango, not to unhook it.
Created attachment 190746 [details] Image showing the Devanagari rendering in VTE Image shows the rendering of Devanagari script in VTE.
We are trying with the same concept. Devanagari rendering is almost perfect now, but we are trying to fix other issues caused due to our changes to the code. I can fix those issues, if I get the help from the community immediately. You can see the rendering of Devanagari script in the attached file. I think the rendering of other languages can be done in similar way.
*** Bug 673601 has been marked as a duplicate of this bug. ***
Comment on attachment 177729 [details] [review] Patch for Devanagari script rendering Rejected as per comment 35.
*** Bug 688494 has been marked as a duplicate of this bug. ***
*** Bug 555641 has been marked as a duplicate of this bug. ***
See bug 535896 for relevant discussions. Copy-pasting the most important bits: Problems arise with Devanagari spacing-marks. These characters extend the width of the base character (from 1 to 2) and place an accent before/after/around it. Because of these possible different positions, the approach of storing the base character in the first cell and the accent in the second (or conditionally the other way around - ouch!) is not feasible. I believe the right approach would be the CJK-like approach. The base character and the vowel are combined into a vteunistr (just like with non-spacing-marks currently), and is stored in a double wide cell. Rendering probably wouldn't be that hard, since we render correctly with non-spacing-mark accents already and we also render CJK, this would the combination of these two. Chances are it'd work out of the box. The logic to combine the base char and the vowel (especially across a line wrap) is a bit tricky, but not that hopeless. Once they are combined, they live together forever, copy-pasting or rewrapping shouldn't separate them, the cursor over them should be double wide etc., just as with CJKs. We'd need to study if let's say multiple spacing-marks over a base character are allowed (I hope not!) and have corresponding safety guards.
(In reply to comment #45) > See bug 535896 for relevant discussions. Copy-pasting the most important bits: > > Problems arise with Devanagari spacing-marks. These characters extend the > width of the base character (from 1 to 2) and place an accent > before/after/around it. Because of these possible different positions, the > approach of storing the base character in the first cell and the accent in the > second (or conditionally the other way around - ouch!) is not feasible. > > I believe the right approach would be the CJK-like approach. The base > character and the vowel are combined into a vteunistr (just like with > non-spacing-marks currently), and is stored in a double wide cell. Rendering > probably wouldn't be that hard, since we render correctly with non-spacing-mark > accents already and we also render CJK, this would the combination of these > two. Chances are it'd work out of the box. > > The logic to combine the base char and the vowel (especially across a line > wrap) is a bit tricky, but not that hopeless. Once they are combined, they > live together forever, copy-pasting or rewrapping shouldn't separate them, the > cursor over them should be double wide etc., just as with CJKs. > > We'd need to study if let's say multiple spacing-marks over a base character > are allowed (I hope not!) and have corresponding safety guards. This might work, but would essentially be a partial hack. I think we should first switch vte to call directly into HarfBuzz instead of Pango. At that point, there are a variety of ways to improve things, including getting rid of the dotted-circle first.
> This might work, but would essentially be a partial hack. I think we should > first switch vte to call directly into HarfBuzz instead of Pango. Would that make my suggestion unnecessary? Would that know how to render if a cell contains a base char, and the next cell contains a vowel that should be drawn to the left of the base char? Mouse highlight, line wrapping etc. would still remain issues. > At that > point, there are a variety of ways to improve things, including getting rid of > the dotted-circle first. Seems that these two works are (mostly) orthogonal. My suggestion is mostly terminal logic with little bit of UI, your one is mostly UI.
(In reply to comment #45) > I believe the right approach would be the CJK-like approach. I haven't tested, but I have a feeling that libraries optimizing for screen updates (e.g. ncurses) might not care about any other properties of these spacing-mark chars that they are single wide. And if this is the case, they might update the base cell only and not the vowel, or the vowel only and not the base cell, if that's the shortest way to update the terminal's state. So we'd need to double check that we properly split or combine these characters whenever necessary. Splitting is already done (because of CJK), just the details would have to be modified a bit (restore the individual base char and spacing-mark, rather than replacing by spaces). Joining is something we don't quite have. On every character insertion, we'd need to try to join the new one with the preceding as well as with the following cell's character. We'd need to watch out for all other cases where existing characters can become adjacent (e.g. delete-characters, anything else?).
I've quickly prototyped up two possible patches. Apply one or the other, but not both. Patch #1 takes the approach that spacing marks should not extend the base character's width. Pros/cons: + This is what vim (incorrectly?) expects, vim will work perfectly. + The text looks nice (to me). + The patch is complete. - Any apps other than vim are likely to fall apart. Patch #2 takes the approach that spacing marks should widen the base character to be double wide. Pros/cons: + This is what probably all applications (except for vim) expect. - Rendering is not that nice, there's a large visible space where two letters should not be visually separated. I've no clue if we can modify font rendering to at least continue the horizontal line that's present in most of these glyphs. - The patch is incomplete, there are many corner cases yet to be taken care of. Similarly to the already existing "ambiguous width characters" option, I guess we may need to implement both behaviors, and have another config option to choose one. At this step I'd like to request feedback from people familiar with the Devanagari script.
Created attachment 278463 [details] [review] Width 1: make spacing-marks actually behave like non-spacing-marks
Created attachment 278464 [details] [review] Width 2: Make spacing-marks combine with the previous and form a double wide cell
Review of attachment 177729 [details] [review]: This patch might be a nice proof-of-concept, but is very far from production quality. See individual comments for details. The patch only handles when characters are emitted one-by-one, with no other operation in between. Care has to be taken to study the desired behavior and carefully implement this if e.g. the cursor is moved between printing a base character and a combining one, etc. Code formatting (e.g. usage of spaces) has to follow the rest of VTE. ::: iso2022.c @@ +390,3 @@ + It returns 1 if found in that array,else returns 0. +*/ +short iscomplex(gunichar c){ This name should be more specific, e.g. is_devnagari. Make the function static. @@ +392,3 @@ +short iscomplex(gunichar c){ + short i=0; + while(devnagari[i]!='\0'){ Please check if there's a glib method doing what you need (g_unichar_type, g_unichar_get_script...). If not, please start by checking upper/lower boundaries so that the most common use case (i.e. the letter is not devanagari) doesn't perform a linear search over ~20 items. @@ +412,3 @@ + The following condition should actually return 2 as it is ambiguous width character but just to + differentiate between ambiguous width character and characters in the array "devnagari" we are + returning 4. Actually we have set it to 2 again when its job is done, not to affect any other Don't misuse semantics. It's very bad coding practice if the value of width==4 means that the width is 2 and there's some other condition. Use values that have the exact semantics that their name suggests. Use more values or enums if desired. ::: vte.c @@ +77,2 @@ +//this flag is for printing the characters with halant in devnagari. +static short halant_flag=0; Don't use static. An application (e.g. gnome-terminal) might have multiple vte widgets. Using static means that what happens in one of them has influence on the other. If you need to store a status, add a new field under terminal->pvt. Make sure to properly update this status on all kinds of operations (cursor moving, deletion, reset...) I think it's a better approach not to store a status anywhere but examine the contents of the preceding cells. @@ +78,3 @@ +static short halant_flag=0; +// this flag is for the characters which requires more space when appended to previous character. +static short iscomplex_flag=0; Ditto. @@ +3047,3 @@ + /* + In case of halant we look behind for two characters. Halant is zero width character so Could you please explain to me in details what the exact behavior of Halant should be? It's not clear to me from your comment, I only have a feeling that it's something more complicated than the other spacing-mark vowel signs.
(In reply to comment #46) > I think we should first switch vte to call directly into HarfBuzz instead of > Pango. How will this be done, something like the cairo stuff in hb's util/ directory ?
*** Bug 744121 has been marked as a duplicate of this bug. ***
Behdad: I guess you probably won't have time to work on this yourself, but if you have any thoughts, could you outline how you think using harfbuzz should work in vte, as a guideline for someone else to implement this? Thanks!
*** Bug 757944 has been marked as a duplicate of this bug. ***
*** Bug 762832 has been marked as a duplicate of this bug. ***
*** Bug 763947 has been marked as a duplicate of this bug. ***
*** Bug 761511 has been marked as a duplicate of this bug. ***
*** Bug 767715 has been marked as a duplicate of this bug. ***
*** Bug 770117 has been marked as a duplicate of this bug. ***
*** Bug 773552 has been marked as a duplicate of this bug. ***
FYI I put a $100 bounty on this bug: https://www.bountysource.com/issues/27378261-use-harfbuzz-directly-to-handle-unicode-complex-text-rendering
If anyone is still trying to work on this, checking how mlterm is doing Arabic and Indic might be a good start (https://bitbucket.org/arakiken/mlterm), it seems to even be able to use HarfBuzz these days (and it has a vte emulation layer that was usable at some point but last time I tested it myself was a while ago).
(In reply to Khaled Hosny from comment #65) > If anyone is still trying to work on this, checking how mlterm is doing > Arabic and Indic might be a good start Does it do Indic?! I took a look at mlterm, looks very similar to what running bicon does, with same limitations mostly. Maybe a bit better... > (https://bitbucket.org/arakiken/mlterm), it seems to even be able to use > HarfBuzz these days (and it has a vte emulation layer that was usable at > some point but last time I tested it myself was a while ago). The HarfBuzz seems to be used in the UI toolkit only.
(In reply to Behdad Esfahbod from comment #66) > (In reply to Khaled Hosny from comment #65) > > If anyone is still trying to work on this, checking how mlterm is doing > > Arabic and Indic might be a good start > > Does it do Indic?! I took a look at mlterm, looks very similar to what > running bicon does, with same limitations mostly. Maybe a bit better... It supports Indic text, at least according to this https://bitbucket.org/arakiken/mlterm/src/tip/doc/en/README.indic, which also suggests that HarfBuzz is used at least for Indic text.
*** Bug 784602 has been marked as a duplicate of this bug. ***
Complex font is rendering correctly in konsole. Any knows how it handles this?
Has there been any update on this. I am also looking for a terminal which supports Indic/Devanagari.
There's been no further development yet, as evident from this bug report. Apparently kde's konsole works with indic.
(In reply to Christian Persch from comment #71) > Apparently kde's konsole works with indic. It might be "by accident" though :) I'm quite certain Gtk+ and Qt can both render complex text in their regular widgets. Konsole is the only terminal emulator I've seen which doesn't render characters individually for each cell, but asks whatever font rendering engine to draw larger chunks or text. Sometimes this results in truly nasty bugs like https://bugs.kde.org/show_bug.cgi?id=379535 and a few others that I'm lazy to look up now, whereas here I suspect this generic behavior results in Indic "accidentally" working.
Thank you for your replies. >I'm quite certain Gtk+ and Qt can both render complex text in their regular widgets. So, is it possible to use 'regular' rendering for complex scripts rather than cell based rendering, just for complex scripts/Indic. Alternately, for Indic scripts instead of using one unicode character for each cell (I am assuming that is how cell based rendering works) use a syllable, example text अ हँ रु द्रे भि र्व सु भि श्च रा म्य ह मा दि त्यै Please see http://www.typophile.com/comment/521602#comment-521602
The following documents maybe helpful in defining an Indic Syllable. http://www.unicode.org/L2/L2016/16161-indic-text-seg.pdf http://www.unicode.org/L2/L2016/16215-indic-presentation.pdf
By the way, slightly related: do we know anything about the issue reported e.g. at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=854149? Even "simple" combining accents are often displayed on top of the following letter rather than the previous one. This depends on some weird combination of the base letter, the combining accent itself, and the font(!). It's definitely not an emulation but a presentational problem. It's not specific to VTE, other software such as gedit or konsole are also effected. Whose fault is it? Will this problem remain if we switch to harfbuzz? Can we expect complex text to show up properly if this much simpler issue isn't handled correctly? Or are we okay blaming certain (quite a few) fonts and asking users to avoid them?
Gedit (and essentially any GTK application other than VTE) shouldn’t have any issue with accents and any such issue is definitely a bug (either font or software).
<off> Gedit can't even display the space character properly, see bug 780068. LOL. </off>
(I think that's a pango and/or font bug, given that using 'pango-view --font "DejaVu Sans Mono, 10" filename' reproduces, showing a '1-2' rendering. (The - being over the 2 is then because vte renders the 2 on top of the 1- overrunning its cell.) In any case, let's discuss that in a separate bug.)
This issue has been referenced here https://github.com/tonsky/FiraCode/issues/162 I couldn't figure out why all equal "=" signs are invisible in vte terminals.
Michael, please see also bug 762832 comment 1 why the idea of using of such ligatures is theoretically somewhat problematic in terminal emulators. We could do something, it'd probably be good enough in some sense, but it would definitely look ugly at the margins, and those who care enough about ligatures probably wouldn't be happy about that. IMHO (but this is a completely different story) instead of such ligatures, programming languages should accept fancy Unicode signs such as triple equals, does not equal etc., and text editors could help the user by replacing the ASCII sequences with the Unicode codepoint. --- Behdad, how would Persian/Arabic/etc. shaping be influenced by line wrapping, as well as words cut in half (e.g. because they overflow from the text editor's viewport, and the terminal emulator is not aware about the rest of the word)? Would it look "good enough"?
Thanks for the reference, Egmont. I've been playing with kitty over the past few hours and must say the font rendering is wonderful, especially for ligatures and icon fonts (in my case non-monospaced) used in nvim, tmux status line and vcs-info promptline. The latter of which works well in vte terminals. Regarding line wrapping, I understand now what you mean and I am able to produce some funky results by resizing the kitty terminal window. That doesn't happen in gtksourceview or this html input form. --- Off topic: I was just reading your insights on unrelated https://github.com/kovidgoyal/kitty/issues/160 in another tab when the inbox notification popped up. :)
*** Bug 793391 has been marked as a duplicate of this bug. ***
Besides the invisible equal sign, I'd like to add that '>' is shown as '=>' when using FiraCode 1.205. What amuses me a lot is that when the window of gnome-terminal loses focus, the '=>' gets extended on its left and collapses with the '$' at the end of the prompt.
(In reply to Sah from comment #83) > Besides the invisible equal sign, I'd like to add that '>' is shown as '=>' > when using FiraCode 1.205. > > What amuses me a lot is that when the window of gnome-terminal loses focus, > the '=>' gets extended on its left and collapses with the '$' at the end of > the prompt. This one is tracked in bug 793391 and will be fixed soon.
@Theppitak Karoonboonyanan Many years later, is Thai SARA AM still an issue in VTE? As far as I understand, ideally this character places a circle over the preceding character, and itself shows in its own cell looking like an "upside down J". With some of the fonts the behavior is this. One drawback is that if SARA AM is the first character in a row then the circle doesn't show up. Another drawback is that temporary visual glitches might occur e.g. when the screen is partially repainted, https://gitlab.gnome.org/GNOME/vte/issues/26 is going to (mostly) eliminate these visual glitches. If I understand you correctly, for typewriters the alternate display is to place the circle inside SARA AM's cell, that is, on top of this "upside down J". This is well accepted, or even expected from Thai users on typewriter-like devices. Did I understand this correctly? With some other fonts this is the behavior I get. This behavior is more robust when a line happens to break just before the SARA AM. Users can pick a font according to their preferred behavior between these two. None of the fonts I could try displayed a dotted circle (bug 583718) for me, so I suspect this has been fixed in fonts or in pango through the last couple of years. Is there anything right now that VTE could do to fix/improve around SARA AM? For Devanagari spacing makrs we'll probably have to implement a change to the emulation layer. When the cursor is just about to wrap to the next line, a freshly received spacing mark will grab the previous base character and move it to the next line, so that they stick together. Do you think this would be the desired behavior with SARA AM too, so that it's guaranteed to be on the same line as its preceding character?
Created attachment 373741 [details] [review] Render spacing marks This patch makes VTE render spacing marks along with the previous character, thus fixing the most annoying rendering issue with Devanagari: the dotted circles. I believe Devanagari text becomes significantly more readable with this patch, although I cannot read this script so I cannot tell for sure. The patch goes on top of these two patches from https://gitlab.gnome.org/GNOME/vte/issues/26: - vte-26-draw_rows-v4.patch - vte-26-invalidate_rows-v1.patch Known bugs, limitations: - Dotted circle still might appear under or next to the cursor, if block cursor shape is used. A workaround is to use underline or i-beam cursor shape. - Probably the result is far from ideal when a spacing mark wraps to the next line while its base character is still in the previous one. I think it should be fixed in the emulation logic (a wrapping spacing mark should grab the previous base character to the beginning of the new line), and not in the displaying part. I'm planning to do it shortly. - No special support for Virama a.k.a. Halant (yet). - Each character still goes into its desired cell, or pair of cell in case of base character + spacing mark. As a result, there's more whitespace between letters than desired. (Can we blame the fonts here? Is there a monospace font for Devanagari, obeying the logical width specified in Unicode?) Truly beautiful rendering would require breaking out of the grid, I'm not sure how that'll work out. We'll see when we begin porting to Harfbuzz. - No sign of harfbuzz yet.
Created attachment 373742 [details] [review] Special treatment of Virama On top of the previous patch, this one adds special treatment for Virama/Halant. Readers of Devanagari scripts: Feedback welcome, for the previous patch only, as well as the previous + this patch. Thanks in advance! By the way, what's the ideal rendering when a word is wrapped at a Virama? Also, do we need to handle three or more letters joined by Viramas?
(In reply to Wesley Moore from comment #64) > FYI I put a [...] bounty on this bug Wesley, and the two other backers at this moment: This is very kind of you, highly appreciated! Could you please clarify: What are the acceptance criteria? This bug is becoming a mixture of multiple issues. E.g. the original summary of the bug didn't mention harfbuzz, it was added later (but was already there by the time you put this bounty). Does it necessarily have to be harfbuzz, or are you okay with anything else that's "reasonably good"? I'm planning to significantly improve on the current situtation for the next stable release 0.56, including BiDi support, some basic Arabic shaping, and the aforementioned Devaganari dotted circle issues. I'll also take a look at harfbuzz, I might add support at a later step (maybe still in time for 0.56, or one or two cycles later), but at this early stage I cannot foresee how that can play along with the strict grid nature of terminal emulators. I don't want to do what Konsole does where columns just get totally misaligned. Nice rendering is an important goal, but I don't think we should sacrifice proper alignment for that. We'll see... :)
I'm.. not excited about adding rendering hacks in vte ;). I know you wrote to me re harfbuzz. I'll reply to your messages this week.
I think rendering spacing marks together with their base character (the first patch), rather than on their own resulting in dotted circles, should be perfectly okay and reasonable until we have something even better in place (e.g. harfbuzz). The Virama hack... I'm not excited about that one either ;)
In two recent comments I menioned I'd probably modify the emulation behavior so that a wrapping spacing mark grabs the preceding base letter into the new row. I'm no longer sure about this. This would most likely break apps that rely on exact cursor tracking, and in the same time do soft breaks at end of lines. Examples include bash/readline, zsh, probably any other modern shell, less in folding mode, and maybe more. These apps would need to be adjusted, causing them to break in terminal emulators that don't support this new method of wrapping. We could introduce a new escape sequence to choose between the two behaviors, but probably no one would care enough to properly toggle it. Another slight inconsistency with the proposed wrapping behavior is that it's unreasonable and expensive (and prone to further breakage) to unwrap (move the base letter back to the previous line and rearrange the rest) any time the spacing mark disappears (e.g. is overridden by a normal letter). (That disappearing spacing mark might not even be the last character of that paragraph at that time, it could be in any earlier line inside a long paragraph.) However, rewrapping on a back-and-forth resize would snap it back. This is not strictly speaking a bug or a problem, but a sign of poor design. At this moment I have no idea how to handle this case of a wrapping spacing mark. Let's postpone this for a while.
(In reply to Egmont Koblinger from comment #88) > > Wesley, and the two other backers at this moment: This is very kind of you, > highly appreciated! > > Could you please clarify: What are the acceptance criteria? This bug is > becoming a mixture of multiple issues. E.g. the original summary of the bug > didn't mention harfbuzz, it was added later (but was already there by the > time you put this bounty). > > Does it necessarily have to be harfbuzz, or are you okay with anything else > that's "reasonably good"? For me the underlying motivation was better general support for ligatures in VTE so that I could potentially use a font with ligatures (PragmataPro). I'm not fussed by the particular approach. In the end, I think if this bug was closed with a result that solved the issue reported in the first comment then I'd pay out the bounty.
Created attachment 374146 [details] [review] Render spacing marks, v2 Updated to apply on top of current master. Do render a standalone spacing mark in the first column, rather than skipping it.
The unistr functions look ok. -#define VTE_ATTR_BOLD_SHIFT (VTE_ATTR_FRAGMENT_SHIFT + VTE_ATTR_FRAGMENT_BITS) +#define VTE_ATTR_SPACING_MARK_SHIFT (VTE_ATTR_FRAGMENT_SHIFT + VTE_ATTR_FRAGMENT_BITS) I'd add that at the end (after INVISIBLE, with a comment separating it from the others) since this is just a cached value derived from the cell's character, not an independent attribute. + * Skip spacing mark cell (it's combined with the preceding regular one) except in the first column. */ For the first column, could just put the mark on a U+00A0 (NBSP) instead of treating this differently here? Actually I don't see why we don't just combine the mark with the preceding character always just like for non-spacing marks, and then we don't need the ismark() except when inserting the character?
(In reply to Christian Persch from comment #94) > -#define VTE_ATTR_BOLD_SHIFT (VTE_ATTR_FRAGMENT_SHIFT + > VTE_ATTR_FRAGMENT_BITS) > +#define VTE_ATTR_SPACING_MARK_SHIFT (VTE_ATTR_FRAGMENT_SHIFT + > VTE_ATTR_FRAGMENT_BITS) > > I'd add that at the end (after INVISIBLE, with a comment separating it from > the others) since this is just a cached value derived from the cell's > character, not an independent attribute. Actually, I can't recall why I added this bit for the cached value, rather than calling g_unichar_ismark() directly from draw_rows(). One reason could have been the case if we ever introduce a way (API or esc seq) of modifying the set of spacing marks, similarly to the ambiguous width one; in that case the property as of the time the character was emitted should be relevant. But I don't see a use case for introducing such a feature (at least not within the case of such characters already occupying a separate cell). Another reason could be some tiny speedup when the emulation (e.g. rewrap on resize) cares about this, but it doesn't, see below. Caching is a tiny bit faster during regular usage of the terminal, not caching is a tiny bit faster if tons of data is produced that don't ever make it into rendered glyphs. Without doing any measurements, I have the feeling that the difference is so tiny that it's not worth it to make extra rounds. That is, I'm tempted to remove this caching, and call g_unichar_ismark() right from draw_rows(). > For the first column, could just put the mark on a U+00A0 (NBSP) instead of > treating this differently here? This is an excellent idea! (I'm wondering why it doesn't work with a regular space; nevermind.) This avoids displaying the "dotted circle" placeholder. There is one trick, though. Most of the combining marks are applied in the regular order (to the right of the base glyph), a few are applied in front of it, or perhaps have other tricks. With the typical case in mind, if the spacing mark is logically in (0-based) column 0, it would show up in column 1 (over whichever letter is there). So maybe the best behaivor would be to display this combined glyph beginning at offscreen column -1. That way the space goes to column -1 and the accent goes to column 0. (This wouldn't be that great for "swapped" marks, though, but probably we can live with that.) I'll experiment with this. > Actually I don't see why we don't just combine the mark with the preceding > character always just like for non-spacing marks, and then we don't need the > ismark() except when inserting the character? So, there's basically two possible approaches I can think of for handling spacing marks: to do in the emulation layer, or in the display layer. I pondered about these in comments 45-48-ish. I ended up dropping the idea of doing it in the emulation layer. There are so many crazy corner cases that it's almost impossible to get it right: - Screen drawing libraries probably don't distinguish spacing marks from regular letters, and they might only overwrite the ones that actually changed. Unlike with CJK where overwriting one half might wipe out the other, here it would be important to preserve the other half (split the joined character). - It's possible that multiple consecutive cells contain spacing marks. Even updating "normal" real-life text might temporarily introduce one, e.g. currently columns 10-11 contain a letter + spacing mark, the desired update is that columns 9-10 will contain a letter + spacing mark; during emitting the update there's a moment when column 9-10-11 contain a letter + 2 spacing marks. Maliciously crafted output can generate even more, can split a longer run of spacing marks in two, or can join two existing runs of spacing marks. Handling all these cases correctly, especially with our vteunistr model, sounds like a nightmare. - We'd need to implement the case of an arriving spacing mark retroactively dragging the preceding character into the next line (which we'll have to do with VS16 anyway) – or even dragging the preceding character + its already present several spacing marks? So I decided to leave the emulation layer unchanged, and handle these on the display layer. This might not be that ideal when such a combo happens to cross the line boundary. For that use case, the other model would probably be better (but then again, how far would we go with this? Would we also recognize ZWJ, Virama/Halant etc.?) The current code doesn't handle such cases that well, although with your U+00A0 trick it would probably be reasonably good, and we can always iterate if the need arises. We could improve the code to look for such combos (as well as standard ligatures of Firacode and such) across linebreaks, and let's say split the glyph in two. This should probably be considered if/whenever we switch to harfbuzz. Does this make sense?
The use of the term “spacing mark” in this discussion is rather confusing, spacing marks are marks that standalone like an other letter and do not combine with preceding characters, like `, on the other hand NON-spacing marks are the one that combine with preceding characters and usually have zero-width.
We're talking about the "Mark, spacing combining" [Mc] category of Unicode (see e.g. https://www.unicode.org/versions/Unicode11.0.0/ch04.pdf chapter 4.5), that is, combining characters that have a width of their own (wcwidth() returns 1), thus combine with the base character and extend their width. The loose wording of "spacing mark" is also seen in some official Unicode docs (e.g. in http://unicode.org/glossary/), and is used by glib too (https://developer.gnome.org/glib/stable/glib-Unicode-Manipulation.html#GUnicodeType). I haven't seen ` and friends being being referred to as "spacing marks", they are "punctuation" or "symbol" characters.
Created attachment 374174 [details] [review] Render spacing marks, v3 This patch removes the caching of the spacing_mark attribute. I'm still about to implement the NBSP trick.
FYI: BiDi support (including Arabic shaping) is being worked on at https://gitlab.gnome.org/GNOME/vte/issues/53.
Created attachment 374190 [details] [review] Render spacing marks, v4 The NSBP trick implemented. Luckily, the code ended up much simpler than I had hoped for :) It's still not drawn correctly when the rectangle cursor is over one of such two cells (especially the spacing mark). It'll require heavy reworking of how we paint the cursor, so I'd defer it to some undefined point in the future. Until then, other cursor shapes are a good workaround.
"Render spacing marks, v4" committed to master (0-55).
well, this need was submitted more than 10 years ago and yet vte still is not supporting fonts with ligatures. It should also be noted that some solutions have been submitted long time ago. Sounds like a joke!
That comment added nothing to this bug; please refrain from commenting further. See https://bugzilla.mozilla.org/page.cgi?id=etiquette.html for bugzilla etiquette.
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/vte/-/issues/1661.