GNOME Bugzilla – Bug 741520
Whole scrollback in one file, O(1) rewrap
Last modified: 2021-06-10 14:59:06 UTC
Inspired by - https://bugs.kde.org/show_bug.cgi?id=196998#c20 - recent changes to the stream (bug 738601) - and following the concept of my recent mcview rewrite https://www.midnight-commander.org/ticket/3250 Let's merge attr changes into text_stream, and drop row_stream entirely. The whole ring would be stored in a single file. Wrapping paragraphs to lines would happen on demand for the visible part (and a bit more as necessary) only. We could insert some additional metadata at every fixed offset (practically once per boa block), like an ever-increasing paragraph count, the current attributes, convenient offset to the previous newline, etc.) This could be encrypted independently from the rest of the boa block, so we don't have to read/decrypt/decompress a full 64k to get this small piece of info. Advantages - Immediate resize with reflow (assuming paragraphs of sane lenghts) - Only 1 fd per vte, bit smaller overall disk usage - Simpler design/implementation (or not?) - Scrollback size would be specified in paragraphs, so no longer drop data or leave partial line at the top on a resize. - Better change to get recovery after a disk full finally work flawlessly Disadvantages/gotchas - Rendering might slow down a bit when giant paragraphs are encountered and you scroll back. Or (like in the mcview rewrite) there could be a safety cap: if we need to traverse more than 1MB backwards to find a newline then we give up and just render it starting randomly - We'd need a safety upper limit on the stream size (in bytes, computed from the scrollback paragraphs) not to eat up disk on oversized paragraphs. - The scrollbar wouldn't move evenly, as its position would correspond to the paragraph rather than the line. We'd need to decouple scroll_delta from the widget's vadjustment, and insert an additional layer in between which maintains the current line offset within the paragraph. (Shift+PgUp and friends would of course still scroll by a fixed amount of lines not paragraphs.) - Dragging the scrollbar would require a binary search in the stream, to find the required paragraph counter. - Searching in the scrollback would be trickier due to inlined attributes. Not sure if it's worth it. Not even sure that I'd go this direction if I started from scratch, but probably I would.
Thanks for bringing this up! Ok lets get busy! Some context: - The reason we ended up with the current 6-fd design is that I was replacing the existing ring, and wanted to limit my work to within the ring, and not have to touch the widget. In particular, no change to delta and scrollback behavior. That's why I needed exact same behavior, which then drove the design for keeping rowstream and textstream. Separating attrstream from text was motivated by my desire for save-history operation to be very simple. Which, again, in retrospect, is optimizing for the wrong operation... So, if we were to step back and redesign: - Agree that attr and text stream should be merged. We need a UTF-8 extension to allow encoding a four-byte attr, and then just leave it in the stream. We should always encode the attrs at the start of the paragraph if it's non-default. Any fast-casing of ASCII-only text will continue to be fast-cased if there are no attrs. - Guarantee a paragraph to go to 64k only. Terminate the paragraph if it spans a full boa block. - For paragraph handling, I'm leaning towards just keeping the paragraph number at the start of each 64kb block in a simple C array/ring into memory and never trying to write it whatsoever. At least not initially. We can always change the design later such that every 4GB worth of data, we dump the paragraph indices and start a new super-block.
Advantages (cont'd): - s/change/chance - Getting rid of g-t checkbox, make rewrapping the only implemented behavior (In reply to comment #1) > - The reason we ended up with the current 6-fd design is that I was replacing > the existing ring, and wanted to limit my work to within the ring, and not have > to touch the widget. Fair enough, it's the same reason why I ended up with the current rewrap :) Now I'd feel more confident touching multiple parts at once. Yet I feel this one would be a more complex change than rewrap or snake on their own. > - Agree that attr and text stream should be merged. We need a UTF-8 > extension to allow encoding a four-byte attr, and then just leave it in the > stream. We should always encode the attrs at the start of the paragraph if > it's non-default. It could be 254 followed by the attr, terminated by 255; this would even allow us to safely walk backwards (not sure if we'll need that). You can even know the attribute while walking backwards if you store the initial actual value at the beginning of a paragraph or boa block, and a value to XOR with later in the stream. Probably we'll need to make sure that such an attr change doesn't cross boa boundaries, by making a boa shorter if required. (Nit: attr is 8 bytes since true color support.) > Any fast-casing of ASCII-only text will continue to be > fast-cased if there are no attrs. I'd drop the fast-casing of ASCII if we only render a screenful of data; saving on code complexity would become more important then than microoptimizations. > - Guarantee a paragraph to go to 64k only. Terminate the paragraph if it > spans a full boa block. I'm not happy with the idea of suddenly breaking the behavior at some random arbitrary offset. And even if we do that, 64kB is way to low. E.g. mysqldump happily produces lines of 1MB. I don't want to break the terminal's search or copy-paste randomly very rarely. I'd much rather go for a small performance degradation which is linear to paragraph size. I'm slowly beginning to see the great advantages of 64k block size. It became very simple and cheap to store some additional data. At the beginning of each block, we could store a few LRU widths and wrapping state at that width. The one corresponding to the width when the content is produced could immediately be inlined there. Upon a width change, we'd need to compute the new value only once (by walking backwards without a limit) and then we'd update the block to remember this value. This would only happen on demand (when that content is actually displayed), and if the user rotates among just a few different widths (e.g. maximize-restore-fullscreen-restore the window) we'd eventually have these values for all these widths. > - For paragraph handling, I'm leaning towards just keeping the paragraph > number at the start of each 64kb block in a simple C array/ring into memory and > never trying to write it whatsoever. At least not initially. We can always > change the design later such that every 4GB worth of data, we dump the > paragraph indices and start a new super-block. I wouldn't go for any other block size, it just makes everything more complicated, harder to test etc... I mean, we could go for a solution as complex as a file system, like maybe a real loop-mounted ext2... :D okay, just kidding. ----- One tricky lowlevel technical detail to watch out for: Currently we can have lines that are not full wide, yet soft wrapped. One use case is when a CJK doesn't fit in the last column, in this case it's one cell shorter. Another one is if you print a \e[K or \e[P. For the latter one, you just can't have this state with dynamic wrapping, so I guess we'd need to force these lines to hard-wrapped. (What I'd like to avoid is: without any resize, you just print something, it scrolls out, you scroll back to see it again, and now it's formatted differently.) For the former one, imagine line 1 being 79 columns wide, then a CJK printed so an empty cell is left at the end and it goes to line 2. Then at some point line 1 scrolls out at the top (written in the stream), and then the new top-left corner (the CJK char) is overwritten with non-CJK which, when inserted in the stream and being wrapped again, would jump to the preceding line's column 80. Conclusion: a line's soft vs. hard newline terminated state can't be stored as long as the next line is still modifiable. Maybe the simplest approach is to keep at least (height+1) lines in the ring before writing to the stream, and force hard-wrapped terminator if the next row's first character would still fit there.
Same story when the 2nd line's first cell is later modified to be a combining accent: we don't want that to be matched up with the preceding line's last character (and causing subsequent content to be shifted) when it's getting scrolled out.
Random thought: We have several bugreports about vte being slow with dingus regexp matching on extremely long lines. Maybe instead of doing a runtime matching all the time, the stream could store the boundaries of matches, along with the rule number that matched. Pretty much like attributes.
Sounds feasible.
I was thinking about it a bit more... The problem is: regexp matching is slow. We wouldn't want to do that for all the data; it's normally only done when the user moves the mouse. So we'd need to compute it on demand, and store it as a cached value. Which can't be inlined in the stream (we can't handle the middle of the stream growing arbitrarily). But we could have per-paragraph or per-boa flags denoting if we know for sure that there isn't any regexp match. Anyway, this will be the last step once everything else is solved :)
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/vte/-/issues/2154.