Bug 631685 – constant disk access by vtestream-file.h

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 631685 - constant disk access by vtestream-file.h


Summary:	constant disk access by vtestream-file.h


Status:	RESOLVED OBSOLETE

Product:	vte
Classification:	Core
Component:	general
Version:	0.27.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	VTE Maintainers
QA Contact:	VTE Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2010-10-08 14:57 UTC by aldyh
Modified:	2021-06-10 14:27 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
0001 Remove explicit I/O from the ring, without introducing VteRowData fragmentation. (16.73 KB, patch) 2011-06-08 19:56 UTC, Austin Clements	rejected	Details \| Review
0002 Remove now-unused vtestream (14.18 KB, patch) 2011-06-08 19:57 UTC, Austin Clements	rejected	Details \| Review
0003 Allocate VteCell data using a compacting, garbage-collecting pool allocator. (9.97 KB, patch) 2011-06-08 19:57 UTC, Austin Clements	rejected	Details \| Review
Keep scrollback history in RAM when /dev/shm is available (1021 bytes, patch) 2012-02-01 17:10 UTC, mhtrinh	rejected	Details \| Review
memfd proof of concept, v0 (668 bytes, patch) 2015-03-17 17:12 UTC, Egmont Koblinger	none	Details \| Review

Description aldyh 2010-10-08 14:57:54 UTC

I am narrowing down frequent disk accesses that inhibit Fedora from spinning down the hard disk, and have found that _vte_file_stream_ensure_fd0() will constantly create and unlink files:

        fd = g_file_open_tmp ("vteXXXXXX", &file_name, NULL);
        if (fd != -1) {
                unlink (file_name);
                g_free (file_name);
        }

When gnome-terminal is open with any kind of input, the filesystem is constantly in use.

Is there a better way of achieving the above, or is it necessary to constantly create and remove files?

Thanks.

Comment 1 Behdad Esfahbod 2010-10-08 15:18:11 UTC

We don't constantly create and remove files.  We create 3 such files per gnome-terminal tab and hang on to them.  We do however write to those files for every incoming line of terminal data, which sucks.  What needs to be done instead, and I've been waiting for reports like yours coming in before adding it is to add a caching layer in there such that history is written to disk in batches (which also means that if your history lines are limited, wouldn't touch disk at all).  See vtestream-file.h.  Need to write a vtestream-cache.h.

Comment 2 aldyh 2010-10-08 15:34:48 UTC

Hi.  Thank you for your prompt response.

Yes, writing to a file for every incoming line would never allow the disk to sleep.

Are you saying that with a cache and a default scroll history, there'd be no disk access?  That would definitely be cool.

Comment 3 Behdad Esfahbod 2010-10-22 19:32:23 UTC

So, is there any flags we can set on the file when opening it to tell the kernel that writing updates to disk can be delayed more than usual?  The problem with using a buffer is that no buffer size can be right.  I was hoping that I could rely on the kernel making the right choice regarding when to write to disk.

Comment 4 aldyh 2010-10-29 15:43:19 UTC

Can't you save the first X kbytes in memory cache, and anything past that in disk?  And have the gnome-terminal default be something that can fit entirely in memory cache?

Also, aren't there system calls to use tmpfs (/dev/shm ??) storage?

Comment 5 Behdad Esfahbod 2010-10-29 15:52:24 UTC

(In reply to comment #4)
> Can't you save the first X kbytes in memory cache, and anything past that in
> disk?  And have the gnome-terminal default be something that can fit entirely
> in memory cache?

I can.  That's why I said we need to implement vtestream-cache.h.  Just has not happened yet.  As for the default, we changed the default to 10,000 lines IIRC which is a lot.  But the good thing about caching is that it can write to disk in chunks.

> Also, aren't there system calls to use tmpfs (/dev/shm ??) storage?

Not that I know of.  We store in /tmp, so if people have /tmp setup as tmpfs, that would work.

Comment 6 Austin Clements 2011-06-07 19:31:24 UTC

(In reply to comment #3)
> So, is there any flags we can set on the file when opening it to tell the
> kernel that writing updates to disk can be delayed more than usual?  The
> problem with using a buffer is that no buffer size can be right.  I was hoping
> that I could rely on the kernel making the right choice regarding when to write
> to disk.

I don't understand why VTE is trying to manage this at all.

If you want to rely on the kernel making the right choice about when to write to disk, just keep the ring in memory and allow the kernel's global page replacement to decide when to write it out.  You could either keep in it anonymous memory and rely on swap (which is, BTW, equivalent to putting the tmp files on tmpfs), or you could encourage the kernel to evict it to disk earlier by keeping the ring data in an mmap'd region.

Before you say "Swap?  Oh horrors!", consider that VTE's current approach is akin to not only manually swapping, but doing so aggressively and with no global resource policy.  Intuitively, VTE's approach seems better than swapping, but that's because it manually manages the locality of the on-disk data; if it relied on swapping but simply malloc'd all of the rows, then it would loose this locality and swapping would indeed be onerous.  However, there's nothing stopping VTE from constructing the same highly-local layout in memory, but relying on the kernel and it's global resource policy to intelligently evict it to disk.

Is there some deeper reason I'm missing that VTE keeps the ring on disk, or does my argument make sense?

(Amusingly, I was led to this code and bug report because of a poor interaction between Chrome and xfce4-terminal: Chrome [for reasons I haven't figured out] registers an inotify watch for writes to files in /tmp, so if something is writing a lot to a terminal, Chrome will chew up a good 30% of a CPU handling the inotify notifications.)

Comment 7 Behdad Esfahbod 2011-06-07 19:44:04 UTC

Feel free to improve it.  You will understand when you check the code :).

Short version is, we don't know how long the rings are, and for unlimited-scrollback, there is no ring.  The current design is the cleanest I could come up with.  You are welcome to write a mmap-stream for it.

Comment 8 Austin Clements 2011-06-07 20:46:51 UTC

I'm certainly willing to patch this, but first I have to understand why VTE is going to so much trouble for about a meg of memory.  Is my accounting simply off?  Are there use cases where it's significantly more memory?  Are there concerns about running tons of terminals on systems that are too small to run tons of terminals?

Comment 9 Behdad Esfahbod 2011-06-07 21:01:50 UTC

Previously we were mallocing each row's memory.  Imagine angry syadmins with a 100 tabs running for over a month and the kind of fragmentation that would bring.

You keep talking about "so much trouble".  In reality, the code there is the simplest way I could fix the fragmentation problem.  That's why it is written the way it is.  A simple buffer layer is all that is needed to fix this.  You can mmap() too, but then you have to manage the map size.

Comment 10 Austin Clements 2011-06-07 21:13:31 UTC

Thanks.  That justification makes sense.  I'll try to take a stab at this over the next few days.

Comment 11 Daniel Kahn Gillmor 2011-06-08 04:46:58 UTC

writing every terminal line to disk seems like it might raise some security concerns as well.

For example, a user who uses a console-based password-manager with an encrypted password store might be upset to find that the passwords the manager prints out got stored to disk just because they were displayed in the terminal.

For another example, OpenPGP has a way to indicate that encrypted messages should not be stored to disk in cleartext form:

 https://tools.ietf.org/html/rfc4880#page-47

If i used gpg to decrypt such a message and send it to a libvte-based terminal emulator, libvte would automatically violate the expectations of gpg.

Obviously, gpg can't actually guarantee that what gets dumped to the terminal never makes it to disk (due to swap, copy/paste, screenshots, etc), but the idea that it's automatically written to disk makes it impossible for such features to ever be meaningful.

Comment 12 Austin Clements 2011-06-08 06:09:35 UTC

I have a strawman working that relies on the kernel to evict scrollback data only when under memory pressure, without introducing memory fragmentation (much like your ring code, it never allocates or frees VteRowData except at initialization or if the user changes the scrollback length).  Currently, it keeps more VteRowDatas around than your implementation, so their internal buffers could in principle put more strain on the memory allocator, but that could be addressed with a little more effort if necessary.

However, my code never explicitly writes to disk, relying instead on the kernel to handle this.  Is there a driving reason your implementation writes to disk (hard-limiting memory usage for massive scrollbacks?) or was this more of an implementation detail?

Comment 13 Behdad Esfahbod 2011-06-08 13:31:16 UTC

Austin, got patch?   Lets move this upstream.  I don't like an in-memory-only solution.  I'm not comfortable doing unlimited scrollback with that.  But willing to evaluate.  However, a file-based mmap()ed model is I think what Konsole uses too, so there's some precedence here.  I mostly borrowed my model from Konsole in fact.

Dan, I understand your point, but IMO this is not too different from a swap partition.  When I was writing this code I considered XOR'ing with a per-terminal random key, but that's so prune to frequency analysis that I thought it would be giving a very fake sense of security.

Austin, now I remember why writing to disk is actually the right thing to do.  During my experiments I found that history logs like, say, compile logs gzip very very well.  Think somewhere between 5-to-1 to 50-to-1.  So my longterm plan was to add a caching and a gzip adaptor in the stream chain.  Since gzip chunks need to be decompressed into memory head-to-tail anyway, we can do a cipher-block-chaining at least.

Comment 14 Daniel Kahn Gillmor 2011-06-08 15:33:47 UTC

Behdad, this is *quite* different from a swap partition.  A swap partition is globally managed, and has system-wide resource allocation policies applied to it.  As a regular user of terminal emulators, i expect that the only thing to be written to disk is things i explicitly write to disk, whether that's through my shell's history logs, or through explicit commands.  I understand that swap means that anything in memory can be temporarily stored to disk, and i (or my admin) can take system-wide measures to ensure that swap is either not in use at all, or is cleanly encrypted.

Are you intending for this data to be accessible after the terminal closes?  If so, why?  If not, why is it being written to disk?

I don't understand your argument about why writing to disk is the right thing to do.  if we have to decompress the data into memory to access it anyway, what do we gain by having it on disk in the first place?  the full/uncompressed memory consumption is required in either case.

If the concern is to reduce memory pressure, you could always compress in RAM and free() the decompressed version, re-allocating and decompressing when you need it (though i'm not convinced that this would be an overall win).  I don't see why this argument justifies writing to disk.

Comment 15 Behdad Esfahbod 2011-06-08 15:42:01 UTC

We create+open a file in /tmp and immediately delete it.  So it's not left behind in the filesystem per se.  But left on disk plates.

We don't need to keep the entire uncompressed history log in memory at any time.  In fact, unless user scrolls up the history, we never need to read from the log at all.  That, combined with unlimited history, makes me uncomfortable keeping it in main memory.  YMMV, but then you're welcome to fix it.

Comment 16 Daniel Kahn Gillmor 2011-06-08 15:59:21 UTC

So your concern is about size?  what do you do if the filesystem you're writing to returns ENOSPC ?  In a real-world computer (not an idealized Turing machine), there is no such thing as "unlimited" :)

I've definitely seen systems where /tmp is more tightly constrained than RAM (e.g. my netbook has 2GiB of RAM, but /tmp is a logical volume of 512MiB), so you have to deal with limit conditions properly in either case.

So:

 * you expect the data to be volatile (no need to recover it after a crash)
 * users don't generally expect that traces of their terminal sessions will be left on their physical media
 * writes to disk are expensive
 * writes to disk are slow
 * the scrollback ring has to have some way to deal with limit conditions (either ENOMEM or ENOSPC) either way, meaning it can't be truly "unlimited"
 * a terminal that consumes the entire /tmp filesystem with an unlinked file denies access to /tmp to all other applications with no obvious indication of who is the culprit; a terminal that consumes all available RAM makes itself a nice fat target for the OOM-killer.

All of the above are either equivocal, or point to RAM, not disk as being the preferred option.

Sorry i don't have the bandwidth to make a patch myself.  If Austin's proposal above keeps the buffers in RAM, i hope i've provided you with convincing-enough arguments to accept them upstream :)

Thanks for working on libvte!

Comment 17 Behdad Esfahbod 2011-06-08 16:11:52 UTC

The current code already handles ENOSPC.  You get empty scrollback for the parts that could not be written to disk.  Doing so WITHOUT BRINGING MY MACHINE TO A GRIND.  Go do that with malloc()...

Anyway, don't want to be a dick, but I don't have time to discuss the details right now, as I last worked on this in 2008 or 2009 IIRC, and don't have concrete plans to work on it in the near future.

Comment 18 Austin Clements 2011-06-08 19:56:29 UTC

Created attachment 189499 [details] [review]
0001 Remove explicit I/O from the ring, without introducing VteRowData fragmentation.

Comment 19 Austin Clements 2011-06-08 19:57:00 UTC

Created attachment 189500 [details] [review]
0002 Remove now-unused vtestream

Comment 20 Austin Clements 2011-06-08 19:57:27 UTC

Created attachment 189501 [details] [review]
0003 Allocate VteCell data using a compacting, garbage-collecting pool allocator.

Comment 21 Austin Clements 2011-06-08 20:03:09 UTC

(In reply to comment #13)
> Austin, got patch?   Lets move this upstream.  I don't like an in-memory-only
> solution.  I'm not comfortable doing unlimited scrollback with that.  But
> willing to evaluate.  However, a file-based mmap()ed model is I think what
> Konsole uses too, so there's some precedence here.  I mostly borrowed my model
> from Konsole in fact.

I've attached what I have so far.  At the moment it essentially eliminates memory fragmentation, though it's not very memory-efficient because of how it keeps VteRowData around.  I believe that's best solved by in-memory compression of historical VteRowData (much like your suggestion), which I believe should fit nicely into the VteCell pool allocator introduced by the third patch.

 b/src/Makefile.am    |    4 
 b/src/ring.c         |  558 +++++++++++++++------------------------------------
 b/src/ring.h         |   19 -
 b/src/vterowdata.c   |   65 +----
 b/src/vterowdata.h   |    1 
 src/vtestream-base.h |  106 ---------
 src/vtestream-file.h |  296 ---------------------------
 src/vtestream.c      |   33 ---
 src/vtestream.h      |   49 ----
 9 files changed, 199 insertions(+), 932 deletions(-)

Comment 22 Behdad Esfahbod 2011-06-08 20:15:19 UTC

I don't like:

  1. Removing the UTF-8 and attr runlength compressions.  That increases memory 8 fold.

  2. Removing the vtestream abstraction.

  3. A GC-based approach.

  4. Need to handle mmap() absence on the system.

Why can't you just implement a vtestream using mmap()?!

Comment 23 Austin Clements 2011-06-09 02:24:32 UTC

(In reply to comment #22)
> I don't like:
> 
>   1. Removing the UTF-8 and attr runlength compressions.  That increases memory
> 8 fold.

As I said, the patch was incomplete and its current memory usage is high for exactly this reason.  I suspect (though have not verified) that simply zlib compressing the VteCell data would be more effective than UTF-8 and runlength encoding, particularly if done in blocks, which would be easy to approach with the pool allocator.

>   2. Removing the vtestream abstraction.

I put this removal in a separate patch in case you wanted to keep the (now dead) code around.

>   3. A GC-based approach.

You gave me the impression that minimizing memory fragmentation was an important requirement for this.  A compacting GC is a trivial (14 SLOC) and effective (~10% maximum internal fragmentation, ~1ms per megabyte) way to accomplish this.

>   4. Need to handle mmap() absence on the system.

This is a fair concern if VTE has to support non-POSIX systems.  Unfortunately, glib's g_mapped_file doesn't expose mremap, which the current patch uses, and apparently doesn't support mapping by FD, which would make mremap hard to fake.

> Why can't you just implement a vtestream using mmap()?!

I considered doing that and discarded it because I prefer solutions that delete more code than they add.

But I've found an even simpler solution: I've switched to urxvt.  I hope you haven't dissuaded future contributors from addressing this bug.

Comment 24 Behdad Esfahbod 2011-06-09 04:27:23 UTC

(In reply to comment #23)
> (In reply to comment #22)
> > I don't like:
> > 
> >   1. Removing the UTF-8 and attr runlength compressions.  That increases memory
> > 8 fold.
> 
> As I said, the patch was incomplete and its current memory usage is high for
> exactly this reason.  I suspect (though have not verified) that simply zlib
> compressing the VteCell data would be more effective than UTF-8 and runlength
> encoding, particularly if done in blocks, which would be easy to approach with
> the pool allocator.

The current design is supportive of an important feature I want to add in the future: rewrapping lines when terminal width is changed.  That's why I want to keep text and attr data in their own uninterrupted stream.


> >   2. Removing the vtestream abstraction.
> 
> I put this removal in a separate patch in case you wanted to keep the (now
> dead) code around.

I like to keep the abstraction.  It decouples the ring workings from the memory bookkeeping.


One other note: we delete the file in /tmp immediately.  If the kernel is sending inotify messages about changes to these files to anyone monitoring /tmp, that's a kernel bug.

Comment 25 Christian Persch 2011-08-16 17:47:23 UTC

Comment on attachment 189499 [details] [review]
0001 Remove explicit I/O from the ring, without introducing VteRowData fragmentation.

Setting patch statuses accordingly.

Comment 26 mhtrinh 2012-02-01 01:49:03 UTC

Hi,

I'm using Xfce Terminal that use vte lib. The problem is that ext4 journalling system jbd2 keep logging change of deleted file, thus prevent any disk spin down. 
It's kind of :
- A process P is dumping in a file F
- Another process Q delete F
- A still dumping in F and the jbd2 still log change even if the file is deleted

Is there a way to prevent this situation ?

Comment 27 Behdad Esfahbod 2012-02-01 01:52:44 UTC

(In reply to comment #26)
> Hi,
> 
> I'm using Xfce Terminal that use vte lib. The problem is that ext4 journalling
> system jbd2 keep logging change of deleted file, thus prevent any disk spin
> down. 
> It's kind of :
> - A process P is dumping in a file F
> - Another process Q delete F
> - A still dumping in F and the jbd2 still log change even if the file is
> deleted
> 
> Is there a way to prevent this situation ?

Some buffering would help, but wouldn't completely eliminate the issue...
Is there perhaps any ioctl() we can do to disable the logging?

Comment 28 mhtrinh 2012-02-01 10:29:53 UTC

What do you think about this workaround :
Instead of creating by default in /tmp, try first to create the temp file in /dev/shm. If it failed, fallback to /tmp.

Comment 29 Behdad Esfahbod 2012-02-01 16:27:46 UTC

Well, that's equivalent to just keeping everything in memory instead, right?  One of the design decisions resulting in the current implementations was to allow unlimited scrollback without worrying about swapping the machine to death.  I don't want unlimited memory growth.  Or is /dev/shm size-limited I guess?

Comment 30 mhtrinh 2012-02-01 17:10:59 UTC

Created attachment 206579 [details] [review]
Keep scrollback history in RAM when /dev/shm is available

(In reply to comment #29)
> Well, that's equivalent to just keeping everything in memory instead, right?

Agree 

> One of the design decisions resulting in the current implementations was to
> allow unlimited scrollback without worrying about swapping the machine to
> death.  I don't want unlimited memory growth.

I understand.
  
> Or is /dev/shm size-limited I guess?

It seems to be limited. On my Opensuse 12.1, I got :
rootfs           28G   22G  4.9G  82% /
devtmpfs        873M   36K  873M   1% /dev
tmpfs           878M  160M  719M  19% /dev/shm
tmpfs           878M  400K  878M   1% /run
/dev/sda6        28G   22G  4.9G  82% /
/dev/sda2        91G   85G  6.1G  94% /windows/D
tmpfs           878M  400K  878M   1% /var/lock
tmpfs           878M  400K  878M   1% /var/run
tmpfs           878M     0  878M   0% /media

Anyway, meanwhile finding any better solution, for those who don't want to buffer on the hdd, I made a hack that create the temp file in /dev/shm/vteXXXXXX. It will simply fallback to the original solution (/tmp/vteXXXXXX) if it failed. I'm not terminal guru with months of history and hundreds of opened terminal so I think I won't get to swap to disk. Plus,now my terminal don't trigger jbd2 anymore each time I hit Enter :-)

Regards.

Comment 31 Behdad Esfahbod 2012-02-01 17:16:08 UTC

The patch can clearly be simplified.

We may as well take this approach.  Puts an end to a long known issue.  Creates new ones perhaps, but those would take a couple years to become well-known :))).

Comment 32 Christian Persch 2012-02-01 18:05:03 UTC

I don't think we should do this. tmpfs are for transient small-ish stuff, not these huge files. And afaik tmpfs is limited to mem size (+ swap space), and that's preciously small compared to disk space.

Shouldn't someone talk to the kernel devs first to see if there's anything the kernel can do to help, both with the inotify problem (comment 6) and this journaling thing?

Comment 33 Behdad Esfahbod 2012-02-01 18:12:18 UTC

(In reply to comment #32)

> Shouldn't someone talk to the kernel devs first to see if there's anything the
> kernel can do to help, both with the inotify problem (comment 6) and this
> journaling thing?

Right.  I'll shoot an email to a couple people, but not holding my breathe.  If you know who will be a good contact, feel free to contact them.

Comment 34 Jon Dowland 2012-03-09 16:18:45 UTC

/dev/shm is for POSIX shared memory. Please don't abuse it for something else.

/tmp has been tmpfs on Solaris for a long time; it appears to be the default now on new-ish Debian installs and I suspect the same for Fedora. So this has kind-of devolved to re-implementing swap on those platforms.

Comment 35 Jon Dowland 2012-03-09 16:19:19 UTC

> and I suspect the same for Fedora.

s/Fedora/Ubuntu/ sorry, long day.

Comment 36 Behdad Esfahbod 2012-03-17 19:30:11 UTC

John McCutchan suggests that we create tmp files in /tmp/vte/ instead in /tmp/, such that non-recursive inotify watches on /tmp don't get notified.

Robert love says: "Also, can you use mmap instead of file I/O to the unlinked /tmp file? That ought to be faster and won't trigger inotify events."

Comment 37 Christian Persch 2012-03-27 13:21:58 UTC

Comment on attachment 206579 [details] [review]
Keep scrollback history in RAM when /dev/shm is available

Rejecting this based on comment 32 and comment 34.

Comment 38 Michal Schmidt 2012-03-27 15:52:12 UTC

(In reply to comments #34, #35)
>> /tmp has been tmpfs on Solaris for a long time; it appears to be the default
>> now on new-ish Debian installs and I suspect the same for Fedora.
> 
> s/Fedora/Ubuntu/ sorry, long day.

Fedora will likely switch to /tmp on tmpfs by default in F18:
http://fedoraproject.org/wiki/Features/tmp-on-tmpfs

systemd is pushing distros softly towards it:
http://cgit.freedesktop.org/systemd/systemd/commit/?id=623ac9d2fce3170125ead9be20f56bfe68ea125e

Comment 39 Behdad Esfahbod 2012-03-27 15:55:44 UTC

So, shall we at least switch to /var/tmp then?  /var/tmp/vte in fact?

Comment 40 Christian Persch 2012-03-27 17:15:04 UTC

Probably yes. Discussing this on IRC, the question was whether to hardcode /var/tmp or add some API for it (that takes into account an env var); that's now filed as bug 672939. We can either wait for that, or just use /var/tmp for now...

Comment 41 Marti 2012-04-12 14:26:01 UTC

(In reply to comment #26)
> I'm using Xfce Terminal that use vte lib. The problem is that ext4 journalling
> system jbd2 keep logging change of deleted file

I think this isn't because of journaling per se, just the fact that dirty files are periodically written out to disk.

I agree that it's silly of Linux to try and flush the file to disk, if the file was deleted (won't show up after unmount anyway), and if there's no memory pressure.

Comment 42 Jon Dowland 2012-04-30 14:16:18 UTC

I still think you're optimising for the wrong use-case, but I respect your decision to do what you want with your software and won't keep banging on the same drum.

Having said that, would you be prepared to accept a patch that made the behaviour configurable? Precisely how is not something I have resolved for myself (not sure if embedding dconf switches into a library component is OK or not, perhaps an API-level switch and let upstream consumers decide).

I'm planning to write the patch anyway, as it will be good practice for me (I'm wanting to get back into GNOME development), but it would be a great motivation to know whether it would be appreciated or not :-)

Comment 43 Christian Persch 2012-04-30 14:24:12 UTC

Bug 664611 comment 4 lays down the roadmap here.

Also, this would be purely an API that gnome-terminal would then make use of through a pref or UI, and not by adding the pref itself to vte.

Comment 44 Behdad Esfahbod 2012-04-30 16:47:38 UTC

Jon.  Maybe you can expand on what you want to implement.  In general we are conservative re adding configuration.  But I don't want to rule that out.  If we do add it (and may be good actually), then I think it should be more like "never ever hit the plates" kind of config, ie. use mlock()ed memory only.  Anything in between is stupid and pseudo-security IMO.

Comment 45 Behdad Esfahbod 2014-03-06 22:35:19 UTC

We use stdio caching now, so this has improved a bit.

Comment 46 Allison Karlitskaya (desrt) 2015-01-18 16:49:21 UTC

A couple of updates/notes/ideas:

- since 2.6.36 inotify has IN_EXCL_UNLINK which everyone should be using.  It
  prevents notifications about changes to unlinked files, so /tmp should be
  fine.

  GLib will be using this flag soon.

- even better, though: since 3.11, there is O_TMPFILE which creates a file
  within a particular filesystem that has never been linked anywhere at all.
  This is clearly absolutely appropriate to this usecase.

- since 3.16, memfd also exists, which might be a nice way to do read() and
  write() access to anonymous memory while letting the kernel decide what it
  wants to do about swapping that or not

Comment 47 Egmont Koblinger 2015-01-18 17:02:32 UTC

> - even better, though: since 3.11, there is O_TMPFILE which creates a file

VTE already uses this, if available.

> - since 3.16, memfd also exists, which might be a nice way to do read() and

We've just added compression of scrollback data in bug 738601, which heavily relies on using sparse blocks.  Does memfd support that?

Comment 48 Allison Karlitskaya (desrt) 2015-01-18 17:29:54 UTC

a memfd is really just a shmem file, and shmem does lazy allocation and has support for hole punching and seeking (if that's what you mean).

Comment 49 Egmont Koblinger 2015-01-18 17:32:29 UTC

Sounds cool, although seems to have the same drawbacks as shm and tmpfs, as discussed above.

Comment 50 Egmont Koblinger 2015-03-02 13:09:55 UTC

Someone pointed out in https://bugzilla.xfce.org/show_bug.cgi?id=8183:

While setting TMPDIR to a shm or tmpfs based location is a nice workaround for those who definitely want their scrollback in memory, it is cumbersome: it is inherited by all child processed launched inside the terminal which is probably not what they want. Moreover, it's not feasible to set this in some global environment definition file.

For these people it would be convenient to support VTETMPDIR - if defined, it would take precedence over the standard tmp dir locations.

Comment 51 Christian Persch 2015-03-02 16:57:04 UTC

Since it's rather complicated to get an env variable into the dbus launch environment, maybe this should be API (on vte side) with pref (on g-t side) instead?

Comment 52 Egmont Koblinger 2015-03-02 17:09:02 UTC

This would be a hack for expert users, preferably setting the env var for the complete desktop environment, I don't mind if requires a logout.

If we're making an API and a g-t checkbobx then I'd probably like to go further and offer a "store in file / store in memory" option with a memory-backed stream implementation (rather than piggybacking tmpfs).

But I think I can easily be convinced here to go with your proposal :)

Comment 53 Egmont Koblinger 2015-03-12 11:18:47 UTC

An interesting issue was raised here: https://bugs.launchpad.net/ubuntu/+source/vte/+bug/1430620, namely the reporter says that writing to the scrollback degrades SSD lifetime. I don't think it's a valid issue (explained there) but it's definitely something we should keep in mind, it's worth a link from here.

Comment 54 Bryce Nesbitt 2015-03-12 20:57:34 UTC

Use the relatively new "/sbin/fatrace" tool to see the vte activity firsthand (I found this when trying to figure out why my SSD is nearing it's wear limit after just 5 years).
--
The current behavior is akin to swapping all the time.  I'd rather swap only when needed.
--
Perhaps the /tmp writes could only be for 'unlimited' scrollback?  And have default in memory scrollback be 2% of main memory?

Comment 55 Egmont Koblinger 2015-03-12 22:23:10 UTC

The current way of storing scrollback (UTF-8 stream + runlength attribute encoding, compressed and encrypted) imposes some requirements towards the storage interface:

- The required storage amount isn't known in advance. We could probably compute a theoretical maximum, but that'd be way bigger than the typical usage. So we'd need a system where we can dynamically allocate memory on demand, and a structure that doesn't cause huge fragmentation in the long run.

- Compression heavily relies on sparse blocks, and I'm pretty sure that we want to have compression for memory-backed scrollback too, not just for file-backed. That is, we need a memory model that can support sparse blocks and punching holes.

This is all, of course, assuming that we don't want to reimplement everything from scratch using a completely different model than the one we have right now.

So far memfd seems to be the best approach to me; in fact, it should be quite easy to port to that. I'll give it a try once I upgrade to Vivid :)

Some questions to think about:

- Are we okay with a Linux-only solution for memory-backed scrollbar?

- To what extent do we want to expose the disk vs. memory choice via API or on the UI? For the UI, the most I can imagine is to choose from "finite in memory", "finite on disk" and "infinite on disk" - I probably wouldn't offer "infinite in memory". Given Gnome's approach of simplifying the options, maybe not all these will make it to the UI. Do we want to allow all possibilities via the API?

- Currently the user can change from finite to infinite scrollback for the running terminal. Now I guess we wouldn't support runtime change from memory to disk or vice versa, so the user wouldn't be able to switch a running vte from "memory-backed finite" to "infinite". Is this something we'd be happy with?

Comment 56 Behdad Esfahbod 2015-03-13 21:58:45 UTC

(In reply to Egmont Koblinger from comment #55)
> The current way of storing scrollback (UTF-8 stream + runlength attribute
> encoding, compressed and encrypted) imposes some requirements towards the
> storage interface:
> 
> - The required storage amount isn't known in advance. We could probably
> compute a theoretical maximum, but that'd be way bigger than the typical
> usage. So we'd need a system where we can dynamically allocate memory on
> demand, and a structure that doesn't cause huge fragmentation in the long
> run.
> 
> - Compression heavily relies on sparse blocks, and I'm pretty sure that we
> want to have compression for memory-backed scrollback too, not just for
> file-backed. That is, we need a memory model that can support sparse blocks
> and punching holes.
> 
> This is all, of course, assuming that we don't want to reimplement
> everything from scratch using a completely different model than the one we
> have right now.
> 
> So far memfd seems to be the best approach to me; in fact, it should be
> quite easy to port to that. I'll give it a try once I upgrade to Vivid :)
> 
> Some questions to think about:
> 
> - Are we okay with a Linux-only solution for memory-backed scrollbar?

Sounds fine to me.


> - To what extent do we want to expose the disk vs. memory choice via API or
> on the UI? For the UI, the most I can imagine is to choose from "finite in
> memory", "finite on disk" and "infinite on disk" - I probably wouldn't offer
> "infinite in memory". Given Gnome's approach of simplifying the options,
> maybe not all these will make it to the UI. Do we want to allow all
> possibilities via the API?

I'm fine with a "secure" finite mode that guarantees contents never hit disk, or the current "normal" mode.


> - Currently the user can change from finite to infinite scrollback for the
> running terminal. Now I guess we wouldn't support runtime change from memory
> to disk or vice versa, so the user wouldn't be able to switch a running vte
> from "memory-backed finite" to "infinite". Is this something we'd be happy
> with?

I think so.

Comment 57 Egmont Koblinger 2015-03-17 17:12:06 UTC

Created attachment 299617 [details] [review]
memfd proof of concept, v0

Just a a quick proof of concept for recent enough systems (yup I've upgraded to Vivid beta): memfd works!

1% of the work, 99% of the interesting work is done.  99% of the work, 1% of the interesting work (configure autodetect, API to choose, UI for g-t) is yet to be done :)

Comment 58 Behdad Esfahbod 2015-03-18 00:41:36 UTC

haha

Comment 59 Bryce Nesbitt 2015-03-18 01:01:49 UTC

Thanks for the patch Egmont!  Anon memory is a great place for scrollback.

Comment 60 Behdad Esfahbod 2015-03-18 17:44:30 UTC

Do we want to disable encryption in that case?

Comment 61 Egmont Koblinger 2015-03-18 19:39:47 UTC

I tend towards a 'yes' but I'm uncertain.

The main concern from many people was that writing sensitive data unencrypted to disk is unexpected behavior, you have certain expectations for every app and vte broke it. For in-memory content no-one expects that to be encrypted, and if swapping is a concern it should be addressed by encrypting on that level.

I believe distros could make the swap partition encrypted in a totally transparent way (generate a random key on each startup). For the fs partitions this definitely needs interaction with the user (password that you inconveniently have to type at boot, re-encrypt later if pw changes...) so it's less likely to happen.

On the other hand, if we already have the encryption code in place then why not...

Opinions welcome :)

Comment 62 Behdad Esfahbod 2015-05-12 19:56:40 UTC

Lets land this.

Should we automatically use memfd if max scrollback lines is set to a small number (say, max 10,000?).

Comment 63 Christian Persch 2015-05-12 20:00:03 UTC

IMHO, no. While it's small when you have *one* terminal, it adds up when you have many terminals (tabs/windows) open. So I'd prefer API for this plus pref in g-t (we can still discuss the default for that later).

Comment 64 Egmont Koblinger 2015-05-12 20:48:05 UTC

For 10.000 lines it's not that terribly bad. Given a typical use of 80 chars per line, not too much attr changes, and compression on the scrollback, it might be perhaps ~30 bytes per line, that is, ~300kB per terminal in average.  We already use that much for the read cache and write buffer of the streams (2x3x64 = 384kB).

I definitely agree though that vte shouldn't do magic, it should have a clear and strict API.  I'm not against magic in g-t, but maybe we should be explicit there on the UI too.

Comment 65 GNOME Infrastructure Team 2021-06-10 14:27:16 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/vte/-/issues/1840.