Bug 764786 – [Security] Run extractor and miners in chroot jail or sandbox or with limited capabilities

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 764786 - [Security] Run extractor and miners in chroot jail or sandbox or with limited capabilities


Summary:	[Security] Run extractor and miners in chroot jail or sandbox or with limited...


Status:	RESOLVED FIXED

Product:	tracker
Classification:	Core
Component:	Extractor
Version:	1.8.x
Hardware:	Other All

Importance:	Normal major
Target Milestone:	---
Assigned To:	tracker-extractor
QA Contact:	tracker-extractor

URL:
Whiteboard:

Duplicates:	774498 (view as bug list)
Depends on:
Blocks:

Reported:	2016-04-08 14:54 UTC by Christian Stadelmann
Modified:	2017-07-22 06:06 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
libtracker-extract: Ditch ThreadAwareness module configuration toggle (12.17 KB, patch) 2016-12-06 16:20 UTC, Carlos Garnacho	committed	Details \| Review
libtracker-common: Implement sandboxing through libseccomp (8.76 KB, patch) 2016-12-06 16:20 UTC, Carlos Garnacho	committed	Details \| Review
tracker-extract: Sandbox extractor threads through seccomp (972 bytes, patch) 2016-12-06 16:20 UTC, Carlos Garnacho	committed	Details \| Review

Description Christian Stadelmann 2016-04-08 14:54:59 UTC

Tracker is operating on unknown data, some of it downloaded from the internet. This data can be modified to be malicious. With tracker scanning (reading, parsing) many files, any file just laying on the disk could be used to crack tracker.

For web browsers, the trend is to do the parsing in unprivileged processes. I think tracker should do something similiar.

Some possible tools to do that:
seccomp(2) : filter a list of allowed syscalls. Miner subprocesses don't need network acces or similiar syscalls. They just need to read files and put data to the database.
namespaces(7) : create a virtual filesystem on top of real filesystem. User data could be mounted read-only with this.
capabilities(7), …

This issue also affects miners provided by other packages, including but not limited to:
gnome-online-miners
tracker-miner-chatlog
tracker-miner-media
tracker-upnp

I know of tracker-sandbox.py, but it looks like a script for developers testing tracker's features, not like a end user tool. It isn't shipped with Fedora.

Comment 1 Carlos Garnacho 2016-05-05 13:30:16 UTC

Within the Tracker repo, only tracker-extract is potentially affected here, moving to that subcomponent.

(In reply to Christian Stadelmann from comment #0)
> Tracker is operating on unknown data, some of it downloaded from the
> internet. This data can be modified to be malicious. With tracker scanning
> (reading, parsing) many files, any file just laying on the disk could be
> used to crack tracker.
> 
> For web browsers, the trend is to do the parsing in unprivileged processes.
> I think tracker should do something similiar.

I think it is feasible to drop certain privileges there so malicious files that attempt to trigger bugs in 3rd party libraries can't do much with arbitrary code execution. I trusted at least the filesystem's would be done through xdg-app, although I guess this is tangential depending on the setup.

I'll leave this though at an "accept patches" stage though, I think devising a model for xdg-app is the priority here.

> 
> Some possible tools to do that:
> seccomp(2) : filter a list of allowed syscalls. Miner subprocesses don't
> need network acces or similiar syscalls. They just need to read files and
> put data to the database.
> namespaces(7) : create a virtual filesystem on top of real filesystem. User
> data could be mounted read-only with this.
> capabilities(7), …
> 
> This issue also affects miners provided by other packages, including but not
> limited to:

I agree some level of paranoia is good, although I don't really think all those deal with the same levels of "untrusted data" :). However, that'd be better dealt with by filing bugs in the relevant places, I don't think this could be done generically in Tracker even if we wanted and tried hard.

> gnome-online-miners

Those only talk to very specific services set up through g-o-a, I'd trust something happens if connection to a counterfeit site is attempted (eg. SSL certificate errors). You might consider those sites exploitable and thus prone to provide malicious data at some point, but that's IMO a quite slim attack vector.

> tracker-miner-chatlog

This one only interacts with telepathy through dbus. It could make it really sure it only talks with org.freedesktop.Telepathy and org.freedesktop.Tracker1 interfaces, but I really see no way to have arbitrary code execution happening there...

> tracker-miner-media

This one is dead.

> tracker-upnp

Not much idea about this one myself... it probably could/should whitelist networks where it is ok to lookup upnp servers. Better dealt with filing a bug in the relevant place, although from looking at the git log, good luck getting a reply...

> 
> I know of tracker-sandbox.py, but it looks like a script for developers
> testing tracker's features, not like a end user tool. It isn't shipped with
> Fedora.

Right, it's basically a sandbox for testing purposes, so the whole testing setup runs without affecting the user databases.

Comment 2 Christian Stadelmann 2016-11-17 21:22:55 UTC

(In reply to Elad Alfassa from comment #0 in https://bugzilla.gnome.org/show_bug.cgi?id=774498)
> Inspired by this tracker-related 0day,
> https://scarybeastsecurity.blogspot.co.il/2016/11/0day-poc-risky-design-
> decisions-in.html
> 
> Can we sandbox tracker-extract using bubblewrap[1]? tracker-extract involves
> parsing a lot of different file types, and since tracker is commonly
> configured to index quite a lot by default, it's an "obvious" candidate for
> sandboxing to reduce attack surface.
> 
> 
> [1] https://github.com/projectatomic/bubblewrap

+1. No need to design tracker in a way that it allows 0-day exploits. Parsing random untrusted data with insecure parsers (I'd count every parser written in C here) is irresponsible.

Comment 3 Elad Alfassa 2016-11-17 21:42:47 UTC

*** Bug 774498 has been marked as a duplicate of this bug. ***

Comment 4 Elad Alfassa 2016-11-17 21:55:52 UTC

bubblewrap is used by xdg-app and I think it makes the most sense to just use it instead of "re-inventing the wheel", and it's very simple to use. Since we're talking about the extractor, using the --file argument is probably a bad idea (since it preforms a copy), so I guess the correct way to implement this would be to use "--bind-ro" to give the extractor a read-only view of the directory where the file to extract is present.

According to the manpage, tracker-extract already uses subprocesses to preform the actual extraction, so it's "just" a matter of changing the way the subprocess is launched to use bubblewrap as a sandbox. deciding which flags to pass to bubblewrap seems trivial enough (I'd guess all the ones that start with --unshare, for a start), the only "complicated" bit in there is probably deciding which seccomp options to use (ie. which syscalls should be allowed to be run by the extractor?), but I guess this can be done "later" since being isolated from the filesystem and the network is already more secure than what we have now.

I would've tried to write a patch for this myself, but I'm not familiar enough with tracker's codebase (and a bit short on free time for such projects these days) so I guess this should be left for people who have more time & are more familiar with the codebase.

(unless someone can teach me exactly how the extractor works and point me at the relevant place in the code that launches the subprocess)

Comment 5 Carlos Garnacho 2016-12-06 16:20:17 UTC

Created attachment 341490 [details] [review]
libtracker-extract: Ditch ThreadAwareness module configuration toggle

It's basically unused, just use a private thread per-module so they're
easier to isolate.

Comment 6 Carlos Garnacho 2016-12-06 16:20:23 UTC

Created attachment 341491 [details] [review]
libtracker-common: Implement sandboxing through libseccomp

The threads calling the new tracker_seccomp_init() function, and all
threads/processes spawned from these, will enter a restricted mode
where only a few sensible syscalls are allowed, and more specifically,
filesystem access is restricted.

Comment 7 Carlos Garnacho 2016-12-06 16:20:28 UTC

Created attachment 341492 [details] [review]
tracker-extract: Sandbox extractor threads through seccomp

Those deal with plugins and potentially malicious content, make it
sure that any potential exploit is deprived of all tools that could
make it harmful.

Comment 8 Elad Alfassa 2016-12-06 16:28:04 UTC

I see you've allowed the sandboxed stuff to use write(). This means if someone manages to exploit the extractor and gain arbitrary code execution, they can write to ~/.bashrc or similar to break out of the sandbox by the next time the user logs in. This is why I think using bubblewrap or anything similar that will allow further isolation of filesystem access (via readonly bind mounts, user namespaces, etc) might be a better approach.

Comment 9 Carlos Garnacho 2016-12-06 20:06:22 UTC

(In reply to Elad Alfassa from comment #8)
> I see you've allowed the sandboxed stuff to use write(). This means if
> someone manages to exploit the extractor and gain arbitrary code execution,
> they can write to ~/.bashrc or similar to break out of the sandbox by the
> next time the user logs in.

No, you can't. The seccomp rules also include making calls to open() fail with EACCESS if anything else than O_READONLY is set, this leaves the malicious code pretty much unable to create or modify anything in the filesystem.

But there's nothing like actually trying! doing:

diff --git a/src/tracker-extract/tracker-extract-text.c b/src/tracker-extract/tracker-extract-text.c
index abcf403..b6d36ea 100644
--- a/src/tracker-extract/tracker-extract-text.c
+++ b/src/tracker-extract/tracker-extract-text.c
@@ -89,6 +89,16 @@ tracker_extract_get_metadata (TrackerExtractInfo *info)
 
        config = tracker_main_get_config ();
 
+       {
+               gint fd;
+               if ((fd = open ("/home/carlos/.bashrc", O_RDWR | O_APPEND, 0755)) < 0) {
+                       g_print ("booh: %m\n");
+               } else {
+                       write (fd, "imaevil", 7);
+                       close (fd);
+               }
+       }
+
        content = get_file_content (tracker_extract_info_get_file (info),
                                    tracker_config_get_max_bytes (config));

Fills my output with:
booh: Permission denied

Booh indeed...

And no, the non-sandboxed pieces don't open ~/.bashrc as RW for malicious code to exploit, the threads just inherit stdout/stderr themselves. I guess malicious code *might* reach for the dbus fd, but that will happen no matter how many processes you put in between, there will be always a socket to the outside. And anyways... that requires exploiting both the codec/library and tracker-extract, just to reach out to a socket that will shut you down if you behave funny, there's more direct ways to exploit your system.

As much as it would seem write() is the root of all evil, it is needed for pretty legitimate stuff, like printing on stderr/stdout or basic IPC stuff (libraries apparently like to communicate between their threads). I can't disable access to write() without triggering *lots* of false alarms, nor does bubblewrap.

> This is why I think using bubblewrap or anything
> similar that will allow further isolation of filesystem access (via readonly
> bind mounts, user namespaces, etc) might be a better approach.

Bubblewrap goes at an entirely different level. If anything, the extraction/miner process should be confined together with the sandboxed app that is interested in the extracted data (and thus the same permissions/restrictions apply altogether). I however consider this part of an "improve flatpack integration" RFE to Tracker interfaces and APIs (also meant to be addressed on 1.12), not an intrinsic security concern.

Comment 10 Carlos Garnacho 2016-12-08 13:55:28 UTC

Pushed with minor modifications that allowed me not to
ditch gstreamer-based extraction entirely. Most namely sockets are
allowed, but only with the AF_LOCAL/UNIX families. The chance to
exploit this is present, but rather low, the exploit would need
breaking through the tracker-extract sandbox by exploiting another
local service.

The GStreamer developers are already aware of the extra syscalls
involved in gstreamer extraction, and it seems definitely
unintended given GstDiscoverer should no decoding. So the situation
can be made better, and the seccomp rules tightened.

But as this patch is intended to be backported, and this is the status
quo, this is the compromise taken.

Attachment 341490 [details] pushed as 697daeb - libtracker-extract: Ditch ThreadAwareness module configuration toggle
Attachment 341491 [details] pushed as 9a924b1 - libtracker-common: Implement sandboxing through libseccomp
Attachment 341492 [details] pushed as 5b4132c - tracker-extract: Sandbox extractor threads through seccomp

Comment 11 Christian Stadelmann 2016-12-08 16:07:01 UTC

Thank you!

Comment 12 Michael Biebl 2016-12-09 00:28:29 UTC

Just curious: Since we already ship systemd --user service files, why not use the sandboxing features provided by systemd, like https://www.freedesktop.org/software/systemd/man/systemd.exec.html#SystemCallFilter=

Comment 13 Ting-Wei Lan 2016-12-09 09:47:46 UTC

Tracker fails to configure on FreeBSD now:

checking for LIBSECCOMP... no
configure: error: Libseccomp is mandatory for sandboxed metadata extraction

Comment 14 Carlos Garnacho 2016-12-09 10:02:24 UTC

(In reply to Michael Biebl from comment #12)
> Just curious: Since we already ship systemd --user service files, why not
> use the sandboxing features provided by systemd, like
> https://www.freedesktop.org/software/systemd/man/systemd.exec.
> html#SystemCallFilter=

I indeed considered that, but I didn't think that was the best option to eradicate the noise generated. I think saying "tracker is more secure" is bolder than "tracker is more secure if you use it under/together with..."

(In reply to Ting-Wei Lan from comment #13)
> Tracker fails to configure on FreeBSD now:
> 
> checking for LIBSECCOMP... no
> configure: error: Libseccomp is mandatory for sandboxed metadata extraction

Right, thanks for the heads up. Only in master/1.11 I made this mandatory, I will loosen the restriction so this only applies to linux platforms. I will have to wait for patches though if something like this is doable in FreeBSD though...

Comment 15 Alan Coopersmith 2016-12-22 06:21:43 UTC

(In reply to Ting-Wei Lan from comment #13)
> Tracker fails to configure on FreeBSD now:
> 
> checking for LIBSECCOMP... no
> configure: error: Libseccomp is mandatory for sandboxed metadata extraction

Same on Solaris.  Thanks for agreeing to make this restriction only apply to Linux.

Comment 16 Carlos Garnacho 2016-12-22 13:36:09 UTC

... and sorry I let that slip, I just pushed commit bdf25c78ee making seccomp
only mandatory on linux, and intend to do releases along the weekend.