Bug 797086 – matroskamux: don't store used UIDs

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 797086 - matroskamux: don't store used UIDs


Summary:	matroskamux: don't store used UIDs


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-good
Version:	git master
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	1.15.1
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2018-09-05 23:29 UTC by Martin Kelly
Modified:	2018-09-06 18:17 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
0001-matroskamux-don-t-store-used-UIDs.patch (2.93 KB, patch) 2018-09-05 23:29 UTC, Martin Kelly	committed	Details \| Review

Description Martin Kelly 2018-09-05 23:29:26 UTC

Created attachment 373551 [details] [review]
0001-matroskamux-don-t-store-used-UIDs.patch

Currently, whenever we generate a 128-bit UID, we store it in a list and
return 0 if we ever encounter a collision. This is so mathematically
improbable that it's not worth checking for, so we can save memory and
time by not tracking the UID. Even if a collision happened, a list of
only 10 UIDs would be unlikely to detect it.

This article has a good description of how improbable a collision is:
https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions

Comment 1 Nicolas Dufresne (ndufresne) 2018-09-06 00:08:27 UTC

Before this can be merged, the UID generator should be updated to include a timestamp combined with the random data. This is best practices for UID generator and it really makes the collision only possible after a very long amount of time.

Comment 2 Martin Kelly 2018-09-06 00:15:33 UTC

Could you clarify what you mean by this? Do you mean, for instance, that the first N bits of the UID would be a timestamp? Are you concerned about a buggy random number generator creating a collision?

Absent a random number generator bug, I'm not sure how collisions could happen at all... to reach a 50% collision probability, you would need to generate "1 billion UIDs per second for about 85 years" (https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions). The only practical way I think we should see collisions is if the random number generator is cyclical or buggy. Adding a timestamp could theoretically help in such a case, but if we are speculating, the clock could be buggy too.

Comment 3 Martin Kelly 2018-09-06 00:17:43 UTC

If we add a timestamp, couldn't we arguably be in violation of the matroska spec? The wording I see for "SegmentUID" is:

A randomly generated unique ID to identify the current Segment between many others (128 bits).

If we add a timestamp, this is no longer fully random.

https://matroska.org/technical/specs/index.html

Comment 4 Nicolas Dufresne (ndufresne) 2018-09-06 00:43:09 UTC

Timestamp + random number will give you a random number. For the rest, read:

https://tools.ietf.org/html/rfc4122
https://en.wikipedia.org/wiki/Universally_unique_identifier

It's not a official standard, just a suggested one, but the one being implemented by the Linux kernel and many other libraries.

Comment 5 Martin Kelly 2018-09-06 00:58:31 UTC

Could you clarify which UUID version you're proposing? Version 1 uses date-time + MAC address while version 4 is "truly random". I don't see a version that uses just a time and a random number. As far as I can tell, the Linux kernel just uses version 4 ("truly random"): https://github.com/torvalds/linux/blob/master/lib/uuid.c#L41

Comment 6 Nicolas Dufresne (ndufresne) 2018-09-06 01:20:31 UTC

Apparently things have change in the kernel, I bet the by truly random, they use a RNG that has been validated. Anyway, feel free to go ahead, I don't disagree, but I don't have a strong trust on glib RNG.

Comment 7 Martin Kelly 2018-09-06 16:22:03 UTC

I agree that checking the GLib implementation to see if we should trust it is a good idea. I did a bit of code reading as this is quite an interesting subject.

The short answer is that every library I can find generating random numbers on Linux relies on /dev/urandom. This includes glib, libossp-uuid (which backs the uuid -v4 command), the uuid function in the Linux kernel (which uses get_random_bytes(), exported by /dev/urandom), and even an strace of openssh (I didn't check the code in that case). In any case, it looks like all our random number generation will be as good/bad as /dev/urandom. That said, I'm not a crytographer, but it looks like /dev/urandom should be good enough for practical purposes: https://www.2uo.de/myths-about-urandom.

On Windows, glib uses rand_s, which the documentation says is "cryptographically secure" (https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/rand-s). If rand_s doesn't exist (Windows XP!), it uses the time as a seed and then applies a Mersenne Twister algorithm to generate random numbers from that.

I think we should be OK in preventing collisions.

Comment 8 Nicolas Dufresne (ndufresne) 2018-09-06 18:05:42 UTC

GLib have it's own RNG, which is documented not recommended for crypto. It's seeded from urandom on Linux, rand_s on Win32 and wallclock time otherwise. But this is not crypto, we just want to avoid collision, we don't care if one can predict the values. That UID cache was initially global, so it's not the first time it becomes a bottleneck. I think the explanation for the change are fair enough, and will improve performance over long period of time.

https://gitlab.gnome.org/GNOME/glib/blob/master/glib/grand.c#L62

Comment 9 Nicolas Dufresne (ndufresne) 2018-09-06 18:16:25 UTC

Pushed as be05515da797115739bf352d6efdbc34eda511bb

Comment 10 Nicolas Dufresne (ndufresne) 2018-09-06 18:17:14 UTC

Thanks, this will appear in the next release of GStreamer.