GNOME Bugzilla – Bug 709898
Expose thumbnail validity in GFile attributes
Last modified: 2013-10-24 17:56:48 UTC
The G_FILE_ATTRIBUTE_THUMBNAIL_PATH and G_FILE_ATTRIBUTE_THUMBNAILING_FAILED attributes don’t expose any concept of thumbnail validity, so programs which use them exclusively (without also using GnomeDesktopThumbnailFactory) can easily end up using outdated thumbnails which no longer represent the file, due to it being updated in the meantime. Patch coming up to add a new G_FILE_ATTRIBUTE_THUMBNAIL_IS_VALID attribute, which will indicate whether the thumbnail is valid. It can’t be 100% accurate without adding a dependency on gdk-pixbuf (to check the tEXt:Thumb::Mtime metadata). Instead, it compares the mtime of the thumbnail and the original file, and marks the thumbnail as invalid if the original file has been modified more recently. This will fail in situations where the original file has been reverted to an older version with an older mtime, but should otherwise be OK in practice.
Created attachment 256988 [details] [review] file-info: Add a G_FILE_ATTRIBUTE_THUMBNAIL_IS_VALID attribute This indicates whether the thumbnail (given by G_FILE_ATTRIBUTE_THUMBNAIL_PATH) is likely to be valid — i.e. to represent the file in its current state. If G_FILE_ATTRIBUTE_THUMBNAIL_IS_VALID is FALSE (for a normal _or_ failed thumbnail) it means the file has changed since the thumbnail was generated, and the thumbnail is out of date. This attribute cannot be 100% accurate without adding a dependency on gdk-pixbuf, so may report that a thumbnail is valid in some cases where it isn’t. For example, if the file is reverted to an older version with an older modification time, the thumbnail will still be marked as valid. Thumbnail validity is calculated by comparing the modification times of the file and the thumbnail.
I wonder if we could have some very small piece of code for reading metadata from PNG files. That seems like a problem that's reasonably well-contained.
OK. Turns out PNG is an extremely simple file format -- I'm going to write a (very) small piece of code to extract the headers and then we can do this properly.
Created attachment 257659 [details] [review] file-info: Add a G_FILE_ATTRIBUTE_THUMBNAIL_IS_VALID attribute This indicates whether the thumbnail (given by G_FILE_ATTRIBUTE_THUMBNAIL_PATH) is valid — i.e. to represent the file in its current state. If G_FILE_ATTRIBUTE_THUMBNAIL_IS_VALID is FALSE (for a normal _or_ failed thumbnail) it means the file has changed since the thumbnail was generated, and the thumbnail is out of date. Part of checking thumbnail validity (by the spec) involves parsing headers out of the thumbnail .png so we include some (small) code to do that. Heavily based on a patch from Philip Withnall <philip.withnall@collabora.co.uk> who suggested the feature and designed the API.
Review of attachment 257659 [details] [review]: This looks to be quite well written, I looked for integer overflow flaws and the like, couldn't find any. But: It'd be nice to have a test case for this with a static small .png file or two. One other comment: ::: gio/glocalfileinfo.c @@ +1284,3 @@ + * is relatively easy. + */ +struct expected_info Don't we normally use CamelCase for structs?
Review of attachment 257659 [details] [review]: A couple of minor comments. I also couldn’t find any integer overflows. I’ve got an updated patch with a couple of tests and fixes for the comments so far. ::: gio/glocalfileinfo.c @@ +1361,3 @@ + { + if (!check_integer_match (expected_info->size, value, value_size)) + return FALSE; Might want to document that Thumb::Size is not _required_ for a match, but if it’s set and mismatched to the stat()ed size, matching fails. @@ +1378,3 @@ + + if (memcmp (contents, "\x89PNG\r\n\x1a\n", 8) != 0) + return FALSE; Might be useful to add a comment pointing to the PNG documentation you used to write these functions. @@ +1403,3 @@ + */ + if (size < chunk_size || size < chunk_size + 8) + goto out; You should probably document here that (chunk_size + 8) is guaranteed not to overflow if (size >= chunk_size) because size is at most (G_MAXSIZE - 12) at this point.
Created attachment 257824 [details] [review] fixup! file-info: Add a G_FILE_ATTRIBUTE_THUMBNAIL_IS_VALID attribute
Created attachment 257825 [details] [review] fixup! file-info: Add a G_FILE_ATTRIBUTE_THUMBNAIL_IS_VALID attribute
Review of attachment 257825 [details] [review]: ::: gio/glocalfileinfo.c @@ +1284,3 @@ * is relatively easy. */ +struct ExpectedInfo i guess this should actually be a typedef. I'll fix that up. @@ +1407,3 @@ + * + * The (chunk_size + 8) addition is guaranteed not to wrap if + * (size >= chunk_size) because size is at most (G_MAXSIZE - 12) at I disagree with the logic here. I check size < chunk_size specifically because (chunk_size+8) could very well wrap. We just read chunk_size from the file, so it could be anything at all. If chunk_size were '(guint)-8' for example and 'size' was 0 then we would see size < chunk_size + 8 0 < ((uint)-8) + 8 0 < 0 false and assume that everything is OK, even though 'chunk_size' is very much larger than 'size'. We have to do the size<chunk_size check precisely to catch these cases that 'chunk_size' is very large. It has nothing to do with 'size'.
In retrospect, there is actually a bug here, but it's an extremely unlikely (in fact, impossible) one but since our threat model here appears to be that we have a hostile thumbnail directory, let's entertain it. Consider the case that we somehow ended up with a thumbnail that was 4294967292 bytes. That's ((guint)-4). Now consider that this thumbnail is also corrupted and contains a chunk_size of 4294967290 ((guint)-6). We check: size < chunk_size || size < chunk_size + 8 The first check will be false because: size < chunk_size 4294967292 < 4294967290 false The second check will also fail: size < chunk_size + 8 4294967292 < (4294967290 + 8) 4294967292 < 2 false So I'll just rewrite this to flatly reject implausibly large chunks. Thanks for all the attention to this suspicious line :)
Created attachment 257853 [details] [review] file-info: Add a G_FILE_ATTRIBUTE_THUMBNAIL_IS_VALID attribute This indicates whether the thumbnail (given by G_FILE_ATTRIBUTE_THUMBNAIL_PATH) is valid — i.e. to represent the file in its current state. If G_FILE_ATTRIBUTE_THUMBNAIL_IS_VALID is FALSE (for a normal _or_ failed thumbnail) it means the file has changed since the thumbnail was generated, and the thumbnail is out of date. Part of checking thumbnail validity (by the spec) involves parsing headers out of the thumbnail .png so we include some (small) code to do that. Heavily based on a patch from Philip Withnall <philip.withnall@collabora.co.uk> who suggested the feature and designed the API.
Review of attachment 257824 [details] [review]: This test isn't working for me. The hash values in the thumbnail filenames are md5s of absolute file URIs and you are probably using ones from your system (ie; including the full path of where you happen to be building glib). In addition to that, changing the XDG_* environment variables inside of a testcase like that (particularly when your test runs after another one) is extremely fragile and should be avoided. In this case it's usually better to spawn a subprocess with the environment already set from the outside (ie: using the envp argument to spawn/gsubprocess).
Review of attachment 257853 [details] [review]: Even if it is small, it might be nice to move the png parsing code into its own file ? Just an idea...
Review of attachment 257853 [details] [review]: looks good ::: gio/glocalfileinfo.c @@ +1458,3 @@ + * The common case is that the tEXt chunks come at the start + * of the file before any of the image data. This trick means + * that we will only fault in a single page (4k) whereas many Should we just guarantee this, by capping size at 4k ?
Review of attachment 257853 [details] [review]: Splitting it out into a separate file is fine by me and makes sense, even -- we will want to duplicate this in GVfs :/ ::: gio/glocalfileinfo.c @@ +1458,3 @@ + * The common case is that the tEXt chunks come at the start + * of the file before any of the image data. This trick means + * that we will only fault in a single page (4k) whereas many No. It's quite possible that we could have a thumbnailer that writes PNG files with the metadata chunks at the end. This hack doesn't assume that they are in any particular location -- only that they are all together.
Created attachment 257889 [details] [review] file-info: Add a G_FILE_ATTRIBUTE_THUMBNAIL_IS_VALID attribute This indicates whether the thumbnail (given by G_FILE_ATTRIBUTE_THUMBNAIL_PATH) is valid — i.e. to represent the file in its current state. If G_FILE_ATTRIBUTE_THUMBNAIL_IS_VALID is FALSE (for a normal _or_ failed thumbnail) it means the file has changed since the thumbnail was generated, and the thumbnail is out of date. Part of checking thumbnail validity (by the spec) involves parsing headers out of the thumbnail .png so we include some (small) code to do that in a separate file. We will likely want to copy this code to gvfs to do the same for GVfsFile. Heavily based on a patch from Philip Withnall <philip.withnall@collabora.co.uk> who suggested the feature and designed the API.
(In reply to comment #9) > Review of attachment 257825 [details] [review]: > @@ +1407,3 @@ > + * > + * The (chunk_size + 8) addition is guaranteed not to wrap if > + * (size >= chunk_size) because size is at most (G_MAXSIZE - 12) at > > We have to do the size<chunk_size check precisely to catch these cases that > 'chunk_size' is very large. It has nothing to do with 'size'. Yes, and that’s why my logic was qualified with “if (size >= chunk_size)”. (In reply to comment #10) > Consider the case that we somehow ended up with a thumbnail that was 4294967292 > bytes. That's ((guint)-4). Now consider that this thumbnail is also corrupted > and contains a chunk_size of 4294967290 ((guint)-6). I don’t think it is possible to hit that situation, since at the point of the (size < chunk_size || size < chunk_size + 8) check, size is at most (G_MAXSIZE - 12), which is less than (G_MAXSIZE - 4). This is due to the subtraction of 8 and 4 bytes for the header and stored chunk size. Anyway, you’ve rewritten it now so this is moot and I’ll take a look at the new version. :-) (In reply to comment #12) > Review of attachment 257824 [details] [review]: > > This test isn't working for me. The hash values in the thumbnail filenames are > md5s of absolute file URIs and you are probably using ones from your system > (ie; including the full path of where you happen to be building glib). The hash values should be of paths in /tmp — that’s why I copy the image files to /tmp before querying their thumbnail information. I agree that this is ugly, but I couldn’t think of a better way of doing it. Perhaps it would be possible to write a mock GFile implementation which returns a different URI from that of the file actually backing it? If you think that could work, let me know and I’ll put together a patch (but not otherwise, because it’s not a small amount of work). > In addition to that, changing the XDG_* environment variables inside of a > testcase like that (particularly when your test runs after another one) is > extremely fragile and should be avoided. In this case it's usually better to > spawn a subprocess with the environment already set from the outside (ie: using > the envp argument to spawn/gsubprocess). That’s why I deliberately changed them at the start of main(). Agreed, it is fragile. I’ll change it in the next iteration of the patch. Note that other test cases in gio/tests use g_setenv() similarly.
Review of attachment 257889 [details] [review]: ::: gio/thumbnail-verify.c @@ +156,3 @@ + * Make sure it won't wrap when we add 8 to it. + */ + if (chunk_size + 8 < chunk_size || size < chunk_size + 8) Isn’t it possible for some compilers to optimise out checks like (chunk_size + 8 < chunk_size)? The CERT-recommended way of performing this check would be: if (G_MAXUINT32 - chunk_size < 8 || size < chunk_size + 8) As per https://www.securecoding.cert.org/confluence/display/seccode/INT30-C.+Ensure+that+unsigned+integer+operations+do+not+wrap.
(In reply to comment #17) > The hash values should be of paths in /tmp — that’s why I copy the image files > to /tmp before querying their thumbnail information. I agree that this is ugly, > but I couldn’t think of a better way of doing it. Any approach here would rely on a fixed path in /tmp which is a bit of a non-starter. A new approach would indeed be appreciated. Maybe a hacked up GFile really is the best way... I leave that to you to decide. > That’s why I deliberately changed them at the start of main(). Agreed, it is > fragile. I’ll change it in the next iteration of the patch. Thanks for handling this. Testing this is proving to be not-very-fun. An idea for a different approach though: why not dynamically generate the thumbnails? You could have fun testing 'corruption' cases this way, too.
(In reply to comment #18) > Review of attachment 257889 [details] [review]: > Isn’t it possible for some compilers to optimise out checks like (chunk_size + > 8 < chunk_size)? Only if they are buggy. Wrapping of unsigned ints is a very-clearly-defined part of the C specification. > The CERT-recommended way of performing this check would be: > if (G_MAXUINT32 - chunk_size < 8 || size < chunk_size + 8) I'm happy to do this as well -- it's essentially equivalent.
Comment on attachment 257889 [details] [review] file-info: Add a G_FILE_ATTRIBUTE_THUMBNAIL_IS_VALID attribute Attachment 257889 [details] pushed as fe70697 - file-info: Add a G_FILE_ATTRIBUTE_THUMBNAIL_IS_VALID attribute Leaving open for the promised testcases.
Created attachment 258007 [details] [review] tests: Add tests for the thumbnail verification code in GIO This code was added for use by the G_FILE_ATTRIBUTE_THUMBNAIL_IS_VALID file attribute, but may end up being used elsewhere (e.g. in GVfs) as well. As it’s dealing with untrusted external files, and the non-trivial PNG file format, this commit adds several test cases to cover valid and invalid PNG files. The security model for the thumbnail verification code is that the user’s cache directory is untrusted, and potentially any PNG file which is passed to the verifier has been manipulated arbitrarily by an attacker. This is a follow-up to commit fe7069749fe39a006985ec266260a3c02ee8c855.
Review of attachment 258007 [details] [review]: I'm concerned about the size of the png files here -- we'd more or less be permanently adding 1MB to the size of a glib git download. I think the best approach here would be to replace the image data in the examples with all-black. That should compress nicely. I also think that the private-library thing worked out a bit more complicated than it's worth. If you can include the .c file directly instead I think it would be cleaner (and probably easier in the end).
Created attachment 258036 [details] [review] tests: Add tests for the thumbnail verification code in GIO This code was added for use by the G_FILE_ATTRIBUTE_THUMBNAIL_IS_VALID file attribute, but may end up being used elsewhere (e.g. in GVfs) as well. As it’s dealing with untrusted external files, and the non-trivial PNG file format, this commit adds several test cases to cover valid and invalid PNG files. The security model for the thumbnail verification code is that the user’s cache directory is untrusted, and potentially any PNG file which is passed to the verifier has been manipulated arbitrarily by an attacker. This is a follow-up to commit fe7069749fe39a006985ec266260a3c02ee8c855.
(In reply to comment #23) > Review of attachment 258007 [details] [review]: > > I'm concerned about the size of the png files here -- we'd more or less be Good point, I completely forgot about that when doing the work. I’ve truncated the files down to 512B, which means they’re no longer valid PNG images (missing the IEND chunk), but we don’t care because the headers are all in-tact. > I also think that the private-library thing worked out a bit more complicated > than it's worth. If you can include the .c file directly instead I think it > would be cleaner (and probably easier in the end). I think having it as a noinst library makes it easier to copy the code and Makefile around as a self-contained blob. It’s more effort than it’s worth to change to including the files directly, so I haven’t.
Attachment 258036 [details] pushed as fcd2f7e - tests: Add tests for the thumbnail verification code in GIO I trimmed the patch down to just add the testcase and avoid refactoring the build system at the same time. Thanks!