GNOME Bugzilla – Bug 626618
jpegparse doesn't handle APP12 marker
Last modified: 2011-04-12 08:07:51 UTC
The JPEG APP12 "Picture Info" segment was used by some older cameras, and contains ASCII-based meta information. [1] When jpegparser finds an APP12 marker it doesn't handle correctly and lose the rest of the fields. 1. http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/APP12.html
Created attachment 167579 [details] [review] add the app12 marker handling
What about all the other APP* markers that aren't handled currently? Why not just skip them as well here?
(In reply to comment #2) > What about all the other APP* markers that aren't handled currently? Why not > just skip them as well here? Ok. I'll do it.
Created attachment 167591 [details] [review] all the application markers (APP0-APP15) are ignored
The patch looks good, tested with some images I had around and they still work :) I'd just reword the commit message, because we are not really ignoring all APP markers, just the ones that aren't processed. For example, we use APP0 to get exif tags.
Stefan, any reason you didn't skip everything that wasn't used when you wrote this?
Created attachment 167768 [details] [review] Ignore unhandled application markers changelog: * v1: ignore only app12 * v2: ignore all the app markers * v3: more precise log message
(In reply to comment #6) > Stefan, any reason you didn't skip everything that wasn't used when you wrote > this? I actually wrote the first pushed version of this element[1] and it was my intention to be rigorous in parsing: if one marker is unknown, the image is not a jpeg. Nevertheless I must say that your suggest is appealing :) 1. http://blogs.igalia.com/vjaquez/2010/01/29/gstjpegparser/
(In reply to comment #8) > (In reply to comment #6) > > Stefan, any reason you didn't skip everything that wasn't used when you wrote > > this? > > I actually wrote the first pushed version of this element[1] and it was my > intention to be rigorous in parsing: if one marker is unknown, the image is not > a jpeg. Nevertheless I must say that your suggest is appealing :) heh, sorry, I took a quick look at the recent activity and assumed Stefan. I should've looked at the copyright notice instead :) > > 1. http://blogs.igalia.com/vjaquez/2010/01/29/gstjpegparser/
Comment on attachment 167768 [details] [review] Ignore unhandled application markers >+ } else if (marker >= APP0 && marker <= APP15) { >+ const gchar *id_str; >+ if (!gst_byte_reader_get_uint16_be (&reader, &size)) >+ goto error; >+ if (!gst_byte_reader_get_string_utf8 (&reader, &id_str)) >+ goto error; >+ if (!gst_byte_reader_skip (&reader, size - 3 - strlen (id_str))) >+ goto error; >+ GST_LOG_OBJECT (parse, "application marker %x: '%s' skiping %u bytes", >+ marker, id_str, size - 2); I'm not so sure about / keen on the ID string check. What's its purpose? Why not just skip the segment based on the size given and be done with it? I bet there are APP segments without a 0-terminated ID string..
If it's just for debugging purposes, you could use GST_MEMDUMP(), which will also include the ID then.
Attending Tim's suggestions and further cleaning up, I'm attaching a patch set
Created attachment 167786 [details] [review] Ignore unhandled application markers
Created attachment 167787 [details] [review] add gst_jpeg_parse_skip_marker()
Created attachment 167789 [details] [review] validate if the comment string is utf8
Created attachment 167793 [details] [review] check the skipping return value
Created attachment 167794 [details] [review] add gst_jpeg_parse_parse_metadata ()
Created attachment 167795 [details] [review] jpegparse: use byte reader accessors
Comment on attachment 167789 [details] [review] validate if the comment string is utf8 > case COM:{ /* read comment and post as tag */ > GstTagList *tags; >- const guint8 *comment = NULL; >+ const guint8 *comment; I think the initialization here might be needed to avoid compiler warnings somewhere (osx build bot?). > if (!gst_byte_reader_get_uint16_be (&reader, &size)) > goto error; > if (!gst_byte_reader_get_data (&reader, size - 2, &comment)) > goto error; > ... >+ if (g_utf8_validate ((char *) comment, -1, NULL)) { > ... >+ } > break; > } Is this safe? I think g_utf8_validate() may end up reading beyond the allocated memory if there's no 0 in the data, because of the -1 length.
Created attachment 167818 [details] [review] ignore unhandled application markers
Created attachment 167819 [details] [review] add gst_jpeg_parse_skip_marker()
Created attachment 167820 [details] [review] validate if the comment string is utf8
Created attachment 167821 [details] [review] check the skipping return value
Created attachment 167822 [details] [review] add gst_jpeg_parse_parse_metadata ()
Created attachment 167823 [details] [review] use byte reader accessors
Created attachment 167824 [details] [review] validate app1 metadata id
(Apologies for the bikeshe^Hnitpicking) IMHO the previous code was much nicer, and the new code doesn't look particularly correct (e.g. passing 1 to g_utf8_validate as bytecount..) and it also does an unnecessary alloc - wouldn't it have been enough to just use gst_byte_reader_get_string() instead of _get_data() and then do the g_utf8_validate() on that? (Since _get_string() makes sure it's 0-terminated)? Maybe something like: static gboolean ngst_byte_reader_get_string_utf8_checked (GstByteReader * b, const gchar ** s) { if (!gst_byte_reader_get_string_utf8 (b, s)) return FALSE; return g_utf8_validate (*s, -1, NULL); } would do the trick as well.
(In reply to comment #27) > > static gboolean > ngst_byte_reader_get_string_utf8_checked (GstByteReader * b, const gchar ** s) > { > if (!gst_byte_reader_get_string_utf8 (b, s)) > return FALSE; > > return g_utf8_validate (*s, -1, NULL); > } > What about add this functions into GstByteReader?? http://pastebin.com/nV0n2DjL
> What about add this functions into GstByteReader?? http://pastebin.com/nV0n2DjL Sure, that'd be nice to have, but let's not block on that. Please file a separate bug for that.
Ok. Back to business. Sorry for the delay.
Created attachment 174660 [details] [review] jpegparse: fix typo
Created attachment 174661 [details] [review] jpegparse: inline gst_jpeg_parse_sof () No functional changes (hopefully).
Created attachment 174662 [details] [review] jpegparse: use byte reader accessors
Created attachment 174663 [details] [review] jpegparse: add gst_jpeg_parse_skip_marker ()
Created attachment 174664 [details] [review] jpegparse: skip all APP markers, excepting APP1
Created attachment 174666 [details] [review] jpegparse: refactor APP1 parsing add gst_jpeg_parse_app1 () and post_metadata ()
Created attachment 174667 [details] [review] jpegparse: refactor COM parsing add gst_jpeg_parse_com () and get_utf8_from_data () to extract and validate comment format
Comment on attachment 174660 [details] [review] jpegparse: fix typo Thanks!
Comment on attachment 174661 [details] [review] jpegparse: inline gst_jpeg_parse_sof () Thanks!
Comment on attachment 174662 [details] [review] jpegparse: use byte reader accessors Thanks!
Comment on attachment 174663 [details] [review] jpegparse: add gst_jpeg_parse_skip_marker () Thanks!
Review of attachment 174664 [details] [review]: Not a big deal, but keeping the detail in the log might help actually implementing those app-markers in the future. ::: gst/jpegformat/gstjpegparse.c @@ -643,3 @@ - marker, id_str, size - 2); - break; - } The point of this code was that we know that it contains a "id_str" and we can log that to have more info about what is in the file. That is lost with the generic gst_jpeg_parse_skip_marker().
(In reply to comment #11) > If it's just for debugging purposes, you could use GST_MEMDUMP(), which will > also include the ID then. The point is to actually implement them at some point :)
Victor, do you have such a picture (containing APP12). Would be nice to attach it here. Someone can use the info e.g. in: http://cpansearch.perl.org/src/EXIFTOOL/Image-ExifTool-8.10/lib/Image/ExifTool/APP12.pm to actualy implement reading the app-marker.
Review of attachment 174666 [details] [review]: Looks good in general. To be sure, you only refactored the app1 code out of the switch and also extracted the post_metadata code. There are no other changes? ::: gst/jpegformat/gstjpegparse.c @@ +519,3 @@ +static inline void +post_metadata (GstJpegParse * parse, guint size, guint8 * data, + GstTagList * (*TagFunc) (const GstBuffer * buff)) tiny nitpick, we would name "TagFunc" -> "tag_func" also what about naming this function "extract_and_post_tags"
Review of attachment 174667 [details] [review]: ::: gst/jpegformat/gstjpegparse.c @@ +604,3 @@ + GST_MEMDUMP ("comment", data, size); + if (s && !g_utf8_validate (s, size, NULL)) + return NULL; I have done quite some research on that and as far as I know this is just ascii. Have you found any spec saying that this is UTF8? Sure, valid ascii is also valid utf8. @@ +640,3 @@ + parse->priv->srcpad, tags); + + g_slice_free (char, comment); I think this is not correct - it would need to be e.g. char[20]. I would just use an ordinary g_strdup above and g_free here.
Created attachment 174853 [details] image with APP12 marker Sample image with APP12 marker
(In reply to comment #42) > Review of attachment 174664 [details] [review]: > > Not a big deal, but keeping the detail in the log might help actually > implementing those app-markers in the future. > > ::: gst/jpegformat/gstjpegparse.c > @@ -643,3 @@ > - marker, id_str, size - 2); > - break; > - } > > The point of this code was that we know that it contains a "id_str" and we can > log that to have more info about what is in the file. That is lost with the > generic gst_jpeg_parse_skip_marker(). Yeah, but IMO just logging the first string in the block doesn't help a lot. I think is preferable wait for further implementation of the APP?? extraction.
(In reply to comment #44) > Victor, do you have such a picture (containing APP12). Would be nice to attach > it here. Someone can use the info e.g. in: > http://cpansearch.perl.org/src/EXIFTOOL/Image-ExifTool-8.10/lib/Image/ExifTool/APP12.pm > to actualy implement reading the app-marker. Image attached. But I'd would like to commit this patch set and then work on this new feature (maybe open a new bug).
(In reply to comment #45) > Review of attachment 174666 [details] [review]: > > Looks good in general. To be sure, you only refactored the app1 code out of the > switch and also extracted the post_metadata code. There are no other changes? No other changes in this regard. > > ::: gst/jpegformat/gstjpegparse.c > @@ +519,3 @@ > +static inline void > +post_metadata (GstJpegParse * parse, guint size, guint8 * data, > + GstTagList * (*TagFunc) (const GstBuffer * buff)) > > tiny nitpick, we would name "TagFunc" -> "tag_func" > also what about naming this function "extract_and_post_tags" Sure. I can do that.
(In reply to comment #46) > Review of attachment 174667 [details] [review]: > > ::: gst/jpegformat/gstjpegparse.c > @@ +604,3 @@ > + GST_MEMDUMP ("comment", data, size); > + if (s && !g_utf8_validate (s, size, NULL)) > + return NULL; > > I have done quite some research on that and as far as I know this is just > ascii. Have you found any spec saying that this is UTF8? Sure, valid ascii is > also valid utf8. I've a couple images with the char 0x20 as only comment and the taglist complains about a non-utf8 char: (gst-launch-0.10:18556): GStreamer-WARNING **: Trying to set string on taglist field 'comment', but string is not valid UTF-8. Please file a bug. Bearing that in mind I validate for utf8 chars. And now it works. > > @@ +640,3 @@ > + parse->priv->srcpad, tags); > + > + g_slice_free (char, comment); > > I think this is not correct - it would need to be e.g. char[20]. I would just > use an ordinary g_strdup above and g_free here. As a I copy (in order to modify it) the string with g_slice_copy, it must be free with g_slice_free. I imagine it would have better performance than a g_strdup/g_free + /* as the string may not end in NUL, we must allocate a new string */ + str = g_slice_copy (size, s); + str[size] = '\0'; If you think it is not useful, I can change it.
Created attachment 174870 [details] [review] jpegparse: refactor APP1 parsing add gst_jpeg_parse_app1 () and post_metadata ()
Created attachment 174875 [details] [review] jpegparse: refactor APP1 parsing add gst_jpeg_parse_app1 () and extract_and_post_tags ()
Created attachment 174876 [details] [review] jpegparse: refactor COM parsing add gst_jpeg_parse_com () and get_utf8_from_data () to extract and validate comment format
oops... I need to rebase the patches.
Created attachment 174877 [details] [review] jpegparse: refactor APP1 parsing add gst_jpeg_parse_app1 () and extract_and_post_tags ()
Created attachment 174878 [details] [review] jpegparse: refactor COM parsing add gst_jpeg_parse_com () and get_utf8_from_data () to extract and validate comment format
mmmh... I didn't realized that Stefan changed the tag posting.
I have added a log message when an APP marker is found (as requested in comment #42). Also I rebased the patches according to the changes applied on #627211
Created attachment 174952 [details] [review] jpegparse: add get_tag_list ()
Created attachment 174953 [details] [review] jpegparse: skip all APP markers, excepting APP1
Created attachment 174954 [details] [review] jpegparse: log id when skipping an unhandled APP marker
Created attachment 174955 [details] [review] jpegparse: refactor APP1 parsing add gst_jpeg_parse_app1 () and extract_and_queue_tags ()
Created attachment 174956 [details] [review] jpegparse: refactor COM parsing add gst_jpeg_parse_com () and get_utf8_from_data () to extract and validate comment format
Created attachment 175093 [details] [review] jpegparse: add get_tag_list ()
Created attachment 175094 [details] [review] jpegparse: skip all APP markers, excepting APP1
Created attachment 175095 [details] [review] jpegparse: log id when skipping an unhandled APP marker
Created attachment 175096 [details] [review] jpegparse: refactor APP1 parsing add gst_jpeg_parse_app1 () and extract_and_queue_tags ()
Created attachment 175097 [details] [review] jpegparse: refactor COM parsing add gst_jpeg_parse_com () and get_utf8_from_data () to extract and validate comment format
Patches Rebased because of the changes in commit 7622328aab0aea21841a50e1fa4ab98b8d6389fa (jpegparse: Small optimization on tags parsing)
Review of attachment 175095 [details] [review]: ::: gst/jpegformat/gstjpegparse.c @@ +518,3 @@ + return FALSE; + } + can we write the whole thing as this to avoid unused variable warnings when debug logging is off. #ifndef GST_DISABLE_DEBUG if (marker >= APP0 && marker <= APP15) { const gchar *id_str = NULL; if (!gst_byte_reader_peek_string_utf8 (reader, &id_str)) return FALSE; GST_LOG_OBJECT (parse, "unhandled marker %x: '%s' skiping %u bytes", marker, id_str, size); } else #else GST_LOG_OBJECT (parse, "unhandled marker %x skiping %u bytes", marker, size); #endif
Sorrz victor, this has been fallen out of focus. Could you please take care of my last comment and rebase the commits. I will push them on the next occasion.
(In reply to comment #72) > Sorrz victor, this has been fallen out of focus. Could you please take care of > my last comment and rebase the commits. I will push them on the next occasion. Thanks Stefan. I'll allocate a slot this weekend to work in this task and I'll resubmit the patches.
Created attachment 185654 [details] [review] jpegparse: add get_tag_list ()
Created attachment 185655 [details] [review] jpegparse: skip all APP markers, excepting APP1
Created attachment 185656 [details] [review] jpegparse: log id when skipping an unhandled APP marker
Created attachment 185657 [details] [review] jpegparse: refactor APP1 parsing add gst_jpeg_parse_app1 () and extract_and_queue_tags ()
Created attachment 185658 [details] [review] jpegparse: refactor COM parsing add gst_jpeg_parse_com () and get_utf8_from_data () to extract and validate comment format
Created attachment 185659 [details] [review] jpegparse: add gst_jpeg_parse_remove_marker() This function will remove the whole marker from the buffer. Also we set it as the default behavior for marker JPG{0-13}? in order to avoid a useless #if
Patches apply, tests pass. Pushed. commit 824364d152dbac2039e2bf39bbcdb25434297fc7 Author: Víctor Manuel Jáquez Leal <vjaquez@igalia.com> Date: Sun Apr 10 19:53:35 2011 +0200 jpegparse: add gst_jpeg_parse_remove_marker() This function will remove the whole marker from the buffer. Also we set it as the default behavior for marker JPG{0-13}? in order to avoid a useless #if https://bugzilla.gnome.org/show_bug.cgi?id=626618 commit 0e27f49dd42623fe4f987381005a681389fd04ca Author: Víctor Manuel Jáquez Leal <vjaquez@igalia.com> Date: Fri Aug 13 12:38:02 2010 +0200 jpegparse: refactor COM parsing add gst_jpeg_parse_com () and get_utf8_from_data () to extract and validate comment format https://bugzilla.gnome.org/show_bug.cgi?id=626618 commit c9b2c4abfe5b2d367007ad17ac1c44482ca7804a Author: Víctor Manuel Jáquez Leal <vjaquez@igalia.com> Date: Tue Nov 16 18:22:07 2010 +0100 jpegparse: refactor APP1 parsing add gst_jpeg_parse_app1 () and extract_and_queue_tags () https://bugzilla.gnome.org/show_bug.cgi?id=626618 commit 81991e3323c2e2a84835017e6437ac037d66a367 Author: Víctor Manuel Jáquez Leal <vjaquez@igalia.com> Date: Sun Nov 21 15:05:43 2010 +0100 jpegparse: log id when skipping an unhandled APP marker https://bugzilla.gnome.org/show_bug.cgi?id=626618 commit a1f77eda32e7f981ee84d729d69f086c47acec50 Author: Víctor Manuel Jáquez Leal <vjaquez@igalia.com> Date: Tue Nov 16 17:47:17 2010 +0100 jpegparse: skip all APP markers, excepting APP1 https://bugzilla.gnome.org/show_bug.cgi?id=626618 commit 51e57d439e80e8b8e3cad98c9609df7817fa55ce Author: Víctor Manuel Jáquez Leal <vjaquez@igalia.com> Date: Sun Nov 21 15:09:17 2010 +0100 jpegparse: add get_tag_list () https://bugzilla.gnome.org/show_bug.cgi?id=626618
I think the last patches broke the camerabin2 unit tests. jpegparse is not fully parsing the whole jpeg anymore. I will continue debuging this now :/
Some info - the first e1 marker is read and then we're stuck in some data :/ gst_jpeg_parse_get_image_length: Parsing jpeg image data (4096 bytes) gst_jpeg_parse_get_image_length: Parse state: offset=0, resync=0, entropy len=0 gst_jpeg_parse_get_image_length: 0x00000002: tag e1, frame_len=306 gst_jpeg_parse_get_image_length: 0x00000136: tag e1, frame_len=3081 gst_jpeg_parse_get_image_length: 0x00000d41: tag ee, frame_len=14 gst_jpeg_parse_get_image_length: 0x00000d51: tag fe, frame_len=8 gst_jpeg_parse_get_image_length: 0x00000d5b: tag db, frame_len=67 gst_jpeg_parse_get_image_length: 0x00000da0: tag c0, frame_len=17 gst_jpeg_parse_get_image_length: 0x00000db3: tag c4, frame_len=31 gst_jpeg_parse_get_image_length: 0x00000dd4: tag c4, frame_len=181 gst_jpeg_parse_get_image_length: 0x00000e8b: tag da, frame_len=12 gst_jpeg_parse_get_image_length: 0x00000e8b: finding entropy segment length gst_jpeg_parse_get_image_length: Parsing jpeg image data (8192 bytes) gst_jpeg_parse_get_image_length: Parse state: offset=3721, resync=0, entropy len=357 gst_jpeg_parse_get_image_length: 0x00000e8b: tag da, frame_len=12 gst_jpeg_parse_get_image_length: 0x00000e8b: finding entropy segment length gst_jpeg_parse_get_image_length: Parsing jpeg image data (12288 bytes) gst_jpeg_parse_get_image_length: Parse state: offset=3721, resync=0, entropy len=4453 gst_jpeg_parse_get_image_length: 0x00000e8b: tag da, frame_len=12 gst_jpeg_parse_get_image_length: 0x00000e8b: finding entropy segment length gst_jpeg_parse_get_image_length: Parsing jpeg image data (16384 bytes) gst_jpeg_parse_get_image_length: Parse state: offset=3721, resync=0, entropy len=8549 gst_jpeg_parse_get_image_length: 0x00000e8b: tag da, frame_len=12 gst_jpeg_parse_get_image_length: 0x00000e8b: finding entropy segment length gst_jpeg_parse_get_image_length: Parsing jpeg image data (20480 bytes) gst_jpeg_parse_get_image_length: Parse state: offset=3721, resync=0, entropy len=12645 gst_jpeg_parse_get_image_length: 0x00000e8b: tag da, frame_len=12 gst_jpeg_parse_get_image_length: 0x00000e8b: finding entropy segment length gst_jpeg_parse_get_image_length: Parsing jpeg image data (22653 bytes) gst_jpeg_parse_get_image_length: Parse state: offset=3721, resync=0, entropy len=16741 gst_jpeg_parse_get_image_length: 0x00000e8b: tag da, frame_len=12 gst_jpeg_parse_get_image_length: 0x00000e8b: finding entropy segment length gst_jpeg_parse_get_image_length: entropy segment length=18914 => frame_len=18926 gst_jpeg_parse_get_image_length: 0x0000587b: EOI marker gst_jpeg_parse_chain:<jpegparse0> parsed image of size 22653 gst_jpeg_parse_read_header:<jpegparse0> marker = d8 gst_jpeg_parse_read_header:<jpegparse0> next marker = ff gst_jpeg_parse_read_header:<jpegparse0> marker = e1 extract_and_queue_tags:<jpegparse0> collected tags: taglist, description=(string)"some\ description", device-manufacturer=(string)MyFavoriteBrand, device-model=(string)123v42.1, application-name=(string)"camerabin2\ test", copyright=(string)"My\ copyright\ notice", geo-location-latitude=(double)36.600000000000001, geo-location-longitude=(double)-12.5, geo-location-elevation=(double)300.85000000000002; gst_jpeg_parse_app1:<jpegparse0> parsed marker e1: 'Exif' 304 bytes gst_jpeg_parse_read_header:<jpegparse0> next marker = 74 gst_jpeg_parse_set_new_caps:<jpegparse0> setting caps on srcpad (hdr_ok=0, have_fps=0) gst_jpeg_parse_set_new_caps:<jpegparse0> setting downstream caps on jpegparse0:src to image/jpeg, parsed=(boolean)true, framerate=(fraction)1/1 gst_jpeg_parse_push_buffer:<jpegparse0> Pushing tags: taglist, description=(string)"some\ description", device-manufacturer=(string)MyFavoriteBrand, device-model=(string)123v42.1, application-name=(string)"camerabin2\ test", copyright=(string)"My\ copyright\ notice", geo-location-latitude=(double)36.600000000000001, geo-location-longitude=(double)-12.5, geo-location-elevation=(double)300.85000000000002; gst_jpeg_parse_push_buffer:<jpegparse0> pushing buffer (ts=0:00:00.000000000, len=22653)
"jpegparse: refactor APP1 parsing" was the culprit. Victor, if you can think of a unit-test to ensure this does not regresses, that be great :) commit b84dd0a766ab4098e64ddc591c3c9031a313afe8 Author: Stefan Kost <ensonic@users.sf.net> Date: Mon Apr 11 18:31:45 2011 +0300 jpegparse: subtract id-str size from the remaining read Fixes a regression from the patches in bug #626618.
Thanks Stefan. I'll try to cook a unit test this weekend again :)