Bug 626618 – jpegparse doesn't handle APP12 marker

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 626618 - jpegparse doesn't handle APP12 marker


Summary:	jpegparse doesn't handle APP12 marker


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-bad
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	0.10.22
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2010-08-11 11:06 UTC by Víctor Manuel Jáquez Leal
Modified:	2011-04-12 08:07 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
add the app12 marker handling (1.25 KB, patch) 2010-08-11 11:09 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
all the application markers (APP0-APP15) are ignored (2.65 KB, patch) 2010-08-11 12:35 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
Ignore unhandled application markers (2.65 KB, patch) 2010-08-12 19:55 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
Ignore unhandled application markers (2.66 KB, patch) 2010-08-13 11:27 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
add gst_jpeg_parse_skip_marker() (2.96 KB, patch) 2010-08-13 11:28 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
validate if the comment string is utf8 (1.76 KB, patch) 2010-08-13 11:29 UTC, Víctor Manuel Jáquez Leal	reviewed	Details \| Review
check the skipping return value (926 bytes, patch) 2010-08-13 11:29 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
add gst_jpeg_parse_parse_metadata () (5.68 KB, patch) 2010-08-13 11:30 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
jpegparse: use byte reader accessors (1001 bytes, patch) 2010-08-13 11:31 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
ignore unhandled application markers (2.66 KB, patch) 2010-08-13 16:13 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
add gst_jpeg_parse_skip_marker() (2.96 KB, patch) 2010-08-13 16:14 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
validate if the comment string is utf8 (2.68 KB, patch) 2010-08-13 16:16 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
check the skipping return value (926 bytes, patch) 2010-08-13 16:20 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
add gst_jpeg_parse_parse_metadata () (5.69 KB, patch) 2010-08-13 16:21 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
use byte reader accessors (1001 bytes, patch) 2010-08-13 16:21 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
validate app1 metadata id (3.54 KB, patch) 2010-08-13 16:22 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
jpegparse: fix typo (907 bytes, patch) 2010-11-17 10:56 UTC, Víctor Manuel Jáquez Leal	committed	Details \| Review
jpegparse: inline gst_jpeg_parse_sof () (841 bytes, patch) 2010-11-17 10:56 UTC, Víctor Manuel Jáquez Leal	committed	Details \| Review
jpegparse: use byte reader accessors (1.02 KB, patch) 2010-11-17 10:56 UTC, Víctor Manuel Jáquez Leal	committed	Details \| Review
jpegparse: add gst_jpeg_parse_skip_marker () (2.32 KB, patch) 2010-11-17 10:56 UTC, Víctor Manuel Jáquez Leal	committed	Details \| Review
jpegparse: skip all APP markers, excepting APP1 (1.81 KB, patch) 2010-11-17 10:56 UTC, Víctor Manuel Jáquez Leal	reviewed	Details \| Review
jpegparse: refactor APP1 parsing (5.41 KB, patch) 2010-11-17 10:56 UTC, Víctor Manuel Jáquez Leal	needs-work	Details \| Review
jpegparse: refactor COM parsing (2.84 KB, patch) 2010-11-17 10:57 UTC, Víctor Manuel Jáquez Leal	needs-work	Details \| Review
image with APP12 marker (1.92 KB, image/jpeg) 2010-11-19 15:49 UTC, Víctor Manuel Jáquez Leal		Details
jpegparse: refactor APP1 parsing (5.44 KB, patch) 2010-11-19 18:07 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
jpegparse: refactor APP1 parsing (5.45 KB, patch) 2010-11-19 18:12 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
jpegparse: refactor COM parsing (2.84 KB, patch) 2010-11-19 18:12 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
jpegparse: refactor APP1 parsing (5.68 KB, patch) 2010-11-19 18:39 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
jpegparse: refactor COM parsing (2.77 KB, patch) 2010-11-19 18:39 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
jpegparse: add get_tag_list () (1.42 KB, patch) 2010-11-21 14:28 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
jpegparse: skip all APP markers, excepting APP1 (1.81 KB, patch) 2010-11-21 14:28 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
jpegparse: log id when skipping an unhandled APP marker (1.41 KB, patch) 2010-11-21 14:28 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
jpegparse: refactor APP1 parsing (5.60 KB, patch) 2010-11-21 14:28 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
jpegparse: refactor COM parsing (2.66 KB, patch) 2010-11-21 14:28 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
jpegparse: add get_tag_list () (1.42 KB, patch) 2010-11-23 08:42 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
jpegparse: skip all APP markers, excepting APP1 (1.81 KB, patch) 2010-11-23 08:42 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
jpegparse: log id when skipping an unhandled APP marker (1.41 KB, patch) 2010-11-23 08:42 UTC, Víctor Manuel Jáquez Leal	needs-work	Details \| Review
jpegparse: refactor APP1 parsing (5.73 KB, patch) 2010-11-23 08:42 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
jpegparse: refactor COM parsing (2.66 KB, patch) 2010-11-23 08:43 UTC, Víctor Manuel Jáquez Leal	none	Details \| Review
jpegparse: add get_tag_list () (1.43 KB, patch) 2011-04-10 18:07 UTC, Víctor Manuel Jáquez Leal	committed	Details \| Review
jpegparse: skip all APP markers, excepting APP1 (1.81 KB, patch) 2011-04-10 18:08 UTC, Víctor Manuel Jáquez Leal	committed	Details \| Review
jpegparse: log id when skipping an unhandled APP marker (1.39 KB, patch) 2011-04-10 18:08 UTC, Víctor Manuel Jáquez Leal	committed	Details \| Review
jpegparse: refactor APP1 parsing (5.73 KB, patch) 2011-04-10 18:08 UTC, Víctor Manuel Jáquez Leal	committed	Details \| Review
jpegparse: refactor COM parsing (2.85 KB, patch) 2011-04-10 18:09 UTC, Víctor Manuel Jáquez Leal	committed	Details \| Review
jpegparse: add gst_jpeg_parse_remove_marker() (3.16 KB, patch) 2011-04-10 18:09 UTC, Víctor Manuel Jáquez Leal	committed	Details \| Review

Description Víctor Manuel Jáquez Leal 2010-08-11 11:06:36 UTC

The JPEG APP12 "Picture Info" segment was used by some older cameras, and contains ASCII-based meta information. [1]

When jpegparser finds an APP12 marker it doesn't handle correctly and lose the rest of the fields.

1. http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/APP12.html

Comment 1 Víctor Manuel Jáquez Leal 2010-08-11 11:09:40 UTC

Created attachment 167579 [details] [review]
add the app12 marker handling

Comment 2 Tim-Philipp Müller 2010-08-11 11:14:53 UTC

What about all the other APP* markers that aren't handled currently? Why not just skip them as well here?

Comment 3 Víctor Manuel Jáquez Leal 2010-08-11 11:21:08 UTC

(In reply to comment #2)
> What about all the other APP* markers that aren't handled currently? Why not
> just skip them as well here?

Ok. I'll do it.

Comment 4 Víctor Manuel Jáquez Leal 2010-08-11 12:35:45 UTC

Created attachment 167591 [details] [review]
all the application markers (APP0-APP15) are ignored

Comment 5 Thiago Sousa Santos 2010-08-12 19:13:27 UTC

The patch looks good, tested with some images I had around and they still work :)

I'd just reword the commit message, because we are not really ignoring all APP markers, just the ones that aren't processed. For example, we use APP0 to get exif tags.

Comment 6 Thiago Sousa Santos 2010-08-12 19:14:04 UTC

Stefan, any reason you didn't skip everything that wasn't used when you wrote this?

Comment 7 Víctor Manuel Jáquez Leal 2010-08-12 19:55:33 UTC

Created attachment 167768 [details] [review]
Ignore unhandled application markers

changelog:
* v1: ignore only app12
* v2: ignore all the app markers
* v3: more precise log message

Comment 8 Víctor Manuel Jáquez Leal 2010-08-12 19:58:37 UTC

(In reply to comment #6)
> Stefan, any reason you didn't skip everything that wasn't used when you wrote
> this?

I actually wrote the first pushed version of this element[1] and it was my intention to be rigorous in parsing: if one marker is unknown, the image is not a jpeg. Nevertheless I must say that your suggest is appealing :)

1. http://blogs.igalia.com/vjaquez/2010/01/29/gstjpegparser/

Comment 9 Thiago Sousa Santos 2010-08-12 20:02:03 UTC

(In reply to comment #8)
> (In reply to comment #6)
> > Stefan, any reason you didn't skip everything that wasn't used when you wrote
> > this?
> 
> I actually wrote the first pushed version of this element[1] and it was my
> intention to be rigorous in parsing: if one marker is unknown, the image is not
> a jpeg. Nevertheless I must say that your suggest is appealing :)

heh, sorry, I took a quick look at the recent activity and assumed Stefan. I should've looked at the copyright notice instead :)

> 
> 1. http://blogs.igalia.com/vjaquez/2010/01/29/gstjpegparser/

Comment 10 Tim-Philipp Müller 2010-08-12 20:14:38 UTC

Comment on attachment 167768 [details] [review]
Ignore unhandled application markers

>+        } else if (marker >= APP0 && marker <= APP15) {
>+          const gchar *id_str;
>+          if (!gst_byte_reader_get_uint16_be (&reader, &size))
>+            goto error;
>+          if (!gst_byte_reader_get_string_utf8 (&reader, &id_str))
>+            goto error;
>+          if (!gst_byte_reader_skip (&reader, size - 3 - strlen (id_str)))
>+            goto error;
>+          GST_LOG_OBJECT (parse, "application marker %x: '%s' skiping %u bytes",
>+              marker, id_str, size - 2);

I'm not so sure about / keen on the ID string check. What's its purpose? Why not just skip the segment based on the size given and be done with it? I bet there are APP segments without a 0-terminated ID string..

Comment 11 Tim-Philipp Müller 2010-08-12 20:17:35 UTC

If it's just for debugging purposes, you could use GST_MEMDUMP(), which will also include the ID then.

Comment 12 Víctor Manuel Jáquez Leal 2010-08-13 11:27:00 UTC

Attending Tim's suggestions and further cleaning up, I'm attaching a patch set

Comment 13 Víctor Manuel Jáquez Leal 2010-08-13 11:27:35 UTC

Created attachment 167786 [details] [review]
Ignore unhandled application markers

Comment 14 Víctor Manuel Jáquez Leal 2010-08-13 11:28:32 UTC

Created attachment 167787 [details] [review]
add gst_jpeg_parse_skip_marker()

Comment 15 Víctor Manuel Jáquez Leal 2010-08-13 11:29:21 UTC

Created attachment 167789 [details] [review]
validate if the comment string is utf8

Comment 16 Víctor Manuel Jáquez Leal 2010-08-13 11:29:54 UTC

Created attachment 167793 [details] [review]
check the skipping return value

Comment 17 Víctor Manuel Jáquez Leal 2010-08-13 11:30:55 UTC

Created attachment 167794 [details] [review]
add gst_jpeg_parse_parse_metadata ()

Comment 18 Víctor Manuel Jáquez Leal 2010-08-13 11:31:40 UTC

Created attachment 167795 [details] [review]
jpegparse: use byte reader accessors

Comment 19 Tim-Philipp Müller 2010-08-13 11:55:45 UTC

Comment on attachment 167789 [details] [review]
validate if the comment string is utf8

>       case COM:{               /* read comment and post as tag */
>         GstTagList *tags;
>-        const guint8 *comment = NULL;
>+        const guint8 *comment;

I think the initialization here might be needed to avoid compiler warnings somewhere (osx build bot?).

>         if (!gst_byte_reader_get_uint16_be (&reader, &size))
>           goto error;
>         if (!gst_byte_reader_get_data (&reader, size - 2, &comment))
>           goto error;
>         ...
>+       if (g_utf8_validate ((char *) comment, -1, NULL)) {
>             ...
>+       }
>         break;
>       }

Is this safe? I think g_utf8_validate() may end up reading beyond the allocated memory if there's no 0 in the data, because of the -1 length.

Comment 20 Víctor Manuel Jáquez Leal 2010-08-13 16:13:48 UTC

Created attachment 167818 [details] [review]
ignore unhandled application markers

Comment 21 Víctor Manuel Jáquez Leal 2010-08-13 16:14:20 UTC

Created attachment 167819 [details] [review]
add gst_jpeg_parse_skip_marker()

Comment 22 Víctor Manuel Jáquez Leal 2010-08-13 16:16:49 UTC

Created attachment 167820 [details] [review]
validate if the comment string is utf8

Comment 23 Víctor Manuel Jáquez Leal 2010-08-13 16:20:17 UTC

Created attachment 167821 [details] [review]
check the skipping return value

Comment 24 Víctor Manuel Jáquez Leal 2010-08-13 16:21:10 UTC

Created attachment 167822 [details] [review]
add gst_jpeg_parse_parse_metadata ()

Comment 25 Víctor Manuel Jáquez Leal 2010-08-13 16:21:52 UTC

Created attachment 167823 [details] [review]
use byte reader accessors

Comment 26 Víctor Manuel Jáquez Leal 2010-08-13 16:22:25 UTC

Created attachment 167824 [details] [review]
validate app1 metadata id

Comment 27 Tim-Philipp Müller 2010-08-13 17:12:49 UTC

(Apologies for the bikeshe^Hnitpicking)

IMHO the previous code was much nicer, and the new code doesn't look particularly correct (e.g. passing 1 to g_utf8_validate as bytecount..) and it also does an unnecessary alloc  - wouldn't it have been enough to just use gst_byte_reader_get_string() instead of _get_data() and then do the g_utf8_validate() on that? (Since _get_string() makes sure it's 0-terminated)?

Maybe something like:

static gboolean
ngst_byte_reader_get_string_utf8_checked (GstByteReader * b, const gchar ** s)
{
  if (!gst_byte_reader_get_string_utf8 (b, s))
    return FALSE;

  return g_utf8_validate (*s, -1, NULL);
}

would do the trick as well.

Comment 28 Víctor Manuel Jáquez Leal 2010-08-17 18:42:27 UTC

(In reply to comment #27)
> 
> static gboolean
> ngst_byte_reader_get_string_utf8_checked (GstByteReader * b, const gchar ** s)
> {
>   if (!gst_byte_reader_get_string_utf8 (b, s))
>     return FALSE;
> 
>   return g_utf8_validate (*s, -1, NULL);
> }
> 

What about add this functions into GstByteReader?? http://pastebin.com/nV0n2DjL

Comment 29 Tim-Philipp Müller 2010-08-20 18:24:42 UTC

> What about add this functions into GstByteReader?? http://pastebin.com/nV0n2DjL

Sure, that'd be nice to have, but let's not block on that. Please file a separate bug for that.

Comment 30 Víctor Manuel Jáquez Leal 2010-11-17 10:48:20 UTC

Ok. Back to business. Sorry for the delay.

Comment 31 Víctor Manuel Jáquez Leal 2010-11-17 10:56:07 UTC

Created attachment 174660 [details] [review]
jpegparse: fix typo

Comment 32 Víctor Manuel Jáquez Leal 2010-11-17 10:56:16 UTC

Created attachment 174661 [details] [review]
jpegparse: inline gst_jpeg_parse_sof ()

No functional changes (hopefully).

Comment 33 Víctor Manuel Jáquez Leal 2010-11-17 10:56:25 UTC

Created attachment 174662 [details] [review]
jpegparse: use byte reader accessors

Comment 34 Víctor Manuel Jáquez Leal 2010-11-17 10:56:35 UTC

Created attachment 174663 [details] [review]
jpegparse: add gst_jpeg_parse_skip_marker ()

Comment 35 Víctor Manuel Jáquez Leal 2010-11-17 10:56:44 UTC

Created attachment 174664 [details] [review]
jpegparse: skip all APP markers, excepting APP1

Comment 36 Víctor Manuel Jáquez Leal 2010-11-17 10:56:53 UTC

Created attachment 174666 [details] [review]
jpegparse: refactor APP1 parsing

add gst_jpeg_parse_app1 () and post_metadata ()

Comment 37 Víctor Manuel Jáquez Leal 2010-11-17 10:57:03 UTC

Created attachment 174667 [details] [review]
jpegparse: refactor COM parsing

add gst_jpeg_parse_com () and get_utf8_from_data () to extract and
validate comment format

Comment 38 Stefan Sauer (gstreamer, gtkdoc dev) 2010-11-19 14:39:35 UTC

Comment on attachment 174660 [details] [review]
jpegparse: fix typo

Thanks!

Comment 39 Stefan Sauer (gstreamer, gtkdoc dev) 2010-11-19 14:39:43 UTC

Comment on attachment 174661 [details] [review]
jpegparse: inline gst_jpeg_parse_sof ()

Thanks!

Comment 40 Stefan Sauer (gstreamer, gtkdoc dev) 2010-11-19 14:39:54 UTC

Comment on attachment 174662 [details] [review]
jpegparse: use byte reader accessors

Thanks!

Comment 41 Stefan Sauer (gstreamer, gtkdoc dev) 2010-11-19 14:40:03 UTC

Comment on attachment 174663 [details] [review]
jpegparse: add gst_jpeg_parse_skip_marker ()

Thanks!

Comment 42 Stefan Sauer (gstreamer, gtkdoc dev) 2010-11-19 14:48:58 UTC

Review of attachment 174664 [details] [review]:

Not a big deal, but keeping the detail in the log might help actually implementing those app-markers in the future.

::: gst/jpegformat/gstjpegparse.c
@@ -643,3 @@
-            marker, id_str, size - 2);
-        break;
-      }

The point of this code was that we know that it contains a "id_str" and we can log that to have more info about what is in the file. That is lost with the generic gst_jpeg_parse_skip_marker().

Comment 43 Stefan Sauer (gstreamer, gtkdoc dev) 2010-11-19 14:59:43 UTC

(In reply to comment #11)
> If it's just for debugging purposes, you could use GST_MEMDUMP(), which will
> also include the ID then.

The point is to actually implement them at some point :)

Comment 44 Stefan Sauer (gstreamer, gtkdoc dev) 2010-11-19 15:00:17 UTC

Victor, do you have such a picture (containing APP12). Would be nice to attach it here. Someone can use the info e.g. in:
http://cpansearch.perl.org/src/EXIFTOOL/Image-ExifTool-8.10/lib/Image/ExifTool/APP12.pm
to actualy implement reading the app-marker.

Comment 45 Stefan Sauer (gstreamer, gtkdoc dev) 2010-11-19 15:07:41 UTC

Review of attachment 174666 [details] [review]:

Looks good in general. To be sure, you only refactored the app1 code out of the switch and also extracted the post_metadata code. There are no other changes?

::: gst/jpegformat/gstjpegparse.c
@@ +519,3 @@
+static inline void
+post_metadata (GstJpegParse * parse, guint size, guint8 * data,
+    GstTagList * (*TagFunc) (const GstBuffer * buff))

tiny nitpick, we would name "TagFunc" -> "tag_func"
also what about naming this function "extract_and_post_tags"

Comment 46 Stefan Sauer (gstreamer, gtkdoc dev) 2010-11-19 15:15:27 UTC

Review of attachment 174667 [details] [review]:

::: gst/jpegformat/gstjpegparse.c
@@ +604,3 @@
+  GST_MEMDUMP ("comment", data, size);
+  if (s && !g_utf8_validate (s, size, NULL))
+    return NULL;

I have done quite some research on that and as far as I know this is just ascii. Have you found any spec saying that this is UTF8? Sure, valid ascii is also valid utf8.

@@ +640,3 @@
+        parse->priv->srcpad, tags);
+
+    g_slice_free (char, comment);

I think this is not correct - it would need to be e.g. char[20]. I would just use an ordinary g_strdup above and g_free here.

Comment 47 Víctor Manuel Jáquez Leal 2010-11-19 15:49:31 UTC

Created attachment 174853 [details]
image with APP12 marker

Sample image with APP12 marker

Comment 48 Víctor Manuel Jáquez Leal 2010-11-19 17:10:33 UTC

(In reply to comment #42)
> Review of attachment 174664 [details] [review]:
> 
> Not a big deal, but keeping the detail in the log might help actually
> implementing those app-markers in the future.
> 
> ::: gst/jpegformat/gstjpegparse.c
> @@ -643,3 @@
> -            marker, id_str, size - 2);
> -        break;
> -      }
> 
> The point of this code was that we know that it contains a "id_str" and we can
> log that to have more info about what is in the file. That is lost with the
> generic gst_jpeg_parse_skip_marker().

Yeah, but IMO just logging the first string in the block doesn't help a lot. I think is preferable wait for further implementation of the APP?? extraction.

Comment 49 Víctor Manuel Jáquez Leal 2010-11-19 17:13:38 UTC

(In reply to comment #44)
> Victor, do you have such a picture (containing APP12). Would be nice to attach
> it here. Someone can use the info e.g. in:
> http://cpansearch.perl.org/src/EXIFTOOL/Image-ExifTool-8.10/lib/Image/ExifTool/APP12.pm
> to actualy implement reading the app-marker.

Image attached.

But I'd would like to commit this patch set and then work on this new feature (maybe open a new bug).

Comment 50 Víctor Manuel Jáquez Leal 2010-11-19 17:16:22 UTC

(In reply to comment #45)
> Review of attachment 174666 [details] [review]:
> 
> Looks good in general. To be sure, you only refactored the app1 code out of the
> switch and also extracted the post_metadata code. There are no other changes?

No other changes in this regard.

> 
> ::: gst/jpegformat/gstjpegparse.c
> @@ +519,3 @@
> +static inline void
> +post_metadata (GstJpegParse * parse, guint size, guint8 * data,
> +    GstTagList * (*TagFunc) (const GstBuffer * buff))
> 
> tiny nitpick, we would name "TagFunc" -> "tag_func"
> also what about naming this function "extract_and_post_tags"

Sure. I can do that.

Comment 51 Víctor Manuel Jáquez Leal 2010-11-19 17:40:29 UTC

(In reply to comment #46)
> Review of attachment 174667 [details] [review]:
> 
> ::: gst/jpegformat/gstjpegparse.c
> @@ +604,3 @@
> +  GST_MEMDUMP ("comment", data, size);
> +  if (s && !g_utf8_validate (s, size, NULL))
> +    return NULL;
> 
> I have done quite some research on that and as far as I know this is just
> ascii. Have you found any spec saying that this is UTF8? Sure, valid ascii is
> also valid utf8.

I've a couple images with the char 0x20 as only comment and the taglist complains about a non-utf8 char:

(gst-launch-0.10:18556): GStreamer-WARNING **: Trying to set string on taglist field 'comment', but string is not valid UTF-8. Please file a bug.

Bearing that in mind I validate for utf8 chars. And now it works.

> 
> @@ +640,3 @@
> +        parse->priv->srcpad, tags);
> +
> +    g_slice_free (char, comment);
> 
> I think this is not correct - it would need to be e.g. char[20]. I would just
> use an ordinary g_strdup above and g_free here.

As a I copy (in order to modify it) the string with g_slice_copy, it must be free with g_slice_free. I imagine it would have better performance than a g_strdup/g_free

+  /* as the string may not end in NUL, we must allocate a new string */
+  str = g_slice_copy (size, s);
+  str[size] = '\0';

If you think it is not useful, I can change it.

Comment 52 Víctor Manuel Jáquez Leal 2010-11-19 18:07:22 UTC

Created attachment 174870 [details] [review]
jpegparse: refactor APP1 parsing

add gst_jpeg_parse_app1 () and post_metadata ()

Comment 53 Víctor Manuel Jáquez Leal 2010-11-19 18:12:04 UTC

Created attachment 174875 [details] [review]
jpegparse: refactor APP1 parsing

add gst_jpeg_parse_app1 () and extract_and_post_tags ()

Comment 54 Víctor Manuel Jáquez Leal 2010-11-19 18:12:09 UTC

Created attachment 174876 [details] [review]
jpegparse: refactor COM parsing

add gst_jpeg_parse_com () and get_utf8_from_data () to extract and
validate comment format

Comment 55 Víctor Manuel Jáquez Leal 2010-11-19 18:38:46 UTC

oops... I need to rebase the patches.

Comment 56 Víctor Manuel Jáquez Leal 2010-11-19 18:39:38 UTC

Created attachment 174877 [details] [review]
jpegparse: refactor APP1 parsing

add gst_jpeg_parse_app1 () and extract_and_post_tags ()

Comment 57 Víctor Manuel Jáquez Leal 2010-11-19 18:39:43 UTC

Created attachment 174878 [details] [review]
jpegparse: refactor COM parsing

add gst_jpeg_parse_com () and get_utf8_from_data () to extract and
validate comment format

Comment 58 Víctor Manuel Jáquez Leal 2010-11-19 18:56:30 UTC

mmmh... I didn't realized that Stefan changed the tag posting.

Comment 59 Víctor Manuel Jáquez Leal 2010-11-21 14:24:52 UTC

I have added a log message when an APP marker is found (as requested in comment #42).

Also I rebased the patches according to the changes applied on #627211

Comment 60 Víctor Manuel Jáquez Leal 2010-11-21 14:28:13 UTC

Created attachment 174952 [details] [review]
jpegparse: add get_tag_list ()

Comment 61 Víctor Manuel Jáquez Leal 2010-11-21 14:28:21 UTC

Created attachment 174953 [details] [review]
jpegparse: skip all APP markers, excepting APP1

Comment 62 Víctor Manuel Jáquez Leal 2010-11-21 14:28:27 UTC

Created attachment 174954 [details] [review]
jpegparse: log id when skipping an unhandled APP marker

Comment 63 Víctor Manuel Jáquez Leal 2010-11-21 14:28:33 UTC

Created attachment 174955 [details] [review]
jpegparse: refactor APP1 parsing

add gst_jpeg_parse_app1 () and extract_and_queue_tags ()

Comment 64 Víctor Manuel Jáquez Leal 2010-11-21 14:28:39 UTC

Created attachment 174956 [details] [review]
jpegparse: refactor COM parsing

add gst_jpeg_parse_com () and get_utf8_from_data () to extract and
validate comment format

Comment 65 Víctor Manuel Jáquez Leal 2010-11-23 08:42:37 UTC

Created attachment 175093 [details] [review]
jpegparse: add get_tag_list ()

Comment 66 Víctor Manuel Jáquez Leal 2010-11-23 08:42:43 UTC

Created attachment 175094 [details] [review]
jpegparse: skip all APP markers, excepting APP1

Comment 67 Víctor Manuel Jáquez Leal 2010-11-23 08:42:48 UTC

Created attachment 175095 [details] [review]
jpegparse: log id when skipping an unhandled APP marker

Comment 68 Víctor Manuel Jáquez Leal 2010-11-23 08:42:55 UTC

Created attachment 175096 [details] [review]
jpegparse: refactor APP1 parsing

add gst_jpeg_parse_app1 () and extract_and_queue_tags ()

Comment 69 Víctor Manuel Jáquez Leal 2010-11-23 08:43:00 UTC

Created attachment 175097 [details] [review]
jpegparse: refactor COM parsing

add gst_jpeg_parse_com () and get_utf8_from_data () to extract and
validate comment format

Comment 70 Víctor Manuel Jáquez Leal 2010-11-23 08:44:51 UTC

Patches Rebased because of the changes in commit 7622328aab0aea21841a50e1fa4ab98b8d6389fa (jpegparse: Small optimization on tags parsing)

Comment 71 Stefan Sauer (gstreamer, gtkdoc dev) 2011-04-07 12:30:00 UTC

Review of attachment 175095 [details] [review]:

::: gst/jpegformat/gstjpegparse.c
@@ +518,3 @@
+      return FALSE;
+  }
+

can we write the whole thing as this to avoid unused variable warnings when debug logging is off.
#ifndef GST_DISABLE_DEBUG
if (marker >= APP0 && marker <= APP15) {
  const gchar *id_str = NULL;
  if (!gst_byte_reader_peek_string_utf8 (reader, &id_str))
    return FALSE;
  GST_LOG_OBJECT (parse, "unhandled marker %x: '%s' skiping %u bytes",
    marker, id_str, size);
} else
#else
  GST_LOG_OBJECT (parse, "unhandled marker %x skiping %u bytes", marker,
    size);
#endif

Comment 72 Stefan Sauer (gstreamer, gtkdoc dev) 2011-04-07 12:31:18 UTC

Sorrz victor, this has been fallen out of focus. Could you please take care of my last comment and rebase the commits. I will push them on the next occasion.

Comment 73 Víctor Manuel Jáquez Leal 2011-04-07 13:14:33 UTC

(In reply to comment #72)
> Sorrz victor, this has been fallen out of focus. Could you please take care of
> my last comment and rebase the commits. I will push them on the next occasion.

Thanks Stefan. I'll allocate a slot this weekend to work in this task and I'll resubmit the patches.

Comment 74 Víctor Manuel Jáquez Leal 2011-04-10 18:07:59 UTC

Created attachment 185654 [details] [review]
jpegparse: add get_tag_list ()

Comment 75 Víctor Manuel Jáquez Leal 2011-04-10 18:08:18 UTC

Created attachment 185655 [details] [review]
jpegparse: skip all APP markers, excepting APP1

Comment 76 Víctor Manuel Jáquez Leal 2011-04-10 18:08:31 UTC

Created attachment 185656 [details] [review]
jpegparse: log id when skipping an unhandled APP marker

Comment 77 Víctor Manuel Jáquez Leal 2011-04-10 18:08:47 UTC

Created attachment 185657 [details] [review]
jpegparse: refactor APP1 parsing

add gst_jpeg_parse_app1 () and extract_and_queue_tags ()

Comment 78 Víctor Manuel Jáquez Leal 2011-04-10 18:09:01 UTC

Created attachment 185658 [details] [review]
jpegparse: refactor COM parsing

add gst_jpeg_parse_com () and get_utf8_from_data () to extract and
validate comment format

Comment 79 Víctor Manuel Jáquez Leal 2011-04-10 18:09:16 UTC

Created attachment 185659 [details] [review]
jpegparse: add gst_jpeg_parse_remove_marker()

This function will remove the whole marker from the buffer.

Also we set it as the default behavior for marker JPG{0-13}? in order to avoid
a useless #if

Comment 80 Stefan Sauer (gstreamer, gtkdoc dev) 2011-04-11 09:41:24 UTC

Patches apply, tests pass. Pushed.

commit 824364d152dbac2039e2bf39bbcdb25434297fc7
Author: Víctor Manuel Jáquez Leal <vjaquez@igalia.com>
Date:   Sun Apr 10 19:53:35 2011 +0200

    jpegparse: add gst_jpeg_parse_remove_marker()
    
    This function will remove the whole marker from the buffer.
    
    Also we set it as the default behavior for marker JPG{0-13}? in order to avoid
    a useless #if
    
    https://bugzilla.gnome.org/show_bug.cgi?id=626618

commit 0e27f49dd42623fe4f987381005a681389fd04ca
Author: Víctor Manuel Jáquez Leal <vjaquez@igalia.com>
Date:   Fri Aug 13 12:38:02 2010 +0200

    jpegparse: refactor COM parsing
    
    add gst_jpeg_parse_com () and get_utf8_from_data () to extract and
    validate comment format
    
    https://bugzilla.gnome.org/show_bug.cgi?id=626618

commit c9b2c4abfe5b2d367007ad17ac1c44482ca7804a
Author: Víctor Manuel Jáquez Leal <vjaquez@igalia.com>
Date:   Tue Nov 16 18:22:07 2010 +0100

    jpegparse: refactor APP1 parsing
    
    add gst_jpeg_parse_app1 () and extract_and_queue_tags ()
    
    https://bugzilla.gnome.org/show_bug.cgi?id=626618

commit 81991e3323c2e2a84835017e6437ac037d66a367
Author: Víctor Manuel Jáquez Leal <vjaquez@igalia.com>
Date:   Sun Nov 21 15:05:43 2010 +0100

    jpegparse: log id when skipping an unhandled APP marker
    
    https://bugzilla.gnome.org/show_bug.cgi?id=626618

commit a1f77eda32e7f981ee84d729d69f086c47acec50
Author: Víctor Manuel Jáquez Leal <vjaquez@igalia.com>
Date:   Tue Nov 16 17:47:17 2010 +0100

    jpegparse: skip all APP markers, excepting APP1
    
    https://bugzilla.gnome.org/show_bug.cgi?id=626618

commit 51e57d439e80e8b8e3cad98c9609df7817fa55ce
Author: Víctor Manuel Jáquez Leal <vjaquez@igalia.com>
Date:   Sun Nov 21 15:09:17 2010 +0100

    jpegparse: add get_tag_list ()
    
    https://bugzilla.gnome.org/show_bug.cgi?id=626618

Comment 81 Stefan Sauer (gstreamer, gtkdoc dev) 2011-04-11 15:16:46 UTC

I think the last patches broke the camerabin2 unit tests. jpegparse is not fully parsing the whole jpeg anymore. I will continue debuging this now :/

Comment 82 Stefan Sauer (gstreamer, gtkdoc dev) 2011-04-11 15:20:21 UTC

Some info - the first e1 marker is read and then we're stuck in some data :/

gst_jpeg_parse_get_image_length: Parsing jpeg image data (4096 bytes)
gst_jpeg_parse_get_image_length: Parse state: offset=0, resync=0, entropy len=0
gst_jpeg_parse_get_image_length: 0x00000002: tag e1, frame_len=306
gst_jpeg_parse_get_image_length: 0x00000136: tag e1, frame_len=3081
gst_jpeg_parse_get_image_length: 0x00000d41: tag ee, frame_len=14
gst_jpeg_parse_get_image_length: 0x00000d51: tag fe, frame_len=8
gst_jpeg_parse_get_image_length: 0x00000d5b: tag db, frame_len=67
gst_jpeg_parse_get_image_length: 0x00000da0: tag c0, frame_len=17
gst_jpeg_parse_get_image_length: 0x00000db3: tag c4, frame_len=31
gst_jpeg_parse_get_image_length: 0x00000dd4: tag c4, frame_len=181
gst_jpeg_parse_get_image_length: 0x00000e8b: tag da, frame_len=12
gst_jpeg_parse_get_image_length: 0x00000e8b: finding entropy segment length
gst_jpeg_parse_get_image_length: Parsing jpeg image data (8192 bytes)
gst_jpeg_parse_get_image_length: Parse state: offset=3721, resync=0, entropy len=357
gst_jpeg_parse_get_image_length: 0x00000e8b: tag da, frame_len=12
gst_jpeg_parse_get_image_length: 0x00000e8b: finding entropy segment length
gst_jpeg_parse_get_image_length: Parsing jpeg image data (12288 bytes)
gst_jpeg_parse_get_image_length: Parse state: offset=3721, resync=0, entropy len=4453
gst_jpeg_parse_get_image_length: 0x00000e8b: tag da, frame_len=12
gst_jpeg_parse_get_image_length: 0x00000e8b: finding entropy segment length
gst_jpeg_parse_get_image_length: Parsing jpeg image data (16384 bytes)
gst_jpeg_parse_get_image_length: Parse state: offset=3721, resync=0, entropy len=8549
gst_jpeg_parse_get_image_length: 0x00000e8b: tag da, frame_len=12
gst_jpeg_parse_get_image_length: 0x00000e8b: finding entropy segment length
gst_jpeg_parse_get_image_length: Parsing jpeg image data (20480 bytes)
gst_jpeg_parse_get_image_length: Parse state: offset=3721, resync=0, entropy len=12645
gst_jpeg_parse_get_image_length: 0x00000e8b: tag da, frame_len=12
gst_jpeg_parse_get_image_length: 0x00000e8b: finding entropy segment length
gst_jpeg_parse_get_image_length: Parsing jpeg image data (22653 bytes)
gst_jpeg_parse_get_image_length: Parse state: offset=3721, resync=0, entropy len=16741
gst_jpeg_parse_get_image_length: 0x00000e8b: tag da, frame_len=12
gst_jpeg_parse_get_image_length: 0x00000e8b: finding entropy segment length
gst_jpeg_parse_get_image_length: entropy segment length=18914 => frame_len=18926
gst_jpeg_parse_get_image_length: 0x0000587b: EOI marker
gst_jpeg_parse_chain:<jpegparse0> parsed image of size 22653
gst_jpeg_parse_read_header:<jpegparse0> marker = d8
gst_jpeg_parse_read_header:<jpegparse0> next marker = ff
gst_jpeg_parse_read_header:<jpegparse0> marker = e1
extract_and_queue_tags:<jpegparse0> collected tags: taglist, description=(string)"some\ description", device-manufacturer=(string)MyFavoriteBrand, device-model=(string)123v42.1, application-name=(string)"camerabin2\ test", copyright=(string)"My\ copyright\ notice", geo-location-latitude=(double)36.600000000000001, geo-location-longitude=(double)-12.5, geo-location-elevation=(double)300.85000000000002;
gst_jpeg_parse_app1:<jpegparse0> parsed marker e1: 'Exif' 304 bytes
gst_jpeg_parse_read_header:<jpegparse0> next marker = 74
gst_jpeg_parse_set_new_caps:<jpegparse0> setting caps on srcpad (hdr_ok=0, have_fps=0)
gst_jpeg_parse_set_new_caps:<jpegparse0> setting downstream caps on jpegparse0:src to image/jpeg, parsed=(boolean)true, framerate=(fraction)1/1
gst_jpeg_parse_push_buffer:<jpegparse0> Pushing tags: taglist, description=(string)"some\ description", device-manufacturer=(string)MyFavoriteBrand, device-model=(string)123v42.1, application-name=(string)"camerabin2\ test", copyright=(string)"My\ copyright\ notice", geo-location-latitude=(double)36.600000000000001, geo-location-longitude=(double)-12.5, geo-location-elevation=(double)300.85000000000002;
gst_jpeg_parse_push_buffer:<jpegparse0> pushing buffer (ts=0:00:00.000000000, len=22653)

Comment 83 Stefan Sauer (gstreamer, gtkdoc dev) 2011-04-11 15:36:23 UTC

"jpegparse: refactor APP1 parsing" was the culprit. Victor, if you can think of a unit-test to ensure this does not regresses, that be great :)

commit b84dd0a766ab4098e64ddc591c3c9031a313afe8
Author: Stefan Kost <ensonic@users.sf.net>
Date:   Mon Apr 11 18:31:45 2011 +0300

    jpegparse: subtract id-str size from the remaining read
    
    Fixes a regression from the patches in bug #626618.

Comment 84 Víctor Manuel Jáquez Leal 2011-04-12 08:07:51 UTC

Thanks Stefan. I'll try to cook a unit test this weekend again :)