Bug 702722 – opencv: add GrabCut segmentation element

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 702722 - opencv: add GrabCut segmentation element


Summary:	opencv: add GrabCut segmentation element


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-bad
Version:	1.x
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	1.1.3
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:	704070
Blocks:

Reported:	2013-06-20 09:32 UTC by Miguel Casas
Modified:	2013-07-17 09:29 UTC

See Also:	704070
GNOME target:	---
GNOME version:	---

Attachments
1st full gstgrabcut element patch && facedetect sends downstream events (22.59 KB, patch) 2013-06-20 09:32 UTC, Miguel Casas	needs-work	Details \| Review
2nd full gstgrabcut element patch && facedetect sends downstream events (22.46 KB, patch) 2013-07-09 14:50 UTC, Miguel Casas	none	Details \| Review
3rd full gstgrabcut element patch && facedetect sends downstream events (22.36 KB, patch) 2013-07-10 15:11 UTC, Miguel Casas	needs-work	Details \| Review
4th full gstgrabcut element patch && facedetect sending meta (26.21 KB, patch) 2013-07-11 15:22 UTC, Miguel Casas	none	Details \| Review
5th full gstgrabcut element patch && facedetect sending meta (21.28 KB, patch) 2013-07-16 13:31 UTC, Miguel Casas	reviewed	Details \| Review
6th full gstgrabcut element patch && facedetect sending meta (21.31 KB, patch) 2013-07-16 13:50 UTC, Miguel Casas	committed	Details \| Review

Description Miguel Casas 2013-06-20 09:32:08 UTC

Created attachment 247303 [details] [review]
1st full gstgrabcut element patch && facedetect sends downstream events

This element is a wrapper around OpenCV grabcut implementation. GrabCut is an
image segmentation method based on graph cuts technique. It can be seen as a
way of fine-grain segmenting the image from some FG and BG "seed" areas. The
OpenCV implementation follows the article [1]. 
The "seed" areas are taken in this element from either an input bounding box
coming from a face detection, or from alpha channel values. The input box is
taken from a "face" event such as the one generated from the 'facedetect' 
element. The Alpha channel values should be one of the following: 
enum{  
 GC_BGD    = 0,  //!< background
 GC_FGD    = 1,  //!< foreground
 GC_PR_BGD = 2,  //!< most probably background
 GC_PR_FGD = 3   //!< most probably foreground
};
with values over GC_PR_FGD interpreted as GC_PR_FGD. If both foreground alpha
is not specified and there is no face detection, a by-default one is taken, 
roughly corresponding to a face in the middle of the frame.

[1] C. Rother, V. Kolmogorov, and A. Blake, "GrabCut: Interactive foreground 
extraction using iterated graph cuts, ACM Trans. Graph., vol. 23, pp. 309–314,
2004.

---------------
Also a small modification in gstfacedetect to send the detected face bounding 
box also via downstream events, and not only to the message bus.

Comment 1 Miguel Casas 2013-07-01 08:58:16 UTC

Ping.

Comment 2 Miguel Casas 2013-07-08 15:54:59 UTC

Ping ...
 Seems that Sebastian is on holidays and all others are too busy on other code areas :'(

Comment 3 Sebastian Dröge (slomo) 2013-07-09 09:19:14 UTC

Review of attachment 247303 [details] [review]:

::: ext/opencv/gstfacedetect.c
@@ +674,3 @@
+        GstEvent *ev;
+        GstStructure *str;
+        str = gst_structure_new ("face",

Prefix the structure with the element name at least, GstOpenCVFaceDetect-face or something like that.

Also this should probably be a GstMeta on the buffer instead of an event

::: ext/opencv/gstgrabcut.cpp
@@ +181,3 @@
+IN CASE OF no alpha mask input (all 0's or all 1's), the 'face' \
+downstream event is used to create a bbox of PR_FG elements.\n\
+IF nothing is present, then a by default bbox is taken.", "Miguel Casas-Sanchez <miguelecasassanchez@gmail.com>");

Put this into the documentation, not here

@@ +276,3 @@
+  grabcut->facepos.y = 77;
+  grabcut->facepos.width = 60;
+  grabcut->facepos.height = 70;

What are these magic number? :)

@@ +334,3 @@
+        if (abs (w) > 2)
+          grabcut->facepos.width = (int) w;
+        if (abs (h) > 2)

Why these checks for >2?

@@ +377,3 @@
+  }
+
+  //////////////////////////////////////////////////////////////////////////////

No C++/C99 comments please, use /* */

@@ +422,3 @@
+
+
+// copied, otherwise only available in C++

Can't you just use the C++ variants of these? Just make the file a C++ file :)

Comment 4 Miguel Casas 2013-07-09 14:48:33 UTC

(In reply to comment #3)
> Review of attachment 247303 [details] [review]:
> 
> ::: ext/opencv/gstfacedetect.c
> @@ +674,3 @@
> +        GstEvent *ev;
> +        GstStructure *str;
> +        str = gst_structure_new ("face",
> 
> Prefix the structure with the element name at least, GstOpenCVFaceDetect-face
> or something like that.
> 

Done.

> Also this should probably be a GstMeta on the buffer instead of an event

Not necessarily, face detect element could output a face position event at a lower frame rate than the data stream... But I'd follow you on this.

> 
> ::: ext/opencv/gstgrabcut.cpp
> @@ +181,3 @@
> +IN CASE OF no alpha mask input (all 0's or all 1's), the 'face' \
> +downstream event is used to create a bbox of PR_FG elements.\n\
> +IF nothing is present, then a by default bbox is taken.", "Miguel
> Casas-Sanchez <miguelecasassanchez@gmail.com>");
> 
> Put this into the documentation, not here

Done.

> 
> @@ +276,3 @@
> +  grabcut->facepos.y = 77;
> +  grabcut->facepos.width = 60;
> +  grabcut->facepos.height = 70;
> 
> What are these magic number? :)

They are the magic number of a person in a video conference environment looking at a laptop's front camera. Is just a failsafe option to avoid empty FG masks, see next comment.

> 
> @@ +334,3 @@
> +        if (abs (w) > 2)
> +          grabcut->facepos.width = (int) w;
> +        if (abs (h) > 2)
> 
> Why these checks for >2?
> 

In case of misconfigured face detectors, some faces are spurious and would have widths/heights of 1. This is to filter those out, GrabCut doesn't like too small seeds... (there's an ASSERT somewhere).

> @@ +377,3 @@
> +  }
> +
> + 
> //////////////////////////////////////////////////////////////////////////////
> 
> No C++/C99 comments please, use /* */

Done. (Keeps coming, sorry)

> 
> @@ +422,3 @@
> +
> +
> +// copied, otherwise only available in C++
> 
> Can't you just use the C++ variants of these? Just make the file a C++ file :)

Yes, actually  I can!

Comment 5 Miguel Casas 2013-07-09 14:50:11 UTC

Created attachment 248736 [details] [review]
2nd full gstgrabcut element patch && facedetect sends downstream events

Addressed comments.

Comment 6 Sebastian Dröge (slomo) 2013-07-10 07:14:46 UTC

(In reply to comment #4)
 
> > @@ +276,3 @@
> > +  grabcut->facepos.y = 77;
> > +  grabcut->facepos.width = 60;
> > +  grabcut->facepos.height = 70;
> > 
> > What are these magic number? :)
> 
> They are the magic number of a person in a video conference environment looking
> at a laptop's front camera. Is just a failsafe option to avoid empty FG masks,
> see next comment.

Can't you just check later if you received (valid) facepos before cutting?

> > @@ +334,3 @@
> > +        if (abs (w) > 2)
> > +          grabcut->facepos.width = (int) w;
> > +        if (abs (h) > 2)
> > 
> > Why these checks for >2?
> > 
> 
> In case of misconfigured face detectors, some faces are spurious and would have
> widths/heights of 1. This is to filter those out, GrabCut doesn't like too
> small seeds... (there's an ASSERT somewhere).

Same here :)

Comment 7 Miguel Casas 2013-07-10 15:10:33 UTC

On second thoughts after your comments, I removed all this by default numbers, and moved the check for bounding box correctness before calling grabcut (to avoid that assert mentioned before). Also updated the documentation and the comments on top of the file, hence now the idea would be to use alpha mask in, if not present bounding box, if not just skip the frame and show a warning. The original by default values made sense for certain applications, but to be generic here would not relevant, the warning+documentation should be enough. Patch coming!

Comment 8 Miguel Casas 2013-07-10 15:11:29 UTC

Created attachment 248843 [details] [review]
3rd full gstgrabcut element patch && facedetect sends downstream events

Comments addressed, see my reply above.

Comment 9 Sebastian Dröge (slomo) 2013-07-11 10:29:39 UTC

Review of attachment 248843 [details] [review]:

::: ext/opencv/gstfacedetect.c
@@ +681,3 @@
+        ev = gst_event_new_custom (GST_EVENT_CUSTOM_DOWNSTREAM, str);
+        gst_pad_push_event (GST_BASE_TRANSFORM_SRC_PAD (GST_BASE_TRANSFORM_CAST
+                (filter)), ev);

I still believe this should be a GstMeta, attached to any buffer to which it is relevant... potentially many per buffer if there are many faces :)

Why do you use doubles for the coordinates? Can they be fractional?

::: ext/opencv/gstgrabcut.cpp
@@ +182,3 @@
+IN CASE OF no alpha mask input (all 0's or all 1's), the 'face' \
+downstream event is used to create a bbox of PR_FG elements.\n\
+IF nothing is present, then nothing is done.", "Miguel Casas-Sanchez <miguelecasassanchez@gmail.com>");

Please move this NOTE in the documentation still

@@ +364,3 @@
+  } else {
+
+    if ( (abs (gc->facepos.x) > 2) && (abs (gc->facepos.y) > 2) 

I can understand the checks for the width/height of the facepos, but why must x and y be > 2? Can't a face start at position (0,0)?

@@ +368,3 @@
+      GST_INFO ("running on bbox (%d,%d),(%d,%d)", gc->facepos.x, gc->facepos.y,
+		gc->facepos.width, gc->facepos.height);
+      run_graphcut_iteration2 (&(gc->GC), gc->cvRGBin, gc->grabcut_mask,

Why is the element called grabcut but the algorithm graphcut? :)

Comment 10 Miguel Casas 2013-07-11 10:36:17 UTC

(In reply to comment #9)
> Review of attachment 248843 [details] [review]:
> 
> ::: ext/opencv/gstfacedetect.c
> @@ +681,3 @@
> +        ev = gst_event_new_custom (GST_EVENT_CUSTOM_DOWNSTREAM, str);
> +        gst_pad_push_event (GST_BASE_TRANSFORM_SRC_PAD
> (GST_BASE_TRANSFORM_CAST
> +                (filter)), ev);
> 
> I still believe this should be a GstMeta, attached to any buffer to which it is
> relevant... potentially many per buffer if there are many faces :)
> 

Yes but facedetect selects the largest face detected and grabcut does not operate well on non-connected foreground patches.

> Why do you use doubles for the coordinates? Can they be fractional?
> 

Face detection gives integer pixels coordinates and sizes. (Sub pixel accuracy is interesting thought hmmm ;) )

> ::: ext/opencv/gstgrabcut.cpp
> @@ +182,3 @@
> +IN CASE OF no alpha mask input (all 0's or all 1's), the 'face' \
> +downstream event is used to create a bbox of PR_FG elements.\n\
> +IF nothing is present, then nothing is done.", "Miguel Casas-Sanchez
> <miguelecasassanchez@gmail.com>");
> 
> Please move this NOTE in the documentation still
> 

I think I did, see the comments on top of the file:
 * is used to create a bbox of PR_FG elements. If both foreground alpha
 * is not specified and there is no face detection, nothing is done.
And then the element comments:
 IN CASE OF no alpha mask input (all 0's or all 1's), the 'face' \
 downstream event is used to create a bbox of PR_FG elements.\n\
 IF nothing is present, then nothing is done.", ...

> @@ +364,3 @@
> +  } else {
> +
> +    if ( (abs (gc->facepos.x) > 2) && (abs (gc->facepos.y) > 2) 
> 
> I can understand the checks for the width/height of the facepos, but why must x
> and y be > 2? Can't a face start at position (0,0)?
> 

Good question, hmmm at the time it made sense, I guess I should run some tests and see if it's still relevant, and if not  just check for correct face detection via (width+height > 2) or similar.

> @@ +368,3 @@
> +      GST_INFO ("running on bbox (%d,%d),(%d,%d)", gc->facepos.x,
> gc->facepos.y,
> +        gc->facepos.width, gc->facepos.height);
> +      run_graphcut_iteration2 (&(gc->GC), gc->cvRGBin, gc->grabcut_mask,
> 
> Why is the element called grabcut but the algorithm graphcut? :)

Grabcut is an implementation of the GraphCut algorithm :)

 * [1] C. Rother, V. Kolmogorov, and A. Blake, "GrabCut: Interactive foreground 
 * extraction using iterated graph cuts, ACM Trans. Graph., vol. 23, pp. 309–314,
 * 2004.

Comment 11 Sebastian Dröge (slomo) 2013-07-11 10:54:52 UTC

(In reply to comment #10)
> (In reply to comment #9)
> > Review of attachment 248843 [details] [review] [details]:
> > 
> > ::: ext/opencv/gstfacedetect.c
> > @@ +681,3 @@
> > +        ev = gst_event_new_custom (GST_EVENT_CUSTOM_DOWNSTREAM, str);
> > +        gst_pad_push_event (GST_BASE_TRANSFORM_SRC_PAD
> > (GST_BASE_TRANSFORM_CAST
> > +                (filter)), ev);
> > 
> > I still believe this should be a GstMeta, attached to any buffer to which it is
> > relevant... potentially many per buffer if there are many faces :)
> > 
> 
> Yes but facedetect selects the largest face detected and grabcut does not
> operate well on non-connected foreground patches.

But that's IMHO an implementation detail that shouldn't be part of the API. grabcut can still just select the largest face from the list it gets, and facedetect can only detect a single face. Other algorithms later could do something more optimal :)

Comment 12 Miguel Casas 2013-07-11 15:22:02 UTC

I re implemented as GstMeta, that needed two extra files: gstboundingbox.{h,c}. Tried with crop but videosink reacted to that, hence the new stuff.

Since there would be an add_bounding_box_meta per detected face, all would be passed downstream while gstgrabcut only collects the first one (largest by implementation).

Comment 13 Miguel Casas 2013-07-11 15:22:51 UTC

Created attachment 248934 [details] [review]
4th full gstgrabcut element patch && facedetect sending meta

Added meta infra for boundingbox, used for face detection meta. Grabcut only takes the first meta.

Comment 14 Sebastian Dröge (slomo) 2013-07-12 07:22:47 UTC

Looks good, I'm just not sure if we might want the meta in libgstvideo as GstVideoRegionOfInterestMeta:

guint x, y, width, height;
GQuark roi_type;


Where roi_type would be "face" in your case. I'll get back to you once I know an answer :)

Comment 15 Sebastian Dröge (slomo) 2013-07-12 07:40:44 UTC

Yes, let's do it that way. Could you make this two patches, one adding the meta to libgstvideo and another for the opencv parts? :)

Comment 16 Miguel Casas 2013-07-12 08:39:33 UTC

(In reply to comment #15)
> Yes, let's do it that way. Could you make this two patches, one adding the meta
> to libgstvideo and another for the opencv parts? :)

Done to the best of my knowledge - let's get that done first.

Comment 17 Wim Taymans 2013-07-12 09:45:17 UTC

Let's add a unique ID to the ROI and a parent ID:

struct ROI {
  GstMeta meta

  gint id;
  gint parent_id;
  GQuark roi_type;
  guint x, y, width, height;
}

this way we can:

 - add more meta later to describe the ROI bounding rectangle with ID in more
   detail
 - Make a big ROI with sub ROI having the parent_id of the big ROI to make
   subregions.

For arbitrary shapes, I would now advertise the ROI as a bounding rectangle containing the object(s) of interest, for the shape of the actual object(s) we would need another meta to describe them.

Comment 18 Miguel Casas 2013-07-16 13:31:35 UTC

Created attachment 249271 [details] [review]
5th full gstgrabcut element patch && facedetect sending meta 

Rebased to latest version of gst-plugins-base, that defines GstVideoRegionOfInterestMeta.

How to hold this commit till the appropriate moment?

Comment 19 Sebastian Dröge (slomo) 2013-07-16 13:39:39 UTC

Review of attachment 249271 [details] [review]:

What do you mean with appropriate moment?

::: ext/opencv/gstgrabcut.cpp
@@ +315,3 @@
+    gc->facepos.height = meta->h * gc->scale * 1.1;
+  } else {
+    memset (&(gc->facepos), 0, sizeof (gc->facepos));

If there's no meta you can just return, right?

Comment 20 Miguel Casas 2013-07-16 13:46:34 UTC

(In reply to comment #19)
> Review of attachment 249271 [details] [review]:
> 
> What do you mean with appropriate moment?
> 

Yeah right now the autogen.sh/configure specifies a gst-plugins-base version (1.1.2.1?), if this patch lands, it'll fail to compile since it needs the last -base commit (the region of interest meta), hence my question.

> ::: ext/opencv/gstgrabcut.cpp
> @@ +315,3 @@
> +    gc->facepos.height = meta->h * gc->scale * 1.1;
> +  } else {
> +    memset (&(gc->facepos), 0, sizeof (gc->facepos));
> 
> If there's no meta you can just return, right?

There might be an input on the alpha channel (allowing for non rectangle roi's...), this is only known whenever the input image has been split in channels and the pixels of channel alpha counted.

But: I indeed forgot to return right away when "No face info present, skipping frame" :) So I'll send another patch in a jiffie :)

Comment 21 Miguel Casas 2013-07-16 13:50:10 UTC

Created attachment 249273 [details] [review]
6th full gstgrabcut element patch && facedetect sending meta

In case of no face input, no alpha channel -> return GST_FLOW_OK

Comment 22 Sebastian Dröge (slomo) 2013-07-17 09:29:30 UTC

Please attach patches in "git format-patch" style in the future :)

commit 7c2177b19f38448150cdb8980a80b8df1c1acd36
Author: Miguel Casas-Sanchez <miguelecasassanchez@gmail.com>
Date:   Wed Jul 17 11:28:28 2013 +0200

    grabcut: Add GrabCut segmentation element
    
    https://bugzilla.gnome.org/show_bug.cgi?id=702722