GNOME Bugzilla – Bug 702722
opencv: add GrabCut segmentation element
Last modified: 2013-07-17 09:29:33 UTC
Created attachment 247303 [details] [review] 1st full gstgrabcut element patch && facedetect sends downstream events This element is a wrapper around OpenCV grabcut implementation. GrabCut is an image segmentation method based on graph cuts technique. It can be seen as a way of fine-grain segmenting the image from some FG and BG "seed" areas. The OpenCV implementation follows the article [1]. The "seed" areas are taken in this element from either an input bounding box coming from a face detection, or from alpha channel values. The input box is taken from a "face" event such as the one generated from the 'facedetect' element. The Alpha channel values should be one of the following: enum{ GC_BGD = 0, //!< background GC_FGD = 1, //!< foreground GC_PR_BGD = 2, //!< most probably background GC_PR_FGD = 3 //!< most probably foreground }; with values over GC_PR_FGD interpreted as GC_PR_FGD. If both foreground alpha is not specified and there is no face detection, a by-default one is taken, roughly corresponding to a face in the middle of the frame. [1] C. Rother, V. Kolmogorov, and A. Blake, "GrabCut: Interactive foreground extraction using iterated graph cuts, ACM Trans. Graph., vol. 23, pp. 309–314, 2004. --------------- Also a small modification in gstfacedetect to send the detected face bounding box also via downstream events, and not only to the message bus.
Ping.
Ping ... Seems that Sebastian is on holidays and all others are too busy on other code areas :'(
Review of attachment 247303 [details] [review]: ::: ext/opencv/gstfacedetect.c @@ +674,3 @@ + GstEvent *ev; + GstStructure *str; + str = gst_structure_new ("face", Prefix the structure with the element name at least, GstOpenCVFaceDetect-face or something like that. Also this should probably be a GstMeta on the buffer instead of an event ::: ext/opencv/gstgrabcut.cpp @@ +181,3 @@ +IN CASE OF no alpha mask input (all 0's or all 1's), the 'face' \ +downstream event is used to create a bbox of PR_FG elements.\n\ +IF nothing is present, then a by default bbox is taken.", "Miguel Casas-Sanchez <miguelecasassanchez@gmail.com>"); Put this into the documentation, not here @@ +276,3 @@ + grabcut->facepos.y = 77; + grabcut->facepos.width = 60; + grabcut->facepos.height = 70; What are these magic number? :) @@ +334,3 @@ + if (abs (w) > 2) + grabcut->facepos.width = (int) w; + if (abs (h) > 2) Why these checks for >2? @@ +377,3 @@ + } + + ////////////////////////////////////////////////////////////////////////////// No C++/C99 comments please, use /* */ @@ +422,3 @@ + + +// copied, otherwise only available in C++ Can't you just use the C++ variants of these? Just make the file a C++ file :)
(In reply to comment #3) > Review of attachment 247303 [details] [review]: > > ::: ext/opencv/gstfacedetect.c > @@ +674,3 @@ > + GstEvent *ev; > + GstStructure *str; > + str = gst_structure_new ("face", > > Prefix the structure with the element name at least, GstOpenCVFaceDetect-face > or something like that. > Done. > Also this should probably be a GstMeta on the buffer instead of an event Not necessarily, face detect element could output a face position event at a lower frame rate than the data stream... But I'd follow you on this. > > ::: ext/opencv/gstgrabcut.cpp > @@ +181,3 @@ > +IN CASE OF no alpha mask input (all 0's or all 1's), the 'face' \ > +downstream event is used to create a bbox of PR_FG elements.\n\ > +IF nothing is present, then a by default bbox is taken.", "Miguel > Casas-Sanchez <miguelecasassanchez@gmail.com>"); > > Put this into the documentation, not here Done. > > @@ +276,3 @@ > + grabcut->facepos.y = 77; > + grabcut->facepos.width = 60; > + grabcut->facepos.height = 70; > > What are these magic number? :) They are the magic number of a person in a video conference environment looking at a laptop's front camera. Is just a failsafe option to avoid empty FG masks, see next comment. > > @@ +334,3 @@ > + if (abs (w) > 2) > + grabcut->facepos.width = (int) w; > + if (abs (h) > 2) > > Why these checks for >2? > In case of misconfigured face detectors, some faces are spurious and would have widths/heights of 1. This is to filter those out, GrabCut doesn't like too small seeds... (there's an ASSERT somewhere). > @@ +377,3 @@ > + } > + > + > ////////////////////////////////////////////////////////////////////////////// > > No C++/C99 comments please, use /* */ Done. (Keeps coming, sorry) > > @@ +422,3 @@ > + > + > +// copied, otherwise only available in C++ > > Can't you just use the C++ variants of these? Just make the file a C++ file :) Yes, actually I can!
Created attachment 248736 [details] [review] 2nd full gstgrabcut element patch && facedetect sends downstream events Addressed comments.
(In reply to comment #4) > > @@ +276,3 @@ > > + grabcut->facepos.y = 77; > > + grabcut->facepos.width = 60; > > + grabcut->facepos.height = 70; > > > > What are these magic number? :) > > They are the magic number of a person in a video conference environment looking > at a laptop's front camera. Is just a failsafe option to avoid empty FG masks, > see next comment. Can't you just check later if you received (valid) facepos before cutting? > > @@ +334,3 @@ > > + if (abs (w) > 2) > > + grabcut->facepos.width = (int) w; > > + if (abs (h) > 2) > > > > Why these checks for >2? > > > > In case of misconfigured face detectors, some faces are spurious and would have > widths/heights of 1. This is to filter those out, GrabCut doesn't like too > small seeds... (there's an ASSERT somewhere). Same here :)
On second thoughts after your comments, I removed all this by default numbers, and moved the check for bounding box correctness before calling grabcut (to avoid that assert mentioned before). Also updated the documentation and the comments on top of the file, hence now the idea would be to use alpha mask in, if not present bounding box, if not just skip the frame and show a warning. The original by default values made sense for certain applications, but to be generic here would not relevant, the warning+documentation should be enough. Patch coming!
Created attachment 248843 [details] [review] 3rd full gstgrabcut element patch && facedetect sends downstream events Comments addressed, see my reply above.
Review of attachment 248843 [details] [review]: ::: ext/opencv/gstfacedetect.c @@ +681,3 @@ + ev = gst_event_new_custom (GST_EVENT_CUSTOM_DOWNSTREAM, str); + gst_pad_push_event (GST_BASE_TRANSFORM_SRC_PAD (GST_BASE_TRANSFORM_CAST + (filter)), ev); I still believe this should be a GstMeta, attached to any buffer to which it is relevant... potentially many per buffer if there are many faces :) Why do you use doubles for the coordinates? Can they be fractional? ::: ext/opencv/gstgrabcut.cpp @@ +182,3 @@ +IN CASE OF no alpha mask input (all 0's or all 1's), the 'face' \ +downstream event is used to create a bbox of PR_FG elements.\n\ +IF nothing is present, then nothing is done.", "Miguel Casas-Sanchez <miguelecasassanchez@gmail.com>"); Please move this NOTE in the documentation still @@ +364,3 @@ + } else { + + if ( (abs (gc->facepos.x) > 2) && (abs (gc->facepos.y) > 2) I can understand the checks for the width/height of the facepos, but why must x and y be > 2? Can't a face start at position (0,0)? @@ +368,3 @@ + GST_INFO ("running on bbox (%d,%d),(%d,%d)", gc->facepos.x, gc->facepos.y, + gc->facepos.width, gc->facepos.height); + run_graphcut_iteration2 (&(gc->GC), gc->cvRGBin, gc->grabcut_mask, Why is the element called grabcut but the algorithm graphcut? :)
(In reply to comment #9) > Review of attachment 248843 [details] [review]: > > ::: ext/opencv/gstfacedetect.c > @@ +681,3 @@ > + ev = gst_event_new_custom (GST_EVENT_CUSTOM_DOWNSTREAM, str); > + gst_pad_push_event (GST_BASE_TRANSFORM_SRC_PAD > (GST_BASE_TRANSFORM_CAST > + (filter)), ev); > > I still believe this should be a GstMeta, attached to any buffer to which it is > relevant... potentially many per buffer if there are many faces :) > Yes but facedetect selects the largest face detected and grabcut does not operate well on non-connected foreground patches. > Why do you use doubles for the coordinates? Can they be fractional? > Face detection gives integer pixels coordinates and sizes. (Sub pixel accuracy is interesting thought hmmm ;) ) > ::: ext/opencv/gstgrabcut.cpp > @@ +182,3 @@ > +IN CASE OF no alpha mask input (all 0's or all 1's), the 'face' \ > +downstream event is used to create a bbox of PR_FG elements.\n\ > +IF nothing is present, then nothing is done.", "Miguel Casas-Sanchez > <miguelecasassanchez@gmail.com>"); > > Please move this NOTE in the documentation still > I think I did, see the comments on top of the file: * is used to create a bbox of PR_FG elements. If both foreground alpha * is not specified and there is no face detection, nothing is done. And then the element comments: IN CASE OF no alpha mask input (all 0's or all 1's), the 'face' \ downstream event is used to create a bbox of PR_FG elements.\n\ IF nothing is present, then nothing is done.", ... > @@ +364,3 @@ > + } else { > + > + if ( (abs (gc->facepos.x) > 2) && (abs (gc->facepos.y) > 2) > > I can understand the checks for the width/height of the facepos, but why must x > and y be > 2? Can't a face start at position (0,0)? > Good question, hmmm at the time it made sense, I guess I should run some tests and see if it's still relevant, and if not just check for correct face detection via (width+height > 2) or similar. > @@ +368,3 @@ > + GST_INFO ("running on bbox (%d,%d),(%d,%d)", gc->facepos.x, > gc->facepos.y, > + gc->facepos.width, gc->facepos.height); > + run_graphcut_iteration2 (&(gc->GC), gc->cvRGBin, gc->grabcut_mask, > > Why is the element called grabcut but the algorithm graphcut? :) Grabcut is an implementation of the GraphCut algorithm :) * [1] C. Rother, V. Kolmogorov, and A. Blake, "GrabCut: Interactive foreground * extraction using iterated graph cuts, ACM Trans. Graph., vol. 23, pp. 309–314, * 2004.
(In reply to comment #10) > (In reply to comment #9) > > Review of attachment 248843 [details] [review] [details]: > > > > ::: ext/opencv/gstfacedetect.c > > @@ +681,3 @@ > > + ev = gst_event_new_custom (GST_EVENT_CUSTOM_DOWNSTREAM, str); > > + gst_pad_push_event (GST_BASE_TRANSFORM_SRC_PAD > > (GST_BASE_TRANSFORM_CAST > > + (filter)), ev); > > > > I still believe this should be a GstMeta, attached to any buffer to which it is > > relevant... potentially many per buffer if there are many faces :) > > > > Yes but facedetect selects the largest face detected and grabcut does not > operate well on non-connected foreground patches. But that's IMHO an implementation detail that shouldn't be part of the API. grabcut can still just select the largest face from the list it gets, and facedetect can only detect a single face. Other algorithms later could do something more optimal :)
I re implemented as GstMeta, that needed two extra files: gstboundingbox.{h,c}. Tried with crop but videosink reacted to that, hence the new stuff. Since there would be an add_bounding_box_meta per detected face, all would be passed downstream while gstgrabcut only collects the first one (largest by implementation).
Created attachment 248934 [details] [review] 4th full gstgrabcut element patch && facedetect sending meta Added meta infra for boundingbox, used for face detection meta. Grabcut only takes the first meta.
Looks good, I'm just not sure if we might want the meta in libgstvideo as GstVideoRegionOfInterestMeta: guint x, y, width, height; GQuark roi_type; Where roi_type would be "face" in your case. I'll get back to you once I know an answer :)
Yes, let's do it that way. Could you make this two patches, one adding the meta to libgstvideo and another for the opencv parts? :)
(In reply to comment #15) > Yes, let's do it that way. Could you make this two patches, one adding the meta > to libgstvideo and another for the opencv parts? :) Done to the best of my knowledge - let's get that done first.
Let's add a unique ID to the ROI and a parent ID: struct ROI { GstMeta meta gint id; gint parent_id; GQuark roi_type; guint x, y, width, height; } this way we can: - add more meta later to describe the ROI bounding rectangle with ID in more detail - Make a big ROI with sub ROI having the parent_id of the big ROI to make subregions. For arbitrary shapes, I would now advertise the ROI as a bounding rectangle containing the object(s) of interest, for the shape of the actual object(s) we would need another meta to describe them.
Created attachment 249271 [details] [review] 5th full gstgrabcut element patch && facedetect sending meta Rebased to latest version of gst-plugins-base, that defines GstVideoRegionOfInterestMeta. How to hold this commit till the appropriate moment?
Review of attachment 249271 [details] [review]: What do you mean with appropriate moment? ::: ext/opencv/gstgrabcut.cpp @@ +315,3 @@ + gc->facepos.height = meta->h * gc->scale * 1.1; + } else { + memset (&(gc->facepos), 0, sizeof (gc->facepos)); If there's no meta you can just return, right?
(In reply to comment #19) > Review of attachment 249271 [details] [review]: > > What do you mean with appropriate moment? > Yeah right now the autogen.sh/configure specifies a gst-plugins-base version (1.1.2.1?), if this patch lands, it'll fail to compile since it needs the last -base commit (the region of interest meta), hence my question. > ::: ext/opencv/gstgrabcut.cpp > @@ +315,3 @@ > + gc->facepos.height = meta->h * gc->scale * 1.1; > + } else { > + memset (&(gc->facepos), 0, sizeof (gc->facepos)); > > If there's no meta you can just return, right? There might be an input on the alpha channel (allowing for non rectangle roi's...), this is only known whenever the input image has been split in channels and the pixels of channel alpha counted. But: I indeed forgot to return right away when "No face info present, skipping frame" :) So I'll send another patch in a jiffie :)
Created attachment 249273 [details] [review] 6th full gstgrabcut element patch && facedetect sending meta In case of no face input, no alpha channel -> return GST_FLOW_OK
Please attach patches in "git format-patch" style in the future :) commit 7c2177b19f38448150cdb8980a80b8df1c1acd36 Author: Miguel Casas-Sanchez <miguelecasassanchez@gmail.com> Date: Wed Jul 17 11:28:28 2013 +0200 grabcut: Add GrabCut segmentation element https://bugzilla.gnome.org/show_bug.cgi?id=702722