GNOME Bugzilla – Bug 747216
applemedia: implement GstAppleCoreVideoMemory
Last modified: 2018-11-03 13:32:44 UTC
In gst_core_media_buffer_wrap_pixel_buffer we're performing CVPixelBufferLockBaseAddress. Apple documentation says: "When accessing pixel data with the GPU, locking is not necessary and can impair performance." In pipelines where the CVPixelBuffer will be passed from one Apple element to another, this might harm performance, as we might be locking the pixel buffer into CPU memory needlessly. --- It seems to be possible to solve this with a custom GstAllocator that'll store the CVPixelBuffer and lock only upon mapping.
Yes, the solution would be to replace the current GstMeta solution (IMHO hack) with a custom GstAllocator and GstMemory... and then do locking in map().
Should I try and implement it?
Created attachment 300861 [details] [review] applemedia: Implement GstCoreVideoMemory First implementation. Feedback wanted. SHORTCOMINGS * Locking was not implemented. Can gst_lock_memory be used? I see that GstGlMemory uses a separate mutex. Why? * Code reuse. * gst_core_media_buffer_wrap_pixel_buffer * gst_core_video_buffer_new can be probably unified into a helper function in corevideomemory. * Performance didn't seem to improve noticeably :( * mem_share not implemented. This is bad, right? Isn't this what kills intervideo* performance? * "GstCoreVideoPlaneMemory.cvmem" This looks oddly similar to "GstMemory.parent", in the sense that the plane memories are supposedly just pointers into the bigger GstCoreVideoMemory. Can we switch to using "GstMemory.parent"? How will it affect sharing, if we choose to implement it?
Please review recent patches in https://github.com/ikonst/gst-plugins-bad/commits/master: * applemedia: implement GstCoreVideoMemory * applemedia: implement copying of meta The latter patch is precursor for solving bug 747707.
^^ please see most recent patches on github repo above, not the links I posted on IRC :)
As Alessandro requested, in a separate branch: https://github.com/ikonst/gst-plugins-bad/tree/corevideomemory
Maybe this code should come with a test suite.
I've added a unit test for avfvideosrc which just tests the corevideomemory part. Surprisingly, no bugs were uncovered, but I've added some debug prints. https://github.com/ikonst/gst-plugins-bad/tree/corevideomemory
gst_core_media_buffer_wrap_block_buffer
TBD: There's gst_core_media_buffer_wrap_block_buffer which exists (probably) for wrapping audio data of CMSampleBuffers (coming from avfassetsrc). It uses GST_MEMORY_FLAG_NO_SHARE, which is bad. Luckily, we can do without a custom memory type here -- just change: gst_memory_new_wrapped (GST_MEMORY_FLAG_NO_SHARE, data, size, 0, size, NULL, NULL)); to: gst_memory_new_wrapped (0, data, size, 0, size, CFRetain (block_buf), (GDestroyNotify) CFRelease));
Two more patches: * applemedia: CMBlockBuffer can be non-contiguous CMBlockBufferGetDataLength would return the entire data length, while size of individual blocks can be smaller. Iterate over the block buffer and add the individual (possibly non-contiguous) memory blocks. * applemedia: enable sharing of CMBlockBuffer data Instead of wrapping with GST_MEMORY_FLAG_NO_SHARE, we make the GstMemory object retain the underlying CMBlockBuffer. https://github.com/ikonst/gst-plugins-bad/tree/corevideomemory
In my patch, I wrap the CVPixelBufferRef in a GstCoreVideoPixelBuffer. One might ask why I can't simply retain the CVPixelBufferRef? The reasons are basically: a) planar video b) CVPixelBuffer(Un)LockBaseAddress Remember that for planar video, each GstCoreVideoMemory can represent a separate plane. While they're locking the same pixel buffer, they're getting different base addresses through CVPixelBufferGetBaseAddressOfPlane. It comes to mind that GstMemory has a hierarchical design, where planes can be offsets into a bigger block, and have the common GstMemory as the 'parent'. But nothing in the Apple docs gives me guarantees that "planes" are offsets into a bigger common block. (We can test this empirically, but I didn't.) Onwards, to the details: 1) CVPixelBufferLockBaseAddress only decides upon locking mode upon first call. This was established through testing; Apple docs do not elaborate :( When buffer is locked, additional locks only increase a counter, but otherwise they're no-ops. Even error checking is not done -- i.e. status = CVPixelBufferLockBaseAddress (pixbuf, kCVPixelBufferLock_ReadOnly); assert (status == noErr); status = CVPixelBufferLockBaseAddress (pixbuf, 0); /* r/w */ assert (status == noErr); CVPixelBufferUnlockBaseAddress should be called a matching number of times, of course. All calls except for the last call are also no-ops. By keeping the lock status, I can refuse to r/w-lock a pixbuf that was previously r/o-locked (instead of blindly ignoring the error). 2) mem_unmap does not get flags but CVPixelBufferUnlockBaseAddress wants the original lockFlags. I could theoretically make the GstCoreVideoMemory (e.g. for each plane) store its own flags, but again, what about: 1) first plane is mapped with GST_MAP_READ 2) second plane is mapped with GST_MAP_READWRITE --> no-op 3) first plane is unmapped --> CVPixelBufferLockBaseAddress(pixbuf, kCVPixelBufferLock_ReadOnly) is a no-op 4) second plane is unmapped --> CVPixelBufferLockBaseAddress(pixbuf, 0) is an CV error, since we're unlocking a ReadOnly-pixbuf with ReadWrite-lockFlags ... So yeah, I'd really rather not wrap CVPixelBuffer in another layer of refcounting, but I didn't feel like I have a choice if I want it to be solid.
Just updated repo with some minor gtk-docs tweaks.
Created attachment 304248 [details] [review] applemedia: implement GstCoreVideoMemory
Created attachment 304249 [details] [review] applemedia: implement copying of meta
Created attachment 304250 [details] [review] tests: applemedia: add tests for GstCoreVideoMemory
Created attachment 304251 [details] [review] applemedia: CMBlockBuffer can be non-contiguous
Created attachment 304252 [details] [review] applemedia: enable sharing of CMBlockBuffer data
For convenience and history, I've attached the patches from my github here. In any case, those are the same patches.
Created attachment 304254 [details] [review] applemedia: implement GstCoreVideoMemory Spelling fix: its -> it's
I must say, I really dislike the use of the unspecific GstCoreVideoMemory name.
"Core Video" is a name of an Apple technology, same as Microsoft called its things "Direct X". Do you rather call it GstAppleCoreVideoMemory? BTW, the follow-up patch is to split applemedia into a plug-in and a library. Sebastian Dröge was actually proposing "libcoremedia" while I thought it rather unspecific, opting for "libapplemedia" instead.
Comment on attachment 304251 [details] [review] applemedia: CMBlockBuffer can be non-contiguous Unrelated, so moved to new bug 751071.
Created attachment 305423 [details] [review] [1/3] applemedia: implement GstAppleCoreVideoMemory Added Apple to all newly-introduced symbols, types and defines.
Created attachment 305424 [details] [review] [2/3] applemedia: implement copying of meta Apple-fication.
Created attachment 305425 [details] [review] [3/3] tests: applemedia: add tests for GstAppleCoreVideoMemory Apple-fication.
Comment on attachment 304252 [details] [review] applemedia: enable sharing of CMBlockBuffer data Unrelated, moved to bug 751072.
Created attachment 305435 [details] [review] [1/3] applemedia: implement GstAppleCoreVideoMemory Again, its -> it's :)
Review of attachment 305435 [details] [review]: You'll see bellow few comments. I've didn't do deeper in the code yet. But I have the impression that you duplicate some of the stuff that we already implemented within GStreamer. There is a strange double wrapping of the CVPixelBuffer, while we designed GstMemory and GstMiniObject specifically so it can be a thing and direct wrapper for you need. I also think that you skipped an important step, deriving GstAllocator. This would remove the weird init() spread across the code, and following this API will provide with API for downstream element to detect if a GstMemory is a GstAppleCoreVideoMemory. ::: sys/applemedia/corevideobuffer.c @@ +69,1 @@ + gst_apple_core_video_memory_init (); Is that some kind of library init ? It's seems strange to have to call this randomly. ::: sys/applemedia/corevideomemory.h @@ +32,3 @@ + GST_APPLE_CORE_VIDEO_MEMORY_LOCKED_READONLY, + GST_APPLE_CORE_VIDEO_MEMORY_LOCKED_READ_WRITE +} GstAppleCoreVideoLockState; Isn't this duplicating GstLockFlags ? @@ +53,3 @@ + * or for reading and writing, as reflected in @lock_state. */ + guint lock_count; +} GstAppleCoreVideoPixelBuffer; Why not implementing a GstMiniObject, which already handle the this locking ? @@ +60,3 @@ + * Indicates a non-planar pixel buffer. + */ +#define GST_APPLE_CORE_VIDEO_NO_PLANE ((size_t)-1) non-planar and NO_PLANE does not fully sound the same to me, can we use a better name ? (I could propose GST_APPLE_CORE_VIDEO_HAS_ONE_PLANE) @@ +75,3 @@ + GstAppleCoreVideoPixelBuffer *gpixbuf; + size_t plane; +} GstAppleCoreVideoMemory; In fact, I don't understand why this double wrapping.
(In reply to Nicolas Dufresne (stormer) from comment #29) > + gst_apple_core_video_memory_init (); I'd have a look at how other allocators work. I copied it from GstGLMemory which, I understand, is not the most shining example of how to do things. > + GST_APPLE_CORE_VIDEO_MEMORY_LOCKED_READONLY, > + GST_APPLE_CORE_VIDEO_MEMORY_LOCKED_READ_WRITE > +} GstAppleCoreVideoLockState; > > Isn't this duplicating GstLockFlags ? As comment 12 explains, locking is per-whole-CVPixelBuffer while individual Gst*Memory objects are per-plane. e.g. 1. Plane 1 mmap -> lock the whole CVPixelBuffer 2. Plane 2 mmap -> CVPixelBuffer already locked 3. Plane 1 unmap -> shouldn't unlock CVPixelBuffer yet 4. Plane 2 unmap -> unlock CVPixelBuffer Therefore, I must remember, per-whole-CVPixelBuffer, whether it's locked and how. > @@ +53,3 @@ > + * or for reading and writing, as reflected in @lock_state. */ > + guint lock_count; > +} GstAppleCoreVideoPixelBuffer; > > Why not implementing a GstMiniObject, which already handle the this locking ? If GstMiniObject maps cleanly to what I'm doing here, then sure, why not. > @@ +60,3 @@ > + * Indicates a non-planar pixel buffer. > + */ > +#define GST_APPLE_CORE_VIDEO_NO_PLANE ((size_t)-1) > > non-planar and NO_PLANE does not fully sound the same to me, can we use a > better name ? (I could propose GST_APPLE_CORE_VIDEO_HAS_ONE_PLANE) It is passed as an argument when a plane is expected, i.e. gst_apple_core_video_memory_new_wrapped (gpixbuf, GST_APPLE_CORE_VIDEO_NO_PLANE, size)) It's like: Q - Excuse me, what plane are you passing me? 1st? 2nd? A - I'm passing you a special plane -- a "no-plane". Also, ..._HAS_ONE_PLANE != non-planar. > @@ +75,3 @@ > + GstAppleCoreVideoPixelBuffer *gpixbuf; > + size_t plane; > +} GstAppleCoreVideoMemory; > > In fact, I don't understand why this double wrapping. Because Gst*Memory is per-plane, see comment 12.
(In reply to Ilya Konstantinov from comment #30) > (In reply to Nicolas Dufresne (stormer) from comment #29) > > + gst_apple_core_video_memory_init (); > > I'd have a look at how other allocators work. I copied it from GstGLMemory > which, I understand, is not the most shining example of how to do things. > > > + GST_APPLE_CORE_VIDEO_MEMORY_LOCKED_READONLY, > > + GST_APPLE_CORE_VIDEO_MEMORY_LOCKED_READ_WRITE > > +} GstAppleCoreVideoLockState; > > > > Isn't this duplicating GstLockFlags ? > > As comment 12 explains, locking is per-whole-CVPixelBuffer while individual > Gst*Memory objects are per-plane. > > e.g. > 1. Plane 1 mmap -> lock the whole CVPixelBuffer > 2. Plane 2 mmap -> CVPixelBuffer already locked > 3. Plane 1 unmap -> shouldn't unlock CVPixelBuffer yet > 4. Plane 2 unmap -> unlock CVPixelBuffer > > Therefore, I must remember, per-whole-CVPixelBuffer, whether it's locked and > how. > > > @@ +53,3 @@ > > + * or for reading and writing, as reflected in @lock_state. */ > > + guint lock_count; > > +} GstAppleCoreVideoPixelBuffer; > > > > Why not implementing a GstMiniObject, which already handle the this locking ? > > If GstMiniObject maps cleanly to what I'm doing here, then sure, why not. Ok, let me know. > > > @@ +60,3 @@ > > + * Indicates a non-planar pixel buffer. > > + */ > > +#define GST_APPLE_CORE_VIDEO_NO_PLANE ((size_t)-1) > > > > non-planar and NO_PLANE does not fully sound the same to me, can we use a > > better name ? (I could propose GST_APPLE_CORE_VIDEO_HAS_ONE_PLANE) > > It is passed as an argument when a plane is expected, i.e. > gst_apple_core_video_memory_new_wrapped (gpixbuf, > GST_APPLE_CORE_VIDEO_NO_PLANE, size)) > > It's like: > Q - Excuse me, what plane are you passing me? 1st? 2nd? > A - I'm passing you a special plane -- a "no-plane". > > Also, ..._HAS_ONE_PLANE != non-planar. Agreed. > > > @@ +75,3 @@ > > + GstAppleCoreVideoPixelBuffer *gpixbuf; > > + size_t plane; > > +} GstAppleCoreVideoMemory; > > > > In fact, I don't understand why this double wrapping. > > Because Gst*Memory is per-plane, see comment 12. Ok, let's add comments the header, so at first read everyone understand the why. No one will automatically refer to this bug in the future.
Created attachment 307418 [details] [review] applemedia: implement GstAppleCoreVideoMemory Implement a new memory type wrapping CVPixelBuffer. There are two immediate advantages: a) Make the GstMemory itself retain the CVPixelBuffer. Previously, the containing GstBuffer was solely responsible for the lifetime of the backing CVPixelBuffer. With this change, we remove the GST_MEMORY_FLAG_NO_SHARE so that GstMemory objects be referenced by multiple GstBuffers (doing away with the need to copy.) b) Delay locking CVPixelBuffer into CPU memory until it's actually mapped -- possibly never. The CVPixelBuffer object is shared among references, shares and (in planar formats) planes, so a wrapper GstAppleCoreVideoPixelBuffer structure was introduced to manage locking.
Changes in attachment 307418 [details] [review] (from attachment 305435 [details] [review]): - moved gst_apple_core_video_memory_init to applemedia plugin init; no longer calling it in "random" places - gst_apple_core_video_pixel_buffer_create -> gst_apple_core_video_pixel_buffer_new - Improved docs for: - gst_apple_core_video_pixel_buffer_lock - gst_apple_core_video_memory_init - gst_is_apple_core_video_memory - GstAppleCoreVideoLockState - GstAppleCoreVideoPixelBuffer - GstAppleCoreVideoMemory (for the last two, let me know whether the phrasing is better now)
Created attachment 307420 [details] [review] applemedia: implement copying of meta Before this, buffers would lose their Core Video / Core Media meta over intervideo* boundary.
Review of attachment 307418 [details] [review]: Sorry I forgot about this patchset. I did a quick step through for which I didn't found evident issues. As it has been a long time, can you confirm this is still good to be merged in master ?
Alessandro Decina has been working on memory management for OS X and iOS. Perhaps he has something better to share soon?
I definitely think this patchset is good. I spent countless hours polishing it. My interests have moved elsewhere, so I probably won't be very available to improve it further, but I definitely think it's up to the GStreamer standards.
I think this is still potentially relevant even with the IOSurface changes that were done. Using CVPixelBuffer and the like is orthogonal from IOSurface
Ok, does this still apply cleanly after Alessandro ? Is Alessandro interested in discussing his work or will he always push stuff at random point in time without further discussion with the others ?
As said in comment 38, these changes here are completely orthogonal to the IOSurface changes he did. Getting both in (and both as public API ideally) would be a good idea.
(In reply to Nicolas Dufresne (stormer) from comment #39) > Ok, does this still apply cleanly after Alessandro ? Is Alessandro > interested in discussing his work or will he always push stuff at random > point in time without further discussion with the others ? Uh, what's with the polemical tone? :) Texture sharing in vtdec was disabled since September (f02425c4afcd85260a1b387aeddf863774257917). Turned out that the fix involved switching to IOSurface (8ae003326157438c12c45589e050c5f446723f61), which is something that has actually been discussed since the IOSurface framework was introduced (years - in the meantime safari, chrome and firefox all switched to it). I finally did it. I was aware of this patchset of course but it has nothing to do with what I did so I didn't say anything here. Now as for this patchset, I'm personally +0 on landing it _as it is_ (and the reason why I haven't landed it) for the following reasons: - performance: while the code is well written (although I think it's too complex in places, like the locking logic), it doesn't _practically_ improve anything performance wise. Its purpose is to delay mapping pixel buffers, but for all practical purposes pixel buffers get _always_ mapped except that in the avfvideosrc ! vtenc case, so it doesn't really improve anything. avfvideosrc ! vtenc as it is today in master already avoids mapping pixel buffers using the CoreMedia meta (which I don't think is ideal, see below); - simplicity: we all agree with the GstMemory approach, but this code also keeps the metas around. I wrote the metas in my 1-day port to 1.x back in the day. In retrospect that was a poor choice and the metas should likely be removed now. We've been talking about _at least_ unifying the CoreVideo and CoreMedia metas for a while. In fact the whole coremedia/corevideo buffer stuff should probably just be one thing; - mainteinance: say we merge this as is (since it's got tests and it's probably good enough to work), it looks to me like we're adding code without improving performance and without reducing debt. Since Ilya is not interested, I will have to be the one to clean up the metas/buffer implementations anyway and if we land this I will have to update even more code
I've already done work towards unifying the metas here: https://bugzilla.gnome.org/show_bug.cgi?id=747707 This work depends on this patch, which is why it wasn't merged yet. IIRC, one thing the current approach fails on is that it marks buffers (rightfully) as unshare-able, therefore resulting in memcpys -- not in the normal avfvideosrc ! vtenc flow, but surely if you do something more exotic like piping thru the intravideo* elements. I know that I'm not improving the common flows since you avoid locking buffers if there's a GL-texture-consumer downstream (detected thru a allocation query). This is also a valid solution, although I believe what I've done here is more in line with the GStreamer architecture.
Re IOSurface, IIRC it's not public API on iOS.
I've been using all of this successfully in my iOS app's RTP pipeline. Moreover, I've been using appsrc/appsink to grab camera frames and to draw frames myself by getting Core Video objects directly (after applying patches in bug 747707) -- which is, of course, the way to go on an embedded platform where performance and power consumption are important. My interest moved elsewhere (i.e. to WebRtc.org) since I realized the RTP stack didn't work well enough for my needs without functional FEC, but otherwise, the whole thing worked pretty well.
(In reply to Ilya Konstantinov from comment #43) > Re IOSurface, IIRC it's not public API on iOS. The IOSurfaceAPI.h header is not public (yet) as it is on OSX. It is somewhat already public API tho, see https://developer.apple.com/library/ios/qa/qa1781/_index.html and AVF and VT in iOS do create IOSurface based objects.
(In reply to Ilya Konstantinov from comment #42) > I've already done work towards unifying the metas here: > https://bugzilla.gnome.org/show_bug.cgi?id=747707 > > This work depends on this patch, which is why it wasn't merged yet. That's great. I would have liked to see the unification first, but I guess it's too late for that (as you say, the whole patchset is noisy). > I know that I'm not improving the common flows since you avoid locking > buffers if there's a GL-texture-consumer downstream (detected thru a > allocation query). This is also a valid solution, although I believe what > I've done here is more in line with the GStreamer architecture. Not sure what makes you say that. What is implemented now is how all the GLMemory aware elements work, and in fact working on the IOSurface memory required coordination with Matthew to do some changes to the libgstgl lib.
I have started merging this. I haven't merged the tests yet since they don't seem to run properly (lots of CF warnings, i'll look into it tomorrow) commit 8577224c74fde67884113c3b06b31ceadc11732c Author: Ilya Konstantinov <ilya.konstantinov@gmail.com> Date: Sun Apr 19 00:30:48 2015 +0300 applemedia: implement copying of meta Before this, buffers would lose their Core Video / Core Media meta over intervideo* boundary. https://bugzilla.gnome.org/show_bug.cgi?id=747216 commit 936b2fdfbc71dfda8c3f51caba84daa20d56732f Author: Ilya Konstantinov <ilya.konstantinov@gmail.com> Date: Thu Apr 2 20:04:18 2015 +0300 applemedia: implement GstAppleCoreVideoMemory Implement a new memory type wrapping CVPixelBuffer. There are two immediate advantages: a) Make the GstMemory itself retain the CVPixelBuffer. Previously, the containing GstBuffer was solely responsible for the lifetime of the backing CVPixelBuffer. With this change, we remove the GST_MEMORY_FLAG_NO_SHARE so that GstMemory objects be referenced by multiple GstBuffers (doing away with the need to copy.) b) Delay locking CVPixelBuffer into CPU memory until it's actually mapped -- possibly never. The CVPixelBuffer object is shared among references, shares and (in planar formats) planes, so a wrapper GstAppleCoreVideoPixelBuffer structure was introduced to manage locking. https://bugzilla.gnome.org/show_bug.cgi?id=747216
What's the status here with the tests?
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/226.