GNOME Bugzilla – Bug 737316
Add support for sending file-descriptors over Unix domain sockets
Last modified: 2015-03-20 10:49:27 UTC
Created attachment 287029 [details] [review] patch: gstreamer: Add GstUnixFDMeta libs: Add GstUnixFdMeta for attaching file descriptors to buffers Unix domain sockets allow passing file descriptors over them to other processes. They are passed as "ancillary data" and send out-of-band when with some data written to a socket (see man cmsg). These patches add a new type of GstMeta which models attaching ancillary data to some data sent across a socket as attaching a GstMeta to a buffer. This will allow a (or many) payloader(s) to be written allowing seamless sending of dmabuf or memfd `GstMemory`s between processes. I've got a work-in-progress one which uses (unlinked) files on a tmpfs as the backing memory. It essentially operates as a more reliable version of shmsrc/ shmsink (although with similar trust issues). The goal is that in the future you could also write payloaders/allocators targeting memfds and new FD-capable sinks using DBus signals or kdbus as a transport. These patches are not ready to go in but I'm publishing them here in the hope of starting a conversation about this topic. My personal motivation is that I want to be able to write a system service that will distribute the video captured from a video capture device to different containers running on the same machine, but I know @wtay and David King are interested in solving some of the similar kinds of problems for the desktop, light-heartedly referred to as "PulseVideo". This implementation patches gstreamer-core because that's where fdsrc and fdsink live. I also have a patch to gst-plugins-base adding support to multisocketsink. I chose this route, rather than writing specific src/ sink elements for fd-passing as fdsrc, fdsink and multisocketsink already know how to talk to unix domain sockets, this just allows them to make use of one more feature that sockets provide. The alternative would have been to copy these elements wholesale, renaming them into gst-plugins-bad which seemed particularly icky. The other pleasant side-effect is that it makes it easier to experiment with FD payloaders/depayloaders. The next steps I'm going to make are building out my example payloader/ depayloader in gst-plugins-bad and using that with rtpvrawpay/rtpvrawdepay for zero-copy broadcast round my PC. This should help further prove the concept.
Created attachment 287030 [details] [review] patch: gstreamer: Teach fdsrc and fdsink about GstUnixFDMeta
I think having a GstMeta is the wrong level. The FD should be attached to the GstMemory, not the GstBuffer. Maybe as a GInterface on top of the GstAllocator, genrallising the interface from GstDmabufAllocator. Ref: http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-base-libs/html/gst-plugins-base-libs-dmabuf.html
I think we should also have separate sources/sinks for Unix domain sockets. As they'll share a lot of code with the TCP elements, maybe it can go directly in that plugin to share some code. Apart from that what Olivier said. This should be a new memory type instead of a meta. Maybe a base memory type for arbitrary fds, and specialized ones for memfd and dmabuf.
(In reply to comment #2) > I think having a GstMeta is the wrong level. The FD should be attached to the > GstMemory, not the GstBuffer. Maybe as a GInterface on top of the GstAllocator, > genrallising the interface from GstDmabufAllocator. Right, so that was my initial approach too, but there is a rather subtle issue at play. In my design there *is* a custom allocator, but it lives in a payloader. Imagine that an fdsink receives a buffer with a dmabuf backed GstMemory: 1. How does the sink know that you want the FD sent and not the contents of the GstMemory? 2. And if it decides that you want the FD sent it knows what to send as ancillary data, but what should it send as the actual data over the socket? I believe that the answer to 2 is that you would send a little data that says something like "I've got a memfd attached, please map from offset to size". As soon as you do that I believe that we've invented a new application-layer network protocol, like RTP, which happens to be transmitted over Unix domain sockets. My understanding is that the GStreamer way of handling new formats is via payloaders/depayloaders rather than custom src/sink types (c.f. RTP). So if you accept that this is a new protocol, imagine a muxer/payloader: gst-launch-1.0 videotestsrc ! memfdpay ! multisocketsink Upstream of the payloader there are mappable GstMemorys backed with memfds. The allocator in memfdpay takes care of that. This is nice and composable, like the best of GStreamer. The upstream elements are oblivious, the pipeline gets zero-copy for free. But what do the downstream buffers contain? The data that will be written to the socket as before, in this case the header that says "I've got a memfd attached, please map from offset to size". In which case how is the FD passed from the payloader to the sink? Attached as metadata to the buffers. The way I see it is that sockets have in-band and out-of band data, and GstBuffer also has in-band and out-of-band data: the contents of the memory and the metadata respectively. These patches just map the socket-level concepts into GStreamer. So this solution has the nice property that you can choose what protocol (payloader/depayloader/allocator) you want to use separately from choosing how you want to manage your clients (which sink to use, multisocketsink or fdsink). It also has the nice property (that a custom sink wouldn't) that you don't need to re-implement/duplicate all excellent code in multisocketsink/fdsink for each fd-passing format.
fdsink/src is specialized for writing data into file descriptors, not socket. So as Sebastien says, this should be a new element. I'm not particularly fan of an hybrid between tcpsrc/sink and shmsrc/sink. I would personally prefer a design closer to shmsrc/sink, but with memfd allocator (hence memfd Memory). This same element, in reverse, should be able to zero copy memory that are backed by FDs and can be mmaped. By experience, the meta is really not the right place for this. In my opinion, we should make these new src/sink preserve the framing, so it becomes a inter process RTP friendly mechanism. The allocator interface proposed by Olivier is a really good starting point, since it's compatible with the existing code for GstDmabufAllocator, next to which we could add a GstMemfdAllocator (e.g. can be reused by a unix socket src/sink along with some kdbus adapter). For the rest, a payloader does not need such a property. It should just keep doing what it's been doing so far. Payloader have got this optimization to not merge Memory unless strictly required. So if upstream provides memfd backed memory, they will remain this way. Then if downstream offers a memfd allocator, the headers will be placed in memfd memory too, and all the sink will have to do, is pass the FDs over the unix socket (or kdbus). In presence of normal sinks, like tcpsink, udpsink, etc. The FDs will be mapped and streamed the conventional way (with a copy along that).
Created attachment 289132 [details] [review] patch: gstreamer: libs: Add GstUnixFdMeta for attaching file descriptors to buffers
Created attachment 289133 [details] [review] patch: gstreamer: WIP: fdsrc, fdsink: Teach elements about GstUnixFDMeta
Created attachment 289134 [details] [review] patch: gstreamer-common: gst-check: Add AG_GST_CHECK_GST_UNIXFD for new library
Created attachment 289135 [details] [review] patch: gst-plugins-base: multisocketsink: Map `GstMemory`s individually when sending
Created attachment 289136 [details] [review] patch: gst-plugins-base: multisocketsink: Add support for Unix FD passing
Created attachment 289137 [details] [review] patch: gst-plugins-bad: Add tmpfile plugin
Created attachment 289138 [details] [review] patch: gst-plugins-bad: Add GstTmpFileAllocator
Created attachment 289139 [details] [review] patch: gst-plugins-bad: Add element fdpay
Created attachment 289140 [details] [review] patch: gst-plugins-bad: Add element fddepay
I've uploaded the GStreamer patches I used in the production of the GStreamer conference 2014 lightning talk "Zero-copy inter-process video with FD passing"[1]. [1]: http://gstconf.ubicast.tv/videos/zero-copy-video-with-file-descriptor-passing/
These patches have been uploaded mostly for posterity. Between your comments above, the pain I had to go through to implement this and a discussion I had with wtay during the conference I agree a specialized sink and source could be a better way to go, at least in the short term. I still think that mapping Unix socket's ancillary data onto `GstBufferMeta` is a sound idea because it expands the usefulness of the elements that already know how to talk to sockets without reducing their usefulness in any other way. On the other hand I agree that making changes to GStreamer core itself to support (what is now) just a single user is a little heavy-handed. This may be worth re-visiting when/if there are more users for it in GStreamer (e.g. more fd-passing based wire protocols to implement). For the time-being I'm happy for this bug to be closed. I'll open a new one for my revised implementation when it's ready. Nonetheless I do think there has been some confusion about exactly what I was proposing/have implemented here, and I'd like to address this for completeness: (In reply to comment #5) > fdsink/src is specialized for writing data into file descriptors, not socket. > So as Sebastien says, this should be a new element. I'm not particularly fan of > an hybrid between tcpsrc/sink and shmsrc/sink. Note, there is no hybrid proposed here. As before fdsink/fdsrc/multisocketsink are used to talk to sockets, the only change is that they understand more of the capabilities that sockets have, e.g. the fact that they support sending ancillary data alongside the data payloads. I think some of the confusion is caused by both sockets and memfds being referred to by a file-descriptor from user-space, despite the fact that they are very different beasts in kernel-space. > I would personally prefer a > design closer to shmsrc/sink, but with memfd allocator (hence memfd Memory). > This same element, in reverse, should be able to zero copy memory that are > backed by FDs and can be mmaped. Indeed. The intention was to cover the use cases that `shmsrc`/`shmsink` cover, and expand on them. `shmsink` and `shmsrc` work using a listening unix socket to create connections between sender and receiver and then communicates over that. `shmsink` behaves like "fdpay ! unixserversink" and `shmsrc` like "unixclientsrc ! fddepay". OTOH the "PulseVideo" use-case uses `socketpair` and DBus for creating connections so the pipelines look like "fdpay ! multisocketsink" and "fdsrc ! fddepay". This is beyond the capabilities of `shmsrc` and `shmsink` because `shmsrc` and `shmsink` implemented their own socket code rather than reusing the code from other elements, whereas it's natural in the fdpay/fddepay model as you can re-use the connection creation mechanisms already implemented in other elements. > By experience, the meta is really not the > right place for this. In my opinion, we should make these new src/sink preserve > the framing, so it becomes a inter process RTP friendly mechanism. fdpay and fddepay preserve framing as currently implemented. > The allocator interface proposed by Olivier is a really good starting point, > since it's compatible with the existing code for GstDmabufAllocator, next to > which we could add a GstMemfdAllocator (e.g. can be reused by a unix socket > src/sink along with some kdbus adapter). Agreed and this is what I've implemented. > For the rest, a payloader does not need such a property. It should just keep > doing what it's been doing so far. Using `fdpay`/`fddepay` does not preclude also using rtpvrawpay/rtpvrawdepay. e.g. the pipeline: rtpvrawpay ! fdpay ! multisocketsink is perfectly valid and shouldn't cause additional copying in theory. In practice I'd need to check how the rtp elements do their payloading to ensure they aren't causing additional copies.
(In reply to comment #16) > (In reply to comment #5) > > fdsink/src is specialized for writing data into file descriptors, not socket. > > So as Sebastien says, this should be a new element. I'm not particularly fan of > > an hybrid between tcpsrc/sink and shmsrc/sink. > > Note, there is no hybrid proposed here. As before fdsink/fdsrc/multisocketsink > are used to talk to sockets, the only change is that they understand more of > the capabilities that sockets have, e.g. the fact that they support sending > ancillary data alongside the data payloads. > > I think some of the confusion is caused by both sockets and memfds being > referred to by a file-descriptor from user-space, despite the fact that they > are very different beasts in kernel-space. fdsrc/fdsink are not specialized for socket, normal file descriptors (files, pipe, etc.) are expected to work. > > > I would personally prefer a > > design closer to shmsrc/sink, but with memfd allocator (hence memfd Memory). > > This same element, in reverse, should be able to zero copy memory that are > > backed by FDs and can be mmaped. > > Indeed. The intention was to cover the use cases that `shmsrc`/`shmsink` > cover, and expand on them. `shmsink` and `shmsrc` work using a listening unix > socket to create connections between sender and receiver and then communicates > over that. `shmsink` behaves like "fdpay ! unixserversink" and `shmsrc` like > "unixclientsrc ! fddepay". OTOH the "PulseVideo" use-case uses `socketpair` > and DBus for creating connections so the pipelines look like "fdpay ! > multisocketsink" and "fdsrc ! fddepay". This is beyond the capabilities of > `shmsrc` and `shmsink` because `shmsrc` and `shmsink` implemented their own > socket code rather than reusing the code from other elements, whereas it's > natural in the fdpay/fddepay model as you can re-use the connection creation > mechanisms already implemented in other elements. For the reference, the shmsrc/shmsink designed was tailored to very small peace of data and framing (RTP). So shmsrc/shmsink implement a protocol that allow multiplexing a larger SHM area, reducing the overhead. The problem I've hit so far, is that it's protocol isn't easily extensible, so adding abitlity to pass sub region of the main SHM area along with passing other FDs that would pass by would break the protocol backward compatibility. Outside that I'm not convince of the overload of fdsrc and fdsink I'd like to thanks you for taking the time to experiment this and sharing your experience at the GStreamer Conference. Until your talk, it wasn't that clear to me what this design was about. My interest into this is mainly DMABUF fd passing, though looking at what is going on the kernel side, I'm starting to foresee something that would allow combination of memfd and KDBus to fulfill this task.
(In reply to comment #17) > (In reply to comment #16) > > (In reply to comment #5) > > > fdsink/src is specialized for writing data into file descriptors, not socket. > > > So as Sebastien says, this should be a new element. I'm not particularly fan of > > > an hybrid between tcpsrc/sink and shmsrc/sink. > > > > Note, there is no hybrid proposed here. As before fdsink/fdsrc/multisocketsink > > are used to talk to sockets, the only change is that they understand more of > > the capabilities that sockets have, e.g. the fact that they support sending > > ancillary data alongside the data payloads. > > > > I think some of the confusion is caused by both sockets and memfds being > > referred to by a file-descriptor from user-space, despite the fact that they > > are very different beasts in kernel-space. > > fdsrc/fdsink are not specialized for socket, normal file descriptors (files, > pipe, etc.) are expected to work. Agreed, and these different types of file description worked fine with my patches too. The patch is a pure generalisation of fdsink and fdsrc. All previous behaviour is preserved unless I've messed up in some way. > > > I would personally prefer a > > > design closer to shmsrc/sink, but with memfd allocator (hence memfd Memory). > > > This same element, in reverse, should be able to zero copy memory that are > > > backed by FDs and can be mmaped. > > > > Indeed. The intention was to cover the use cases that `shmsrc`/`shmsink` > > cover, and expand on them. `shmsink` and `shmsrc` work using a listening unix > > socket to create connections between sender and receiver and then communicates > > over that. `shmsink` behaves like "fdpay ! unixserversink" and `shmsrc` like > > "unixclientsrc ! fddepay". OTOH the "PulseVideo" use-case uses `socketpair` > > and DBus for creating connections so the pipelines look like "fdpay ! > > multisocketsink" and "fdsrc ! fddepay". This is beyond the capabilities of > > `shmsrc` and `shmsink` because `shmsrc` and `shmsink` implemented their own > > socket code rather than reusing the code from other elements, whereas it's > > natural in the fdpay/fddepay model as you can re-use the connection creation > > mechanisms already implemented in other elements. > > For the reference, the shmsrc/shmsink designed was tailored to very small peace > of data and framing (RTP). So shmsrc/shmsink implement a protocol that allow > multiplexing a larger SHM area, reducing the overhead. The problem I've hit so > far, is that it's protocol isn't easily extensible, so adding abitlity to pass > sub region of the main SHM area along with passing other FDs that would pass by > would break the protocol backward compatibility. Right. Compatibility is a concern for me also. I'm creating a system that will allow running test-scripts that my clients provide running inside a docker container. Our clients depend on being able to run the same test script and being sure they are getting exactly the same result. There is no scope for breaking or changing the behaviour of these scripts. I don't want to be constrained by backwards compatibility when writing new versions of stb-tester though so I run user scripts in docker containers. My solution to the protocol versioning is to separate connection establishment from the general communication when video is being passed. This allows feature negotiation to happen out-of-band before the connection is established, and once this negotiation is complete fdpay/fddepay can be configured to talk a particular version of the protocol, or even an entirely different payloader could be used. More concretely I've defined a DBus interface that looks like (in vala): [DBus (name = "com.stb-tester.VideoSource1")] interface VideoSource : GLib.Object { public abstract string caps { owned get; } public abstract GLib.UnixInputStream attach () throws Error; } The caps property has DBus type "s" and the attach() method has DBus type "h" (Unix FD). The idea is that if I come up with a new protocol (corresponding to some properties set on fdpay or a different payloader) I can also come up with a new interface name, e.g. com.stb-tester.VideoSource2. I then need to ensure the server-side can offer both the VideoSource1 and VideoSource2 interfaces and the job's a good'un. I've put the PulseVideo source that I used for the presentation on github: https://gist.github.com/wmanley/76974b124588c669c3b1 I'm considering also implementing a "zerocopydbusserversink" which would be a bin capable of exposing a stream on DBus using GDBus. The bin would be responsible for the negotiation and configuration and elements contained within would be responsible for actually sending the data. This is in contrast to the approach of multisocketsink and tcpserversink for instance where tcpserversink derives from multisocketsink and is thus responsible for both connection establishment and sending the data. In summary I believe that the key to compatibility is: * Separation between connection establishment and streaming * And out-of-band feature negotiation. > Outside that I'm not convince of the overload of fdsrc and fdsink I'd like to > thanks you for taking the time to experiment this and sharing your experience > at the GStreamer Conference. Until your talk, it wasn't that clear to me what > this design was about. My interest into this is mainly DMABUF fd passing, > though looking at what is going on the kernel side, I'm starting to foresee > something that would allow combination of memfd and KDBus to fulfill this task. Thanks for the kind words. Indeed, using KDBus certainly has an appeal. Video frames could then be individually sent as DBus signals, rather than my design where only the socket (e.g. the conduit for the video) is sent during stream-setup time. The reason I didn't take this approach is that I need to be making use of it now-ish rather than being able to wait for KDBus.
reply to comment #16) > These patches have been uploaded mostly for posterity. Between your comments > above, the pain I had to go through to implement this and a discussion I had > with wtay during the conference I agree a specialized sink and source could be > a better way to go, at least in the short term. I'm flip-flopping once again. I tried creating a custom sink for this and it turned out horrible too. I ended up having to copy `multisocketsink` wholesale as I really need most of the logic in there for handling multiple clients who may or may not be cooperative. This in turn required copying `GstMultiHandleSink` as this is not a publicly available base-class. While solving some dependency/deployment issues of my previous approach it resulted in a huge amount of duplicated code. I'm working on a third approach which I hope will be more palatable, again using `GstMeta` but in a much less intrusive way utilising GIO's `GSocketControlMessage` to take care of the portability and dependency issues and to make the intention behind the design clearer. I will also introduce a new `socketsrc` element based on `GSocket` in a similar way that `multisocketsink` uses `GSocket`. I'll create a separate bugzilla ticket for these patches when they're more ready.
I've created [#739544] and [#739546] adding a new `socketsrc` element, currently without `GSocketControlMessage` capabilities. [#739544]: https://bugzilla.gnome.org/show_bug.cgi?id=739544 [#739546]: https://bugzilla.gnome.org/show_bug.cgi?id=739546 This lays some of the foundations.
I've got an implementation of zero-copy "pulsevideo" in my [pulsevideo repo] on github. It contains a "pulsevideo" program which is invoked like gst-launch-1.0 and a source element to read from pulsevideo. [pulsevideo repo]: https://github.com/wmanley/pulsevideo It's still a work in progress, but many parts are in a good state already and I've been using it in production reliably for the past month or so. It is intended to work against GStreamer 1.2 and later so I can support Ubuntu 14.04 and as such it contains a modified version of multisocketsink, a backport of watchdog from 1.4, a few new elements (socketsrc, fdpay, fddepay), a new allocator (GstTmpFileAllocator) and a new buffer meta (GstNetControlMessageMeta). I hope to get much of this into GStreamer proper, and the next step is to get #739546 reviewed. So if anyone has any interest in this work I would appreciate a review of #739546.
I'm closing this a fixed as these patches are replaced by #739546, #746150, #746164. Thanks to @wtay for all his help at the recent GStreamer hackfest.