GNOME Bugzilla – Bug 744261
curl: add curlhttpsrc
Last modified: 2017-07-26 18:00:29 UTC
Created attachment 296494 [details] [review] curlhttpsrc patch Hi, As part of a recent trial to test HTTP/2 for media delivery, BBC R&D created a new GStreamer element that uses libcurl instead of libsoup as a http/https URI Handler. We released the code to github here: https://github.com/BBC/gst-curlhttpsrc, but we also wanted to contribute the code back to GStreamer as we believe that it has some benefits over the existing souphttpsrc that other people might want to use. Those benefits are as follows: > Support for the next generation HTTP/2 standard via nghttp2 support in libcurl > Supports persistant connections that allow reuse, meaning new sessions don't need to be set up for every subsequent media request > Supports HTTP/1.1 multiplexing across element instances to share a single connection and again reduce setup times > Full property compatibility with souphttpsrc You should find attached a patch that integrates curlhttpsrc into the gst-plugins-bad tree. Let me know what you think, if there's any changes needed or you have any questions whatsoever feel free to ask. Best Regards, Sam
Review of attachment 296494 [details] [review]: Thanks for the patch! The code looks good overall but there are some things to fix. Also on my tests if I try to use http 2.0 in a server that doesn't support it seems to hang rather than error out. Some other comments and questions below. ::: ext/curl/Makefile.am @@ +14,3 @@ gstcurlftpsink.c \ $(gstcurlsshsink_SOURCES) \ + gstcurlsmtpsink.c \ missing the gstcurlqueue.c here ::: ext/curl/gstcurlhttpsrc.c @@ +167,3 @@ + + gst_curl_http_src_curl_capabilities = curl_version_info (CURLVERSION_NOW); + http_env = g_getenv ("GST_CURL_HTTP_VER"); Why do you need this env var? @@ +602,3 @@ + * Check that the CURL worker thread is running. If it isn't, start it. + */ + gst_curl_http_src_ref_multi (source); It would make more sense to do this in the NULL -> READY transition. In NULL state the element should have resources allocated. @@ +678,3 @@ + + GST_INFO_OBJECT (src, "Closing instance, worker thread refcount is now %u", + --klass->multi_task_context.refcount); Don't do operations inside of GST_DEBUG and friends macros. They can be disabled at compile time and your decrement would never happen. @@ +702,3 @@ + + /* Unref the curl_multi task. If nothing else holds a reference, stop it. */ + gst_curl_http_src_unref_multi (src); And do this in the READY -> NULL state change @@ +1020,3 @@ + *buf = gst_buffer_new_allocate (NULL, src->len, NULL); + gst_buffer_map (*buf, &info, GST_MAP_READWRITE); + memcpy (info.data, src->msg, (size_t) src->len); Remember to unmap the buffer @@ +1041,3 @@ + src->caps = gst_caps_make_writable (src->caps); + gst_caps_set_simple (src->caps, "content-type", G_TYPE_STRING, + src->headers.content_type, NULL); What is the use case for this? ::: ext/curl/gstcurlqueue.h @@ +54,3 @@ + GMutex running; + GstCurlHttpSrcQueueElement *next; +}; Did you consider using the Glib helpers like GQueue to avoid implementing a queue yourself? https://developer.gnome.org/glib/stable/glib-Double-ended-Queues.html
Hi Thiago, Thanks very much for the comments, I've added some replies below: > Why do you need this [GST_CURL_HTTP_VER] env var? We wanted an easy way to force certain HTTP versions during testing, and I couldn't see a nice easy way of doing this via playbin. Also, I don't think any other HTTP uri handler actually allows you to set the HTTP Version number from another element. > + gst_curl_http_src_ref_multi (source); > > It would make more sense to do this in the NULL -> READY transition. In NULL state the element should have resources allocated. I'll be sure to move it. > + GST_INFO_OBJECT (src, "Closing instance, worker thread refcount is now %u", > + --klass->multi_task_context.refcount); > > Don't do operations inside of GST_DEBUG and friends macros. They can be disabled at compile time and your decrement would never happen. Got it. > + *buf = gst_buffer_new_allocate (NULL, src->len, NULL); > + gst_buffer_map (*buf, &info, GST_MAP_READWRITE); > + memcpy (info.data, src->msg, (size_t) src->len); > > Remember to unmap the buffer As far as I can tell, this should be done after the buffer has been pushed, so I'm going to take that to mean I should be putting it at the beginning of the _create() function as the GstPushSrc takes care of pushing the buffer upstream for me? > + src->caps = gst_caps_make_writable (src->caps); > + gst_caps_set_simple (src->caps, "content-type", G_TYPE_STRING, > + src->headers.content_type, NULL); > > What is the use case for this? A colleague of mine wanted me to set the caps to tell a downstream element what content it's getting for ease of use in his own module. > + GMutex running; > + GstCurlHttpSrcQueueElement *next; > +}; > > Did you consider using the Glib helpers like GQueue to avoid implementing a queue yourself? I did, and an initial attempt at a port to GSList ended in things breaking. Maybe in the future I can port it over but as I can't devote an awful lot of time to this at present I've elected to leave it as-is for now. As for your hang, that's not something I've ever seen before, and I have tested requesting HTTP/2 upgrades to HTTP/1.1 servers. The transfer should time out (assuming the timeout property isn't set as 0 which means it'll never time out) and if it doesn't that might be a libcurl bug. Were you attempting to connect via SSL or cleartext? Also, what versions of curl/nghttp are you running? I've been pretty much tracking git latest during development and it may be something that's been fixed. Thanks again for the feedback, I'll take all your suggestions on board and I'll put an amended patch up soon. Cheers, Sam
Created attachment 296783 [details] [review] Improved curlhttpsrc patch Hi Thiago, I've taken your suggestions on board and done the relevant changes, and attached a new version of the patch to the ticket. Cheers, Sam
Hi, Just a quick note that it would be useful to have a GstBaseHttpSrc to factorise common code with GstSoupHttpSrc and this new GstCurlHttpSrc. It could also benefit web engines which implement their own http src duplicating some code too. Cheers, Julien
(In reply to Julien Isorce from comment #4) > Hi, > Just a quick note that it would be useful to have a GstBaseHttpSrc to > factorise common code with GstSoupHttpSrc and this new GstCurlHttpSrc. It > could also benefit web engines which implement their own http src > duplicating some code too. > Cheers, > Julien Hi Julien, Let me introduce myself, My name is Nitesh Laller and I am working on HTTP streaming modules in Samsung organization. I think idea of GstBaseHttpSrc is useful and it will help in building different http source elements without duplicating code again and again. If someone has not already started developing this BaseKlass, then I would like to involve myself in this. Please reply. Thanks,
Hi Nitesh, Nobody started it already so feel free to go ahead. Good luck. Julien
(In reply to Nitesh Laller from comment #5) > > I think idea of GstBaseHttpSrc is useful What common code / functionality do you think can be put into the base class? I was thinking of an interface for HTTP sources before, but if actual functionality can also be shared that would of course be even better.
(In reply to Sebastian Dröge (slomo) from comment #7) > (In reply to Nitesh Laller from comment #5) > > > > I think idea of GstBaseHttpSrc is useful > > What common code / functionality do you think can be put into the base > class? I was thinking of an interface for HTTP sources before, but if actual > functionality can also be shared that would of course be even better. Hi Sebastian, As you know there are various third party libraries which provide interfaces for handling http requests and they all have their own type of implementations for handling such requests. Interfaces are different, way of implementation is different, So keeping those in mind currently I have following points which can be put in http base class. Functional Changes: 1) http header response parsing could be done in http base class as it almost same for all http based plugins. If in case third party library is already providing such parsing then client plugin can ignore default parsing , if not then anyone can use it. On the basis of header response parsing various parts of http plugin can be put in common place like , get_size check, Is_seekable check. Coding related changes could be: 1) Common properties can be included in base class example: PROP_LOCATION, PROP_USER_ID, PROP_USER_PW, PROP_PROXYURI, PROP_PROXY_ID, PROP_PROXY_PW, PROP_COOKIES, PROP_USER_AGENT, PROP_EXTRA_HEADERS, PROP_COMPRESS, PROP_AUTOMATIC_REDIRECT, PROP_MAXREDIRECT PROP_KEEP_ALIVE, PROP_TIMEOUT, PROP_SSL_STRICT, PROP_SSL_CA_FILE, PROP_RETRIES, PROP_HTTPVERSION, PROP_IS_LIVE All variables related to above properties will come under base class structures and client can use these variables. 2) “get_size” implementation will come inside base http class because header is already parsed in base http class. 3) "is_seekable” will come inside base http class, because on the basis of size of content this decision can be taken. Size of content comes in header response of request. 4) "open_session" function pointer can be provided in base class as compared to “start” function 5) Similarly "close_session" function pointer can be provided in base class ( as "open_session" and "close_session" give more sense related to http sessions) 6) "query" function can be handled in http base class to handle common queries like GST_QUERY_URI 7) Additionally change_state function can be handled in base class. Third party related resource allocation and de-allocation changes can be called using additional class function pointers. So currently I have these changes in mind and these changes can reduce the common code which every http based plugin need to include. If you have additional ideas for how more functionality and code can be made common for base class then please share, it would be highly helpful. Thanks, Nitesh Laller
Hey everybody, anything new from this thread? I'd be interested to contribute some to this out of actual interest. Also I know of applications/users that would like to have more metrics displayed by the default HTTP src (currently using souphttpsrc). These would include for example time to DNS resolution, TCP connection setup time, time until first byte, etc .. How about introducing also a conform way of passing HTTP metrics downstream in this novel HTTP base class? It may also be a more generic metric class suitable to any kind of network transaction using source elements? As already said, I am happy to contribute to already started work here, as well as starting it if nothing has been done yet. Cheers Stephan
Let's merge curlhttpsrc after 1.12 for now (with a lower rank than souphttpsrc), and handle the "HTTP source interface" discussion in another bug. Anybody interested working on that please open a new bug :)
Created attachment 351434 [details] [review] curl: Add curlhttpsrc element Merged from https://github.com/bbc/gst-curlhttpsrc commit 478ec021b92f08a211e33ee0d454ff1ebe612a42
Created attachment 351435 [details] [review] curl: Add curlhttpsrc element Merged from https://github.com/bbc/gst-curlhttpsrc commit 478ec021b92f08a211e33ee0d454ff1ebe612a42
Created attachment 351557 [details] [review] curlhttpsrc: Random cleanup
Ok, the main blocker for merging this now is that the *whole* response is first downloaded and only then placed into a single output buffer. This is not acceptable, the response should be read chunk by chunk and immediately forwarded downstream.
Created attachment 351558 [details] [review] curlhttpsrc: Random cleanup
Hi Sebastian, Apologies for the very delayed response, this got lost under a lot of other things. I agree that the whole response being placed into the buffer isn't ideal. I do remember trying to get it to deliver chunk by chunk, but it's been so long now that I can't quite remember where I got with it. Someone else from Fluendo did fork the github repository and made a version which does deliver chunk by chunk, so that would be a good place to start. As for the patch, is this something you're proposing to change, or would you need me to do it? (Just to make sure we don't duplicate effort)
Ok, might be best to merge that from the Fluendo repository then. Can you prepare a patch with that and also my changes from above? That would be great :)
Hi Sebastian, I've managed to get some time to look at this. Do you have any idea at the moment about what the release schedule for the 1.14 release is going to be? It would be nice to get this in for the next release, if at all possible. -Sam
You still have a month or two for 1.14
Created attachment 356056 [details] [review] Add curlhttpsrc with chunked data delivery Hi Sebastian, I've just attached a new patch file which is based on a reworked version of curlhttpsrc which delivers data in chunks rather than in the single buffer, as well as incorporating your changes. Let me know what you think! :) -Sam
Does this include the changes from the Fluendo code that you mentioned?
Review of attachment 356056 [details] [review]: Just very shortly went over the code, some comments below ::: ext/curl/gstcurlhttpsrc.c @@ +7,3 @@ + * Based on the GstElement template, courtesy of + * Copyright (C) 2005 Thomas Vander Stichele <thomas@apestaart.org> + * Copyright (C) 2005 Ronald S. Bultje <rbultje@ronald.bitfreak.net> This can probably go away, it's a template after all and not really doing anything @@ +1704,3 @@ + g_mutex_lock (&s->buffer_mutex); + s->buffer = + realloc (s->buffer, (s->buffer_len + chunk_len + 1) * sizeof (char)); Use g_realloc(), g_malloc(), g_free() please @@ +1711,3 @@ + memcpy (s->buffer + s->buffer_len, chunk, chunk_len); + s->buffer_len += chunk_len; + g_cond_signal (&s->signal); You also need to signal in GstBaseSrc::unlock(), and then go out of ::create() (etc) ASAP, returning GST_FLOW_FLUSHING. And all following calls until GstBaseSrc::unlock_stop should behave the same (immediately return GST_FLOW_FLUSHING).
Hi Sebastian, I'll incorporate your requested changes now. I just have one question though about the expected behaviour of the source element when it's sending flushing. In the documentation I can find, that means that the element should be discarding all buffers. In the case of a running HTTP transfer, does that mean I should drop any received buffers from the remote server that I haven't yet sent downstream? Given that the downstream is then not receiving some part of the resource that has been requested, should I cancel the in-flight HTTP request, and then send GST_FLOW_EOS the next time ::create() is called after ::unlock_stop()? At a cursory glance, this seems to be the behaviour of souphttpsrc. Or have I got the wrong end of the stick here?
Sounds correct. What happens next time ::create() is called is not entirely clear here, usually what would happen is that you go back to READY anyway and then next time in READY->PAUSED you would do a new request.
There should really be either a state change or a seek request after unlock_stop().
Created attachment 356287 [details] [review] Add curlhttpsrc with chunked data delivery - rework with ::unlock() Here's a new patch that fixes the issues that were raised. Do let me know if you find anything else.
Comment on attachment 356287 [details] [review] Add curlhttpsrc with chunked data delivery - rework with ::unlock() Crashes here with: gst-launch-1.0 curlhttpsrc location=http://google.com ! fakesink silent=false -v Thread 3 "curlhttpsrc0:sr" received signal SIGSEGV, Segmentation fault.
+ Trace 237693
Thread 140737152808704 (LWP 31351)
You probably also want to merge https://cgit.freedesktop.org/gstreamer/gst-plugins-good/commit/ext/soup?id=d0dfbb04c002bc94fb7d68d420f5276c32b9d991
Created attachment 356365 [details] [review] Add curlhttpsrc with chunked data delivery - fixed segv Oops, I was using GST_TYPE_STRUCTURE instead of G_TYPE_STRING to store the redirection URI. Stupid copy/paste mistake, I bet. I've also made sure to include a URI that will result in a redirection to my test set. I've also added in the element message as requested, along with a bit of reorganisation to fix an assertion fail I picked up earlier when running with lots of debugging.
commit e74b3a02ddf0e964b0ad8cc12b9e4967652f23e2 (HEAD -> master) Author: Sam Hurst <Sam.Hurst@bbc.co.uk> Date: Tue Jul 25 15:23:57 2017 +0100 curl: Add curlhttpsrc element Merged from https://github.com/bbc/gst-curlhttpsrc commit f8aabcfc5c50a44f3362de831377d6e86dcd2d49 https://bugzilla.gnome.org/show_bug.cgi?id=744261
This broke my macOS build because of the 3 variables declared in the curlsrc.h file => the linker complains about duplicated symbols. Why is there so much stuff defined in that header btw? At least the gst_curl_http_src_curl_capabilities, pref_http_ver and gst_curl_http_src_default_useragent variables should be moved to the C file.
Created attachment 356433 [details] [review] Reorganise header files In all honesty, before last week the last time I looked at most of this code was two and a half years ago and I've worked on a lot since so I can't remember decisions from back then. I do remember that it used to be spread across a lot more files, but I was encouraged (can't remember by who...) to try and merge it down to a smaller number. So I'm guessing those declarations are a relic of that. The #defines could all be moved into gstcurldefaults.h too, and that would skim down gstcurlhttpsrc.h. The build seems to work fine on all my Linux boxes, but I haven't tested it on a Mac as I don't have one. My apologies. I've attached a patch, does this help?
Thanks Sam! This patch looks good to me and also fixed my build :)
Comment on attachment 356433 [details] [review] Reorganise header files commit b922edce7b6d659f114a34f6e78768e9165584e7 (HEAD -> master, origin/master, origin/HEAD) Author: Sam Hurst <Sam.Hurst@bbc.co.uk> Date: Wed Jul 26 17:43:19 2017 +0100 curl: Reorganise header files to fix macOS builds https://bugzilla.gnome.org/show_bug.cgi?id=744261