After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 744261 - curl: add curlhttpsrc
curl: add curlhttpsrc
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-bad
git master
Other All
: Normal enhancement
: 1.13.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2015-02-10 12:15 UTC by Sam Hurst
Modified: 2017-07-26 18:00 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
curlhttpsrc patch (84.08 KB, patch)
2015-02-10 12:15 UTC, Sam Hurst
reviewed Details | Review
Improved curlhttpsrc patch (84.08 KB, patch)
2015-02-13 14:28 UTC, Sam Hurst
none Details | Review
curl: Add curlhttpsrc element (89.58 KB, patch)
2017-05-09 13:45 UTC, Sebastian Dröge (slomo)
none Details | Review
curl: Add curlhttpsrc element (89.98 KB, patch)
2017-05-09 13:50 UTC, Sebastian Dröge (slomo)
none Details | Review
curlhttpsrc: Random cleanup (5.58 KB, patch)
2017-05-10 14:59 UTC, Sebastian Dröge (slomo)
none Details | Review
curlhttpsrc: Random cleanup (12.61 KB, patch)
2017-05-10 15:02 UTC, Sebastian Dröge (slomo)
none Details | Review
Add curlhttpsrc with chunked data delivery (93.67 KB, patch)
2017-07-20 16:42 UTC, Sam Hurst
none Details | Review
Add curlhttpsrc with chunked data delivery - rework with ::unlock() (95.36 KB, patch)
2017-07-24 11:06 UTC, Sam Hurst
none Details | Review
Add curlhttpsrc with chunked data delivery - fixed segv (97.35 KB, patch)
2017-07-25 15:21 UTC, Sam Hurst
committed Details | Review
Reorganise header files (4.72 KB, patch)
2017-07-26 16:48 UTC, Sam Hurst
committed Details | Review

Description Sam Hurst 2015-02-10 12:15:37 UTC
Created attachment 296494 [details] [review]
curlhttpsrc patch

Hi,

As part of a recent trial to test HTTP/2 for media delivery, BBC R&D created a new GStreamer element that uses libcurl instead of libsoup as a http/https URI Handler. We released the code to github here: https://github.com/BBC/gst-curlhttpsrc, but we also wanted to contribute the code back to GStreamer as we believe that it has some benefits over the existing souphttpsrc that other people might want to use.

Those benefits are as follows:
> Support for the next generation HTTP/2 standard via nghttp2 support in libcurl
> Supports persistant connections that allow reuse, meaning new sessions don't need to be set up for every subsequent media request
> Supports HTTP/1.1 multiplexing across element instances to share a single connection and again reduce setup times
> Full property compatibility with souphttpsrc

You should find attached a patch that integrates curlhttpsrc into the gst-plugins-bad tree.

Let me know what you think, if there's any changes needed or you have any questions whatsoever feel free to ask.

Best Regards,
Sam
Comment 1 Thiago Sousa Santos 2015-02-11 03:49:41 UTC
Review of attachment 296494 [details] [review]:

Thanks for the patch!

The code looks good overall but there are some things to fix.

Also on my tests if I try to use http 2.0 in a server that doesn't support it seems to hang rather than error out.
Some other comments and questions below.

::: ext/curl/Makefile.am
@@ +14,3 @@
 			gstcurlftpsink.c \
 			$(gstcurlsshsink_SOURCES) \
+			gstcurlsmtpsink.c \

missing the gstcurlqueue.c here

::: ext/curl/gstcurlhttpsrc.c
@@ +167,3 @@
+
+  gst_curl_http_src_curl_capabilities = curl_version_info (CURLVERSION_NOW);
+  http_env = g_getenv ("GST_CURL_HTTP_VER");

Why do you need this env var?

@@ +602,3 @@
+   * Check that the CURL worker thread is running. If it isn't, start it.
+   */
+  gst_curl_http_src_ref_multi (source);

It would make more sense to do this in the NULL -> READY transition. In NULL state the element should have resources allocated.

@@ +678,3 @@
+
+  GST_INFO_OBJECT (src, "Closing instance, worker thread refcount is now %u",
+      --klass->multi_task_context.refcount);

Don't do operations inside of GST_DEBUG and friends macros. They can be disabled at compile time and your decrement would never happen.

@@ +702,3 @@
+
+  /* Unref the curl_multi task. If nothing else holds a reference, stop it. */
+  gst_curl_http_src_unref_multi (src);

And do this in the READY -> NULL state change

@@ +1020,3 @@
+    *buf = gst_buffer_new_allocate (NULL, src->len, NULL);
+    gst_buffer_map (*buf, &info, GST_MAP_READWRITE);
+    memcpy (info.data, src->msg, (size_t) src->len);

Remember to unmap the buffer

@@ +1041,3 @@
+      src->caps = gst_caps_make_writable (src->caps);
+      gst_caps_set_simple (src->caps, "content-type", G_TYPE_STRING,
+          src->headers.content_type, NULL);

What is the use case for this?

::: ext/curl/gstcurlqueue.h
@@ +54,3 @@
+  GMutex running;
+  GstCurlHttpSrcQueueElement *next;
+};

Did you consider using the Glib helpers like GQueue to avoid implementing a queue yourself?

https://developer.gnome.org/glib/stable/glib-Double-ended-Queues.html
Comment 2 Sam Hurst 2015-02-11 15:58:47 UTC
Hi Thiago,

Thanks very much for the comments, I've added some replies below:

> Why do you need this [GST_CURL_HTTP_VER] env var?

We wanted an easy way to force certain HTTP versions during testing, and I couldn't see a nice easy way of doing this via playbin. Also, I don't think any other HTTP uri handler actually allows you to set the HTTP Version number from another element.

> +  gst_curl_http_src_ref_multi (source);
>
> It would make more sense to do this in the NULL -> READY transition. In NULL state the element should have resources allocated.

I'll be sure to move it.

> +  GST_INFO_OBJECT (src, "Closing instance, worker thread refcount is now %u",
> +      --klass->multi_task_context.refcount);
>
> Don't do operations inside of GST_DEBUG and friends macros. They can be disabled at compile time and your decrement would never happen.

Got it.

> +    *buf = gst_buffer_new_allocate (NULL, src->len, NULL);
> +    gst_buffer_map (*buf, &info, GST_MAP_READWRITE);
> +    memcpy (info.data, src->msg, (size_t) src->len);
>
> Remember to unmap the buffer

As far as I can tell, this should be done after the buffer has been pushed, so I'm going to take that to mean I should be putting it at the beginning of the _create() function as the GstPushSrc takes care of pushing the buffer upstream for me?

> +      src->caps = gst_caps_make_writable (src->caps);
> +      gst_caps_set_simple (src->caps, "content-type", G_TYPE_STRING,
> +          src->headers.content_type, NULL);
> 
> What is the use case for this?

A colleague of mine wanted me to set the caps to tell a downstream element what content it's getting for ease of use in his own module.

> +  GMutex running;
> +  GstCurlHttpSrcQueueElement *next;
> +};
>
> Did you consider using the Glib helpers like GQueue to avoid implementing a queue yourself?

I did, and an initial attempt at a port to GSList ended in things breaking. Maybe in the future I can port it over but as I can't devote an awful lot of time to this at present I've elected to leave it as-is for now. 

As for your hang, that's not something I've ever seen before, and I have tested requesting HTTP/2 upgrades to HTTP/1.1 servers. The transfer should time out (assuming the timeout property isn't set as 0 which means it'll never time out) and if it doesn't that might be a libcurl bug. Were you attempting to connect via SSL or cleartext? Also, what versions of curl/nghttp are you running? I've been pretty much tracking git latest during development and it may be something that's been fixed.

Thanks again for the feedback, I'll take all your suggestions on board and I'll put an amended patch up soon.

Cheers,
Sam
Comment 3 Sam Hurst 2015-02-13 14:28:53 UTC
Created attachment 296783 [details] [review]
Improved curlhttpsrc patch

Hi Thiago,

I've taken your suggestions on board and done the relevant changes, and attached a new version of the patch to the ticket.

Cheers,
Sam
Comment 4 Julien Isorce 2017-01-02 10:10:30 UTC
Hi,
Just a quick note that it would be useful to have a GstBaseHttpSrc to factorise common code with GstSoupHttpSrc and this new GstCurlHttpSrc. It could also benefit web engines which implement their own http src duplicating some code too.
Cheers,
Julien
Comment 5 Nitesh Laller 2017-01-24 09:29:10 UTC
(In reply to Julien Isorce from comment #4)
> Hi,
> Just a quick note that it would be useful to have a GstBaseHttpSrc to
> factorise common code with GstSoupHttpSrc and this new GstCurlHttpSrc. It
> could also benefit web engines which implement their own http src
> duplicating some code too.
> Cheers,
> Julien

Hi Julien,

Let me introduce myself, My name is Nitesh Laller and I am working on HTTP streaming modules in Samsung organization.

I think idea of GstBaseHttpSrc is useful and it will help in building different http source elements without duplicating code again and again.
If someone has not already started developing this BaseKlass, then I would like to involve myself in this.
Please reply.

Thanks,
Comment 6 Julien Isorce 2017-01-24 09:39:58 UTC
Hi Nitesh,
Nobody started it already so feel free to go ahead.
Good luck.
Julien
Comment 7 Sebastian Dröge (slomo) 2017-01-24 10:19:39 UTC
(In reply to Nitesh Laller from comment #5)
>
> I think idea of GstBaseHttpSrc is useful

What common code / functionality do you think can be put into the base class? I was thinking of an interface for HTTP sources before, but if actual functionality can also be shared that would of course be even better.
Comment 8 Nitesh Laller 2017-01-25 07:36:37 UTC
(In reply to Sebastian Dröge (slomo) from comment #7)
> (In reply to Nitesh Laller from comment #5)
> >
> > I think idea of GstBaseHttpSrc is useful
> 
> What common code / functionality do you think can be put into the base
> class? I was thinking of an interface for HTTP sources before, but if actual
> functionality can also be shared that would of course be even better.

Hi Sebastian,

As you know there are various third party libraries which provide interfaces for handling http requests and they all have their own type of implementations for handling such requests.
Interfaces are different, way of implementation is different, So keeping those in mind currently I have following points which can be put in http base class.

Functional Changes:
1) http header response parsing could be done in http base class as it almost same for all http based plugins.
If in case third party library is already providing such parsing then client plugin can ignore default parsing , if not then anyone can use it.
On the basis of header response parsing various parts of http plugin can be put in common place like , get_size check, Is_seekable check. 

Coding related changes could be: 
1) Common properties can be included in base class example:
PROP_LOCATION, PROP_USER_ID,  PROP_USER_PW,  PROP_PROXYURI,  PROP_PROXY_ID,  PROP_PROXY_PW,  PROP_COOKIES,  PROP_USER_AGENT,  PROP_EXTRA_HEADERS,  PROP_COMPRESS,  PROP_AUTOMATIC_REDIRECT,  PROP_MAXREDIRECT  PROP_KEEP_ALIVE,  PROP_TIMEOUT,  PROP_SSL_STRICT,  PROP_SSL_CA_FILE,  PROP_RETRIES,  PROP_HTTPVERSION,  PROP_IS_LIVE  
All variables related to above properties will come under base class structures and client can use these variables.

2) “get_size” implementation will come inside base http class because header is already parsed in base http class.
3) "is_seekable” will come inside base http class, because on the basis of size of content this decision can be taken. Size of content comes in header response of request.
4) "open_session" function pointer can be provided in base class as compared to “start” function
5) Similarly "close_session" function pointer can be provided in base class ( as "open_session" and "close_session" give more sense related to http sessions)
6) "query" function can be handled in http base class to handle common queries like GST_QUERY_URI 
7) Additionally change_state function can be handled in base class. Third party related resource allocation and de-allocation changes can be called using additional class function pointers.

So currently I have these changes in mind and these changes can reduce the common code which every http based plugin need to include.

If you have additional ideas for how more functionality and code can be made common for base class then please share, it would be highly helpful.

Thanks,
Nitesh Laller
Comment 9 Stephan Hesse 2017-03-02 13:14:42 UTC
Hey everybody, 

anything new from this thread?

I'd be interested to contribute some to this out of actual interest.

Also I know of applications/users that would like to have more metrics displayed by the default HTTP src (currently using souphttpsrc). These would include for example time to DNS resolution, TCP connection setup time, time until first byte, etc ..

How about introducing also a conform way of passing HTTP metrics downstream in this novel HTTP base class? It may also be a more generic metric class suitable to any kind of network transaction using source elements?

As already said, I am happy to contribute to already started work here, as well as starting it if nothing has been done yet.

Cheers
Stephan
Comment 10 Sebastian Dröge (slomo) 2017-04-18 11:48:42 UTC
Let's merge curlhttpsrc after 1.12 for now (with a lower rank than souphttpsrc), and handle the "HTTP source interface" discussion in another bug. Anybody interested working on that please open a new bug :)
Comment 11 Sebastian Dröge (slomo) 2017-05-09 13:45:56 UTC
Created attachment 351434 [details] [review]
curl: Add curlhttpsrc element

Merged from https://github.com/bbc/gst-curlhttpsrc commit
  478ec021b92f08a211e33ee0d454ff1ebe612a42
Comment 12 Sebastian Dröge (slomo) 2017-05-09 13:50:33 UTC
Created attachment 351435 [details] [review]
curl: Add curlhttpsrc element

Merged from https://github.com/bbc/gst-curlhttpsrc commit
  478ec021b92f08a211e33ee0d454ff1ebe612a42
Comment 13 Sebastian Dröge (slomo) 2017-05-10 14:59:13 UTC
Created attachment 351557 [details] [review]
curlhttpsrc: Random cleanup
Comment 14 Sebastian Dröge (slomo) 2017-05-10 15:01:22 UTC
Ok, the main blocker for merging this now is that the *whole* response is first downloaded and only then placed into a single output buffer. This is not acceptable, the response should be read chunk by chunk and immediately forwarded downstream.
Comment 15 Sebastian Dröge (slomo) 2017-05-10 15:02:11 UTC
Created attachment 351558 [details] [review]
curlhttpsrc: Random cleanup
Comment 16 Sam Hurst 2017-07-06 10:52:44 UTC
Hi Sebastian,

Apologies for the very delayed response, this got lost under a lot of other things.

I agree that the whole response being placed into the buffer isn't ideal. I do remember trying to get it to deliver chunk by chunk, but it's been so long now that I can't quite remember where I got with it. Someone else from Fluendo did fork the github repository and made a version which does deliver chunk by chunk, so that would be a good place to start. As for the patch, is this something you're proposing to change, or would you need me to do it? (Just to make sure we don't duplicate effort)
Comment 17 Sebastian Dröge (slomo) 2017-07-06 11:00:50 UTC
Ok, might be best to merge that from the Fluendo repository then. Can you prepare a patch with that and also my changes from above? That would be great :)
Comment 18 Sam Hurst 2017-07-19 08:48:31 UTC
Hi Sebastian,

I've managed to get some time to look at this. Do you have any idea at the moment about what the release schedule for the 1.14 release is going to be? It would be nice to get this in for the next release, if at all possible.

-Sam
Comment 19 Sebastian Dröge (slomo) 2017-07-19 09:28:17 UTC
You still have a month or two for 1.14
Comment 20 Sam Hurst 2017-07-20 16:42:20 UTC
Created attachment 356056 [details] [review]
Add curlhttpsrc with chunked data delivery

Hi Sebastian,

I've just attached a new patch file which is based on a reworked version of curlhttpsrc which delivers data in chunks rather than in the single buffer, as well as incorporating your changes. Let me know what you think! :)

-Sam
Comment 21 Sebastian Dröge (slomo) 2017-07-24 07:20:52 UTC
Does this include the changes from the Fluendo code that you mentioned?
Comment 22 Sebastian Dröge (slomo) 2017-07-24 07:26:02 UTC
Review of attachment 356056 [details] [review]:

Just very shortly went over the code, some comments below

::: ext/curl/gstcurlhttpsrc.c
@@ +7,3 @@
+ * Based on the GstElement template, courtesy of
+ * Copyright (C) 2005 Thomas Vander Stichele <thomas@apestaart.org>
+ * Copyright (C) 2005 Ronald S. Bultje <rbultje@ronald.bitfreak.net>

This can probably go away, it's a template after all and not really doing anything

@@ +1704,3 @@
+  g_mutex_lock (&s->buffer_mutex);
+  s->buffer =
+      realloc (s->buffer, (s->buffer_len + chunk_len + 1) * sizeof (char));

Use g_realloc(), g_malloc(), g_free() please

@@ +1711,3 @@
+  memcpy (s->buffer + s->buffer_len, chunk, chunk_len);
+  s->buffer_len += chunk_len;
+  g_cond_signal (&s->signal);

You also need to signal in GstBaseSrc::unlock(), and then go out of ::create() (etc) ASAP, returning GST_FLOW_FLUSHING. And all following calls until GstBaseSrc::unlock_stop should behave the same (immediately return GST_FLOW_FLUSHING).
Comment 23 Sam Hurst 2017-07-24 10:15:38 UTC
Hi Sebastian,

I'll incorporate your requested changes now. I just have one question though about the expected behaviour of the source element when it's sending flushing. In the documentation I can find, that means that the element should be discarding all buffers. In the case of a running HTTP transfer, does that mean I should drop any received buffers from the remote server that I haven't yet sent downstream?

Given that the downstream is then not receiving some part of the resource that has been requested, should I cancel the in-flight HTTP request, and then send GST_FLOW_EOS the next time ::create() is called after ::unlock_stop()? At a cursory glance, this seems to be the behaviour of souphttpsrc. Or have I got the wrong end of the stick here?
Comment 24 Sebastian Dröge (slomo) 2017-07-24 10:21:54 UTC
Sounds correct. What happens next time ::create() is called is not entirely clear here, usually what would happen is that you go back to READY anyway and then next time in READY->PAUSED you would do a new request.
Comment 25 Tim-Philipp Müller 2017-07-24 10:31:21 UTC
There should really be either a state change or a seek request after unlock_stop().
Comment 26 Sam Hurst 2017-07-24 11:06:38 UTC
Created attachment 356287 [details] [review]
Add curlhttpsrc with chunked data delivery - rework with ::unlock()

Here's a new patch that fixes the issues that were raised. Do let me know if you find anything else.
Comment 27 Sebastian Dröge (slomo) 2017-07-25 08:58:47 UTC
Comment on attachment 356287 [details] [review]
Add curlhttpsrc with chunked data delivery - rework with ::unlock()

Crashes here with: gst-launch-1.0 curlhttpsrc location=http://google.com ! fakesink silent=false -v

Thread 3 "curlhttpsrc0:sr" received signal SIGSEGV, Segmentation fault.

Thread 140737152808704 (LWP 31351)

  • #0 gst_structure_copy
    at gststructure.c line 346
  • #1 boxed_proxy_collect_value
    at ././gobject/gboxed.c line 240
  • #2 gst_structure_set_valist_internal
    at gststructure.c line 607
  • #3 gst_structure_set
    at gststructure.c line 638
  • #4 gst_curl_http_src_handle_response
    at gstcurlhttpsrc.c line 1093
  • #5 gst_curl_http_src_create
    at gstcurlhttpsrc.c line 807
  • #6 gst_base_src_get_range
    at gstbasesrc.c line 2504
  • #7 gst_base_src_loop
    at gstbasesrc.c line 2804
  • #8 gst_task_func
    at gsttask.c line 332
  • #9 g_thread_pool_thread_proxy
    at ././glib/gthreadpool.c line 307
  • #10 g_thread_proxy
    at ././glib/gthread.c line 784
  • #11 start_thread
    at pthread_create.c line 333
  • #12 clone
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 97

Comment 28 Sebastian Dröge (slomo) 2017-07-25 08:59:13 UTC
You probably also want to merge https://cgit.freedesktop.org/gstreamer/gst-plugins-good/commit/ext/soup?id=d0dfbb04c002bc94fb7d68d420f5276c32b9d991
Comment 29 Sam Hurst 2017-07-25 15:21:32 UTC
Created attachment 356365 [details] [review]
Add curlhttpsrc with chunked data delivery - fixed segv

Oops, I was using GST_TYPE_STRUCTURE instead of G_TYPE_STRING to store the redirection URI. Stupid copy/paste mistake, I bet. I've also made sure to include a URI that will result in a redirection to my test set. I've also added in the element message as requested, along with a bit of reorganisation to fix an assertion fail I picked up earlier when running with lots of debugging.
Comment 30 Sebastian Dröge (slomo) 2017-07-26 06:53:12 UTC
commit e74b3a02ddf0e964b0ad8cc12b9e4967652f23e2 (HEAD -> master)
Author: Sam Hurst <Sam.Hurst@bbc.co.uk>
Date:   Tue Jul 25 15:23:57 2017 +0100

    curl: Add curlhttpsrc element
    
    Merged from https://github.com/bbc/gst-curlhttpsrc commit
      f8aabcfc5c50a44f3362de831377d6e86dcd2d49
    
    https://bugzilla.gnome.org/show_bug.cgi?id=744261
Comment 31 Philippe Normand 2017-07-26 16:08:49 UTC
This broke my macOS build because of the 3 variables declared in the curlsrc.h file => the linker complains about duplicated symbols. Why is there so much stuff defined in that header btw?

At least the gst_curl_http_src_curl_capabilities, pref_http_ver and gst_curl_http_src_default_useragent variables should be moved to the C file.
Comment 32 Sam Hurst 2017-07-26 16:48:29 UTC
Created attachment 356433 [details] [review]
Reorganise header files

In all honesty, before last week the last time I looked at most of this code was two and a half years ago and I've worked on a lot since so I can't remember decisions from back then. I do remember that it used to be spread across a lot more files, but I was encouraged (can't remember by who...) to try and merge it down to a smaller number. So I'm guessing those declarations are a relic of that. The #defines could all be moved into gstcurldefaults.h too, and that would skim down gstcurlhttpsrc.h.

The build seems to work fine on all my Linux boxes, but I haven't tested it on a Mac as I don't have one. My apologies. I've attached a patch, does this help?
Comment 33 Philippe Normand 2017-07-26 16:55:17 UTC
Thanks Sam! This patch looks good to me and also fixed my build :)
Comment 34 Philippe Normand 2017-07-26 18:00:13 UTC
Comment on attachment 356433 [details] [review]
Reorganise header files

commit b922edce7b6d659f114a34f6e78768e9165584e7 (HEAD -> master, origin/master, origin/HEAD)
Author: Sam Hurst <Sam.Hurst@bbc.co.uk>
Date:   Wed Jul 26 17:43:19 2017 +0100

    curl: Reorganise header files to fix macOS builds
    
    https://bugzilla.gnome.org/show_bug.cgi?id=744261