After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 557777 - SoupURILoader
SoupURILoader
Status: RESOLVED FIXED
Product: libsoup
Classification: Core
Component: Misc
unspecified
Other Linux
: Normal normal
: ---
Assigned To: libsoup-maint@gnome.bugs
libsoup-maint@gnome.bugs
: 524502 528862 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2008-10-24 15:22 UTC by Dan Winship
Modified: 2011-09-22 18:59 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
webkit patch to use new-io libsoup branch (16.01 KB, patch)
2009-12-20 19:00 UTC, Dan Winship
none Details | Review
Implement about:foo, the epiphany part (4.93 KB, patch)
2009-12-20 19:44 UTC, Dan Winship
none Details | Review
make about: override work (1.67 KB, patch)
2009-12-20 19:44 UTC, Dan Winship
none Details | Review
StartHttp now uses libsoup-io API (5.36 KB, patch)
2010-04-06 19:07 UTC, Sergio Villar
none Details | Review
WebKitGtk+ using SoupURILoader stuff for Http resources (15.29 KB, patch)
2010-06-04 09:26 UTC, Sergio Villar
none Details | Review

Description Dan Winship 2008-10-24 15:22:36 UTC
This is an idea I've had... I'm not totally sure about it. Basically, a libcurl-like API for libsoup that lets you do simple operations on http, ftp, file, and data URIs.

The primary consumer would be the WebKit soup backend; this would replace the current gunky gio-based code there, because as I understand it, there's no way the gio/gvfs-based ftp loader can ever work right, because gvfs URIs aren't the same as real URIs, and WebKit needs to work with real URIs.

(This could also maybe tie into some of the stuff in bug 524502 [soup_simple API].)

Since this API can't be implemented on top of gio/gvfs, and we don't want to duplicate the gvfs ftp code, this would mean moving the core of gvfsbackendftp.c into libsoup (probably as an undocumented, not-necessarily-ABI-stable API, unless it was possible to hide it completely behind the SoupURILoader API).

As I said, I'm not totally sure about this. Thoughts?
Comment 1 Benjamin Otte (Company) 2008-10-25 09:50:17 UTC
On one hand a simple thing that gives me a read (or write) data stream from any random URL is a quite useful thing. On the other hand I'm not sure you could (or should) move the gvfs ftp code into a library. For one, most of the ftp code is about ensuring compatibility between gvfs and ftp (figuring out the right gvfs error message from a "550 Failed" ftp reply is quite a job for example). Also, there's still some ftp servers that only allow one connection per user, so you likely want a daemon to handle your ftp needs instead of all apps working but not the browser or vice versa.

I think what you do want for web operations is an API that maps from gvfs urls to real URIs and back and then use gvfs directly. And I think this API should live in gio.
Comment 2 Dan Winship 2008-10-26 19:08:16 UTC
(In reply to comment #1)
> I'm not sure you could
> (or should) move the gvfs ftp code into a library. For one, most of the ftp
> code is about ensuring compatibility between gvfs and ftp (figuring out the
> right gvfs error message from a "550 Failed" ftp reply is quite a job for
> example).

The API would be gio-ish, and would return GIOErrors, if that helps at all...

> Also, there's still some ftp servers that only allow one connection
> per user, so you likely want a daemon to handle your ftp needs instead of all
> apps working but not the browser or vice versa.

In the case of WebKit, having separate ftp connections from nautilus/gvfs shouldn't be any more or less a problem than it currently is with Firefox. (Is it a problem with Firefox?)

In the general case, I don't see ftp making a massive resurgence, so the likelihood of conflicts between apps seems small. :)

> I think what you do want for web operations is an API that maps from gvfs urls
> to real URIs and back and then use gvfs directly. And I think this API should
> live in gio.

Well, the other issue is that you don't want the ftp server to suddenly show up in nautilus and the file chooser (and have a new nautilus window pop up) just because the user is viewing an ftp:// URI in WebKit.

Another example, currently libgweather uses libsoup to fetch METAR info from a CGI script on weather.noaa.gov, but the result it gets back is wrapped in HTML which might, in theory, change in the future. It would be nicer to get the raw METAR reports, but those are only available from ftp. But we don't want a noaa.weather.gov mount to show up in nautilus every time the clock applet updates itself.

So at a minimum we'd need g_file_new_for_standard_uri(), to get a GFile corresponding to a "real" URI, and a new GMountMountFlag G_MOUNT_MOUNT_HIDDEN, to create a mount that is not made visible to GVolumeMonitors. (And probably some magic to cause hidden mounts to become visible if they become referenced by a gvfs URI instead of a standard URI? Bluh.)

Comment 3 Dan Winship 2008-12-22 20:52:32 UTC
(Having just mentioned this on #gnome-hackers, I should probably update it a little.)

I think for the reasons mentioned in the previous comment, trying to make it possible to use gio+gvfs to do this would be a lose. Also, although one argument for using gio+gvfs would be "but then you can get support for sftp and bluetooth and other stuff too", I think that's out of scope for this API, which is really pretty much about (a) "implement the URL schemes that browsers implement", and/or (b) "provide easy access to publicly-available files on the net", neither of which requires sftp, bluetooth, etc support.

So I still like the idea of moving the core of gvfsd-ftp into libsoup itself. I'm not totally sure what level of API exposure for the FTP stuff would make sense. For the first release, it would probably all be hidden behind SoupURILoader, except that there'd be a private soup-ftp.h installed for gvfs, where you'd have to #define LIBSOUP_I_KNOW_SOUP_FTP_IS_UNSTABLE or something. Later on if we were happy with the API, we might make it official. Or not.
Comment 4 Christian Dywan 2009-02-16 23:23:24 UTC
I am very interested in an API that handles http, file, data and myfoo the same way (myfoo being a made up protocol defined by my application), maybe also ftp if it works. It is otherwise complicated to have to maintain several code paths for different protocols with essentially the same purpose.
Comment 5 Alexander V. Butenko 2009-03-30 19:09:25 UTC
probably its nice to collaborate with gio guys once they have big part of this job done. It will be nice to see only one library which implements network protocols with a good development power for supporting it.  And also it will avoid code duplication and
Comment 6 Dan Winship 2009-04-23 17:15:13 UTC
(In reply to comment #5)
> probably its nice to collaborate with gio guys once they have big part of this
> job done. It will be nice to see only one library which implements network
> protocols with a good development power for supporting it.  And also it will
> avoid code duplication and

We definitely want to avoid code duplication, but as discussed above, that means making gio and SoupURILoader both use the same code underneath, not making SoupURILoader use gio. Note also that many of gio's protocols (eg, sftp) are explicitly out of scope for this bug.

(In reply to comment #4)
> I am very interested in an API that handles http, file, data and myfoo the same
> way (myfoo being a made up protocol defined by my application), maybe also ftp
> if it works. It is otherwise complicated to have to maintain several code paths
> for different protocols with essentially the same purpose.

A good concrete example of "myfoo" is "cid:", the URI protocol for identifying images in an HTML email message by the Content-Id of the multipart/related message part. GtkHTML makes that easy to deal with, since it punts all URI handling back up to the application, but a WebKit-based mail client would need some way to teach WebKit how to fetch those resources.

Note, btw, that there is now a SoC student who will be working on this.
Comment 7 Dan Winship 2009-09-02 14:47:42 UTC
The SoC work is in the soup-uri-loader branch of git://github.com/gcorvala/gsoc-2009.git (which, last I checked, had not rebased/merged master since the start of summer). It implements ftp and file, both sync and async, using GInputStream rather than the SoupMessage style of emitting progress signals (thereby making it more like the proposed soup-internals rewrite from bug 591739.

The goal is to merge this for 2.30.

Some outstanding issues:

1. Miscellaneous bits of not-quite-finishedness

2. Duplication between SoupProtocolFTP and gvfs-ftp: at the time we started, gvfs hadn't been ported to GSocket yet, so reimplementing from scratch made a little more sense. Long term we obviously don't want two implementations. gvfs's implementation is more complete, but SoupProtocolFTP handles both sync and async. I still don't have a definite sense of how we'd want to expose the beyond-SoupURILoader bits of an ftp implementation in libsoup for gvfs's use.

3. Implement SoupProtocolData for data: URIs. (pretty easy)

4. Implement SoupProtocolHTTP. Will require some thought about how to expose various HTTP possibilities (message headers, SoupSessionFeatures) at the SoupURILoader level. Given the GInputStreaminess of SoupURILoader, this probably depends on bug 591739

5. Implement user-defined SoupProtocols. (Might be doable already?)

6. Figure out SoupSession/SoupURILoader integration issues. (Eg, HTTP redirects to FTP URIs). This involves fixing the longstanding "all URI schemes that aren't 'https' are http" thing.
Comment 8 Dan Winship 2009-12-19 12:44:21 UTC
Discussed this with Benjamin Otte yesterday...

We realized that we don't need a separate SoupURILoader class; we can just have this functionality in SoupSession, which then also solves the problem of "what session gets used when you load an http URI via SoupURILoader". The API is now looking something like

    SoupRequest *soup_session_request_uri (SoupSession *, const char *uri);

where SoupRequest is an interface type; the implementation for HTTP uris will be SoupMessage, again solving some API problems mentioned above. Then SoupRequest will have (gio-style) sync and async methods to send it and get back a GInputStream.

For the first draft, the HTTP interface to SoupRequest will use something like gvfs's SoupInputStream, though eventually we want to push the "streaminess" further down the stack. The initial implementation of FTP URIs will use Gabriel's SoC code, though eventually we want to figure out how to share gvfs's ftp code between gvfs and libsoup. There are various problems with all of the obvious solutions there.

Directory rendering will be handled via a kind of SoupSessionFeature that is given a GFileEnumerator (or something) and returns a GInputStream. An implementation will be provided that generates an HTML directory listing using icons fetched via gio.
Comment 9 Dan Winship 2009-12-20 19:00:55 UTC
Created attachment 150115 [details] [review]
webkit patch to use new-io libsoup branch

work is ongoing on the new-io branch in libsoup git. I'm attaching a patch to make webkit use it for non-http traffic.

Current status:
    SoupRequestFile: supports sync and async, does simple directory listings
    SoupRequestData: supports sync and async
    SoupRequestFTP: Uses the SoC code. Supports sync well, async mostly. Directory listings don't yet work (though the underlying code supports them) and content-type/content-length data isn't working in the async interface, so WebKit thinks all ftp: resources are application/octet-stream. Benjamin is planning to fix up the issues that keep us from using ftp-via-gvfs
    SoupRequestHTTP: (not used by the WebKit patch) Uses gvfs's SoupInputStream. Async works (a bit kludgily), sync does not, and won't until soup uses GSocket.
    Plugin types: you can write them. It doesn't go out of its way to make it easy yet

Also, no support for authentication yet.
Comment 10 Dan Winship 2009-12-20 19:44:19 UTC
Created attachment 150117 [details] [review]
Implement about:foo, the epiphany part
Comment 11 Dan Winship 2009-12-20 19:44:59 UTC
Created attachment 150118 [details] [review]
make about: override work
Comment 12 Benjamin Otte (Company) 2009-12-20 23:07:06 UTC
I've pushed some more reorganizations to the new-io branch, including the removal of the FTP code to use gvfs instead. That works fine with one little caveat: We need to find a way to mount synchronously, as the current code only allows accessing already mounted FTP shares. And it looks ugly.

Other than that it works perfectly fine.
Comment 13 Sergio Villar 2010-04-06 19:07:05 UTC
Created attachment 158073 [details] [review]
StartHttp now uses libsoup-io API

Patch on top of Dan's original patch to webkit.

I know there are missing thing but I'd like to get feedback ASAP in order to confirm that I'm on the right way
Comment 14 Dan Winship 2010-04-06 19:40:45 UTC
Comment on attachment 158073 [details] [review]
StartHttp now uses libsoup-io API

yeah, this looks like it's on the right path; I'm guessing that the code for handling restarted requests (eg, after a redirect or http authentication) is going to need to be adjusted as well.

also, you might have lost content-sniffing? not sure. I didn't bother to try actually applying this patch yet :)

>+    d->m_req = soup_session_request(session, url.string().utf8().data(), &error);
>+    g_object_set (d->m_msg, SOUP_MESSAGE_METHOD, request.httpMethod().utf8().data(), NULL);

It's a bit clunky that you have to first create the request and then later set the request method. it might make sense to make soup_session_request() a little more powerful so that you can specify the method there. (Maybe make it g_object_new-like, where you provide a NULL-terminated list of properties and values?)


Also, at one point I had thought about the possibility of making SoupRequestHTTP actually *be* a SoupMessage, but I don't remember if there was a good reason why we didn't. (Note that this would require turning SoupRequest into an interface rather than a class, because we can't change the size of SoupMessage.) Anyway, it's possible that that would be uglier than the way it is now.
Comment 15 Benjamin Otte (Company) 2010-04-06 20:05:59 UTC
Using a g_object_new()-style API seems like a bad idea as you do not know the supported properties (and their types) in advance as the request types can be registered dynamically.

And I do remember us deciding against making SoupMessage a SoupRequest at the hackfest and I do remember us being pretty sure it's the correct approach, but I don't remember why either.
Comment 16 Sergio Villar 2010-04-07 09:51:31 UTC
(In reply to comment #14)
> (From update of attachment 158073 [details] [review])
> yeah, this looks like it's on the right path; I'm guessing that the code for
> handling restarted requests (eg, after a redirect or http authentication) is
> going to need to be adjusted as well.

A lot of xmlhttprequest tests fail with this code, might be somehow related?

> also, you might have lost content-sniffing? not sure. I didn't bother to try
> actually applying this patch yet :)

content sniffing callback is still there. What I remember is removing the got-chunk callback that was causing weird side effects

> >+    d->m_req = soup_session_request(session, url.string().utf8().data(), &error);
> >+    g_object_set (d->m_msg, SOUP_MESSAGE_METHOD, request.httpMethod().utf8().data(), NULL);
> 
> It's a bit clunky that you have to first create the request and then later set
> the request method. it might make sense to make soup_session_request() a little
> more powerful so that you can specify the method there. (Maybe make it
> g_object_new-like, where you provide a NULL-terminated list of properties and
> values?)

Yeah, another possibility I was thinking about was to add a new method to ResourceRequestSoup called updateSoupMessage or something that will pretty much do what toSoupMessage is doing but message creation. Both could share the same code to set headers and flags.
 
> Also, at one point I had thought about the possibility of making
> SoupRequestHTTP actually *be* a SoupMessage, but I don't remember if there was
> a good reason why we didn't. (Note that this would require turning SoupRequest
> into an interface rather than a class, because we can't change the size of
> SoupMessage.) Anyway, it's possible that that would be uglier than the way it
> is now.

Well if you take a look at the repo you actually did that :). It was later removed during the hackfest, don't know your reasons though
Comment 17 Gustavo Noronha (kov) 2010-04-07 18:29:04 UTC
(In reply to comment #16)
> > It's a bit clunky that you have to first create the request and then later set
> > the request method. it might make sense to make soup_session_request() a little
> > more powerful so that you can specify the method there. (Maybe make it
> > g_object_new-like, where you provide a NULL-terminated list of properties and
> > values?)
> 
> Yeah, another possibility I was thinking about was to add a new method to
> ResourceRequestSoup called updateSoupMessage or something that will pretty much
> do what toSoupMessage is doing but message creation. Both could share the same
> code to set headers and flags.

I don't see how you'd use this. ResourceRe{quest,sponse}Soup do not store a SoupMessage currently, so what would you use such a method for?
Comment 18 Sergio Villar 2010-04-09 10:49:25 UTC
(In reply to comment #17)
> (In reply to comment #16)
> > > It's a bit clunky that you have to first create the request and then later set
> > > the request method. it might make sense to make soup_session_request() a little
> > > more powerful so that you can specify the method there. (Maybe make it
> > > g_object_new-like, where you provide a NULL-terminated list of properties and
> > > values?)
> > 
> > Yeah, another possibility I was thinking about was to add a new method to
> > ResourceRequestSoup called updateSoupMessage or something that will pretty much
> > do what toSoupMessage is doing but message creation. Both could share the same
> > code to set headers and flags.
> 
> I don't see how you'd use this. ResourceRe{quest,sponse}Soup do not store a
> SoupMessage currently, so what would you use such a method for?

Well, you will obviously have to pass the SoupMessage to that method, so it will be something like

ResourceRequest::updateSoupMessage(SoupMessage* soupMessage)
{
  headers and flags stuff...
}

ResourceRequest::toSoupMessage()
{
   msg = new soupMessage
   updateSoupMessage(msg);
}

and in ResourceRequestHandler you'll call that updateSoupMessage method instead of g_object_set. Just an idea
Comment 19 Sergio Villar 2010-04-12 11:59:34 UTC
(In reply to comment #16)
> (In reply to comment #14)
> > (From update of attachment 158073 [details] [review] [details])
> > yeah, this looks like it's on the right path; I'm guessing that the code for
> > handling restarted requests (eg, after a redirect or http authentication) is
> > going to need to be adjusted as well.
> 
> A lot of xmlhttprequest tests fail with this code, might be somehow related?

BTW that was not the problem. They were failing because xml http requests use sync calls to libsoup and those do not work with Dan's original libsoup-io branch.

Good news are that I was able to make them work, I'll upload the patch to libsoup's bug. With the new libsoup with sync calls working all the xmlhttp tests work fine :)
Comment 20 Sergio Villar 2010-04-12 22:05:30 UTC
(In reply to comment #19)
> (In reply to comment #16)
> > (In reply to comment #14)
> > > (From update of attachment 158073 [details] [review] [details] [details])
> > > yeah, this looks like it's on the right path; I'm guessing that the code for
> > > handling restarted requests (eg, after a redirect or http authentication) is
> > > going to need to be adjusted as well.
> > 
> > A lot of xmlhttprequest tests fail with this code, might be somehow related?
> 
> BTW that was not the problem. They were failing because xml http requests use
> sync calls to libsoup and those do not work with Dan's original libsoup-io
> branch.
> 
> Good news are that I was able to make them work, I'll upload the patch to
> libsoup's bug. With the new libsoup with sync calls working all the xmlhttp
> tests work fine :)

I must say that somehow I didn't test them correctly and are still failing, let's see why...
Comment 21 Sergio Villar 2010-06-04 09:26:35 UTC
Created attachment 162736 [details] [review]
WebKitGtk+ using SoupURILoader stuff for Http resources

This patch is not really a derived work from Dan's patch as the previous one. After trying hard to get http/ftp/data/file fully working I decided to give up and restart step by step.

With this first patch WebKitGtk+ uses SoupURILoader stuff only for http and good news is that all of the WebKit tests work fine. Note that I am not using the libsoup-io in the gnome repositories but this one http://gitorious.org/libsoup-io-cache in gitorious. Basically is the original libsoup-io branch with several fixes.

So I think that we should start to seriously consider the SoupURILoader stuff as an actual way to follow. This patch does not adapt WebKitGtk+ to use the new API for data or file. As I said I tried really hard to get that working but I found some problems need to be solved:
   * data: handling is a bit tricky. Current WebKit code uses a g_timeout_add to avoid pure synchronous handling of data: resources as it breaks tests. Basically WebCore receives events in different order than expected.
   * file: handling should be easily portable to the new API (I actually did it quite fast a couple of weeks ago) but it won't work until URI parsing is properly handled in SoupURI. Some file: URI's are not properly handled by libsoup currently and thus it breaks several WebKit tests.
Comment 22 Sergio Villar 2010-06-07 10:34:24 UTC
BTW I have just created a bug in WebKit https://bugs.webkit.org/show_bug.cgi?id=40222 for the WebKitGtk+ patch.

We can still talk about libsoup specifics here. Remember that the current implementation is here:  http://gitorious.org/libsoup-io-cache
Comment 23 Dan Winship 2010-08-16 02:22:54 UTC
*** Bug 528862 has been marked as a duplicate of this bug. ***
Comment 24 Dan Winship 2010-08-16 02:23:22 UTC
*** Bug 524502 has been marked as a duplicate of this bug. ***
Comment 25 Dan Winship 2010-12-10 15:08:03 UTC
This has now been moved to libsoup, but it is currently protected by #ifdef LIBSOUP_USE_UNSTABLE_REQUEST_API. It doesn't support SoupSessionSync, and there are network performance problems in epiphany currently that may be caused by this code. So, use with caution. Hopefully to be fixed for GNOME 3.0.
Comment 26 Dan Winship 2011-09-22 18:59:54 UTC
although this is still considered unstable API, I don't think there's any reason to keep this bug open