After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 523100 - caching support
caching support
Status: RESOLVED FIXED
Product: libsoup
Classification: Core
Component: HTTP Transport
2.4.x
Other Linux
: Normal normal
: GNOME 2.24
Assigned To: libsoup-maint@gnome.bugs
libsoup-maint@gnome.bugs
Depends on: 591739
Blocks:
 
 
Reported: 2008-03-18 04:16 UTC by Dan Winship
Modified: 2011-09-22 19:00 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
SoupCache, first draft. (24.58 KB, patch)
2009-03-26 14:34 UTC, Xan Lopez
none Details | Review
soupcachev2.diff (25.76 KB, patch)
2009-03-27 14:56 UTC, Xan Lopez
none Details | Review
couple cleanups (2.55 KB, patch)
2010-01-17 03:03 UTC, Alexander V. Butenko
none Details | Review
Adds a get_message method to SoupRequestHttp (2.23 KB, patch)
2010-04-06 19:04 UTC, Sergio Villar
none Details | Review
Sync methods in libsoup new-io branch working (3.83 KB, patch)
2010-04-12 15:44 UTC, Sergio Villar
none Details | Review

Description Dan Winship 2008-03-18 04:16:03 UTC
We need to provide support for caching:

  - Provide a SoupCache object that can be attached to a session
  - Should be transparent to the SoupSession API. Eg, you queue/send a
    GET, and the session/cache sends a conditional GET instead, and gets
    back a 304, and then creates a 200 response to return to the app.

Not yet clear who will handle storage of cache objects...
Comment 1 Benjamin Otte (Company) 2008-03-31 14:18:11 UTC
CC'ing myself as I might be interested in writing a gvfs backend for the browser cache when this becomes reality.
Comment 2 Alexander V. Butenko 2009-02-15 18:25:38 UTC
https://bugs.webkit.org/show_bug.cgi?id=21239
i was trying to implement cache for libsoup in webkit using Marco Barisione patch.  Now im limited on time and cant continue works on this. Probably this start will be good for somebody to adopt it directly to libsoup.
Comment 3 Xan Lopez 2009-03-26 14:34:26 UTC
Created attachment 131433 [details] [review]
SoupCache, first draft.

OK, here's a first draft of this. There's many things wrong/missing, but I prefer to get feedback early on.

Known issues:

- I'm ignoring most request cache-directives/headers (things like max-stale, min-fresh, no-cache, etc). I implement a bunch of the response ones, though.
- I'm ignoring re-validation for now.
- All I/O is sync and I'm not really checking errors. (yeah...)
- Storage is hardcoded in the base class, but shouldn't be too difficult to make it extensible.
- I basically ignore anything on the cache across sessions, not sure if I should do that (but it's easier).
- Most sites I've tested seem to work, but I'm sure there's zillions of bugs.
Comment 4 Xan Lopez 2009-03-26 14:35:14 UTC
Oh, and it's only hooked into the async session.
Comment 5 Xan Lopez 2009-03-27 14:56:23 UTC
Created attachment 131500 [details] [review]
soupcachev2.diff

This adds support for most request cache directives, and fixes a stupid bug in deleting cache entries by key (the cache files were not being updated).
Comment 6 Dan Winship 2009-03-27 17:51:27 UTC
some comments after a first reading:

+get_cacheability (SoupMessage *msg)

This seems to treat no-store and no-cache as identical (using the behavior for no-store for both). "no-cache" means (AFAICT) that you're allowed to cache it, but you always have to revalidate before sending the cached response to the client. (So it always requires at least a conditional request, but not always a full request.)

I'm not sure that the logic for status codes is correct. What is it based on?

Also, RFC 2616, 14.21:

   The presence of an Expires header field with a date value of some
   time in the future on a response that otherwise would by default be
   non-cacheable indicates that the response is cacheable, unless
   indicated otherwise by a Cache-Control header field (section 14.9).

+headers_foreach (const char *name, const char *value, SoupMessageHeaders *headers)

should have a better name, involving "copy" or something.

soup_message_get_cache_key() seems weird. Why not just use soup_uri_to_string()? http vs https and port number need to be taken into account as well.

+		expires_int = g_ascii_strtoll (expires, NULL, 10.0);
+		date_int = g_ascii_strtoll (date, NULL, 10.0);

Expires and Date are textual dates, not numbers. Use soup_date_new() and soup_date_to_time_t(). (Or add a method to get the number of seconds between two SoupDates, since some responses may have Expires headers that are later than the time_t rollover and so converting to time_t will lose accuracy.)

Also, g_ascii_strtoll's third argument is an int, not a double. Doesn't matter here since that will be going away, but should be fixed elsewhere in the file.

I don't really like the gotos in the freshness calculation. Maybe move that calculation to a separate function so you can just return rather than goto? Actually, several of the calculations (this, get_cacheability, etc) might make sense as virtual methods, so that cache subclasses could implement extensions or override the default freshness behavior, etc.

+	} else {
+		/* Mmmm, can this happen? */
+		entry->date = time (NULL);
+	}

Yes, servers are allowed to return responses without Date headers if they don't have a clock. (Eg, embedded devices.)

+static void
+msg_got_chunk_cb (SoupMessage *msg, SoupBuffer *chunk, SoupCacheEntry *entry)

+	if (!chunk->data || !chunk->length) return;

That can't happen. Either remove it or turn it into a g_return_if_fail.

I'm not sure I like the open-and-close-on-every-chunk thing. Couldn't it just keep an fd open while the request was running? Also, currently you're not persisting the SoupCacheEntry data, so the cached data is only good for the current session.

You don't really need to use gio if you don't want to; we're only going to support keeping the cache on local disk anyway.

It looks like currently if you send two requests for the same resource, the second request could end up using the cache entry from the first before the first response was actually fully processed? Or if you made two requests from different processes (or even different sessions in the same process), they could both end up taking turns appending the same data to the same cache file, and end up with garbage.

+	/* 1. The presented Request-URI and that of stored response
+	   match */
+	if (!entry) return FALSE;

Style nits on both the comment and the if. Should be:

+	/* 1. The presented Request-URI and that of stored response
+	 * match 
+	 */
+	if (!entry)
+ 		return FALSE;

+	/* 4. The presented request and stored response are free from
+	   directives that would prevent its use. */

You check the cached headers here, but you need to check for a Cache-control header in the request as well (where some of the tokens have slightly different semantics).

+		if (g_hash_table_lookup (hash, "no-cache") ||
+		    g_hash_table_lookup (hash, "no-store")) {
+			g_hash_table_destroy (hash);
+			return FALSE;
+		}

If you find "no-store" in the cache, you've already messed up. :-) And again, "no-cache" just means you must revalidate (always, as opposed to must-revalidate, which only means you must revalidate if it's not still fresh).

+		must_revalidate = (gboolean)g_hash_table_lookup_extended (hash, "must-revalidate", NULL, NULL);

the cast is superfluous

+	msg->response_headers = soup_message_headers_new (SOUP_MESSAGE_HEADERS_RESPONSE);
+	msg->response_body = soup_message_body_new ();

msg already has these. Don't create new ones.

Obviously soup_cache_send_response() eventually needs to deal with the possibility of the request being cancelled, restarted, or paused. I'll keep thinking about how to do this.

+soup_cache_constructor (GType                  type,

For code that just needs to do some extra steps after construction, rather than actually modifying how construction happens, it's better to override _constructed() rather than _constructor(). (Existing code in libsoup gets this wrong.)

+++ b/libsoup/soup-cache.h

You should typedef SoupCache in soup-types.h (and then soup-session-private.h doesn't need to include soup-cache.h)

+} SoupCacheability;

doesn't actually need to be exposed in the header

+++ b/libsoup/soup-session-async.c

You will need to actually add the request to the SoupMessageQueue. Eg, so that soup_session_abort() will cancel it correctly. So queue_message() will need to actually call the superclass method, and then at that point, you no longer need QueueMessageData, because everything you need will be in the SoupMessageQueueItem. (And you won't need to manually call the callback because the ordinary session machinery will take care of that when soup_cache_send_response() calls soup_message_finished().)
Comment 7 Dan Winship 2009-03-27 17:51:50 UTC
oh, and also: Awesome!
Comment 8 Xan Lopez 2009-03-31 15:24:19 UTC
OK, I'm going through these now, some comments:

(In reply to comment #6)
> some comments after a first reading:
> 
> +get_cacheability (SoupMessage *msg)
> 
> This seems to treat no-store and no-cache as identical (using the behavior for
> no-store for both). "no-cache" means (AFAICT) that you're allowed to cache it,
> but you always have to revalidate before sending the cached response to the
> client. (So it always requires at least a conditional request, but not always 
> full request.)

Right, because I'm not doing any validation now, so I just throw away the entry. To do this I think I'd need to change the return value of "has_response" from boolean to an enum { HAS/HAS NOT/NEEDS VALIDATION }, and make the session send a conditional request, etc. But I think for now it's better to just ignore it and add it when the rest of the code is solid, do you agree?

> 
> I'm not sure that the logic for status codes is correct. What is it based on?

That I took from your initial soup-cache.c code (it was pretty much the only thing you had).

> 
> Also, RFC 2616, 14.21:
> 
>    The presence of an Expires header field with a date value of some
>    time in the future on a response that otherwise would by default be
>    non-cacheable indicates that the response is cacheable, unless
>    indicated otherwise by a Cache-Control header field (section 14.9).
>

OK (although I need more robust time/date handling in general).
 
> +headers_foreach (const char *name, const char *value, SoupMessageHeaders
> *headers)
> 
> should have a better name, involving "copy" or something.

Done.

> 
> soup_message_get_cache_key() seems weird. Why not just use
> soup_uri_to_string()? http vs https and port number need to be taken into
> account as well.

Done.

> 
> +               expires_int = g_ascii_strtoll (expires, NULL, 10.0);
> +               date_int = g_ascii_strtoll (date, NULL, 10.0);
> 
> Expires and Date are textual dates, not numbers. Use soup_date_new() and
> soup_date_to_time_t(). (Or add a method to get the number of seconds between
> two SoupDates, since some responses may have Expires headers that are later
> than the time_t rollover and so converting to time_t will lose accuracy.)

Mmm, right, I've already seen some Expire dates > 2038.

> 
> Also, g_ascii_strtoll's third argument is an int, not a double. Doesn't matter
> here since that will be going away, but should be fixed elsewhere in the file.

Done (not sure why I thought that had to be a floating point number, being the base...).

> 
> I don't really like the gotos in the freshness calculation. Maybe move that
> calculation to a separate function so you can just return rather than goto?

Done.

> Actually, several of the calculations (this, get_cacheability, etc) might make
> sense as virtual methods, so that cache subclasses could implement extensions
> or override the default freshness behavior, etc.

Mmm, what alternative behaviors do you have in mind?

> 
> +       } else {
> +               /* Mmmm, can this happen? */
> +               entry->date = time (NULL);
> +       }
> 
> Yes, servers are allowed to return responses without Date headers if they don't
> have a clock. (Eg, embedded devices.)

OK. Is time (NULL) a reasonable "gee, I have no idea" default?

> 
> +static void
> +msg_got_chunk_cb (SoupMessage *msg, SoupBuffer *chunk, SoupCacheEntry *entry)
> 
> +       if (!chunk->data || !chunk->length) return;
> 
> That can't happen. Either remove it or turn it into a g_return_if_fail.

Done (it's a g_return_if_fail now).

> 
> I'm not sure I like the open-and-close-on-every-chunk thing. Couldn't it just
> keep an fd open while the request was running?

Done.

> Also, currently you're not
> persisting the SoupCacheEntry data, so the cached data is only good for the
> current session.

Right, I mentioned that already too. I think I prefer to add this later when this code is working correctly. We just need to dump the hash table to disk somehow and read it on initialization, I don't think it will be hugely complicated.

> 
> You don't really need to use gio if you don't want to; we're only going to
> support keeping the cache on local disk anyway.

Mmm, it's not that bad? :)

> 
> It looks like currently if you send two requests for the same resource, the
> second request could end up using the cache entry from the first before the
> first response was actually fully processed? Or if you made two requests from
> different processes (or even different sessions in the same process), they
> could both end up taking turns appending the same data to the same cache file,
> and end up with garbage.

OK, I think I have covered the first (I set a 'dirty' bit on the entry while it's being created, and don't create more with the same key in that case). Not sure about the second case, probably needs something similar on disk? We don't really cover those cases now anyway.

> 
> +       /* 1. The presented Request-URI and that of stored response
> +          match */
> +       if (!entry) return FALSE;
> 
> Style nits on both the comment and the if. Should be:
> 
> +       /* 1. The presented Request-URI and that of stored response
> +        * match 
> +        */
> +       if (!entry)
> +               return FALSE;

Done.

> 
> +       /* 4. The presented request and stored response are free from
> +          directives that would prevent its use. */
> 
> You check the cached headers here, but you need to check for a Cache-control
> header in the request as well (where some of the tokens have slightly different
> semantics).

Right, silly mistake, fixed now.

> 
> +               if (g_hash_table_lookup (hash, "no-cache") ||
> +                   g_hash_table_lookup (hash, "no-store")) {
> +                       g_hash_table_destroy (hash);
> +                       return FALSE;
> +               }
> 
> If you find "no-store" in the cache, you've already messed up. :-) And again,
> "no-cache" just means you must revalidate (always, as opposed to
> must-revalidate, which only means you must revalidate if it's not still fresh).

Right.

> 
> +               must_revalidate = (gboolean)g_hash_table_lookup_extended (hash,
> "must-revalidate", NULL, NULL);
> 
> the cast is superfluous

Right.

> 
> +       msg->response_headers = soup_message_headers_new
> (SOUP_MESSAGE_HEADERS_RESPONSE);
> +       msg->response_body = soup_message_body_new ();
> 
> msg already has these. Don't create new ones.

Done.

> 
> Obviously soup_cache_send_response() eventually needs to deal with the
> possibility of the request being cancelled, restarted, or paused. I'll keep
> thinking about how to do this.
> 
> +soup_cache_constructor (GType                  type,
> 
> For code that just needs to do some extra steps after construction, rather than
> actually modifying how construction happens, it's better to override
> _constructed() rather than _constructor(). (Existing code in libsoup gets this
> wrong.)

Right.

> 
> +++ b/libsoup/soup-cache.h
> 
> You should typedef SoupCache in soup-types.h (and then soup-session-private.h
> doesn't need to include soup-cache.h)

OK.

> 
> +} SoupCacheability;
> 
> doesn't actually need to be exposed in the header

Right.

> 
> +++ b/libsoup/soup-session-async.c
> 
> You will need to actually add the request to the SoupMessageQueue. Eg, so that
> soup_session_abort() will cancel it correctly. So queue_message() will need to
> actually call the superclass method, and then at that point, you no longer need
> QueueMessageData, because everything you need will be in the
> SoupMessageQueueItem. (And you won't need to manually call the callback because
> the ordinary session machinery will take care of that when
> soup_cache_send_response() calls soup_message_finished().)
> 

OK, so, if I do this request_queued is called for each request before I know I'll get it from the cache or not. Since I was using this entry point as "We WON'T get this from cache, it's either obsolete or we want to refresh it or we didn't have it", it kinda screws the logic of the whole thing. Is using request_started now for this OK? (I'm trying it now and I get nasty crashes all over the place, but I guess it's just that some assumptions I had before are wrong now).
Comment 9 Dan Winship 2009-04-05 18:28:23 UTC
FWIW, it looks like Firefox does not compute heuristic expiration dates for resources with no explicit expiration info. Instead it just does a conditional request with "If-Modified-Since" and/or "If-None-Match".
Comment 10 Dan Winship 2009-04-06 19:17:15 UTC
(In reply to comment #8)
> But I think for now it's better to just ignore
> it and add it when the rest of the code is solid, do you agree?

Sure

> > I'm not sure that the logic for status codes is correct. What is it based on?
> 
> That I took from your initial soup-cache.c code (it was pretty much the only
> thing you had).

Hah. Well, I don't remember where I got those ideas from, so you should probably double-check all that.

> > Actually, several of the calculations (this, get_cacheability, etc) might make
> > sense as virtual methods, so that cache subclasses could implement extensions
> > or override the default freshness behavior, etc.
> 
> Mmm, what alternative behaviors do you have in mind?

Well, like making a SoupCache subclass that will serve non-fresh data in some circumstances. Or implementing cache-control extensions like http://www.mnot.net/drafts/draft-nottingham-http-stale-while-revalidate-00.txt.

> > Yes, servers are allowed to return responses without Date headers if they don't
> > have a clock. (Eg, embedded devices.)
> 
> OK. Is time (NULL) a reasonable "gee, I have no idea" default?

You want to err on the side of guessing that the document is too old, not too new. So it would be a little better to get time(NULL) from the time when you sent the request, rather than from the time when you received the response. But not so much better to make it worth keeping track of that. Maybe just put a comment saying that if at some point in the future, we're keeping track of the time when requests are sent, then we should use that instead.

> OK, so, if I do this request_queued is called for each request before I know
> I'll get it from the cache or not. Since I was using this entry point as "We
> WON'T get this from cache, it's either obsolete or we want to refresh it or we
> didn't have it", it kinda screws the logic of the whole thing. Is using
> request_started now for this OK? (I'm trying it now and I get nasty crashes all
> over the place, but I guess it's just that some assumptions I had before are
> wrong now).

request_started ought to work, although it will be emitted more than once if the request is requeued (eg, redirect, authentication, etc).
Comment 11 Dan Winship 2009-05-06 00:16:53 UTC
pierlux mentioned in http://mail.gnome.org/archives/desktop-devel-list/2009-May/msg00110.html that libchamplain is managing its own cache of OpenStreetMap tiles right now. Presumably it should eventually be possible for it to use SoupCache and we should make sure of that later on when this is closer to done.

Random related thought while pondering that: SoupCache should have a "maximum size" property.
Comment 12 Baybal Ni 2009-09-02 14:03:14 UTC
Does anybody work on this now?
Comment 13 Xan Lopez 2009-09-02 14:10:14 UTC
(In reply to comment #12)
> Does anybody work on this now?

The latest code is in the 'cache' branch in libsoup git. It's on hold pending some reestructuring of the libsoup internals that will happen during the 2.30 cycle.
Comment 14 baskion 2009-12-21 18:52:54 UTC
Any idea when 2.30 is expected? If I take the TOT from the master and merge it with cache, is it expected to work ?
Comment 15 Dan Winship 2009-12-29 16:09:14 UTC
(In reply to comment #14)
> Any idea when 2.30 is expected?

March 31

> If I take the TOT from the master and merge it
> with cache, is it expected to work ?

Some of it will work a little, but the cache branch was never fully working
Comment 16 Dan Winship 2009-12-29 16:20:07 UTC
so, after the "SoupURILoader" stuff lands, I think the right way forward on this is to make WebKit use SoupRequest for HTTP as well. For an HTTP request, it would call the as-yet-non-existent soup_request_http_get_message() method to get a SoupMessage from the SoupRequest so it could set up HTTP request headers, etc, as it needs them, but then after that it would call soup_request_send_async() just like in the non-HTTP case, and from that point the flow control within WebKit would be entirely determined by SoupRequest, rather than by SoupMessage signals, so we don't have to worry about faking those signals when substituting in cached data. (SoupRequestHTTP would check the cache, send a conditional GET to validate the cached data if needed, or send a full request if there was nothing good in the cache, and then once it had the data, by whatever means, it would fill in the SoupMessage headers, etc, call the soup_request_send_async() callback, and then provide a GInputStream coming either from the SoupMessage or from a disk file, and WebKit would read from that.)
Comment 17 Alexander V. Butenko 2010-01-17 03:03:02 UTC
some cleanups for caching branch. Dont need to do some casts
Comment 18 Alexander V. Butenko 2010-01-17 03:03:52 UTC
Created attachment 151579 [details] [review]
couple cleanups
Comment 19 Sergio Villar 2010-04-06 19:04:22 UTC
Created attachment 158071 [details] [review]
Adds a get_message method to SoupRequestHttp

Needed by WebKitGtk to set headers and flags in the SoupMessage
Comment 20 Sergio Villar 2010-04-12 15:44:52 UTC
Created attachment 158495 [details] [review]
Sync methods in libsoup new-io branch working

With this patch both sync and async methods of new-io branch are working.

I have used webkit's http tests to verify the implementation and seems to be working fine so far.
Comment 21 Sergio Villar 2010-04-12 22:06:25 UTC
(In reply to comment #20)
> Created an attachment (id=158495) [details] [review]
> Sync methods in libsoup new-io branch working
> 
> With this patch both sync and async methods of new-io branch are working.
> 
> I have used webkit's http tests to verify the implementation and seems to be
> working fine so far.

xmlhttp tests that use sync connections are still failing, let's see why
Comment 22 Sergio Villar 2010-04-19 17:05:10 UTC
I've just uploaded my work related to caching support to gitoriuos. You can check it out it here:

http://gitorious.org/libsoup-io-cache/libsoup-io-cache
Comment 23 Sergio Villar 2010-07-13 18:47:16 UTC
Several fixes added related to cache. I have now a more or less reliable SoupCache working with WebKitGtk+

Sniffing a bit network traffic shown that we can go from 2Mb of exchanged packets (for a 5 tab browse session) down to 250kb, something that's pretty impressive IMHO (and very useful for slow connections)
Comment 24 Sergio Villar 2010-07-23 11:26:17 UTC
Some more changes to the cache, I added a new API to set/get the maximum cache size.

After that I added a LRU replacement mechanism in cache to support the maximum cache size.
Comment 25 Sergio Villar 2010-07-23 11:27:43 UTC
Comment on attachment 158495 [details] [review]
Sync methods in libsoup new-io branch working

See the gitorious branch for latest code
Comment 26 Sergio Villar 2010-07-23 11:28:17 UTC
Comment on attachment 158071 [details] [review]
Adds a get_message method to SoupRequestHttp

See the gitorious branch for latest code
Comment 27 Dan Winship 2010-07-30 09:42:51 UTC
Xan and I talked about this a little bit this week. The problem is that the underlying new-io code is just not ready to be committed to master (and become vaguely-api-stable). However, the new-io stuff is mostly separate from the rest of libsoup, and it ought to be possible to just temporarily import it into webkitgtk (with some renaming to avoid future namespace clashes).

diffing Sergio's branch against master, I see:

  - soup-cache, tpl: These don't include any private headers, so there
    shouldn't be any problem moving them out of the libsoup tree

  - soup_message_headers_get_headers_type: This could be upstreamed, but
    it isn't really needed anyway, since it will always be
    SOUP_MESSAGE_HEADERS_RESPONSE in the situations where SoupCache calls
    it.

  - SoupMessage changes: I don't think these are needed. I think it's
    just leftover from when SoupRequest was an interface implemented by
    SoupMessage rather than SoupRequestHTTP being its own class.

  - SoupSession uri-loader stuff: doesn't actually depend on being part
    of SoupSession. This could all be moved outside of SoupSession

  - soup_session_get_cache: Again, this would just be moved out of
    SoupSession

  - SoupSessionSync changes: I think these are wrong, but anyway,
    WebKitGTK doesn't use SoupSessionSync, so we don't need them anyway

  - SoupURI file: decoding changes: What was this change for? It seems
    like it shouldn't be necessary.

  - uri_decoded_copy: Probably we should just get rid of the "fixup" flag
    and always fix broken %-encoding, in which case soup-request-data
    could just use soup_uri_decode() instead of needing
    uri_decoded_copy()
Comment 28 Sergio Villar 2010-08-10 10:36:31 UTC
(In reply to comment #27)
> Xan and I talked about this a little bit this week. The problem is that the
> underlying new-io code is just not ready to be committed to master (and become
> vaguely-api-stable). However, the new-io stuff is mostly separate from the rest
> of libsoup, and it ought to be possible to just temporarily import it into
> webkitgtk (with some renaming to avoid future namespace clashes).

OK, looks like a plan

> diffing Sergio's branch against master, I see:
> 
>   - soup-cache, tpl: These don't include any private headers, so there
>     shouldn't be any problem moving them out of the libsoup tree

We have an issue here, if we move it to webkitgtk we'd have to expose that API to clients because we don't want webkitgtk to automatically use the cache. And clients have to set some other parametters like the location of the cache for example.
 
>   - soup_message_headers_get_headers_type: This could be upstreamed, but
>     it isn't really needed anyway, since it will always be
>     SOUP_MESSAGE_HEADERS_RESPONSE in the situations where SoupCache calls
>     it.
> 
>   - SoupMessage changes: I don't think these are needed. I think it's
>     just leftover from when SoupRequest was an interface implemented by
>     SoupMessage rather than SoupRequestHTTP being its own class.

yeah seem useless
 
>   - SoupSession uri-loader stuff: doesn't actually depend on being part
>     of SoupSession. This could all be moved outside of SoupSession

Where do you suggest to put all that stuff into?

>   - soup_session_get_cache: Again, this would just be moved out of
>     SoupSession

Was only there for avoiding continuous calls to soup_session_get_feature, but indeed easy to remove

>   - SoupSessionSync changes: I think these are wrong, but anyway,
>     WebKitGTK doesn't use SoupSessionSync, so we don't need them anyway

Haven't touched anything there I think

>   - SoupURI file: decoding changes: What was this change for? It seems
>     like it shouldn't be necessary.

Apparently it is, I came across it when a WK test started to fail.  The reason is that for example "%25.txt" is a valid file name and should not be interpreted as  "%.txt"
 
>   - uri_decoded_copy: Probably we should just get rid of the "fixup" flag
>     and always fix broken %-encoding, in which case soup-request-data
>     could just use soup_uri_decode() instead of needing
>     uri_decoded_copy()

That would break API of soup_uri_normalize, wouldn't it?
Comment 29 Dan Winship 2010-08-10 12:46:32 UTC
(In reply to comment #28)
> >   - soup-cache, tpl: These don't include any private headers, so there
> >     shouldn't be any problem moving them out of the libsoup tree
> 
> We have an issue here, if we move it to webkitgtk we'd have to expose that API
> to clients because we don't want webkitgtk to automatically use the cache. And
> clients have to set some other parametters like the location of the cache for
> example.

hm... well, that could be done with a webkit-level API

> >   - SoupSession uri-loader stuff: doesn't actually depend on being part
> >     of SoupSession. This could all be moved outside of SoupSession
> 
> Where do you suggest to put all that stuff into?

it would just be internal to webkit somewhere.

> >   - SoupURI file: decoding changes: What was this change for? It seems
> >     like it shouldn't be necessary.
> 
> Apparently it is, I came across it when a WK test started to fail.  The reason
> is that for example "%25.txt" is a valid file name and should not be
> interpreted as  "%.txt"

Can you point me to the specific test?

> That would break API of soup_uri_normalize, wouldn't it?

Eh. The exact behavior on malformed URIs is not really well-defined. :)
Comment 30 Sergio Villar 2010-08-11 09:03:48 UTC
(In reply to comment #29)
> (In reply to comment #28)
> > >   - soup-cache, tpl: These don't include any private headers, so there
> > >     shouldn't be any problem moving them out of the libsoup tree
> > 
> > We have an issue here, if we move it to webkitgtk we'd have to expose that API
> > to clients because we don't want webkitgtk to automatically use the cache. And
> > clients have to set some other parametters like the location of the cache for
> > example.
> 
> hm... well, that could be done with a webkit-level API

Yeah I know, just let's see what webkitgtk+ guys say about this "born to be killed" API :-)
 
> > >   - SoupSession uri-loader stuff: doesn't actually depend on being part
> > >     of SoupSession. This could all be moved outside of SoupSession
> > 
> > Where do you suggest to put all that stuff into?
> 
> it would just be internal to webkit somewhere.

OK just asked for some suggestion :-).

> > >   - SoupURI file: decoding changes: What was this change for? It seems
> > >     like it shouldn't be necessary.
> > 
> > Apparently it is, I came across it when a WK test started to fail.  The reason
> > is that for example "%25.txt" is a valid file name and should not be
> > interpreted as  "%.txt"
> 
> Can you point me to the specific test?

Sure, it's LayoutTests/fast/encoding/percent-escaping.html
Comment 31 Xan Lopez 2010-08-11 09:07:36 UTC
(In reply to comment #30)
> > hm... well, that could be done with a webkit-level API
> 
> Yeah I know, just let's see what webkitgtk+ guys say about this "born to be
> killed" API :-)
> 

As for myself, I can live with a short-lived API (that we would deprecate in favor of the libsoup equivalent when it's ready) if that gets my HTTP caching ;)
Comment 32 Gustavo Noronha (kov) 2010-08-11 12:55:03 UTC
(In reply to comment #31)
> As for myself, I can live with a short-lived API (that we would deprecate in
> favor of the libsoup equivalent when it's ready) if that gets my HTTP caching
> ;)

+1 - specially if we document it properly =)
Comment 33 Ross Burton 2010-08-13 14:32:15 UTC
I'd really like to use libsoup's caching rather than rolling my own, so what is the problem with the new-io branch that is stopping it reach master?  If it's nothing too serious then using this branch might be an option, living with the API churn until it's ready.
Comment 34 Sergio Villar 2010-08-19 14:21:30 UTC
I've just created a bug in webkit bz, will post a patch there soon.

https://bugs.webkit.org/show_bug.cgi?id=44261
Comment 35 Sergio Villar 2010-09-23 17:14:32 UTC
Uploaded today several changes in the cache stuff. I'm cherry-picking all the changes/improvements in webkit patch to this branch in order to keep it up-to-date.

A lot of changes mainly related to optimizations and several fixes regarding LRU algorithm for discarding cache entries and cache behaviour the cache limits are reached. Cache should be far more robust right now.
Comment 36 Sergio Villar 2010-10-20 15:58:09 UTC
Patches landed in WebKitGtk+. I'll try to update the gitorious branch with the latest changes I committed only in the WebKitGtk+ imported code
Comment 37 Dan Winship 2010-12-10 15:09:48 UTC
This has now been moved to libsoup, but it is currently protected by #ifdef
LIBSOUP_USE_UNSTABLE_REQUEST_API. It doesn't support SoupSessionSync, and there
are network performance problems in epiphany currently that may be caused by
this code (or more specifically, the SoupRequest stuff from bug 557777). So, use with caution. Hopefully to be fixed for GNOME 3.0.
Comment 38 Dan Winship 2011-09-22 19:00:27 UTC
as with the SoupURILoader bug, this is still considered unstable, but there's no reason to keep this bug open