After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 682112 - SoupCache: make it "streamable"
SoupCache: make it "streamable"
Status: RESOLVED FIXED
Product: libsoup
Classification: Core
Component: API
unspecified
Other Linux
: Normal normal
: ---
Assigned To: Sergio Villar
libsoup-maint@gnome.bugs
Depends on:
Blocks:
 
 
Reported: 2012-08-17 17:03 UTC by Sergio Villar
Modified: 2017-12-01 09:44 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Added new SoupCacheInputStream (10.04 KB, patch)
2012-08-17 17:52 UTC, Sergio Villar
none Details | Review
Added SoupContentProcessor interface (4.04 KB, patch)
2012-08-17 17:53 UTC, Sergio Villar
none Details | Review
SoupCache implements the SoupContentProcessor interface (7.59 KB, patch)
2012-08-17 17:55 UTC, Sergio Villar
none Details | Review
SoupContentSnifferStream: implement can_poll() & cleanup (5.31 KB, patch)
2012-08-17 17:56 UTC, Sergio Villar
committed Details | Review
Added soup_content_decoder_get_decoders_for_msg() (4.18 KB, patch)
2012-08-17 17:58 UTC, Sergio Villar
none Details | Review
SoupCache: use a SoupCacheInputStream to write data (1.36 KB, patch)
2012-08-17 17:59 UTC, Sergio Villar
none Details | Review
SoupCache: use decoders & sniffers in send_response() (2.05 KB, patch)
2012-08-17 18:05 UTC, Sergio Villar
needs-work Details | Review
SoupCache: remove old writing stuff (17.30 KB, patch)
2012-08-17 18:07 UTC, Sergio Villar
none Details | Review
SoupCacheInputStream (12.77 KB, patch)
2012-09-04 15:54 UTC, Sergio Villar
reviewed Details | Review
SoupContentProcessor interface (5.24 KB, patch)
2012-09-04 15:56 UTC, Sergio Villar
reviewed Details | Review
SoupContentProcessor support to soup-message-io (5.82 KB, patch)
2012-09-04 16:00 UTC, Sergio Villar
reviewed Details | Review
SoupContentDecoder is now a SoupContentProcessor (7.75 KB, patch)
2012-09-04 16:03 UTC, Sergio Villar
reviewed Details | Review
SoupContentSniffer is now a SoupContentProcessor (2.54 KB, patch)
2012-09-04 16:04 UTC, Sergio Villar
reviewed Details | Review
SoupCache is now a SoupContentProcessor (6.52 KB, patch)
2012-09-04 16:08 UTC, Sergio Villar
reviewed Details | Review
SoupCache: remove old writing stuff (17.90 KB, patch)
2012-09-04 16:11 UTC, Sergio Villar
reviewed Details | Review
SoupCacheInputStream (12.52 KB, patch)
2012-09-10 12:01 UTC, Sergio Villar
none Details | Review
SoupContentProcessor interface (5.11 KB, patch)
2012-09-10 12:07 UTC, Sergio Villar
committed Details | Review
SoupContentProcessors support in SoupMessage, SoupContentDecoder & SoupContentSniffer (16.92 KB, patch)
2012-09-10 12:10 UTC, Sergio Villar
committed Details | Review
SoupCache is now a SoupContentProcessor (26.15 KB, patch)
2012-09-10 12:14 UTC, Sergio Villar
none Details | Review
SoupCacheInputStream (13.84 KB, patch)
2012-09-12 14:17 UTC, Sergio Villar
needs-work Details | Review
SoupCache is now a SoupContentProcessor (24.55 KB, patch)
2012-09-12 14:20 UTC, Sergio Villar
reviewed Details | Review
SoupCacheInputStream: new input stream filter that writes to the cache (13.51 KB, patch)
2012-12-12 12:27 UTC, Dan Winship
committed Details | Review
SoupCache: add SoupContentProcessor support (24.53 KB, patch)
2012-12-12 12:28 UTC, Dan Winship
committed Details | Review

Description Sergio Villar 2012-08-17 17:03:19 UTC
Once upon a time there as a SoupCache that was implemented using the well known "got-headers", "got-body", "got-chunk"... signals from the SoupMessage. Although we more or less managed to get it working, it was a huge hack as we had to simulate all those signals from the cache when retrieving locally stored data in order not to break applications.

Apart from that there were some other issues, for example the cache was storing the decoded data from network, so we were wasting a lot of disk space. Also content sniffing was never properly implemented for cached resources.

The recent effort to create a stack of streams to process data from network gives us an unique opportunity to rewrite part of the cache and integrate it in this new "pipeline".

After some afternoons working on a prototype I managed to get an (incomplete) implementation of the cache integrated in the streams stack that works smoothly so far and that heavily simplifies the SoupCache messy code. I'm early uploading here the patches to check with Dan & Co what they think about the approach I followed.

This is the current status:
* raw data from network is cached by a SoupCacheInputStream. In order to simplify this, data is written synchronously to disk.
* requires the SoupBodyInputStream to be pollable (addressed in bug 665884)
* writing data to disk is not cancellable ATM. I am not sure that we were using it anyway but can be added later
* this is how the "stream pipeline" looks like:
- from network:
SoupContentSnifferStream->SoupConverterWrapper*->SoupCacheInputStream->SoupBodyInputStream
-from cache:
SoupContentSnifferStream->SoupConverterWrapper*->SoupBodyInputStream

this means that content sniffing and content decoding work independently of the origin of data
Comment 1 Sergio Villar 2012-08-17 17:52:22 UTC
Created attachment 221650 [details] [review]
Added new SoupCacheInputStream

New input stream that transparently reads data from a base stream and writes it to a given output stream.
Comment 2 Sergio Villar 2012-08-17 17:53:31 UTC
Created attachment 221651 [details] [review]
Added SoupContentProcessor interface

Interface to be implemented by those streams that process data from a base stream without modifying it.
Comment 3 Sergio Villar 2012-08-17 17:55:30 UTC
Created attachment 221652 [details] [review]
SoupCache implements the SoupContentProcessor interface

soup-cache & soup-message-io: added SoupContentProcessor support
    
SoupCache implements the SoupContentProcessor interface. It adds itself to
the list of content processort in the SoupMessage in the callback of
got-headers.
    
Also SoupMessageIO uses those processors to properly setup the stack of
streams used to read a particular resource.
Comment 4 Sergio Villar 2012-08-17 17:56:56 UTC
Created attachment 221653 [details] [review]
SoupContentSnifferStream: implement can_poll() & cleanup

Done in terms of the base_stream can_poll() interface. It was not needed to
read resources from network as the underlying streams were all pollable. Now
we will use it also to sniff locally cached resources. The file streams are
not pollable so we must implement the can_poll() in order to use "normal"
read() calls instead of the read_nonblocking().
    
This patch also removes the SoupMessage argument from various sniffing
functions as it isn't really used for sniffing.
Comment 5 Sergio Villar 2012-08-17 17:58:12 UTC
Created attachment 221654 [details] [review]
Added soup_content_decoder_get_decoders_for_msg()

This "private" function is used by the SoupContentDecoder to fill in the
SoupMessage with the proper decoders. It will be also used by the SoupCache
when creating the stream stack that will read a cached resource.
Comment 6 Sergio Villar 2012-08-17 17:59:29 UTC
Created attachment 221655 [details] [review]
SoupCache: use a SoupCacheInputStream to write data
Comment 7 Sergio Villar 2012-08-17 18:05:27 UTC
Created attachment 221657 [details] [review]
SoupCache: use decoders & sniffers in send_response()

Instead of sending a plain GFileInputStream for cached resources, wrap it using the appropriate decoder streams and sniffer stream.
Comment 8 Sergio Villar 2012-08-17 18:07:09 UTC
Created attachment 221658 [details] [review]
SoupCache: remove old writing stuff

The patch removes all the methods required to cache network data from the SoupBuffers as they aren't needed anymore.

Also added the callbacks for the two signals coming from the SoupCacheInputStream notifying about success or error caching the resources.
Comment 9 Sergio Villar 2012-08-17 18:07:38 UTC
That's all folks! :)
Comment 10 Dan Winship 2012-08-18 16:19:47 UTC
Comment on attachment 221650 [details] [review]
Added new SoupCacheInputStream

>+soup_cache_input_stream_dispose (GObject *object)
>+{
>+	static gboolean dispose_has_run = FALSE;
>+	SoupCacheInputStream *self = (SoupCacheInputStream *)object;
>+	SoupCacheInputStreamPrivate *priv = self->priv;
>+
>+	if (dispose_has_run)
>+		return;

Hm? This means only the first SoupCacheInputStream to be disposed will actually unref its priv->output_stream...

You shouldn't need any is-already-disposed check; g_clear_object() is a no-op the if the object gets disposed again.
Comment 11 Dan Winship 2012-08-18 16:25:57 UTC
Comment on attachment 221652 [details] [review]
SoupCache implements the SoupContentProcessor interface

>+soup_cache_content_processor_wrap (SoupContentProcessor *processor,
>+				   GInputStream *base_stream,
>+				   SoupMessage *msg)
>+{
>+}

So it adds a no-op implementation? I don't think that's a good way to break up the patches

>+	for (p = priv->content_processors; p; p = p->next)
>+		istream = soup_content_processor_wrap (p->data, istream, msg);
> 
> 	for (d = priv->decoders; d; d = d->next) {

Likewise, it seems weird to not implement SoupContentProcessor in SoupContentDecoder and SoupContentSniffer at this point. Although then you'll need to define a priority system as well...
Comment 12 Dan Winship 2012-08-18 16:27:20 UTC
Comment on attachment 221653 [details] [review]
SoupContentSnifferStream: implement can_poll() & cleanup

good, although it would be even better as two separate patches
Comment 13 Dan Winship 2012-08-18 16:30:40 UTC
Comment on attachment 221654 [details] [review]
Added soup_content_decoder_get_decoders_for_msg()

I think with SoupContentProcessor we should be able to get rid of priv->decoders; that was only needed as a way for SoupContentDecoder to communicate with soup-message-io, but if soup-message-io is calling SoupContentDecoder's wrap() method instead, then SoupContentDecoder can just keep all of that info internally...
Comment 14 Dan Winship 2012-08-18 16:33:31 UTC
Comment on attachment 221655 [details] [review]
SoupCache: use a SoupCacheInputStream to write data

right, so following up on my earlier comment, rather than including a dummy implementation in the original SoupContentProcessor patch, and then filling it in here, you should move the implementation stuff from that patch here.

>+	if (!output_stream)
>+		return g_object_ref (base_stream);

Hm... is that really the behavior we want?

It seems like maybe in general we might want _wrap() to be able to return a GError?
Comment 15 Dan Winship 2012-08-18 16:34:29 UTC
Comment on attachment 221657 [details] [review]
SoupCache: use decoders & sniffers in send_response()

Yeah, this is no good. We shouldn't need all that duplicated code
Comment 16 Dan Winship 2012-08-18 16:50:46 UTC
(In reply to comment #11)
> Likewise, it seems weird to not implement SoupContentProcessor in
> SoupContentDecoder and SoupContentSniffer at this point. Although then you'll
> need to define a priority system as well...

So, the decoding stages would seem to be:

   SOUP_LEVEL_MESSAGE_BODY,      /* Raw network data */
   SOUP_LEVEL_TRANSFER_ENCODING, /* SoupBodyInputStream is here */
   SOUP_LEVEL_ENTITY_BODY,       /* Has Transfer-Encoding removed */
   SOUP_LEVEL_CONTENT_ENCODING,  /* SoupContentDecoder works here */
   SOUP_LEVEL_BODY_DATA          /* Actual body data */

SoupContentSniffer would operate at SOUP_LEVEL_BODY_DATA, SoupCache would operate at SOUP_LEVEL_ENTITY_BODY.

soup_message_setup_body_istream() would wrap() all registered processors in order from MESSAGE_BODY innermost to BODY_DATA outermost. To solve the "read from cache" problem, you could add another argument to it indicating what level to start at; then soup-message-io would pass MESSAGE_BODY, but SoupCache would pass CONTENT_ENCODING.
Comment 17 Dan Winship 2012-08-18 16:52:42 UTC
(in general, this is great though)
Comment 18 Sergio Villar 2012-08-20 08:47:49 UTC
Review of attachment 221653 [details] [review]:

Committed in master as 1b812ae and 6d370df
Comment 19 Sergio Villar 2012-08-21 17:49:49 UTC
(In reply to comment #15)
> (From update of attachment 221657 [details] [review])
> Yeah, this is no good. We shouldn't need all that duplicated code

So, now it's time to think how to do it. Decoders and sniffers will be SoupContentProcessors and thus, they will add themselves to the list of content processors on "got-headers" callback. Thing is that they are all SoupSessionFeatures and they connect to that signal when the SoupMessage is queued. That won't happen for resources returned by the cache so that isn't a valid approach.

Summing up, I think that the only thing we could do is to get the list of decoders and sniffers for a given message, add them all to the list of SoupMessage's content processors and then let setup_message_body() wrap them all.
Comment 20 Dan Winship 2012-08-21 18:00:43 UTC
I was thinking more that soup-message-io would do something like:

  processors = soup_session_get_features (io->item->session, SOUP_TYPE_CONTENT_PROCESSOR);
  foreach (p in processors) {
      if (!soup_message_disables_feature (msg, p))
          soup_content_processor_wrap (p, ...);
  }

except it would have to sort them in the right order too

(possibly there should be soup_session_get_features_for_message())
Comment 21 Sergio Villar 2012-09-04 15:54:37 UTC
Created attachment 223439 [details] [review]
SoupCacheInputStream

New version of the SoupCacheInputStream. These are the main changes from the previous version:

* added API to stop caching
* now using asynchronous writing
* added code to verify that buffers are completely written
Comment 22 Sergio Villar 2012-09-04 15:56:21 UTC
Created attachment 223441 [details] [review]
SoupContentProcessor interface

New version of the content processor interface:

* Added the SoupDecodingStage
* Added soup_content_processor_get_decoding_stage

to be used to properly sort content processors
Comment 23 Sergio Villar 2012-09-04 16:00:59 UTC
Created attachment 223442 [details] [review]
SoupContentProcessor support to soup-message-io

Some changes to soup_message_setup_body_istream():

* soup-message-io now calls the wrap() method for every available content processor sorting them by SoupDecodingStage.
* added a new parameter called start_at that discards content processors with decoding stages < start_at.
Comment 24 Sergio Villar 2012-09-04 16:03:00 UTC
Created attachment 223443 [details] [review]
SoupContentDecoder is now a SoupContentProcessor

SoupContentDecoder implements now the SoupContentProcessor interface. With this change we can get rid of priv->decoders in SoupMessage.

SoupContentDecoder works at SOUP_LEVEL_CONTENT_ENCODING stage.
Comment 25 Sergio Villar 2012-09-04 16:04:36 UTC
Created attachment 223444 [details] [review]
SoupContentSniffer is now a SoupContentProcessor

SoupContentSniffer is now a SoupContentProcessor.

It works at SOUP_LEVEL_BODY_DATA decoding stage.
Comment 26 Sergio Villar 2012-09-04 16:08:12 UTC
Created attachment 223445 [details] [review]
SoupCache is now a SoupContentProcessor

SoupCache implements the SoupContentProcessor interface. Several changes from the previous version:

* removed duplicated code in send_response(), the body istream is now only created in soup_message_setup_body_istream()
* the cache works at SOUP_LEVEL_ENTITY_BODY decoding stage
* use soup_message_disable_feature() in order not to process non-cacheable resources (and also conditional requests)
Comment 27 Sergio Villar 2012-09-04 16:11:14 UTC
Created attachment 223447 [details] [review]
SoupCache: remove old writing stuff

Some changes from the previous version:

* use soup_message_disable_feature() in order not to cache non cacheable resources
* refactored resource_cached_cb() and resource_caching_failed_cb(). Both of them had a common teardown code.
Comment 28 Sergio Villar 2012-09-04 16:13:29 UTC
These are the new patches. They came after some time because I was on holidays.
Comment 29 Dan Winship 2012-09-09 20:32:44 UTC
Comment on attachment 223439 [details] [review]
SoupCacheInputStream

>+ * soup-cache-input-stream.c - Source for SoupCacheInputStream

hm?

>+	priv->bytes_read = priv->bytes_written = 0;

priv is initialized to all-bytes-0, so the only field you need to initialize is priv->buffer_queue.

>+static gssize
>+soup_cache_input_stream_read_nonblocking (GPollableInputStream  *stream,

for future-proofing I think you should implement plain read() too.
Comment 30 Dan Winship 2012-09-09 20:39:42 UTC
Comment on attachment 223441 [details] [review]
SoupContentProcessor interface

>+static SoupDecodingStage
>+soup_content_processor_real_get_content_stage (SoupContentProcessor *processor)
>+{
>+	return SOUP_LEVEL_INVALID;
>+}

Since this is constant for any given processor, I'd rather have it as a class member, like SoupAuthClass->scheme_name.

You can keep soup_content_processor_get_content_stage() as an accessor method for it though.

>+} SoupDecodingStage;

This may eventually be used for output stream processing as well, so it shouldn't have "decoding" in its name. How about "SoupProcessingStage"? Also, the constants should probably use STAGE rather than LEVEL, to match the enum name.

Hm... of course, wrap() takes a GInputStream, so it can't actually be used for output too... I guess rename wrap() to wrap_input(), and later we can add wrap_output().
Comment 31 Dan Winship 2012-09-09 20:43:31 UTC
Comment on attachment 223442 [details] [review]
SoupContentProcessor support to soup-message-io

looks right
Comment 32 Dan Winship 2012-09-09 20:46:13 UTC
(In reply to comment #31)
> (From update of attachment 223442 [details] [review])
> looks right

Well, although, it's nice if everything still works right at every commit, because otherwise it makes it harder to "git bisect". So probably this patch and the next two (SoupContentDecoder and SoupContentSniffer) should be squashed together.
Comment 33 Dan Winship 2012-09-09 20:49:40 UTC
Comment on attachment 223443 [details] [review]
SoupContentDecoder is now a SoupContentProcessor

>SoupContentDecoder adds itself to the list of the SoupMessage content
>processors in the "got-headers" callback and then wraps the given base
>stream with a list of decoders when required.

Need to update to match the new code. The rest of it looks good.
Comment 34 Dan Winship 2012-09-09 20:51:56 UTC
Comment on attachment 223444 [details] [review]
SoupContentSniffer is now a SoupContentProcessor

>SoupContentSniffer adds itself as content processor at the
>SOUP_LEVEL_BODY_DATA stage.

It doesn't really "add itself". "is" works.

>+static GInputStream *
>+soup_content_sniffer_content_processor_wrap (SoupContentProcessor *processor,
>+					     GInputStream *base_stream,
>+					     SoupMessage *msg,
>+					     GError **error)
>+{
>+	return soup_content_sniffer_stream_new (SOUP_CONTENT_SNIFFER (processor), msg, base_stream);
>+}

just inline the code, and remove soup_content_sniffer_stream_new(), since it's not needed any more.
Comment 35 Dan Winship 2012-09-09 21:02:39 UTC
Comment on attachment 223445 [details] [review]
SoupCache is now a SoupContentProcessor

This seems too simple... :-)

>-	SoupCacheEntry *entry;
>+	SoupCacheEntry *entry = NULL;

not really needed

>+	if (!(cacheable & SOUP_CACHE_CACHEABLE))
>+		soup_message_disable_feature (msg, SOUP_TYPE_CACHE);

hm... but that will stick with the message, so that, eg, an uncacheable 401 response would still be marked uncacheable when it got resent after authenticating. (This applies to all the places you use disable_feature().)

In this case, you don't need to disable the cache; just make wrap() be a no-op if the response isn't cacheable.

(Can you get away with moving all of msg_got_headers_cb() into wrap() ?)

>+	/* Create the body stream. */
>+	soup_message_disable_feature (msg, SOUP_TYPE_CACHE);
>+	cache_stream = soup_message_setup_body_istream (body_stream, msg,

should be "/* Create the cache stream. */" ?


Doesn't this patch leave it so that everything gets cached twice? (Since all the old signal handlers are still connected too.) Probably should be merged with the next patch.
Comment 36 Dan Winship 2012-09-09 21:04:57 UTC
Comment on attachment 223447 [details] [review]
SoupCache: remove old writing stuff

look at all those minus signs!

>+	if (soup_message_headers_get_encoding (entry->headers) == SOUP_ENCODING_CHUNKED) {

SOUP_ENCODING_EOF will have the same problem
Comment 37 Sergio Villar 2012-09-10 12:01:26 UTC
Created attachment 223898 [details] [review]
SoupCacheInputStream

In this new version:
* removes unneeded initializations
* removed priv->bytes_read
* implemented read()
Comment 38 Sergio Villar 2012-09-10 12:07:17 UTC
Created attachment 223899 [details] [review]
SoupContentProcessor interface

In this new version:
* renamed SoupDecodingStage -> SoupProcessingStage
* removed virtual get_decoding_stage()
* processing_stage is now an interface member
* renamed wrap() to wrap_input()
* default wrap_input() implementation returns NULL instead of a new reference to the base stream
Comment 39 Sergio Villar 2012-09-10 12:10:52 UTC
Created attachment 223900 [details] [review]
SoupContentProcessors support in SoupMessage, SoupContentDecoder & SoupContentSniffer

In this new version:
* SoupContentDecoder, SoupContentSniffer and SoupMessage changes merged in a single patch
* renamed wrap() to wrap_input()
* moved processing stage to the interface initialization
* removed soup_content_sniffer_stream_new()
Comment 40 Sergio Villar 2012-09-10 12:14:42 UTC
Created attachment 223902 [details] [review]
SoupCache is now a SoupContentProcessor

In this new version:
* SoupContentProcessor changes merged with the old writing code removal
* return NULL in wrap() for non cacheable messages instead of disabling the cache feature
* moved msg_got_headers_cb() code to wrap_input()
* request_time and response_time temporarily stored in the SoupMessage. Allows us to remove a helper struct.
Comment 41 Sergio Villar 2012-09-12 08:59:08 UTC
(In reply to comment #40)
> Created an attachment (id=223902) [details] [review]
> SoupCache is now a SoupContentProcessor
> 
> In this new version:
> * SoupContentProcessor changes merged with the old writing code removal
> * return NULL in wrap() for non cacheable messages instead of disabling the
> cache feature
> * moved msg_got_headers_cb() code to wrap_input()
> * request_time and response_time temporarily stored in the SoupMessage. Allows
> us to remove a helper struct.

There are a couple of things I am not comfortable with yet:
* the use of 2 signals to notify the finish of the caching operation
* lack of cancellation support
* we're using the synchronous g_file_replace() because we need to return something in wrap_input()

I think we could fix that by doing something like:
    file = get_file_from_entry (cache, entry);
    istream = soup_cache_input_stream_new (base_stream);
    soup_cache_input_stream_cache (istream, file, entry->cancellable, cache_cb, user_data);

and then we'll handle the result of the cache operation in "cache_cb" with the typical soup_cache_input_stream_cache_finish(). That means:
* using a well-known pattern for handling an asynchronous operation
* the SoupCacheInputStream will internally use g_file_replace_async() to get a GOutputStream. We don't have any limitation there because we already store SoupBuffers until we can write them.
* we wouldn't need the signals anymore, the way I was using them didn't ever look "natural" to me.
* we would get caching cancellation "for free", as we could use the already existing entry->cancellable

What do you think?
Comment 42 Sergio Villar 2012-09-12 14:17:01 UTC
Created attachment 224121 [details] [review]
SoupCacheInputStream

New in this version:

* new method: soup_cache_input_stream_cache(): starts the caching of data, and provides a mechanism for the client to cancel it and to get notified once the operation finishes
* new method: soup_cache_input_stream_cache_finish(): to be called by the callback passed to the above method. Will return the number of bytes written (or 0 if error)
* removed the old "cached" and "failed" signals
* the SoupCacheInputStream is the one asynchronously creating the GOutputStream from a GFile provided by the client.
Comment 43 Sergio Villar 2012-09-12 14:20:27 UTC
Created attachment 224122 [details] [review]
SoupCache is now a SoupContentProcessor

In this new version:

* the SoupCache calls soup_cache_input_stream_cache/_finish() to trigger/complete the caching of a new resource.
* removed the handlers for the old "cached" and "failed" stream signals
* caching cancellation is back in place
Comment 44 Dan Winship 2012-10-29 14:32:23 UTC
Comment on attachment 223899 [details] [review]
SoupContentProcessor interface

>New SoupContentProcessor interface. It defines a _wrap_input() function which
>implementors will use to add their own stream on top of the given base
>stream. Content processors do not modify data retrieved from the base stream
>by definition.

Remove that last sentence; in some cases you may be dealing with a known-broken server, and want to insert a SoupContentProcessor to fix up its response so libsoup will parse it the way you want. Eg, see bug 588658 and bug 588660.

>+static GInputStream*
>+soup_content_processor_real_wrap_input (SoupContentProcessor *processor,

space before "*" on the first line


OK to commit with those changes.
Comment 45 Dan Winship 2012-10-29 14:44:12 UTC
Comment on attachment 223900 [details] [review]
SoupContentProcessors support in SoupMessage, SoupContentDecoder & SoupContentSniffer

>either downloaded from the network or read from a local cached file. The
>list of decoders in SoupMessage was removed from priv and replaced by a list
>of processors.

That last part isn't true any more. Everything else is good.
Comment 46 Dan Winship 2012-10-29 15:50:15 UTC
Comment on attachment 224121 [details] [review]
SoupCacheInputStream

Hm... I'm not sure if I like it better with the signals or the async callback.

Thinking out loud: SoupClientInputStream::eof is another similar "tell me when the request is done" hack... maybe we need something like "soup_message_io_add_completion_callback()", so arbitrary code can register stuff to happen after the request is done (and then the cache can finish its update from there, rather than needing to connect to a signal or call an async method).

Let's stick with the async method for now though.

>+	GSimpleAsyncResult *result;

Eww! GSimpleAsyncResult! The rest of libsoup master is now using GTask, which is much more fun.

>+	GCancellable *cancellable;

Not needed with GTask, because you can always just call g_task_get_cancellable(priv->task)

>+	if (error)
>+		g_simple_async_result_take_error (priv->result, error);
>+
>+	g_simple_async_result_complete (priv->result);

  if (error)
    g_task_return_error (priv->task, error);
  else
    g_task_return_int (priv->task, priv->bytes_written);

>+gsize
>+soup_cache_input_stream_cache_finish (SoupCacheInputStream  *istream,
>+				      GAsyncResult          *result,
>+				      GError               **error)
>+{
          return g_task_propagate_int (G_TASK (result), error);
>+}

>+				     _("Unexpected closed input stream"));

I'd say "Input stream unexpectedly closed". Or even "Network stream unexpectedly closed".

>+	if (error) {
>+		g_simple_async_result_take_error (priv->result, error);
>+		g_simple_async_result_complete (priv->result);
>+	} else {

  if (error)
    g_task_return_error (priv->task, error);

>+	priv->result = g_simple_async_result_new (G_OBJECT (istream), callback, user_data,
>+						  soup_cache_input_stream_cache);
>+
>+	if (cancellable) {
>+		priv->cancellable = g_object_ref (cancellable);
>+		g_simple_async_result_set_check_cancellable (priv->result, cancellable);
>+	}

  priv->task = g_task_new (istream, cancellable, callback, user_data);

>+	g_queue_foreach (priv->buffer_queue, (GFunc) soup_buffer_free, NULL);
>+	g_queue_free (priv->buffer_queue);

  g_queue_free_full (priv->buffer_queue, (GDestroyNotify) soup_buffer_free);

>+	g_output_stream_write_async (priv->output_stream, buffer->data, buffer->length,
>+				     G_PRIORITY_LOW, NULL,
>+				     (GAsyncReadyCallback) write_ready_cb,
>+				     g_object_ref (istream));

If you want the cancellable passed to cache() to actually do anything, you need to pass it to g_output_stream_write_async() here.

Also, maybe use G_PRIORITY_DEFAULT if priv->buffer_queue gets above a certain size, so it doesn't eat up too much memory?

>+	g_object_class_install_property (gobject_class, PROP_OUTPUT_STREAM,
>+					 g_param_spec_object ("output-stream", "Output stream",
>+							      "the output stream where to write.",
>+							      G_TYPE_OUTPUT_STREAM,
>+							      G_PARAM_READWRITE | G_PARAM_STATIC_STRINGS));

and G_PARAM_CONSTRUCT_ONLY.
Comment 47 Dan Winship 2012-10-29 15:54:45 UTC
Comment on attachment 224122 [details] [review]
SoupCache is now a SoupContentProcessor

>+	/* Create the cache stream. */
>+	soup_message_disable_feature (msg, SOUP_TYPE_CACHE);
>+	cache_stream = soup_message_setup_body_istream (body_stream, msg,

is the disable_feature call actually needed? (Does it even have any effect at this point?)

>+static void
>+msg_got_headers_cb (SoupMessage *msg, gpointer user_data)
>+{
>+	g_object_set_data (G_OBJECT (msg), "response-time", GINT_TO_POINTER (time (NULL)));
>+	g_signal_handlers_disconnect_by_func (msg, msg_got_headers_cb, user_data);
> }

maybe we should eventually just keep track of that stuff in soup-message-io / soup-message. WebKit has code to keep track of these things too, for the web timing stuff.
Comment 48 Sergio Villar 2012-10-29 16:39:46 UTC
(In reply to comment #46)
> (From update of attachment 224121 [details] [review])
> Hm... I'm not sure if I like it better with the signals or the async callback.
> 
> Thinking out loud: SoupClientInputStream::eof is another similar "tell me when
> the request is done" hack... maybe we need something like
> "soup_message_io_add_completion_callback()", so arbitrary code can register
> stuff to happen after the request is done (and then the cache can finish its
> update from there, rather than needing to connect to a signal or call an async
> method).
> 
> Let's stick with the async method for now though.
> 
> >+	GSimpleAsyncResult *result;
> 
> Eww! GSimpleAsyncResult! The rest of libsoup master is now using GTask, which
> is much more fun.

Heh, I uploaded this like 2 months ago, the GTask was still under the hood by that time :). I'll rework it whenever I have some time.
Comment 49 Sergio Villar 2012-11-01 22:28:17 UTC
(In reply to comment #33)
> (From update of attachment 223443 [details] [review])
> >SoupContentDecoder adds itself to the list of the SoupMessage content
> >processors in the "got-headers" callback and then wraps the given base
> >stream with a list of decoders when required.
> 
> Need to update to match the new code. The rest of it looks good.

So there is actually one issue in this patch. The problem is that I need the session to get the list of content processors, and I get it from the SoupMessageQueueItem. The problem is that the io->item is NULL for some messages and that makes most of the tests sigsev with this patch.
Comment 50 Dan Winship 2012-11-02 12:12:34 UTC
(In reply to comment #49)
> So there is actually one issue in this patch. The problem is that I need the
> session to get the list of content processors, and I get it from the
> SoupMessageQueueItem. The problem is that the io->item is NULL for some
> messages and that makes most of the tests sigsev with this patch.

Right, those are server-side messages; just change it to only call soup_message_setup_body_istream() if io->mode == SOUP_MESSAGE_IO_CLIENT.

(Some day we'll want to be able to use content processors for server messages too, but we can worry about that later.)
Comment 51 Sergio Villar 2012-11-02 16:57:22 UTC
Comment on attachment 223900 [details] [review]
SoupContentProcessors support in SoupMessage, SoupContentDecoder & SoupContentSniffer

Committed in master c79bf27
Comment 52 Sergio Villar 2012-11-02 16:58:02 UTC
Comment on attachment 223899 [details] [review]
SoupContentProcessor interface

Committed in master: 4bec9eb
Comment 53 Dan Winship 2012-12-12 12:27:56 UTC
Created attachment 231350 [details] [review]
SoupCacheInputStream: new input stream filter that writes to the cache

ported to GTask. not tested
Comment 54 Dan Winship 2012-12-12 12:28:19 UTC
Created attachment 231351 [details] [review]
SoupCache: add SoupContentProcessor support

mostly unchanged, just rebased, etc
Comment 55 Sergio Villar 2012-12-18 10:54:34 UTC
I've been using these last 2 remaining patches for some time and they seem still to work fine. They pass the make check (well it lacks a new entry in POTFILES) and WebKit http tests seem to be happy as well. Were we planning any additional change?
Comment 56 Dan Winship 2012-12-18 13:52:31 UTC
nope. if the webkit tests pass, then let's commit it
Comment 57 Sergio Villar 2012-12-18 15:50:59 UTC
Last bits committed in master
Comment 58 Claudio Saavedra 2017-11-30 16:03:38 UTC
Review of attachment 231351 [details] [review]:

::: libsoup/soup-cache.c
@@ +836,3 @@
+
+	request_time = GPOINTER_TO_INT (g_object_get_data (G_OBJECT (msg), "request-time"));
+static GInputStream*

Sergio, is this a typo? I was reading this code today and I realized that the stored response-time is never used. Should it be

  response_time = GPOINTER_TO_INT (g_object_get_data (G_OBJECT (msg), "response-time"));

?
Comment 59 Sergio Villar 2017-11-30 16:10:51 UTC
Yeah good catch!
Comment 60 Claudio Saavedra 2017-12-01 09:44:35 UTC
Thanks, fixed that in bug 791031.