Bug 656684 – Should support multipart/x-mixed-replace

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 656684 - Should support multipart/x-mixed-replace


Summary:	Should support multipart/x-mixed-replace


Status:	RESOLVED FIXED

Product:	libsoup
Classification:	Core
Component:	HTTP Transport
Version:	2.34.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	libsoup-maint@gnome.bugs
QA Contact:	libsoup-maint@gnome.bugs

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2011-08-16 18:14 UTC by Gustavo Noronha (kov)
Modified:	2012-08-20 19:33 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
draft implementation (5.18 KB, patch) 2011-08-16 18:14 UTC, Gustavo Noronha (kov)	none	Details \| Review
avoid dropping out of got-headers when handling main multipart/x-mixed-replace response (1.36 KB, patch) 2011-08-16 18:26 UTC, Gustavo Noronha (kov)	none	Details \| Review
proposed implementation (22.07 KB, patch) 2011-09-14 23:03 UTC, Gustavo Noronha (kov)	reviewed	Details \| Review
necessary webkit modifications to make it use the new support (1.70 KB, patch) 2011-09-14 23:04 UTC, Gustavo Noronha (kov)	none	Details \| Review
proposed implementation (29.36 KB, patch) 2011-10-17 12:55 UTC, Gustavo Noronha (kov)	needs-work	Details \| Review
draft for the multipart stream (37.94 KB, patch) 2012-01-11 16:07 UTC, Gustavo Noronha (kov)	none	Details \| Review
webkit changes to use the new stuff (7.16 KB, patch) 2012-01-11 17:35 UTC, Gustavo Noronha (kov)	none	Details \| Review
check point (58.23 KB, patch) 2012-01-20 18:47 UTC, Gustavo Noronha (kov)	none	Details \| Review
patch (64.01 KB, patch) 2012-01-26 14:06 UTC, Gustavo Noronha (kov)	none	Details \| Review
alternate patch (64.03 KB, patch) 2012-01-27 14:27 UTC, Gustavo Noronha (kov)	none	Details \| Review
updated patch (38.54 KB, patch) 2012-02-04 00:11 UTC, Gustavo Noronha (kov)	needs-work	Details \| Review
patch updated for the brave new world (40.90 KB, patch) 2012-05-09 11:51 UTC, Gustavo Noronha (kov)	reviewed	Details \| Review
webkit changes (9.04 KB, patch) 2012-05-09 11:54 UTC, Gustavo Noronha (kov)	none	Details \| Review
patch (44.66 KB, patch) 2012-05-24 22:59 UTC, Gustavo Noronha (kov)	none	Details \| Review
patch with Content-Length-based optimization (45.89 KB, patch) 2012-05-25 20:20 UTC, Gustavo Noronha (kov)	reviewed	Details \| Review
patch with comments addressed (46.67 KB, patch) 2012-05-28 22:06 UTC, Gustavo Noronha (kov)	committed	Details \| Review

Description Gustavo Noronha (kov) 2011-08-16 18:14:52 UTC

Created attachment 193983 [details] [review]
draft implementation

This is a "server push" technique that was added to Netscape at some point and is still suprisingly common, specially for online security cameras services, like this one for instance:

http://91.143.176.52/mjpg/video.mjpg

Right now if you try to open this in Epiphany you'll get an infinite stream of garbage that will eventually lead to the OOM handler killing Epiphany.

I started looking at this recently and my approach has been to modify the message I/O state machinery to make soup go through multiple headers->body cycles, after the initial headers are received.

I think this breaks API expectations, so maybe it would need to be protected under a flag or feature, what do you think? I have a working draft and a small patch to WebKit that I'll post in a few to help us discuss the design.

Comment 1 Gustavo Noronha (kov) 2011-08-16 18:26:34 UTC

Created attachment 193985 [details] [review]
avoid dropping out of got-headers when handling main multipart/x-mixed-replace response

Comment 2 Dan Winship 2011-08-16 18:44:46 UTC

"Ew"

Alternate possibility #1: keep the soup-message-io hackery, but emit "restarted" at each boundary (but without calling soup_message_cleanup_response()). That is at least in theory not an API break, though I bet it would still break things...

Alternate possibility #2: don't deal with it at the soup-message-io level. Instead, make a SoupMixedReplaceFilterStream that you wrap the SoupRequest's output stream with, and have the special handling be there. This may be more useful in the all-singing all-dancing jetpacks and flying cars future where libsoup is all gio-like and you can insert arbitrary filters into the mix (bug 591739). There could also be shared code between this and a solution to bug 637845 (although that's kind of the opposite of this).

Comment 3 Gustavo Noronha (kov) 2011-08-16 18:55:09 UTC

Some useful contextual conversations I remember we had on IRC:

Feb 14 14:08:32 <svillar>       danw: is it me or we don't have proper support for multipart/x-mixed-replace content in libsoup?
Feb 14 14:13:09 <danw>  svillar: that's mostly a browser-level thing. although i think there's a bug suggesting it would be nice if soup-multipart could parse-while-streaming instead of needing the whole body ahead of time
Feb 14 14:13:24 <danw>  svillar: but i assume webkit has its own multipart parser for that anyway
Feb 14 14:15:26 <svillar>       yeah I was thinking in a decoder stream or something like that
Feb 14 14:15:51 <svillar>       but it indeed sounds like a browser specific use case
Feb 14 14:18:35 <svillar>       NS library treats them like multiple messages issuing the equivalent to libsoup's got_headers or got_body multiple times
Feb 14 14:19:40 <danw>  i didn't realize people still used it actually. i thought it wasn't supported by all browsers

Legend says Internet Explorer only supports it through a plugin, but everyone else seems to support it. I remember asking about svillar's plans about it at some point as well, but my logs seem to have disappeared. =(

Comment 4 Gustavo Noronha (kov) 2011-08-16 18:57:23 UTC

Thanks Dan =) I'll give alternate possibility #2 some thought, given I like the idea of flying cars quite a bit, but it's good to know we could probably have a non-API-breaking solution with #1!

Comment 5 Gustavo Noronha (kov) 2011-09-14 23:03:44 UTC

Created attachment 196559 [details] [review]
proposed implementation

As we discussed on IRC, I'd like to go ahead and add support for this while we figure out how things will look in the future, since there are some open questions. Here's what I did:

I added a new session feature, SoupMultipartMixedReplace, that just toggles a boolean in SoupMessage's private structure. If the boolean is set to true, then the multipart/x-mixed-replace content type is handled specially, and a flag is set on the message when it is found, to indicate that it is being handled as such. This was necessary to make the message queue do the right thing when handling the restarted signal, but should be useful for applications as well, which is why I made it a SoupMessageFlag.

Comments? Bikeshedding on the naming of the various components would be welcome. Also notice I used 2.37 in the Since: tag, since it's too late for 2.36 I'm afraid.

Comment 6 Gustavo Noronha (kov) 2011-09-14 23:04:47 UTC

Created attachment 196560 [details] [review]
necessary webkit modifications to make it use the new support

Comment 7 Dan Winship 2011-10-11 17:46:40 UTC

Comment on attachment 196559 [details] [review]
proposed implementation

So, one problem I see is that this depends on each multipart part having a Content-Length header. Not sure how common that is. The original "spec" (http://web.archive.org/web/19981203153836/fishcam.netscape.com/assist/net_sites/pushpull.html) doesn't use them. (Multipart types are inherently self-delimiting, so it shouldn't be necessary...) Basically you'd need to scan for the boundary in read_body_chunk() as well as in read_metadata(), and if you find it... hm... fiddle with the iostate to ensure that it doesn't try to read again, truncate the read buffer, and copy the unused part of it to the metadata buffer.


>  * Copyright (C) 2000-2003, Ximian, Inc.
>+ * Copyright (C) 2011 Collabora Ltd.

I suppose I should update those one of these days :-)

>+	if (g_str_has_prefix (data, io->multipart_replace_boundary)) {

The boundary is actually "--" plus the string specified as boundary param. Except apparently lots of people are dumb and include the "--" as part of the boundary param. Yay. Anyway, you need to deal with the boundary param either having the "--" or not having it.

>+		/* Replace the boundary with a response so that we

s/response/Status-Line/

>+		g_byte_array_free (io->read_meta_buf, TRUE);
>+		io->read_meta_buf = g_byte_array_new ();

g_byte_array_set_size (io->read_meta_buf, 0) ?

Comment 8 Gustavo Noronha (kov) 2011-10-17 12:55:19 UTC

Created attachment 199199 [details] [review]
proposed implementation

This makes responses with no Content-Length work (even with no headers at all), and makes the test more complete by checking that the headers we expect are there, and only those. I changed the read function in io_read to read lines like read_metadata for the multipart/x-mixed-replace case, to make detecting the boundary less crazy. This works with the test URL I was using (which seems to be down right now) and with other URLs I found on the web such as this:

http://vefmyndavel.akureyri.is/mjpg/video.mjpg

Comment 9 Gustavo Noronha (kov) 2011-10-17 13:06:04 UTC

A few other links I found in the wild, most of them work! =D

http://vcf.vn.ua:8081/mjpg/video.mjpg
http://vefmyndavel.akureyri.is/mjpg/video.mjpg
http://cam.meteoweb.it:8080/mjpg/video.mjpg (broken, says 'connection terminated unexpectedly', I'll see what's up)
http://webcam.bonhoeffer.nl/mjpg/video.mjpg
http://cam.thesylvaherald.com/mjpg/video.mjpg
http://cf162wc.cs.wwu.edu/mjpg/video.mjpg
http://82.207.68.254/mjpg/video.mjpg

Comment 10 Dan Winship 2011-11-09 15:42:13 UTC

Comment on attachment 199199 [details] [review]
proposed implementation

ok... I don't really like the changes that are needed to soup-message-queue and soup-session-async to handle "restarted" differently, so maybe the previous patch (without "restarted") is better... though that version of the patch would need to be updated for my pending changes to SoupHTTPInputStream and ResourceHandleSoup.

if it matters, it's fine to have this only work when you're using SoupRequest, and thus to be implemented (in part or in full) at the SoupHTTPInputStream / SoupRequestHTTP level rather than entirely in soup-message-io.


Is there any particular rush to get this in, or is it just that you want to make sure it definitely gets in before GNOME 3.4? If the latter, I'd prefer to wait a little bit, since I'll hopefully have the giobased stuff landed by the end of the hackfest, and that ought to make this a bunch more sane (and even if it doesn't make it more sane, it would require rewriting a lot of it anyway, so it would be a little annoying to commit this now and then have to rewrite most of it before it's ever released anyway).

Comment 11 Gustavo Noronha (kov) 2011-11-11 15:45:45 UTC

Yeah, I also don't like those changes. Making sure it definitely gets in before GNOME 3.4 sounds good to me, I was more worried about getting the changes through some bigger testing as early as possible given they were a bit invasive, but it sounds good doing it during the hackfest =D

Comment 12 Gustavo Noronha (kov) 2011-11-14 00:41:26 UTC

Oh, look what I just found: http://blog.whatwg.org/weekly-xbl-observers looks like they're going to officially standardize the thing!

Comment 13 Gustavo Noronha (kov) 2012-01-11 16:07:45 UTC

Created attachment 205025 [details] [review]
draft for the multipart stream

OK, here's my draft for you to validate the approach; some things are a bit (read a lot) hacky because sync reading on the http input stream doesn't work, and the read method requires sync reads; I also did not provide sync version of the next part method for now, for simplicity (although it currently _is_ sync hah).

Of note regarding the general approach is that current code (those that do not handle multipart) will work the same way it does currently; I decided against trying to be smart and saving the API users because I imagine people may want to do their own multipart processing and we would be getting in their way, what do you think?

If you're OK with this approach in general we can start deciding on the name of the object and I can write doc strings and fix next_part_async to be async for real, and add a sync version.

Comment 14 Gustavo Noronha (kov) 2012-01-11 17:35:51 UTC

Created attachment 205034 [details] [review]
webkit changes to use the new stuff

And here are the webkit changes to help visualizing how the API is used in a real world scenario =)

Comment 15 Dan Winship 2012-01-14 21:19:12 UTC

(In reply to comment #13)
> Created an attachment (id=205025) [details] [review]
> draft for the multipart stream
> 
> OK, here's my draft for you to validate the approach; some things are a bit
> (read a lot) hacky because sync reading on the http input stream doesn't work,

should be fixed in the latest giobased3...

> Of note regarding the general approach is that current code (those that do not
> handle multipart) will work the same way it does currently; I decided against
> trying to be smart and saving the API users because I imagine people may want
> to do their own multipart processing and we would be getting in their way, what
> do you think?

makes sense

> If you're OK with this approach in general

It basically all seems fine. Of course, the internals (of both this and giobased3 in general) will hopefully look a lot less hacky by the time 3.4 is out...

I forget if we discussed this, but maybe it would make more sense to have soup_multipart_stream_next_part_finish() return a new GInputStream rather than having it restart the SoupMultipartStream itself? (I'm thinking that I'm going to do this for SoupInputStream/SoupOutputStream.)

soup_message_io_is_read_blocking() appears to be cruft?

> we can start deciding on the name of the object

SoupMultipartInputStream, since we eventually want an Output version too (bug 637845).

Comment 16 Dan Winship 2012-01-14 21:19:39 UTC

(In reply to comment #14)
> Created an attachment (id=205034) [details] [review]
> webkit changes to use the new stuff

does this imply that webkit actually works against giobased3?

Comment 17 Gustavo Noronha (kov) 2012-01-16 13:18:49 UTC

I don't think we discussed the next_part_finish() bit, how would that work? We would create a new SoupMultipartInputStream, and wrap our base_stream in it, then die? I'll update giobased3 and make those fixes.

(In reply to comment #16)
> > webkit changes to use the new stuff
> 
> does this imply that webkit actually works against giobased3?

It does, at least the URLs I tried worked fine =)

Comment 18 Dan Winship 2012-01-16 14:24:04 UTC

(In reply to comment #17)
> I don't think we discussed the next_part_finish() bit, how would that work? We
> would create a new SoupMultipartInputStream, and wrap our base_stream in it,
> then die?

Look at giobased4; now SoupInputStream has been split into SoupSocketInputStream (which there's only ever one of per socket) and SoupBodyInputStream (which is per-message, and does chunked encoding, or cuts off Content-Length-encoded bodies at the right point).

So the idea would be that the app would create the SoupMultipartInputStream, call next_part() on it to get a stream for the first body part, read that part until you get back a 0 read, and then close and unref that stream and call next_part() on the SoupMultipartInputStream again.

You might be able to reuse SoupBodyInputStream as the body-part-stream type if you added a SOUP_ENCODING_MULTIPART. ALthough SoupBodyInputStream requires its base-stream to be a SoupSocketInputStream. But maybe SoupSocketInputStream should be "SoupBufferedInputStream", and SoupMultipartInputStream would be a subclass of that... and then you wouldn't need your own buffer-management code.

Comment 19 Gustavo Noronha (kov) 2012-01-20 12:25:00 UTC

I think the idea of sharing the buffer management code is great, I can make a new base class SoupBufferedInputStream from which both SoupMultipartInputStream and SoupSocketInputStream inherit, what do you think?

I'm having a hard time visualizing how to return a new input stream for each next_part call, though; I guess I'd have SoupMultipartInputStream be just an empty shell, and have, say, SoupPartInputStream (which would be the one inheritting from SoupBufferedInputStream), which would be created to wrap the SoupBodyInputStream when next_part is called, and would return 0 and cleanup itself when the part is done. Does that sound like what you were thinking?

Comment 20 Dan Winship 2012-01-20 12:28:35 UTC

yes, exactly

Comment 21 Gustavo Noronha (kov) 2012-01-20 18:47:16 UTC

Created attachment 205720 [details] [review]
check point

OK, check point to see if it's going in the direction you envisioned, I think handing over buffer handling and boundary detection to SoupBufferedInputStream is a great idea =D

Comment 22 Dan Winship 2012-01-24 14:17:35 UTC

looks generally sane

Comment 23 Gustavo Noronha (kov) 2012-01-26 14:06:32 UTC

Created attachment 206187 [details] [review]
patch

OK, I cleaned it up a bit, wrote docs, made a sync API for next part, and made the async one really async. A couple notes:

* I haven't tried this one in WebKit as of yet, because giobased4 is still not working there - I think because of the broken content sniffer, I'll take a closer look at that later today.
* The test is failing currently, because the read for the last part stops short a few bytes, the next read returns 0, and later reads return garbage, I haven't figured out why just yet, could be a bug in SoupBufferedInputStream I think

Other than those bugs, I'm fairly satisfied with the way it looks, I think it's a straight-forward API!

Open question: SoupMultipartInputStream obviously needs to be public; should we really install soup-buffered-input-stream.h, do you see some way around it?

Comment 24 Dan Winship 2012-01-26 14:37:47 UTC

(In reply to comment #23)
> Open question: SoupMultipartInputStream obviously needs to be public; should we
> really install soup-buffered-input-stream.h, do you see some way around it?

Hm... I hadn't been planning to.

Oh, I guess I didn't read your proposal earlier well enough... I had been thinking that SoupSocketInputStream would simply be *renamed* to SoupBufferedInputStream, not made a subclass.

And then (although I was less certain about this part) SoupMultipartInputStream just has-a SoupBufferedInputStream (internally), not is-a SoupBufferedInputStream.

However, the subclassing way may make more sense...

Comment 25 Gustavo Noronha (kov) 2012-01-26 16:19:14 UTC

Yeah, I think I prefer subclassing. However, I don't feel too strongly about it, I can make it a has-a relation for SoupMultipartInputStream, while keeping SoupSocketInputStream as a subclass, if you think it makes sense.

I started writing a simple "mjpeg viewer" to help me with tests, btw, it's here: https://gitorious.org/kov-play/kov-play/blobs/master/soup/multipart/main.c

Comment 26 Gustavo Noronha (kov) 2012-01-27 14:27:57 UTC

Created attachment 206265 [details] [review]
alternate patch

Here's the same patch but with SoupMultipartInputStream using a SoupBufferedInputStream instead of being one. This keeps the SoupSocketInputStream as a subclass, but avoids exposing SoupBufferedInputStream.

Comment 27 Gustavo Noronha (kov) 2012-02-04 00:11:07 UTC

Created attachment 206746 [details] [review]
updated patch

Updated patch - decided to make it a full patch because I updated the commit message.

This patch essentially turns the check into a multipart/* check. I don't think sniffing each part works and disabling sniffing seems to be the wrong thing to do on a second thought, for the only sensible place to do it would be when we tag the message with the multipart flag, but to actually avoid sniffing to happen we would have to unref/set-to-NULL the priv->sniffer of the message or track whether a message should be sniffed in a different way - it may also be unexpected behaviour for the API users, I think.

One thing I noticed testing with my simple mjpeg viewer app was that sometimes reads that should be blocking are returning with a G_IO_ERROR_WOULD_BLOCK error - repeating the read does make it work. I believe the problem is in SoupBodyInputStream, I'll take a look at that and update the webkit changes tomorrow.

Comment 28 Dan Winship 2012-04-18 15:05:42 UTC

Comment on attachment 206746 [details] [review]
updated patch

the giobased stuff has landed in master now, but it's slightly different now from the version you based this patch against (and it also requires glib master)

Comment 29 Gustavo Noronha (kov) 2012-05-09 11:51:51 UTC

Created attachment 213731 [details] [review]
patch updated for the brave new world

This includes all of the required changes, I believe =)

Comment 30 Gustavo Noronha (kov) 2012-05-09 11:54:36 UTC

Created attachment 213732 [details] [review]
webkit changes

I updated the webkit bits too, but it does not seem to work with a more recent WebKit - not because the parts are incorrect, but because WebCore refuses to update the image, even though it's getting all the notifications it needs; I checked out the same WebKit revision I used for the earlier patch and it worked fine, fwiw.

Comment 31 Dan Winship 2012-05-09 12:39:27 UTC

Comment on attachment 213731 [details] [review]
patch updated for the brave new world

quick review...

>+			if (content_type && g_str_has_prefix (content_type, "multipart/")) {
>+				/* Flag message as multipart, and fall through to get got_headers emitted. */
>+				priv->msg_flags |= SOUP_MESSAGE_MULTIPART;
>+				io->read_encoding = SOUP_ENCODING_EOF;

SOUP_ENCODING_EOF is only for the multipart/x-mixed-replace case, not multipart in general.

And really, it's not even true in general for multipart/x-mixed-replace. Presumably servers send mixed-replace streams with "Connection: close" (or HTTP/1.0) anyway... So is that actually needed?

>+	nread = soup_filter_input_stream_read_until (priv->base_stream, buffer, count,
>+						     priv->boundary, priv->boundary_size,
>+						     blocking, &got_boundary,
>+						     cancellable, error);

soup_filter_input_stream_read_until() will currently fail if priv->boundary_size > count, which could more plausibly happen here than in soup-message-io where boundary_size is usually 1 or 2... so we'll have to fix that. (Probably change read_until() to read MIN(length, boundary_length) bytes?)

>+static gssize
>+soup_multipart_input_stream_skip (GInputStream	*stream,
>+				  gsize		 count,
>+				  GCancellable	*cancellable,
>+				  GError       **error)
>+{
>+	return soup_multipart_input_stream_read (stream, NULL, count,
>+						 cancellable, error);
>+}

You don't need to do that, since that's the default implementation anyway.

>+	g_object_get (object, "base-stream", &base_stream, NULL);
>+	priv->base_stream = SOUP_FILTER_INPUT_STREAM (soup_filter_input_stream_new (base_stream));

leaks base_stream because g_object_get() will have reffed it. You can use "G_FILTER_INPUT_STREAM (object)->base_stream".

>+	if (g_str_has_prefix (boundary, "--"))
>+		priv->boundary = g_strdup_printf ("\r\n%s\r\n", boundary);
>+	else
>+		priv->boundary = g_strdup_printf ("\r\n--%s\r\n", boundary);

A few things:

  1) Is there guaranteed to be a blank line before the initial boundary?

  2) People are morons and you can't assume they'll correctly use \r\n rather
     than plain \n. Sigh

  3) The final boundary in the multipart is "\n--%s--". Probably x-mixed-replace
     streams never have a final boundary, but if this is a generic multipart-
     parsing class then it should deal with that. (And if the server does decide
     to close a multipart/x-mixed-replace stream correctly before closing the
     connection, you don't want to misparse the boundary as being part of the
     data. Plus, if you're trying to deal with both --%s\r\n and --%s\n anyway,
     it's not hard to deal with --%s-- too.)


>+	g_source_set_dummy_callback (base_source);
>+	pollable_source = g_pollable_source_new (G_OBJECT (stream));
>+	g_source_add_child_source (pollable_source, base_source);
>+	g_source_unref (base_source);

Use g_pollable_source_new_full() to simplify this

>+	if (!success) {
>+		GString *headers_string =
>+			g_string_new_len ((const char*) priv->meta_buf->data,
>+					  priv->meta_buf->len);
>+		g_warning ("Failed to parse headers:\n%s", headers_string->str);

If that's not just for debug purposes then it would be better to have that turn into a GError returned from some method.

>+soup_multipart_input_stream_next_part_thread (GSimpleAsyncResult *simple,

mmm... I'm not sure using run_in_thread() is going to work... I guess we'll find out

>+	return SOUP_MULTIPART_INPUT_STREAM (g_object_new (SOUP_TYPE_MULTIPART_INPUT_STREAM,

you don't need to cast the return value of g_object_new().

>+	if (!g_input_stream_set_pending (stream, &error)) {
>+		g_simple_async_result_set_op_res_gpointer (simple, NULL, NULL);
>+		g_simple_async_result_take_error (simple, error);

don't need to call set_op_res_gpointer in the error case

Comment 32 Gustavo Noronha (kov) 2012-05-24 22:59:39 UTC

Created attachment 214909 [details] [review]
patch

Here's an updated patch, sorry it took so long, I've been busy with random other stuff. I believe this addresses all your comments, and it uses the idea we discussed on IRC today. One additional comment:

> soup_filter_input_stream_read_until() will currently fail if
> priv->boundary_size > count, which could more plausibly happen here than in
> soup-message-io where boundary_size is usually 1 or 2... so we'll have to fix
> that. (Probably change read_until() to read MIN(length, boundary_length)
> bytes?)

I had to ensure this on the caller side, in SoupMultipartInputStream, because I need to give it a buffer that has enough space to fit the boundary and two additional bytes anyway. Unfortunately this means I have to do some buffering in SoupMultipartInputStream, for those cases =(. I added a test that tests small reads work. I haven't tested this code with WebKit yet, but I'll do it tomorrow, I believe it'll work just fine, though =)

Comment 33 Gustavo Noronha (kov) 2012-05-25 20:20:24 UTC

Created attachment 215000 [details] [review]
patch with Content-Length-based optimization

This patch uses Content-Length to guide its usage of read_until. On the pro side, the code still works with sites which do not use Content-Length, and those which give a number but send more data, on the con side, the read implementation has grown way more complex than I hoped originally, but I guess that's the price we pay for supporting crazy servers =)

This brings CPU usage to around as low as Firefox's on the sites I tried.

Comment 34 Dan Winship 2012-05-26 23:54:47 UTC

Review of attachment 215000 [details] [review]:

::: libsoup/soup-filter-input-stream.c
@@ +206,3 @@
 
+	/* This is the number of bytes we will read but not compare when looking for
+	 * the boundary.

I was about to say "you need to update the docs", but then I realized there aren't any. :)

@@ +208,3 @@
+	 * the boundary.
+	 */
+	boundary_wildcard_bytes = boundary_length - strlen(boundary);

space before (

@@ +246,3 @@
 		end -= boundary_length;
 	for (p = buf; p <= end; p++) {
+		if (!memcmp (p, boundary, boundary_length - boundary_wildcard_bytes)) {

not related to this change, but: the vast vast majority of the memcmps will fail on the first byte. So if memcmp is showing up as expensive in the profiling, it has to be because just making a function call is expensive. In which case doing:

    if (*p == *boundary && !memcmp ...

should help a lot... (and not just for multiparts; for header parsing too)

::: libsoup/soup-multipart-input-stream.c
@@ +22,3 @@
+
+/**
+ * SECTION:soup-multipart-stream

missing "-input-"

you also need to update docs/reference/libsoup-2.4-docs.sgml and libsoup-2.4-sections.txt

@@ +33,3 @@
+ * which are not wrapped will be treated like non-multipart responses.
+ *
+ * Since: 2.40.0

you're inconsistent about 2.40.0 vs 2.40. The latter is probably better.

@@ +75,3 @@
+	if (multipart->priv->msg) {
+		g_object_unref (multipart->priv->msg);
+		multipart->priv->msg = NULL;

the whole if() here can be replaced with just

  g_clear_object (&multipart->priv->msg);

@@ +135,3 @@
+
+static gssize
+read_from_buf (SoupMultipartInputStream *multipart, gpointer buffer, gsize count)

you had originally added this to deal with the boundary_len + 2 problem, right? now that that's resolved inside SoupFilterInputStream, does SoupMultipartInputStream still need to do its own buffering?

(more discussion of this further down...)

@@ +174,3 @@
+	 * necessary for keeping CPU usage civil.
+	 */
+	if (priv->remaining_bytes > 0) {

The logic in here doesn't work. Eg, if priv->remaining_bytes is 20, count is 10, and priv->boundary_size is 15, then it will read 20 bytes (ie, more than count).

I think what you meant is:

    if (priv->remaining_bytes > priv->boundary_size * 2) {
        goffset bytes_to_read = MIN (count, priv->remaining_bytes - priv->boundary_size * 2);

        // read @bytes_to_read bytes, update priv->remaining_bytes, return nread
    }

ie, if we're not too close to the end of the part, then read up to @count bytes, but not beyond the "too close to the end" point

@@ +253,3 @@
+		 * belongs to the next part's headers.
+		 */
+		g_byte_array_append (priv->meta_buf, buf + 1, 1);

hm. this is another reason you'd need internal buffering I guess. Idea for dealing with this below...

@@ +259,3 @@
+	nread -= priv->boundary_size;
+
+	/* And finally we ignore the newline that preceeded the boundary. */

"preceded"

@@ +304,3 @@
+	 * no further checks are needed.
+	 */
+	priv->need_boundary_check = TRUE;

OK, idea for simplifying the buffering/boundary parsing stuff: give soup_filter_input_stream_read_until() a flag to say whether you want it to include the boundary in your buffer, or if it should leave it still in its internal buffer instead. And tell it you want the latter behavior. Then read_real() wouldn't have to deal with all the interior-boundary-vs-last-boundary and crlf-vs-lf, because it would only ever see up to the start of the boundary, not the end. soup_multipart_input_stream_read_headers() would then deal with actually parsing the boundary. (And it would always deal with it, so you don't need priv->need_boundary_check any more.) And since you're using read_line() there instead of read_until(), you don't need the "wildcard" magic any more either.

@@ +323,3 @@
+
+	params = g_hash_table_new_full (g_str_hash, g_str_equal,
+					g_free, g_free);

this is unused and leaked; soup_message_headers_get_content_type() overwrites it with a new hash table

@@ +329,3 @@
+
+	boundary = g_hash_table_lookup (params, "boundary");
+	if (g_str_has_prefix (boundary, "--"))

you need to check if boundary is NULL.

Actually, probably you should check in soup-message-io, and don't set the MULTIPART flag if it's missing. (But if it doesn't require hacks in too many places, it would be nice if SoupMultipartInputStream did *something* other than crash in that case. I guess it can just g_return_if_fail here, and g_return_if_fail (priv->boundary != NULL) at the top of any other method that needs to look at priv->boundary.

@@ +348,3 @@
+	SoupMultipartInputStreamPrivate *priv = multipart->priv;
+
+	return g_pollable_input_stream_is_readable (G_POLLABLE_INPUT_STREAM (priv->base_stream));

if you have to keep priv->buffered_bytes, then this is wrong; it's readable when there's buffered data too. at least sometimes

@@ +422,3 @@
+	gboolean success;
+
+	priv->current_headers = soup_message_headers_new (SOUP_MESSAGE_HEADERS_RESPONSE);

SOUP_MESSAGE_HEADERS_MULTIPART. (already exists)

@@ +428,3 @@
+		return;
+
+	success = soup_headers_parse_response ((const char*) priv->meta_buf->data,

soup_headers_parse()

@@ +457,3 @@
+	if (priv->current_headers) {
+		soup_message_headers_free (priv->current_headers);
+		priv->current_headers = NULL;

g_clear_pointer (&priv->current_headers, soup_message_headers_free);

(might have to up the glib requirement in configure.ac for that...)

@@ +470,3 @@
+
+		if (!got_lf)
+			continue;

If nread > 0 && !got_lf, then either:
  a. nread == sizeof (read_buf), which probably indicates something went horribly wrong somewhere, OR
  b. you read the last line of the multipart, and it didn't have a terminating newline (in which
     case the line had better be the final boundary)

@@ +477,3 @@
+
+			/* Discard boundary if it's still there. */
+			if (!strncmp ((char*)priv->meta_buf->data,

This still assumes there will be no blank lines before the first boundary, which I don't think is guaranteed.

@@ +487,3 @@
+		if (nread == 1 &&
+		    !strncmp ((char *)priv->meta_buf->data +
+			      priv->meta_buf->len - 2,

You need to check that priv->meta_buf->len >= 2. Likewise >= 3 below. (Not your fault; you copied this from soup-message-io before this bug got fixed there...)

@@ +502,3 @@
+
+	if (found_headers) {
+		g_byte_array_prepend (priv->meta_buf, (const guint8 *)"HTTP/1.0 200 OK\r\n", strlen ("HTTP/1.0 200 OK\r\n"));

You don't need this if you're using soup_headers_parse() rather than soup_headers_parse_response() above. You just need *some* line there (which can be the boundary line if that would be convenient, or just a blank line if not).

@@ +520,3 @@
+	gboolean success;
+
+	success = soup_multipart_input_stream_read_headers (multipart, cancellable, &error);

you can just call soup_multipart_input_stream_next_part() here rather than doing the read_headers and g_object_new separately. (and I'd move it to just before next_part_async)

@@ +555,3 @@
+ * Since: 2.40.0
+ **/
+SoupMultipartInputStream*

space before * (likewise in several other functions below)

@@ +638,3 @@
+
+	if (!g_input_stream_set_pending (stream, &error)) {
+		g_simple_async_result_set_op_res_gpointer (simple, NULL, NULL);

don't need that

@@ +674,3 @@
+	simple = G_SIMPLE_ASYNC_RESULT (result);
+
+	g_return_val_if_fail (g_simple_async_result_get_source_tag (simple) == soup_multipart_input_stream_next_part_async, FALSE);

this and the G_IS_SIMPLE_ASYNC_RESULT check can be combined into

g_return_val_if_fail (g_simple_async_result_is_valid (result, G_OBJECT (multipart), soup_multipart_input_stream_next_part_async), FALSE);

@@ +677,3 @@
+
+	new_stream = g_simple_async_result_get_op_res_gpointer (simple);
+	if (g_simple_async_result_propagate_error (simple, error)) {

the standard idiom is:

    if (g_simple_async_result_propagate_error (simple, error))
        return NULL;

    return g_simple_async_result_get_op_res_gpointer (simple);

oh, except you need to g_object_ref() it too; the copy you passed to set_op_res_gpointer is now owned by the GSimpleAsyncResult and will be freed when the GSimpleAsyncResult is destroyed.

@@ +693,3 @@
+ * #SoupMultipartInputStream and will be replaced when a call is made
+ * to soup_multipart_input_stream_next_part() or its async
+ * counter-part, so if keeping the headers is required, a copy must be

"counterpart"

also, the docs should note that the headers may be empty, if the part didn't include any headers

::: libsoup/soup-multipart-input-stream.h
@@ +39,3 @@
+							                GInputStream              *base_stream);
+
+GInputStream*             soup_multipart_input_stream_next_part        (SoupMultipartInputStream  *multipart,

put GInputStream's "*" on the other side of the gap

Comment 35 Dan Winship 2012-05-26 23:54:55 UTC

Review of attachment 215000 [details] [review]:

::: libsoup/soup-filter-input-stream.c
@@ +206,3 @@
 
+	/* This is the number of bytes we will read but not compare when looking for
+	 * the boundary.

I was about to say "you need to update the docs", but then I realized there aren't any. :)

@@ +208,3 @@
+	 * the boundary.
+	 */
+	boundary_wildcard_bytes = boundary_length - strlen(boundary);

space before (

@@ +246,3 @@
 		end -= boundary_length;
 	for (p = buf; p <= end; p++) {
+		if (!memcmp (p, boundary, boundary_length - boundary_wildcard_bytes)) {

not related to this change, but: the vast vast majority of the memcmps will fail on the first byte. So if memcmp is showing up as expensive in the profiling, it has to be because just making a function call is expensive. In which case doing:

    if (*p == *boundary && !memcmp ...

should help a lot... (and not just for multiparts; for header parsing too)

::: libsoup/soup-multipart-input-stream.c
@@ +22,3 @@
+
+/**
+ * SECTION:soup-multipart-stream

missing "-input-"

you also need to update docs/reference/libsoup-2.4-docs.sgml and libsoup-2.4-sections.txt

@@ +33,3 @@
+ * which are not wrapped will be treated like non-multipart responses.
+ *
+ * Since: 2.40.0

you're inconsistent about 2.40.0 vs 2.40. The latter is probably better.

@@ +75,3 @@
+	if (multipart->priv->msg) {
+		g_object_unref (multipart->priv->msg);
+		multipart->priv->msg = NULL;

the whole if() here can be replaced with just

  g_clear_object (&multipart->priv->msg);

@@ +135,3 @@
+
+static gssize
+read_from_buf (SoupMultipartInputStream *multipart, gpointer buffer, gsize count)

you had originally added this to deal with the boundary_len + 2 problem, right? now that that's resolved inside SoupFilterInputStream, does SoupMultipartInputStream still need to do its own buffering?

(more discussion of this further down...)

@@ +174,3 @@
+	 * necessary for keeping CPU usage civil.
+	 */
+	if (priv->remaining_bytes > 0) {

The logic in here doesn't work. Eg, if priv->remaining_bytes is 20, count is 10, and priv->boundary_size is 15, then it will read 20 bytes (ie, more than count).

I think what you meant is:

    if (priv->remaining_bytes > priv->boundary_size * 2) {
        goffset bytes_to_read = MIN (count, priv->remaining_bytes - priv->boundary_size * 2);

        // read @bytes_to_read bytes, update priv->remaining_bytes, return nread
    }

ie, if we're not too close to the end of the part, then read up to @count bytes, but not beyond the "too close to the end" point

@@ +253,3 @@
+		 * belongs to the next part's headers.
+		 */
+		g_byte_array_append (priv->meta_buf, buf + 1, 1);

hm. this is another reason you'd need internal buffering I guess. Idea for dealing with this below...

@@ +259,3 @@
+	nread -= priv->boundary_size;
+
+	/* And finally we ignore the newline that preceeded the boundary. */

"preceded"

@@ +304,3 @@
+	 * no further checks are needed.
+	 */
+	priv->need_boundary_check = TRUE;

OK, idea for simplifying the buffering/boundary parsing stuff: give soup_filter_input_stream_read_until() a flag to say whether you want it to include the boundary in your buffer, or if it should leave it still in its internal buffer instead. And tell it you want the latter behavior. Then read_real() wouldn't have to deal with all the interior-boundary-vs-last-boundary and crlf-vs-lf, because it would only ever see up to the start of the boundary, not the end. soup_multipart_input_stream_read_headers() would then deal with actually parsing the boundary. (And it would always deal with it, so you don't need priv->need_boundary_check any more.) And since you're using read_line() there instead of read_until(), you don't need the "wildcard" magic any more either.

@@ +323,3 @@
+
+	params = g_hash_table_new_full (g_str_hash, g_str_equal,
+					g_free, g_free);

this is unused and leaked; soup_message_headers_get_content_type() overwrites it with a new hash table

@@ +329,3 @@
+
+	boundary = g_hash_table_lookup (params, "boundary");
+	if (g_str_has_prefix (boundary, "--"))

you need to check if boundary is NULL.

Actually, probably you should check in soup-message-io, and don't set the MULTIPART flag if it's missing. (But if it doesn't require hacks in too many places, it would be nice if SoupMultipartInputStream did *something* other than crash in that case. I guess it can just g_return_if_fail here, and g_return_if_fail (priv->boundary != NULL) at the top of any other method that needs to look at priv->boundary.

@@ +348,3 @@
+	SoupMultipartInputStreamPrivate *priv = multipart->priv;
+
+	return g_pollable_input_stream_is_readable (G_POLLABLE_INPUT_STREAM (priv->base_stream));

if you have to keep priv->buffered_bytes, then this is wrong; it's readable when there's buffered data too. at least sometimes

@@ +422,3 @@
+	gboolean success;
+
+	priv->current_headers = soup_message_headers_new (SOUP_MESSAGE_HEADERS_RESPONSE);

SOUP_MESSAGE_HEADERS_MULTIPART. (already exists)

@@ +428,3 @@
+		return;
+
+	success = soup_headers_parse_response ((const char*) priv->meta_buf->data,

soup_headers_parse()

@@ +457,3 @@
+	if (priv->current_headers) {
+		soup_message_headers_free (priv->current_headers);
+		priv->current_headers = NULL;

g_clear_pointer (&priv->current_headers, soup_message_headers_free);

(might have to up the glib requirement in configure.ac for that...)

@@ +470,3 @@
+
+		if (!got_lf)
+			continue;

If nread > 0 && !got_lf, then either:
  a. nread == sizeof (read_buf), which probably indicates something went horribly wrong somewhere, OR
  b. you read the last line of the multipart, and it didn't have a terminating newline (in which
     case the line had better be the final boundary)

@@ +477,3 @@
+
+			/* Discard boundary if it's still there. */
+			if (!strncmp ((char*)priv->meta_buf->data,

This still assumes there will be no blank lines before the first boundary, which I don't think is guaranteed.

@@ +487,3 @@
+		if (nread == 1 &&
+		    !strncmp ((char *)priv->meta_buf->data +
+			      priv->meta_buf->len - 2,

You need to check that priv->meta_buf->len >= 2. Likewise >= 3 below. (Not your fault; you copied this from soup-message-io before this bug got fixed there...)

@@ +502,3 @@
+
+	if (found_headers) {
+		g_byte_array_prepend (priv->meta_buf, (const guint8 *)"HTTP/1.0 200 OK\r\n", strlen ("HTTP/1.0 200 OK\r\n"));

You don't need this if you're using soup_headers_parse() rather than soup_headers_parse_response() above. You just need *some* line there (which can be the boundary line if that would be convenient, or just a blank line if not).

@@ +520,3 @@
+	gboolean success;
+
+	success = soup_multipart_input_stream_read_headers (multipart, cancellable, &error);

you can just call soup_multipart_input_stream_next_part() here rather than doing the read_headers and g_object_new separately. (and I'd move it to just before next_part_async)

@@ +555,3 @@
+ * Since: 2.40.0
+ **/
+SoupMultipartInputStream*

space before * (likewise in several other functions below)

@@ +638,3 @@
+
+	if (!g_input_stream_set_pending (stream, &error)) {
+		g_simple_async_result_set_op_res_gpointer (simple, NULL, NULL);

don't need that

@@ +674,3 @@
+	simple = G_SIMPLE_ASYNC_RESULT (result);
+
+	g_return_val_if_fail (g_simple_async_result_get_source_tag (simple) == soup_multipart_input_stream_next_part_async, FALSE);

this and the G_IS_SIMPLE_ASYNC_RESULT check can be combined into

g_return_val_if_fail (g_simple_async_result_is_valid (result, G_OBJECT (multipart), soup_multipart_input_stream_next_part_async), FALSE);

@@ +677,3 @@
+
+	new_stream = g_simple_async_result_get_op_res_gpointer (simple);
+	if (g_simple_async_result_propagate_error (simple, error)) {

the standard idiom is:

    if (g_simple_async_result_propagate_error (simple, error))
        return NULL;

    return g_simple_async_result_get_op_res_gpointer (simple);

oh, except you need to g_object_ref() it too; the copy you passed to set_op_res_gpointer is now owned by the GSimpleAsyncResult and will be freed when the GSimpleAsyncResult is destroyed.

@@ +693,3 @@
+ * #SoupMultipartInputStream and will be replaced when a call is made
+ * to soup_multipart_input_stream_next_part() or its async
+ * counter-part, so if keeping the headers is required, a copy must be

"counterpart"

also, the docs should note that the headers may be empty, if the part didn't include any headers

::: libsoup/soup-multipart-input-stream.h
@@ +39,3 @@
+							                GInputStream              *base_stream);
+
+GInputStream*             soup_multipart_input_stream_next_part        (SoupMultipartInputStream  *multipart,

put GInputStream's "*" on the other side of the gap

Comment 36 Dan Winship 2012-05-27 12:51:39 UTC

Comment on attachment 213732 [details] [review]
webkit changes

This patch looks like it will break defersLoading for mixed-replace responses. Though I'm not sure that will have any real-world effect.

Two half-related bugs involving messages with multiple responses:

  1. bug 675306; the fact that http auth happens before soup_request_send_async
     returns makes us need to keep soup_session_pause_message around, which is
     ugly

  2. (various wk bugs); we fail certain redirection tests because libsoup won't
     redirect them automatically (eg, http: to about:, or some redirects involving
     POSTs), and ResourceHandleSoup doesn't do redirects on its own

The patch here pulls ResourceResponseSoup and SoupMessage a little bit further apart... maybe we should do that even more, and get rid of most automatic request restarting inside SoupRequest and instead have ResourceHandleSoup control "restarts" like it does here for mixed-replace "restarts"?

(This is not an objection to the current SoupMultipartInputStream API, just an attempt to brain dump my current thoughts on trying to figure out if we need to change the SoupRequest API any more before calling it stable...)

Comment 37 Gustavo Noronha (kov) 2012-05-28 17:59:41 UTC

(In reply to comment #36)
>   1. bug 675306; the fact that http auth happens before soup_request_send_async
>      returns makes us need to keep soup_session_pause_message around, which is
>      ugly

It seems to me the only way out of this is to add an alternative authentication API that requires an explicit decision about whether the authentication will be handled or not - say, by returning TRUE - similar to the one we have in WebKit for geolocation policy decisions (see geolocation-policy-decision-requested). That way soup itself knows whether the message should be held, and the callback to send_async won't be called before the decision has been set, keeping the nice linear flow we want to have.
 
>   2. (various wk bugs); we fail certain redirection tests because libsoup won't
>      redirect them automatically (eg, http: to about:, or some redirects
> involving
>      POSTs), and ResourceHandleSoup doesn't do redirects on its own
> 
> The patch here pulls ResourceResponseSoup and SoupMessage a little bit further
> apart... maybe we should do that even more, and get rid of most automatic
> request restarting inside SoupRequest and instead have ResourceHandleSoup
> control "restarts" like it does here for mixed-replace "restarts"?

Hmm. That sounds like it would be a good way out. In fact, I think we discussed something like this during the webkit hackfest. I think having things be less automatic by default can be a bit inconvenient for simpler use cases, but I also think convenience can be bolted on later on, whereas serving the needs of a webkit working around automation has proven challenging.

> (This is not an objection to the current SoupMultipartInputStream API, just an
> attempt to brain dump my current thoughts on trying to figure out if we need to
> change the SoupRequest API any more before calling it stable...)

Sure, I'm happy this helps us pound a bit more on the API to make it as bullet proof as possible ;) I'll start updating the patch from your comments, should have it up in a few.

Comment 38 Gustavo Noronha (kov) 2012-05-28 22:06:16 UTC

Created attachment 215153 [details] [review]
patch with comments addressed

(In reply to comment #35)
> Review of attachment 215000 [details] [review]:
> @@ +246,3 @@
>          end -= boundary_length;
>      for (p = buf; p <= end; p++) {
> +        if (!memcmp (p, boundary, boundary_length - boundary_wildcard_bytes))
> {
> 
> not related to this change, but: the vast vast majority of the memcmps will
> fail on the first byte. So if memcmp is showing up as expensive in the
> profiling, it has to be because just making a function call is expensive. In
> which case doing:
> 
>     if (*p == *boundary && !memcmp ...
> 
> should help a lot... (and not just for multiparts; for header parsing too)

I tried this, and tried even removing the memcmp call and doing all of the comparison by hand. The first bit helped a bit, but after that there was no improvement. Even with this and the Content-Length optimization my "mjpeg viewer" still managed to use 50% of a core here on this site: http://82.207.68.254/mjpg/video.mjpg

Most sites are more sensible, though, so I think this is acceptable, we can revisit if it turns out to be a more broad problem.

> OK, idea for simplifying the buffering/boundary parsing stuff: give

Brilliant idea, thanks!

> g_clear_pointer (&priv->current_headers, soup_message_headers_free);
> 
> (might have to up the glib requirement in configure.ac for that...)

> You don't need this if you're using soup_headers_parse() rather than
> soup_headers_parse_response() above. You just need *some* line there (which can
> be the boundary line if that would be convenient, or just a blank line if not).

I left the boundary, since it's very convenient with read_headers handling the boundary.
 
> you can just call soup_multipart_input_stream_next_part() here rather than
> doing the read_headers and g_object_new separately. (and I'd move it to just
> before next_part_async)

I can't remember why I didn't do this in the first place. Probably noobed something related to GIO. 

Thanks for the review!

Comment 39 Gustavo Noronha (kov) 2012-06-06 18:41:25 UTC

I bisected the regression in WebKit, and filed a bug: https://bugs.webkit.org/show_bug.cgi?id=88436

Comment 40 Dan Winship 2012-07-26 13:13:55 UTC

Review of attachment 215153 [details] [review]:

Sorry for the delay. Almost there.

The soup_filter_input_stream_read_until() parts (soup-filter-input-stream.c, soup-filter-input-stream.h, soup-socket.c) are ready to go and can be committed now as a separate commit if you like.

Does the existence of %SOUP_MESSAGE_MULTIPART really simplify the WebKit side much (as compared to just checking the Content-Type in WebKit rather than checking the message flags)?

Actually, I guess all the changes below are pretty small, so you can commit with those changes (with or without SOUP_MESSAGE_MULTIPART depending on what you think about that.)

::: libsoup/soup-message-io.c
@@ +593,3 @@
+			    g_str_has_prefix (content_type, "multipart/") &&
+			    g_hash_table_lookup (params, "boundary")) {
+				/* Flag message as multipart, and fall through to get got_headers emitted. */

i don't think the comment is really needed at this point

@@ +595,3 @@
+				/* Flag message as multipart, and fall through to get got_headers emitted. */
+				priv->msg_flags |= SOUP_MESSAGE_MULTIPART;
+			}

g_clear_pointer (&params, g_hash_table_unref);

::: libsoup/soup-multipart-input-stream.c
@@ +32,3 @@
+ * soup_multipart_input_stream_next_part() before reading. Responses
+ * which are not wrapped will be treated like non-multipart responses.
+ *

"Note that although #SoupMultipartInputStream is a #GInputStream, you should not read directly from it, and the results are undefined if you do."

(I want to leave open the possibility that in the future, soup_request_send() would return a SoupMultipartInputStream directly, which you can then treat either as a multipart stream, or as a plain body stream.)

@@ +53,3 @@
+	gboolean	         done_with_part;
+
+	GByteArray	        *buffered_bytes;

@buffered_bytes is created and destroyed, but never actually used

@@ +88,3 @@
+
+	if (multipart->priv->meta_buf)
+		g_byte_array_free (multipart->priv->meta_buf, TRUE);

g_clear_pointer (&multipart->priv->meta_buf, g_byte_array_unref);

@@ +175,3 @@
+
+	/* Ignore the newline that preceded the boundary. */
+	buf = ((guint8*)buffer) + nread - 2;

nread might be 1

@@ +192,3 @@
+{
+	return soup_multipart_input_stream_read_real (stream, buffer, count,
+						       TRUE, cancellable, error);

second line looks indented one space too far, though that might just be bugzilla

@@ +263,3 @@
+
+	return soup_multipart_input_stream_read_real (G_INPUT_STREAM (multipart),
+						       buffer, count,

maybe wrong indent again

@@ +378,3 @@
+
+			/* Now check for possible multipart termination. */
+			buf = &read_buf[nread - 2];

if nread >= 2 && ...

Comment 41 Gustavo Noronha (kov) 2012-08-20 13:49:56 UTC

Thanks for the review and sorry for my delay - turns out I went on some vacations after GUADEC, getting to know snow and skiing =). I have applied the changes you suggested, will just do a final test with WebKit before I push. I'll split the SoupFilterInputStream changes in a separate commit.

(In reply to comment #40)
> Does the existence of %SOUP_MESSAGE_MULTIPART really simplify the WebKit side
> much (as compared to just checking the Content-Type in WebKit rather than
> checking the message flags)?

It does not, actually. I guess the flag made sense at first, when we were thinking of not having a response that did not include the original multipart header, but right now it doesn't really add much. In WebKit I can even use d->m_response.isMultipart(), I believe, which already has the code to decide based on the content type. I'll remove the flag, I guess we can always readd it if we feel it's useful.

Comment 42 Gustavo Noronha (kov) 2012-08-20 18:03:48 UTC

Comment on attachment 215153 [details] [review]
patch with comments addressed

Pushed as 207b8aa56d272e20d5d3e052753e15b8fb43214f, for the SoupFilterInputStream changes, and d2a1f47dd173cc6642832c33b554c506dc97b7b9 for the SoupMultipartInputStream.