GNOME Bugzilla – Bug 640414
Try to force keep-alive for HTTP/1.0 connections
Last modified: 2011-02-16 10:08:50 UTC
Trying to keep connections alive is a good idea if we plan to issue multiple consecutive requests to the same host in a row, as new connections would not have to be created, we can just reuse the old ones. This is the default behaviour for HTTP/1.1, but that is not the case for servers using HTTP/1.0. You might think that nobody is using that but huge websites as Wikipedia actually do.
Created attachment 179164 [details] [review] Patch This patch adds a new field to SoupSessionHost that tracks the HTTP version used by the remote host. If HTTP/1.0 is used then we add the "Connection: keep-alive" header to the following requests to that host.
Just some numbers. I tracked the amount of new connections performing the following test: 1) Went to http://www.wikipedia.org 2) Clicked on Portuguese version 3) Clicked on the featured article of the day 4) Clicked on the first link of the article This threw the following numbers: * current version: 152 new connections * patched version: 33 new connections
Comment on attachment 179164 [details] [review] Patch >+ soup_message_headers_append (item->msg->request_headers, >+ "Connection", "keep-alive"); The standard "spelling" is "Keep-Alive" with capital letters, and there are probably some stupid servers out there that will only recognize it that way. Additionally, you should make sure it's not already set: conn_header = soup_message_headers_get_list (item->msg->request_headers); if (!conn_header || (!soup_header_contains (conn_header, "Keep-Alive") && !soup_header_contains (conn_header, "close"))) soup_message_headers_append (...) Beyond that... I'm not sure it's really worth doing the whole keeping-track-of-1.1-vs-1.0 thing: the "Connection: Keep-Alive" is harmless (other than being a few additional bytes) when sent to an HTTP/1.1 server... So we could just add the header unconditionally. If we are keeping the 1.1-vs-1.0 split, then: >+ /* The session will listen to changes in http version if >+ * needed when the message is started. >+ */ >+ g_signal_handlers_disconnect_by_func (msg, http_version_changed, session); redirect_handler is definitely the wrong place for this; you should connect to the signal from queue_message() rather than from soup_session_send_queue_item(), and you should use the SoupMessageQueueItem rather than the SoupSession as the user_data. And then you don't have to worry about removing it because soup_session_unqueue_item() will do it for you. And then you should extend do_msg_reuse_test() in misc-test.c to verify that the handler does get disconnected in this case. (Although I think that currently only tests SoupMessage signals and won't notice if a GObject::notify signal handler is leaked...) >+ g_object_get (msg, SOUP_MESSAGE_HTTP_VERSION, &version, NULL); Use the direct accessor (soup_message_get_http_version()). It's faster. >+ /* Do never upgrade the HTTP version for a host. Note that all >+ * the response fields (including the http version) are reset >+ * just before sending a message. >+ */ Yeah... so really, what you want is to connect to got-headers, not notify::http-version. (Also, grammar note: Lose the word "Do" in that comment.)
(In reply to comment #3) > (From update of attachment 179164 [details] [review]) > >+ soup_message_headers_append (item->msg->request_headers, > >+ "Connection", "keep-alive"); > > The standard "spelling" is "Keep-Alive" with capital letters, and there are > probably some stupid servers out there that will only recognize it that way. > > Additionally, you should make sure it's not already set: > > conn_header = soup_message_headers_get_list > (item->msg->request_headers); > if (!conn_header || (!soup_header_contains (conn_header, "Keep-Alive") > && > !soup_header_contains (conn_header, "close"))) > soup_message_headers_append (...) Good point > Beyond that... I'm not sure it's really worth doing the whole > keeping-track-of-1.1-vs-1.0 thing: the "Connection: Keep-Alive" is harmless > (other than being a few additional bytes) when sent to an HTTP/1.1 server... So > we could just add the header unconditionally. Me neither. That's the first thing I thought, for HTTP/1.1 we would be just sending a couple more bytes than needed. Actually firefox always send that header unconditionally. Chrome only sends it for HTTP/1.0 > If we are keeping the 1.1-vs-1.0 split, then: > > >+ /* The session will listen to changes in http version if > >+ * needed when the message is started. > >+ */ > >+ g_signal_handlers_disconnect_by_func (msg, http_version_changed, session); > > redirect_handler is definitely the wrong place for this; you should connect to > the signal from queue_message() rather than from > soup_session_send_queue_item(), and you should use the SoupMessageQueueItem > rather than the SoupSession as the user_data. And then you don't have to worry > about removing it because soup_session_unqueue_item() will do it for you. The disconnection in redirect handler was just used to avoid double connection, but indeed it'd be better to do it how you say. > And then you should extend do_msg_reuse_test() in misc-test.c to verify that > the handler does get disconnected in this case. (Although I think that > currently only tests SoupMessage signals and won't notice if a GObject::notify > signal handler is leaked...) I'll check > >+ g_object_get (msg, SOUP_MESSAGE_HTTP_VERSION, &version, NULL); > > Use the direct accessor (soup_message_get_http_version()). It's faster. > > >+ /* Do never upgrade the HTTP version for a host. Note that all > >+ * the response fields (including the http version) are reset > >+ * just before sending a message. > >+ */ > > Yeah... so really, what you want is to connect to got-headers, not > notify::http-version. (Also, grammar note: Lose the word "Do" in that comment.) Yeah that was also my first idea but I changed my mind afterwards, don't recall the reason tough :)
Created attachment 179274 [details] [review] Patch v2 Second version of the patch with the requested changes. The logic to keep track of HTTP/1.0 hosts is really straightforward and IMHO will not cause a noticeable impact on performance. So we could keep it instead of unconditionally adding a new header, which is not that bad by itself, but will save us some network traffic. Changing the test is not needed now as we connect to "got-headers"
Comment on attachment 179274 [details] [review] Patch v2 > The logic to keep track of HTTP/1.0 hosts is really straightforward and IMHO > will not cause a noticeable impact on performance. OK, I always worry about adding more cases that have to lock mutexes, but it's probably not a big deal.
Committed 382ae1da1a23631e5c1274129077be4fcfcf84c4