Bug 792124 – Corrupt message error when trying to load some linkedin pages

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 792124 - Corrupt message error when trying to load some linkedin pages


Summary:	Corrupt message error when trying to load some linkedin pages


Status:	RESOLVED FIXED

Product:	libsoup
Classification:	Core
Component:	HTTP Transport
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	libsoup-maint@gnome.bugs
QA Contact:	libsoup-maint@gnome.bugs

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2018-01-02 11:59 UTC by Carlos Garcia Campos
Modified:	2018-01-09 15:07 UTC

See Also:	https://bugs.webkit.org/show_bug.cgi?id=179583
GNOME target:	---
GNOME version:	---

Attachments
soup-headers: accept any 3 digit number as message status code (2.56 KB, patch) 2018-01-03 08:01 UTC, Carlos Garcia Campos	committed	Details \| Review

Description Carlos Garcia Campos 2018-01-02 11:59:53 UTC

It happens with profiles pages, try https://www.linkedin.com/in/foo for example. This is because linkedin is not HTTP compliant and uses invalid HTTP status codes like 999. RFC says that status code are extensible, the client doesn't need to know all possible codes, but it must know the category, and 9 is not a valid category. The thing is that all other browsers I've tried (chrome, ff and safari) allow 999 as HTTP status code. WebKit works just fine if we simply remove the check from libsoup. What linkedin does is replying with 999 Request denied but provides several HTTP headers and a response body:

date: Tue, 02 Jan 2018 10:53:14 GMT
x-li-pop: prod-tln1
x-li-proto: http/2
x-li-uuid: J3+Cp8n4BRVAM1atiysAAA==
set-cookie: trkCode=gf; Max-Age=5
trkInfo=AQEJwHSeldJuQAAAAWC2gP14p1rRFTVITbEz_d43U700TyP7BH97B1OXIgMVX6AmftRXHs7upsMPC5Rd-YJ0lFBSD_GR_KFiMTXmlai-BhWDfy4ccETeSFTagvLv0Ni0zYQQEbY=; Max-Age=5
rtc=AQF78MbC7rz4wAAAAWC2gP14p9OMkZX7bBp2YDYUVJfKcLjXaQ7LcgWVoOmHb5nGYy_WiDJ19_MA4fGBCrVTMvCicxlUTY-Lyw8XlZsD3M_oB-8dk-aBZX--PmthjUhvE8jgquFDDlrxKUC84hKlEUT3288EdersG6Gocb4r7kOjwlpGmrfhfHh7ZnK7e-hSRVTCK6qglxXxwJAV2FbVEj0xW0dQbYRrSV9yu7wpE6YH-jCDci2q1vJUGWMsDA==; Max-Age=120; path=/; domain=.linkedin.com
content-length: 1461
content-type: text/html
X-Firefox-Spdy: h2

<html><head>
<script type="text/javascript">
window.onload = function() {
  // Parse the tracking code from cookies.
  var trk = "bf";
  var trkInfo = "bf";
  var cookies = document.cookie.split("; ");
  for (var i = 0; i < cookies.length; ++i) {
    if ((cookies[i].indexOf("trkCode=") == 0) && (cookies[i].length > 8)) {
      trk = cookies[i].substring(8);
    }
    else if ((cookies[i].indexOf("trkInfo=") == 0) && (cookies[i].length > 8)) {
      trkInfo = cookies[i].substring(8);
    }
  }

  if (window.location.protocol == "http:") {
    // If "sl" cookie is set, redirect to https.
    for (var i = 0; i < cookies.length; ++i) {
      if ((cookies[i].indexOf("sl=") == 0) && (cookies[i].length > 3)) {
        window.location.href = "https:" + window.location.href.substring(window.location.protocol.length);
        return;
      }
    }
  }

  // Get the new domain. For international domains such as
  // fr.linkedin.com, we convert it to www.linkedin.com
  var domain = "www.linkedin.com";
  if (domain != location.host) {
    var subdomainIndex = location.host.indexOf(".linkedin");
    if (subdomainIndex != -1) {
      domain = "www" + location.host.substring(subdomainIndex);
    }
  }

  window.location.href = "https://" + domain + "/authwall?trk=" + trk + "&trkInfo=" + trkInfo +
      "&originalReferer=" + document.referrer.substr(0, 200) +
      "&sessionRedirect=" + encodeURIComponent(window.location.href);
}
</script>
</head></html>

I read that linkedin could be blacklisting user agents or even IP address, but I've checked that both safari and firefox are indeed getting the 999 and simply handling it. I've checked chromium code and the HTTP header parser doesn't check the parsed status code either. I hate to say this, but we should probably ignore the parsed code (as long as it's a 3 digit number, of course).

What do you guys think?

Comment 1 Carlos Garcia Campos 2018-01-02 14:11:30 UTC

It seems XMLHTTPRequest allows any status code in the response:

http://w3c-test.org/XMLHttpRequest/status-error.htm

Comment 2 Michael Catanzaro 2018-01-02 15:46:56 UTC

IMO we should do exactly what you suggest.

Comment 3 Carlos Garcia Campos 2018-01-02 16:18:57 UTC

Ok, I'll submit a patch tomorrow.

Comment 4 Carlos Garcia Campos 2018-01-03 08:01:33 UTC

Created attachment 366238 [details] [review]
soup-headers: accept any 3 digit number as message status code

Comment 5 Claudio Saavedra 2018-01-03 09:28:47 UTC

Review of attachment 366238 [details] [review]:

This looks good to me. Thanks for also adding the test case! Dan, do you mind giving it a final OK?

Comment 6 Dan Winship 2018-01-04 18:00:03 UTC

Well.. that's appalling.

I suppose if you squint, it's possible to read the spec text as merely saying that if you don't know what "999" means, you have to treat it the same as "900", without actually saying that you have to know how to handle "900"...

I can't decide if you should update SOUP_STATUS_IS_SERVER_ERROR() in soup-status.h to include everything >= 500. There's a vague unstated assumption that all statuses are covered by one of those macros... OTOH, nothing in libsoup even uses SOUP_STATUS_IS_SERVER_ERROR() anyway so it probably doesn't matter either way.

Comment 7 Claudio Saavedra 2018-01-09 10:36:26 UTC

(In reply to Dan Winship from comment #6)

> I can't decide if you should update SOUP_STATUS_IS_SERVER_ERROR() in
> soup-status.h to include everything >= 500. There's a vague unstated
> assumption that all statuses are covered by one of those macros... OTOH,
> nothing in libsoup even uses SOUP_STATUS_IS_SERVER_ERROR() anyway so it
> probably doesn't matter either way.


Strictly speaking > 600 is not really a server error, so I'm not sure it would be good to use the same macro for spec-defined server errors and... this. We could add another macro for >= 600 so that they're still covered, even if in the end they are invalid. What do you think?

Comment 8 Carlos Garcia Campos 2018-01-09 10:44:39 UTC

(In reply to Claudio Saavedra from comment #7)
> (In reply to Dan Winship from comment #6)
> 
> > I can't decide if you should update SOUP_STATUS_IS_SERVER_ERROR() in
> > soup-status.h to include everything >= 500. There's a vague unstated
> > assumption that all statuses are covered by one of those macros... OTOH,
> > nothing in libsoup even uses SOUP_STATUS_IS_SERVER_ERROR() anyway so it
> > probably doesn't matter either way.
> 
> 
> Strictly speaking > 600 is not really a server error, so I'm not sure it
> would be good to use the same macro for spec-defined server errors and...
> this. We could add another macro for >= 600 so that they're still covered,
> even if in the end they are invalid. What do you think?

Error codes are extensible, it's fine that we don't cover all possible errors with macros, we only have macros for the ones defined in the spec.

Comment 9 Claudio Saavedra 2018-01-09 10:50:33 UTC

Review of attachment 366238 [details] [review]:

I think it would be good to land this in the stable branch at least as is so that linkedin can work again. As for master, I'd say let's land it and add whatever is needed later.

Comment 10 Carlos Garcia Campos 2018-01-09 10:52:44 UTC

Comment on attachment 366238 [details] [review]
soup-headers: accept any 3 digit number as message status code

Pushed to both branches

Comment 11 Dan Winship 2018-01-09 15:07:00 UTC

(In reply to Claudio Saavedra from comment #7)
> Strictly speaking > 600 is not really a server error

It's not a "5xx Server Error", but it's a "server error"; HTTP/1.1 doesn't define error codes greater than 599, so if the server returns one in an HTTP/1.1 response, then the server has erred, right? :)

Anyway, as I said, I wasn't sure if it would be a good idea or not, and it sounds like you think it's not a good idea, so the overall vote looks like "no".