After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 792124 - Corrupt message error when trying to load some linkedin pages
Corrupt message error when trying to load some linkedin pages
Status: RESOLVED FIXED
Product: libsoup
Classification: Core
Component: HTTP Transport
unspecified
Other Linux
: Normal normal
: ---
Assigned To: libsoup-maint@gnome.bugs
libsoup-maint@gnome.bugs
Depends on:
Blocks:
 
 
Reported: 2018-01-02 11:59 UTC by Carlos Garcia Campos
Modified: 2018-01-09 15:07 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
soup-headers: accept any 3 digit number as message status code (2.56 KB, patch)
2018-01-03 08:01 UTC, Carlos Garcia Campos
committed Details | Review

Description Carlos Garcia Campos 2018-01-02 11:59:53 UTC
It happens with profiles pages, try https://www.linkedin.com/in/foo for example. This is because linkedin is not HTTP compliant and uses invalid HTTP status codes like 999. RFC says that status code are extensible, the client doesn't need to know all possible codes, but it must know the category, and 9 is not a valid category. The thing is that all other browsers I've tried (chrome, ff and safari) allow 999 as HTTP status code. WebKit works just fine if we simply remove the check from libsoup. What linkedin does is replying with 999 Request denied but provides several HTTP headers and a response body:

date: Tue, 02 Jan 2018 10:53:14 GMT
x-li-pop: prod-tln1
x-li-proto: http/2
x-li-uuid: J3+Cp8n4BRVAM1atiysAAA==
set-cookie: trkCode=gf; Max-Age=5
trkInfo=AQEJwHSeldJuQAAAAWC2gP14p1rRFTVITbEz_d43U700TyP7BH97B1OXIgMVX6AmftRXHs7upsMPC5Rd-YJ0lFBSD_GR_KFiMTXmlai-BhWDfy4ccETeSFTagvLv0Ni0zYQQEbY=; Max-Age=5
rtc=AQF78MbC7rz4wAAAAWC2gP14p9OMkZX7bBp2YDYUVJfKcLjXaQ7LcgWVoOmHb5nGYy_WiDJ19_MA4fGBCrVTMvCicxlUTY-Lyw8XlZsD3M_oB-8dk-aBZX--PmthjUhvE8jgquFDDlrxKUC84hKlEUT3288EdersG6Gocb4r7kOjwlpGmrfhfHh7ZnK7e-hSRVTCK6qglxXxwJAV2FbVEj0xW0dQbYRrSV9yu7wpE6YH-jCDci2q1vJUGWMsDA==; Max-Age=120; path=/; domain=.linkedin.com
content-length: 1461
content-type: text/html
X-Firefox-Spdy: h2

<html><head>
<script type="text/javascript">
window.onload = function() {
  // Parse the tracking code from cookies.
  var trk = "bf";
  var trkInfo = "bf";
  var cookies = document.cookie.split("; ");
  for (var i = 0; i < cookies.length; ++i) {
    if ((cookies[i].indexOf("trkCode=") == 0) && (cookies[i].length > 8)) {
      trk = cookies[i].substring(8);
    }
    else if ((cookies[i].indexOf("trkInfo=") == 0) && (cookies[i].length > 8)) {
      trkInfo = cookies[i].substring(8);
    }
  }

  if (window.location.protocol == "http:") {
    // If "sl" cookie is set, redirect to https.
    for (var i = 0; i < cookies.length; ++i) {
      if ((cookies[i].indexOf("sl=") == 0) && (cookies[i].length > 3)) {
        window.location.href = "https:" + window.location.href.substring(window.location.protocol.length);
        return;
      }
    }
  }

  // Get the new domain. For international domains such as
  // fr.linkedin.com, we convert it to www.linkedin.com
  var domain = "www.linkedin.com";
  if (domain != location.host) {
    var subdomainIndex = location.host.indexOf(".linkedin");
    if (subdomainIndex != -1) {
      domain = "www" + location.host.substring(subdomainIndex);
    }
  }

  window.location.href = "https://" + domain + "/authwall?trk=" + trk + "&trkInfo=" + trkInfo +
      "&originalReferer=" + document.referrer.substr(0, 200) +
      "&sessionRedirect=" + encodeURIComponent(window.location.href);
}
</script>
</head></html>

I read that linkedin could be blacklisting user agents or even IP address, but I've checked that both safari and firefox are indeed getting the 999 and simply handling it. I've checked chromium code and the HTTP header parser doesn't check the parsed status code either. I hate to say this, but we should probably ignore the parsed code (as long as it's a 3 digit number, of course).

What do you guys think?
Comment 1 Carlos Garcia Campos 2018-01-02 14:11:30 UTC
It seems XMLHTTPRequest allows any status code in the response:

http://w3c-test.org/XMLHttpRequest/status-error.htm
Comment 2 Michael Catanzaro 2018-01-02 15:46:56 UTC
IMO we should do exactly what you suggest.
Comment 3 Carlos Garcia Campos 2018-01-02 16:18:57 UTC
Ok, I'll submit a patch tomorrow.
Comment 4 Carlos Garcia Campos 2018-01-03 08:01:33 UTC
Created attachment 366238 [details] [review]
soup-headers: accept any 3 digit number as message status code
Comment 5 Claudio Saavedra 2018-01-03 09:28:47 UTC
Review of attachment 366238 [details] [review]:

This looks good to me. Thanks for also adding the test case! Dan, do you mind giving it a final OK?
Comment 6 Dan Winship 2018-01-04 18:00:03 UTC
Well.. that's appalling.

I suppose if you squint, it's possible to read the spec text as merely saying that if you don't know what "999" means, you have to treat it the same as "900", without actually saying that you have to know how to handle "900"...

I can't decide if you should update SOUP_STATUS_IS_SERVER_ERROR() in soup-status.h to include everything >= 500. There's a vague unstated assumption that all statuses are covered by one of those macros... OTOH, nothing in libsoup even uses SOUP_STATUS_IS_SERVER_ERROR() anyway so it probably doesn't matter either way.
Comment 7 Claudio Saavedra 2018-01-09 10:36:26 UTC
(In reply to Dan Winship from comment #6)

> I can't decide if you should update SOUP_STATUS_IS_SERVER_ERROR() in
> soup-status.h to include everything >= 500. There's a vague unstated
> assumption that all statuses are covered by one of those macros... OTOH,
> nothing in libsoup even uses SOUP_STATUS_IS_SERVER_ERROR() anyway so it
> probably doesn't matter either way.


Strictly speaking > 600 is not really a server error, so I'm not sure it would be good to use the same macro for spec-defined server errors and... this. We could add another macro for >= 600 so that they're still covered, even if in the end they are invalid. What do you think?
Comment 8 Carlos Garcia Campos 2018-01-09 10:44:39 UTC
(In reply to Claudio Saavedra from comment #7)
> (In reply to Dan Winship from comment #6)
> 
> > I can't decide if you should update SOUP_STATUS_IS_SERVER_ERROR() in
> > soup-status.h to include everything >= 500. There's a vague unstated
> > assumption that all statuses are covered by one of those macros... OTOH,
> > nothing in libsoup even uses SOUP_STATUS_IS_SERVER_ERROR() anyway so it
> > probably doesn't matter either way.
> 
> 
> Strictly speaking > 600 is not really a server error, so I'm not sure it
> would be good to use the same macro for spec-defined server errors and...
> this. We could add another macro for >= 600 so that they're still covered,
> even if in the end they are invalid. What do you think?

Error codes are extensible, it's fine that we don't cover all possible errors with macros, we only have macros for the ones defined in the spec.
Comment 9 Claudio Saavedra 2018-01-09 10:50:33 UTC
Review of attachment 366238 [details] [review]:

I think it would be good to land this in the stable branch at least as is so that linkedin can work again. As for master, I'd say let's land it and add whatever is needed later.
Comment 10 Carlos Garcia Campos 2018-01-09 10:52:44 UTC
Comment on attachment 366238 [details] [review]
soup-headers: accept any 3 digit number as message status code

Pushed to both branches
Comment 11 Dan Winship 2018-01-09 15:07:00 UTC
(In reply to Claudio Saavedra from comment #7)
> Strictly speaking > 600 is not really a server error

It's not a "5xx Server Error", but it's a "server error"; HTTP/1.1 doesn't define error codes greater than 599, so if the server returns one in an HTTP/1.1 response, then the server has erred, right? :)

Anyway, as I said, I wasn't sure if it would be a good idea or not, and it sounds like you think it's not a good idea, so the overall vote looks like "no".