GNOME Bugzilla – Bug 792124
Corrupt message error when trying to load some linkedin pages
Last modified: 2018-01-09 15:07:00 UTC
It happens with profiles pages, try https://www.linkedin.com/in/foo for example. This is because linkedin is not HTTP compliant and uses invalid HTTP status codes like 999. RFC says that status code are extensible, the client doesn't need to know all possible codes, but it must know the category, and 9 is not a valid category. The thing is that all other browsers I've tried (chrome, ff and safari) allow 999 as HTTP status code. WebKit works just fine if we simply remove the check from libsoup. What linkedin does is replying with 999 Request denied but provides several HTTP headers and a response body: date: Tue, 02 Jan 2018 10:53:14 GMT x-li-pop: prod-tln1 x-li-proto: http/2 x-li-uuid: J3+Cp8n4BRVAM1atiysAAA== set-cookie: trkCode=gf; Max-Age=5 trkInfo=AQEJwHSeldJuQAAAAWC2gP14p1rRFTVITbEz_d43U700TyP7BH97B1OXIgMVX6AmftRXHs7upsMPC5Rd-YJ0lFBSD_GR_KFiMTXmlai-BhWDfy4ccETeSFTagvLv0Ni0zYQQEbY=; Max-Age=5 rtc=AQF78MbC7rz4wAAAAWC2gP14p9OMkZX7bBp2YDYUVJfKcLjXaQ7LcgWVoOmHb5nGYy_WiDJ19_MA4fGBCrVTMvCicxlUTY-Lyw8XlZsD3M_oB-8dk-aBZX--PmthjUhvE8jgquFDDlrxKUC84hKlEUT3288EdersG6Gocb4r7kOjwlpGmrfhfHh7ZnK7e-hSRVTCK6qglxXxwJAV2FbVEj0xW0dQbYRrSV9yu7wpE6YH-jCDci2q1vJUGWMsDA==; Max-Age=120; path=/; domain=.linkedin.com content-length: 1461 content-type: text/html X-Firefox-Spdy: h2 <html><head> <script type="text/javascript"> window.onload = function() { // Parse the tracking code from cookies. var trk = "bf"; var trkInfo = "bf"; var cookies = document.cookie.split("; "); for (var i = 0; i < cookies.length; ++i) { if ((cookies[i].indexOf("trkCode=") == 0) && (cookies[i].length > 8)) { trk = cookies[i].substring(8); } else if ((cookies[i].indexOf("trkInfo=") == 0) && (cookies[i].length > 8)) { trkInfo = cookies[i].substring(8); } } if (window.location.protocol == "http:") { // If "sl" cookie is set, redirect to https. for (var i = 0; i < cookies.length; ++i) { if ((cookies[i].indexOf("sl=") == 0) && (cookies[i].length > 3)) { window.location.href = "https:" + window.location.href.substring(window.location.protocol.length); return; } } } // Get the new domain. For international domains such as // fr.linkedin.com, we convert it to www.linkedin.com var domain = "www.linkedin.com"; if (domain != location.host) { var subdomainIndex = location.host.indexOf(".linkedin"); if (subdomainIndex != -1) { domain = "www" + location.host.substring(subdomainIndex); } } window.location.href = "https://" + domain + "/authwall?trk=" + trk + "&trkInfo=" + trkInfo + "&originalReferer=" + document.referrer.substr(0, 200) + "&sessionRedirect=" + encodeURIComponent(window.location.href); } </script> </head></html> I read that linkedin could be blacklisting user agents or even IP address, but I've checked that both safari and firefox are indeed getting the 999 and simply handling it. I've checked chromium code and the HTTP header parser doesn't check the parsed status code either. I hate to say this, but we should probably ignore the parsed code (as long as it's a 3 digit number, of course). What do you guys think?
It seems XMLHTTPRequest allows any status code in the response: http://w3c-test.org/XMLHttpRequest/status-error.htm
IMO we should do exactly what you suggest.
Ok, I'll submit a patch tomorrow.
Created attachment 366238 [details] [review] soup-headers: accept any 3 digit number as message status code
Review of attachment 366238 [details] [review]: This looks good to me. Thanks for also adding the test case! Dan, do you mind giving it a final OK?
Well.. that's appalling. I suppose if you squint, it's possible to read the spec text as merely saying that if you don't know what "999" means, you have to treat it the same as "900", without actually saying that you have to know how to handle "900"... I can't decide if you should update SOUP_STATUS_IS_SERVER_ERROR() in soup-status.h to include everything >= 500. There's a vague unstated assumption that all statuses are covered by one of those macros... OTOH, nothing in libsoup even uses SOUP_STATUS_IS_SERVER_ERROR() anyway so it probably doesn't matter either way.
(In reply to Dan Winship from comment #6) > I can't decide if you should update SOUP_STATUS_IS_SERVER_ERROR() in > soup-status.h to include everything >= 500. There's a vague unstated > assumption that all statuses are covered by one of those macros... OTOH, > nothing in libsoup even uses SOUP_STATUS_IS_SERVER_ERROR() anyway so it > probably doesn't matter either way. Strictly speaking > 600 is not really a server error, so I'm not sure it would be good to use the same macro for spec-defined server errors and... this. We could add another macro for >= 600 so that they're still covered, even if in the end they are invalid. What do you think?
(In reply to Claudio Saavedra from comment #7) > (In reply to Dan Winship from comment #6) > > > I can't decide if you should update SOUP_STATUS_IS_SERVER_ERROR() in > > soup-status.h to include everything >= 500. There's a vague unstated > > assumption that all statuses are covered by one of those macros... OTOH, > > nothing in libsoup even uses SOUP_STATUS_IS_SERVER_ERROR() anyway so it > > probably doesn't matter either way. > > > Strictly speaking > 600 is not really a server error, so I'm not sure it > would be good to use the same macro for spec-defined server errors and... > this. We could add another macro for >= 600 so that they're still covered, > even if in the end they are invalid. What do you think? Error codes are extensible, it's fine that we don't cover all possible errors with macros, we only have macros for the ones defined in the spec.
Review of attachment 366238 [details] [review]: I think it would be good to land this in the stable branch at least as is so that linkedin can work again. As for master, I'd say let's land it and add whatever is needed later.
Comment on attachment 366238 [details] [review] soup-headers: accept any 3 digit number as message status code Pushed to both branches
(In reply to Claudio Saavedra from comment #7) > Strictly speaking > 600 is not really a server error It's not a "5xx Server Error", but it's a "server error"; HTTP/1.1 doesn't define error codes greater than 599, so if the server returns one in an HTTP/1.1 response, then the server has erred, right? :) Anyway, as I said, I wasn't sure if it would be a good idea or not, and it sounds like you think it's not a good idea, so the overall vote looks like "no".