GNOME Bugzilla – Bug 777714
Adblocker prevents uploading pictures to Twitter and breaks Twitter autocompletion
Last modified: 2017-02-07 07:39:11 UTC
Every time I try to upload a picture to twitter I get an error (Media upload failed, try again later). This is with Epiphany 3.22.4 and WebKitGTK+ 2.14.3, but it's been happening for a while. I suspect the problem is related to the User Agent, because I can do it with the MiniBrowser shipped with WebKitGTK+.
Anything in the web inspector? Does it work if you disable the adblocker? (In reply to Alberto Garcia from comment #0) > I suspect the problem is related to the User Agent, because I can do it with > the MiniBrowser shipped with WebKitGTK+. Well maybe, but it seems kinda unlikely; more likely some other Epiphany feature is to blame. If it was a user agent problem then I would expect MiniBrowser would not work either. Anyway, this is really easy to confirm or disprove. Visit https://techblog.willshouse.com/2012/01/03/most-common-user-agents/ in MiniBrowser to check your user agent there. Mine says: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/604.1 (KHTML, like Gecko) Version/10.0 Safari/604.1 And if I visit in Epiphany, it says: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/604.1 (KHTML, like Gecko) Version/10.0 Safari/604.1 Epiphany/3.23.4 So you see the only difference is the bit at the end. You can run MiniBrowser with Epiphany's user agent to see if that breaks it: MiniBrowser --user-agent='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/604.1 (KHTML, like Gecko) Version/10.0 Safari/604.1 Epiphany/3.23.4' You can also try running Epiphany with MiniBrowser's user agent: gsettings set org.gnome.Epiphany user-agent 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/604.1 (KHTML, like Gecko) Version/10.0 Safari/604.1' If it really turns out to be a user agent problem, then Twitter has lost the right to receive an accurate user agent from WebKit and I'll have it on the quirks list tomorrow, but I bet something else is wrong in Epiphany.
(In reply to Michael Catanzaro from comment #1) > Anything in the web inspector? Does it work if you disable the > adblocker? Actually it does work if I disable the adblocker, so it seems it was that!
(In reply to Alberto Garcia from comment #0) > Every time I try to upload a picture to twitter I get an error (Media upload > failed, try again later). > > This is with Epiphany 3.22.4 and WebKitGTK+ 2.14.3, but it's been happening > for a while. > > I suspect the problem is related to the User Agent, because I can do it with > the MiniBrowser shipped with WebKitGTK+. Strange, I logged into your test account but can't reproduce it. When I click the "add photos or video" button and select something with the file chooser, it seems to upload with no problems; I see a thumbnail of my image and if I click on it it appears in a lightbox. Epiphany 3.22.5 with WebKitGTK+ 2.14.3. Does it break for you before that point? Just to be sure we're using similar adblock filters, please open ~/.config/epiphany/adblock and confirm the last modified date is recent.
*** Bug 757983 has been marked as a duplicate of this bug. ***
(In reply to Michael Catanzaro from comment #3) > When I click the "add photos or video" button and select something > with the file chooser, it seems to upload with no problems; I see a > thumbnail of my image and if I click on it it appears in a lightbox. Yes, the preview works. It fails when you actually try to publish the tweet with that image. > please open ~/.config/epiphany/adblock and confirm the last modified > date is recent. 9th of January 2017.
So here are the different requests I see blocked by our adblocker: Request 'https://analytics.twitter.com/tpm/p?_=1485990140542' blocked (page: 'https://twitter.com/testaccount50/with_replies') Probably good to block that one. Request 'https://twitter.com/i/search/typeahead.json?count=1200&media_tagging_in_prefetch=true&prefetch=true&result_type=users&users_cache_age=0' blocked (page: 'https://twitter.com/testaccount50/with_replies') That one is probably what's breaking autocompletion. Request 'https://syndication.twitter.com/i/jot/syndication?l=%7B%22_category_%22%3A%22syndicated_impression%22%2C%22event_namespace%22%3A%7B%22client%22%3A%22web%22%2C%22page%22%3A%22me%22%2C%22action%22%3A%22impression%22%7D%2C%22triggered_on%22%3A1485990149363%7D' blocked (page: 'https://twitter.com/testaccount50/with_replies') Not sure. Request 'https://www.google-analytics.com/analytics.js' blocked (page: 'https://twitter.com/testaccount50/with_replies') Good. ** (WebKitWebProcess:18252): WARNING **: Request 'https://twitter.com/i/jot' blocked (page: 'https://twitter.com/testaccount50/with_replies') Not sure. Request 'https://upload.twitter.com/i/media/upload.json?command=INIT&total_bytes=56389&media_type=image%2Fjpeg&media_category=tweet_image' blocked (page: 'https://twitter.com/testaccount50/with_replies') Probably what's breaking uploads.
typeahead.json and upload.json are both being blocked by this filter pattern: .ad.json? which is exceedingly generic and obviously a bug.
Epiphany is escaping .ad.json? to .ad.json\?. I asked Christian about it. He observed that it needs to be escaped to \.ad\.json\? or it matches too much.
So there is existing code to escape periods, commented out. So can anybody think of a good reason to not escape these? I'm almost positive this is going to break something, because if not it would not have been commented out. But I have no clue what it will break, and this does fix Twitter. We need content extensions. :(
Created attachment 344753 [details] [review] uri-tester: escape parens Midori has been doing this since 2011, so probably we should too.
Created attachment 344754 [details] [review] uri-tester: Escape periods We are passing this into a regex, so need to escape it for it to match a period character rather than any old character. This fixes the adblocker being too aggressive and blocking important scripts on twitter.com. Note: I presume this was commented out because it broke something, but I've traveled far back into Midori's git history and have no clue what. bzr's blame feature is sufficiently broken that I can't tie it down to a particular commit. Well, we know the current code is wrong, and this fixes Twitter, so let's try it and see what happens.
(In reply to Michael Catanzaro from comment #10) > Created attachment 344753 [details] [review] [review] > uri-tester: escape parens > > Midori has been doing this since 2011, so probably we should too. Er 2013, got to remember to fix that when committing.
Both patches look reasonable to me
Comment on attachment 344753 [details] [review] uri-tester: escape parens I don't agree with the reason, though, we should do this because it's correct not because Midori does it.
Comment on attachment 344754 [details] [review] uri-tester: Escape periods Yes, let's try. We know for sure what this fixes, so let's not assume it breaks something else :-)
Review of attachment 344753 [details] [review]: ::: embed/web-extension/ephy-uri-tester.c @@ +244,1 @@ g_string_append_printf (str, "\\%c", *src); I wonder whether it would be safer to use g_regex_escape_string() first, and then transform the already-escaped string. For example, instead of transforming: foo*bar -> foo.*bar we would first escape the string with g_regex_escape_string(), then transform: foo\*bar -> foo.*bar Unless I am missing something, this would work because there is no way of escaping a “*” in the AdBlock filter syntax (according to https://adblockplus.org/filters), so we wouldn't run in double-escaping issues. This has the added benefit that the Epiphany code would contain less knowledge about how GRegex works, which would make it more stable and very likely cover a few more corner cases related to escaping that might not be being handled right now.
Review of attachment 344754 [details] [review]: ::: embed/web-extension/ephy-uri-tester.c @@ -262,3 @@ - if (str->str && str->str[len - 1] == '*' && str->str[len - 2] == '.') - g_string_erase (str, len - 2, 2); - Why is this being removed? It is true that “.*” at the end of the regular expression is kind of useless. If we prefer to keep it, then it would be better to transform: foo* into: foo.*$ This anchors the regexp match to the end of the string explicitly, which can be more easily optimized by regular expression engines.
(In reply to Adrian Perez from comment #17) > Review of attachment 344754 [details] [review] [review]: > > ::: embed/web-extension/ephy-uri-tester.c > @@ -262,3 @@ > - if (str->str && str->str[len - 1] == '*' && str->str[len - 2] == '.') > - g_string_erase (str, len - 2, 2); > - > > Why is this being removed? Because I think it's legit to have a . at the end of the rule now, matching an actual . and not any character. .* in a regex would just be stupid as the . there is redundant, and that's what this was removing before. But \.* is meaningful. > It is true that “.*” at the end of the regular > expression is kind of useless. If we prefer to keep it, then it would be > better to transform: > > foo* > > into: > > foo.*$ > > This anchors the regexp match to the end of the string explicitly, which can > be more easily optimized by regular expression engines. The filters are supposed to match any substring of the rule. I'm really not sure why the original code removed the trailing * (In reply to Adrian Perez from comment #16) > I wonder whether it would be safer to use g_regex_escape_string() first, and > then transform the already-escaped string. For example, instead of > transforming: > > foo*bar -> foo.*bar > > we would first escape the string with g_regex_escape_string(), then > transform: > > foo\*bar -> foo.*bar I think I don't want to try that in this patch. It might be good to do indeed, but I'm worried about corner cases since I don't really understand the code. Also, I have no clue how the adblocker handles differences between normal pattern filters and actual regex filters, and don't want to break those.
(In reply to Michael Catanzaro from comment #18) > I think I don't want to try that in this patch. It might be good to do > indeed, but I'm worried about corner cases since I don't really understand > the code. Also, I have no clue how the adblocker handles differences between > normal pattern filters and actual regex filters, and don't want to break > those. I mean, are there really *NO* special characters in adblock filters besides * that have the same meaning as in a regexp? We'd have to be sure we're un-escaping all the right characters if we do this. I think it needs an expert on adblock filters (hi ;)
g_regex_escape_string escapes the following characters: '\0', '\\', '|', '(', ')', '[', ']', '{', '}', '^', '$', '*', '+', '?', '.' Our current code escapes the following characters: '?', '[', ']' With my patches, it additionally escapes: '.', '(', ')' Our code removes (completely ignores) the following characters: '|', '^', '+' Hmm. (In reply to Michael Catanzaro from comment #18) > Because I think it's legit to have a . at the end of the rule now, matching > an actual . and not any character. .* in a regex would just be stupid as the > . there is redundant, and that's what this was removing before. But \.* is > meaningful. But I missed this rule: case '*': g_string_append (str, ".*"); break; That has to be wrong; this is not ready to commit until we understand what's going on here. Is it supposed to be "\.*" perhaps? That would at least make more sense.
Review of attachment 344754 [details] [review]: ::: embed/web-extension/ephy-uri-tester.c @@ +235,3 @@ break; + case '.': + g_string_append (str, "\\."); I should fall through to the next cases instead of doing anything here. The fallthrough is better and does the right thing.
(In reply to Michael Catanzaro from comment #18) > (In reply to Adrian Perez from comment #17) > > Review of attachment 344754 [details] [review] [review] [review]: > > > > ::: embed/web-extension/ephy-uri-tester.c > > @@ -262,3 @@ > > - if (str->str && str->str[len - 1] == '*' && str->str[len - 2] == '.') > > - g_string_erase (str, len - 2, 2); > > - > > > > Why is this being removed? > > Because I think it's legit to have a . at the end of the rule now, matching > an actual . and not any character. .* in a regex would just be stupid as the > . there is redundant, and that's what this was removing before. But \.* is > meaningful. I have re-read the page I linked earlier and you are right: having a period at the end of and AdBlock rule is valid, and of course “.*” also is valid. They translate as: AdBlock → GRegex ------- ------- foo. foo\. foo.* foo\..* (also works: foo\..*$) Both of the above match the same set of patterns. > > It is true that “.*” at the end of the regular > > expression is kind of useless. If we prefer to keep it, then it would be > > better to transform: > > > > foo* > > > > into: > > > > foo.*$ > > > > This anchors the regexp match to the end of the string explicitly, which can > > be more easily optimized by regular expression engines. > > The filters are supposed to match any substring of the rule. I'm really not > sure why the original code removed the trailing * AdBlock rules use the pipe character “|” for anchoring the match: AdBlock → GRegex ------- ------- |foo ^foo bar| bar$ The documentation at https://adblockplus.org/filters list as an example a rule to block Flash movie URLs which end exactly in “.swf” with: swf| That won't prevent loading other URLs which contain the string “swf” in the middle. > (In reply to Adrian Perez from comment #16) > > I wonder whether it would be safer to use g_regex_escape_string() first, and > > then transform the already-escaped string. For example, instead of > > transforming: > > > > foo*bar -> foo.*bar > > > > we would first escape the string with g_regex_escape_string(), then > > transform: > > > > foo\*bar -> foo.*bar > > I think I don't want to try that in this patch. It might be good to do > indeed, but I'm worried about corner cases since I don't really understand > the code. Also, I have no clue how the adblocker handles differences between > normal pattern filters and actual regex filters, and don't want to break > those. Yep, I understand your concerns about breaking things. Regarding the regexp vs. non-regexp rules, AdBlock parses the rule as a regexp when the rule is contained between slashes “/”, as in: .swf| → non-regexp rule /\.swf$/ → regexp rule It's explained here: https://adblockplus.org/filters#regexps The syntax used for regexps is that of JavaScript regular expressions, and unfortunately that's not exactly the same syntax as GRegex, which is compatible with PCRE/Perl. If there were some function in the JavaScriptCore API to easily construct JS-compatible regexps and check for matches, that would be the perfect solution... Unfortunately there isn't and we would need to go through many hoops to reuse JSC's regexp engine without modifying its API :-\
Note: we use GRegex's JS-compatible mode.
(In reply to Michael Catanzaro from comment #19) > (In reply to Michael Catanzaro from comment #18) > > I think I don't want to try that in this patch. It might be good to do > > indeed, but I'm worried about corner cases since I don't really understand > > the code. Also, I have no clue how the adblocker handles differences between > > normal pattern filters and actual regex filters, and don't want to break > > those. > > I mean, are there really *NO* special characters in adblock filters besides > * that have the same meaning as in a regexp? We'd have to be sure we're > un-escaping all the right characters if we do this. I think it needs an > expert on adblock filters (hi ;) I am no expert writing AdBlock rulesets, just reading the docs here O:-) (I have a pretty complete understanding about how regexps work, though!) From the AdBlock documentation, for no-regexp rules, those are the characters with special meanings: - “*” matches zero or more characters (like “.*” in a regexp). - “|” anchors a match to the beginning/end (like “^” and “$” in a regexp). - “||” anchors a match to the beginning, ignoring the URL scheme (this would be something like “^[a-zA-Z\-\+\.]:\/\/” in a regexp). - “^” matches a separator character. Documentation defines those as “anything but a letter, a digit, or one of the following: _ - . %” (regexp: “([^a-zA-Z\d]|[_\-\.%])”). - “$” separates the rule match from its options. Oddly enough, there is no explanation on how to match a literal dollar sign in a rule. That's probably fine because I understand that the URLs are matched in the %-encoded form, so the rule can match it with “%24”.
(In reply to Michael Catanzaro from comment #23) > Note: we use GRegex's JS-compatible mode. Somehow I missed this, awesome! As we discussed in chat, I will try to make a patch that handles all the special characters in AdBlock rules that I listed in my previous comment.
See also: bug #778054
Created attachment 344815 [details] [review] Ensure that all regexp special characters are escaped This version of the patch handles a few more escaping cases, and adds support for “^” to match a “delimiter character” and “|” at the end of a pattern to anchor it to the end of the string. IMHO it could obsolete both of the patches previously proposed. To make sure of not leaving anything behind, I took care of going through the GRegex docs (https://developer.gnome.org/glib/stable/glib-regex-syntax.html) and writing down each possible metacharacter as “handled already”, “not handled/needs work”, or “not handled, but cannot happen in AdBlock patterns”. Half of the patch are comments with a brief explanations — let's make it easier for others (or ourselves) coming back to this code in the future :-) I have tried with a few sites already, including Twitter (image upload and completions work), a few Bugzillas, Feedly, Google Inbox, WordReference, DuckDuckGo... all seem fine.
Review of attachment 344815 [details] [review]: OK great, let's do this. Regarding the commit message: please fix "conerting" -> "converting" first of all. Also prefix the first line of the commit message with "uri-tester: ". Also, since this patch does more than just ensure all special characters are escaped, please make the first line a bit more generic: e.g. "uri-tester: Ensure regex is properly constructed" Also please commit this to gnome-3-22. The uri-tester file moved to a different place, but other than that it should apply fine there. ::: embed/web-extension/ephy-uri-tester.c @@ +387,3 @@ + /* NOTE: The '$' is used as separator for the rule options, so rule patterns + cannot ever contain them. If a rule needs to match it, it uses "%24". Epiphany style is to put a * on each line of a multi-line comment: /* foo * foo * foo * foo */ @@ +411,3 @@ + break; + } + /* Fall-through to escape the vertical bar. */ I would escape this manually. It's too fragile to have a fall-through past this huge comment.
Created attachment 344851 [details] [review] [PATCH v2] uri-tester: Ensure regexps are properly constructed (for the gnome-3-33 branch) Updated patch to apply on th “gnome-3-22” branch.
Created attachment 344852 [details] [review] [PATCH v2] uri-tester: Ensure regexps are properly constructed (for the master branch) Same, checrry-picked and ready to apply onto “master”.
Note: Andres, I'm told this fixed the search page on Flickr, can you confirm? I bet this fixed odd problems on thousands or tens of thousands of websites. Dogfooding is good!
(In reply to Michael Catanzaro from comment #31) > Note: Andres, I'm told this fixed the search page on Flickr, can you > confirm? I bet this fixed odd problems on thousands or tens of thousands of > websites. Dogfooding is good! I'm a bit lost. What was broken on Flickr's search page?
See https://bugzilla.redhat.com/show_bug.cgi?id=1239249.
(In reply to Michael Catanzaro from comment #33) > See https://bugzilla.redhat.com/show_bug.cgi?id=1239249. Sorry, didn't realize about the "See also" field. Arguably, I've never had the adblock activated (I didn't even know, maybe that's why I hit much more bugs than most :P) Now, I think I have it but I still cannot reproduce this bug with 3.22.5.
(In reply to Andres Gomez from comment #34) > (In reply to Michael Catanzaro from comment #33) > > See https://bugzilla.redhat.com/show_bug.cgi?id=1239249. > > Sorry, didn't realize about the "See also" field. > > Arguably, I've never had the adblock activated (I didn't even know, maybe > that's why I hit much more bugs than most :P) > > Now, I think I have it but I still cannot reproduce this bug with 3.22.5. I mean, the problem with the flickr search. I've not tested the bug with Twitter.