Bug 777714 – Adblocker prevents uploading pictures to Twitter and breaks Twitter autocompletion

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 777714 - Adblocker prevents uploading pictures to Twitter and breaks Twitter autocompletion


Summary:	Adblocker prevents uploading pictures to Twitter and breaks Twitter autocompl...


Status:	RESOLVED FIXED

Product:	epiphany
Classification:	Core
Component:	General
Version:	3.22.x (obsolete)
Hardware:	Other Linux

Importance:	Normal major
Target Milestone:	---
Assigned To:	Epiphany Maintainers
QA Contact:	Epiphany Maintainers

URL:
Whiteboard:

Duplicates:	757983 (view as bug list)
Depends on:
Blocks:	776514

Reported:	2017-01-24 20:30 UTC by Alberto Garcia
Modified:	2017-02-07 07:39 UTC

See Also:	https://bugzilla.redhat.com/show_bug.cgi?id=1239249
GNOME target:	---
GNOME version:	---

Attachments
uri-tester: escape parens (870 bytes, patch) 2017-02-02 00:31 UTC, Michael Catanzaro	accepted-commit_now	Details \| Review
uri-tester: Escape periods (1.77 KB, patch) 2017-02-02 00:31 UTC, Michael Catanzaro	needs-work	Details \| Review
Ensure that all regexp special characters are escaped (4.33 KB, patch) 2017-02-02 23:43 UTC, Adrian Perez	none	Details \| Review
[PATCH v2] uri-tester: Ensure regexps are properly constructed (for the gnome-3-33 branch) (4.32 KB, patch) 2017-02-03 13:16 UTC, Adrian Perez	committed	Details \| Review
[PATCH v2] uri-tester: Ensure regexps are properly constructed (for the master branch) (4.32 KB, patch) 2017-02-03 13:20 UTC, Adrian Perez	committed	Details \| Review

Description Alberto Garcia 2017-01-24 20:30:41 UTC

Every time I try to upload a picture to twitter I get an error (Media upload failed, try again later).

This is with Epiphany 3.22.4 and WebKitGTK+ 2.14.3, but it's been happening for a while.

I suspect the problem is related to the User Agent, because I can do it with the MiniBrowser shipped with WebKitGTK+.

Comment 1 Michael Catanzaro 2017-01-24 21:54:15 UTC

Anything in the web inspector? Does it work if you disable the adblocker?

(In reply to Alberto Garcia from comment #0)
> I suspect the problem is related to the User Agent, because I can do it with
> the MiniBrowser shipped with WebKitGTK+.

Well maybe, but it seems kinda unlikely; more likely some other Epiphany feature is to blame. If it was a user agent problem then I would expect MiniBrowser would not work either. Anyway, this is really easy to confirm or disprove. Visit https://techblog.willshouse.com/2012/01/03/most-common-user-agents/ in MiniBrowser to check your user agent there. Mine says:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/604.1 (KHTML, like Gecko) Version/10.0 Safari/604.1

And if I visit in Epiphany, it says:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/604.1 (KHTML, like Gecko) Version/10.0 Safari/604.1 Epiphany/3.23.4

So you see the only difference is the bit at the end. You can run MiniBrowser with Epiphany's user agent to see if that breaks it:

MiniBrowser --user-agent='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/604.1 (KHTML, like Gecko) Version/10.0 Safari/604.1 Epiphany/3.23.4'

You can also try running Epiphany with MiniBrowser's user agent:

gsettings set org.gnome.Epiphany user-agent 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/604.1 (KHTML, like Gecko) Version/10.0 Safari/604.1'

If it really turns out to be a user agent problem, then Twitter has lost the right to receive an accurate user agent from WebKit and I'll have it on the quirks list tomorrow, but I bet something else is wrong in Epiphany.

Comment 2 Alberto Garcia 2017-01-25 13:00:55 UTC

(In reply to Michael Catanzaro from comment #1)
> Anything in the web inspector? Does it work if you disable the
> adblocker?

Actually it does work if I disable the adblocker, so it seems it was
that!

Comment 3 Michael Catanzaro 2017-01-25 18:04:46 UTC

(In reply to Alberto Garcia from comment #0)
> Every time I try to upload a picture to twitter I get an error (Media upload
> failed, try again later).
> 
> This is with Epiphany 3.22.4 and WebKitGTK+ 2.14.3, but it's been happening
> for a while.
> 
> I suspect the problem is related to the User Agent, because I can do it with
> the MiniBrowser shipped with WebKitGTK+.

Strange, I logged into your test account but can't reproduce it. When I click the "add photos or video" button and select something with the file chooser, it seems to upload with no problems; I see a thumbnail of my image and if I click on it it appears in a lightbox. Epiphany 3.22.5 with WebKitGTK+ 2.14.3. Does it break for you before that point?

Just to be sure we're using similar adblock filters, please open ~/.config/epiphany/adblock and confirm the last modified date is recent.

Comment 4 Michael Catanzaro 2017-01-26 13:55:52 UTC

*** Bug 757983 has been marked as a duplicate of this bug. ***

Comment 5 Alberto Garcia 2017-01-26 13:59:57 UTC

(In reply to Michael Catanzaro from comment #3)
> When I click the "add photos or video" button and select something
> with the file chooser, it seems to upload with no problems; I see a
> thumbnail of my image and if I click on it it appears in a lightbox.

Yes, the preview works. It fails when you actually try to publish the
tweet with that image.

> please open ~/.config/epiphany/adblock and confirm the last modified
> date is recent.

9th of January 2017.

Comment 6 Michael Catanzaro 2017-02-01 23:05:41 UTC

So here are the different requests I see blocked by our adblocker:

Request 'https://analytics.twitter.com/tpm/p?_=1485990140542' blocked (page: 'https://twitter.com/testaccount50/with_replies')

Probably good to block that one.

Request 'https://twitter.com/i/search/typeahead.json?count=1200&media_tagging_in_prefetch=true&prefetch=true&result_type=users&users_cache_age=0' blocked (page: 'https://twitter.com/testaccount50/with_replies')

That one is probably what's breaking autocompletion.

Request 'https://syndication.twitter.com/i/jot/syndication?l=%7B%22_category_%22%3A%22syndicated_impression%22%2C%22event_namespace%22%3A%7B%22client%22%3A%22web%22%2C%22page%22%3A%22me%22%2C%22action%22%3A%22impression%22%7D%2C%22triggered_on%22%3A1485990149363%7D' blocked (page: 'https://twitter.com/testaccount50/with_replies')

Not sure.

Request 'https://www.google-analytics.com/analytics.js' blocked (page: 'https://twitter.com/testaccount50/with_replies')

Good.

** (WebKitWebProcess:18252): WARNING **: Request 'https://twitter.com/i/jot' blocked (page: 'https://twitter.com/testaccount50/with_replies')


Not sure.

Request 'https://upload.twitter.com/i/media/upload.json?command=INIT&total_bytes=56389&media_type=image%2Fjpeg&media_category=tweet_image' blocked (page: 'https://twitter.com/testaccount50/with_replies')

Probably what's breaking uploads.

Comment 7 Michael Catanzaro 2017-02-01 23:13:12 UTC

typeahead.json and upload.json are both being blocked by this filter pattern:

.ad.json?

which is exceedingly generic and obviously a bug.

Comment 8 Michael Catanzaro 2017-02-01 23:22:23 UTC

Epiphany is escaping .ad.json? to .ad.json\?. I asked Christian about it. He observed that it needs to be escaped to \.ad\.json\? or it matches too much.

Comment 9 Michael Catanzaro 2017-02-02 00:30:25 UTC

So there is existing code to escape periods, commented out.

So can anybody think of a good reason to not escape these? I'm almost positive this is going to break something, because if not it would not have been commented out. But I have no clue what it will break, and this does fix Twitter.

We need content extensions. :(

Comment 10 Michael Catanzaro 2017-02-02 00:31:15 UTC

Created attachment 344753 [details] [review]
uri-tester: escape parens

Midori has been doing this since 2011, so probably we should too.

Comment 11 Michael Catanzaro 2017-02-02 00:31:18 UTC

Created attachment 344754 [details] [review]
uri-tester: Escape periods

We are passing this into a regex, so need to escape it for it to match a
period character rather than any old character. This fixes the adblocker
being too aggressive and blocking important scripts on twitter.com.

Note: I presume this was commented out because it broke something, but
I've traveled far back into Midori's git history and have no clue what.
bzr's blame feature is sufficiently broken that I can't tie it down to a
particular commit. Well, we know the current code is wrong, and this
fixes Twitter, so let's try it and see what happens.

Comment 12 Michael Catanzaro 2017-02-02 00:33:51 UTC

(In reply to Michael Catanzaro from comment #10)
> Created attachment 344753 [details] [review] [review]
> uri-tester: escape parens
> 
> Midori has been doing this since 2011, so probably we should too.

Er 2013, got to remember to fix that when committing.

Comment 13 Christian Hergert 2017-02-02 00:38:33 UTC

Both patches look reasonable to me

Comment 14 Carlos Garcia Campos 2017-02-02 06:17:51 UTC

Comment on attachment 344753 [details] [review]
uri-tester: escape parens

I don't agree with the reason, though, we should do this because it's correct not because Midori does it.

Comment 15 Carlos Garcia Campos 2017-02-02 06:18:38 UTC

Comment on attachment 344754 [details] [review]
uri-tester: Escape periods

Yes, let's try. We know for sure what this fixes, so let's not assume it breaks something else :-)

Comment 16 Adrian Perez 2017-02-02 08:04:45 UTC

Review of attachment 344753 [details] [review]:

::: embed/web-extension/ephy-uri-tester.c
@@ +244,1 @@
         g_string_append_printf (str, "\\%c", *src);

I wonder whether it would be safer to use g_regex_escape_string() first, and then transform the already-escaped string. For example, instead of transforming:

  foo*bar -> foo.*bar

we would first escape the string with g_regex_escape_string(), then transform:

  foo\*bar -> foo.*bar

Unless I am missing something, this would work because there is no way of escaping a “*” in the AdBlock filter syntax (according to https://adblockplus.org/filters), so we wouldn't run in double-escaping issues.

This has the added benefit that the Epiphany code would contain less knowledge about how GRegex works, which would make it more stable and very likely cover a few more corner cases related to escaping that might not be being handled right now.

Comment 17 Adrian Perez 2017-02-02 08:11:03 UTC

Review of attachment 344754 [details] [review]:

::: embed/web-extension/ephy-uri-tester.c
@@ -262,3 @@
-  if (str->str && str->str[len - 1] == '*' && str->str[len - 2] == '.')
-    g_string_erase (str, len - 2, 2);
-

Why is this being removed? It is true that “.*” at the end of the regular expression is kind of useless. If we prefer to keep it, then it would be better to transform:

  foo*

into:

  foo.*$

This anchors the regexp match to the end of the string explicitly, which can be more easily optimized by regular expression engines.

Comment 18 Michael Catanzaro 2017-02-02 15:53:21 UTC

(In reply to Adrian Perez from comment #17)
> Review of attachment 344754 [details] [review] [review]:
> 
> ::: embed/web-extension/ephy-uri-tester.c
> @@ -262,3 @@
> -  if (str->str && str->str[len - 1] == '*' && str->str[len - 2] == '.')
> -    g_string_erase (str, len - 2, 2);
> -
> 
> Why is this being removed?

Because I think it's legit to have a . at the end of the rule now, matching an actual . and not any character. .* in a regex would just be stupid as the . there is redundant, and that's what this was removing before. But \.* is meaningful.

> It is true that “.*” at the end of the regular
> expression is kind of useless. If we prefer to keep it, then it would be
> better to transform:
> 
>   foo*
> 
> into:
> 
>   foo.*$
> 
> This anchors the regexp match to the end of the string explicitly, which can
> be more easily optimized by regular expression engines.

The filters are supposed to match any substring of the rule. I'm really not sure why the original code removed the trailing *

(In reply to Adrian Perez from comment #16)
> I wonder whether it would be safer to use g_regex_escape_string() first, and
> then transform the already-escaped string. For example, instead of
> transforming:
> 
>   foo*bar -> foo.*bar
> 
> we would first escape the string with g_regex_escape_string(), then
> transform:
> 
>   foo\*bar -> foo.*bar

I think I don't want to try that in this patch. It might be good to do indeed, but I'm worried about corner cases since I don't really understand the code. Also, I have no clue how the adblocker handles differences between normal pattern filters and actual regex filters, and don't want to break those.

Comment 19 Michael Catanzaro 2017-02-02 16:28:15 UTC

(In reply to Michael Catanzaro from comment #18)
> I think I don't want to try that in this patch. It might be good to do
> indeed, but I'm worried about corner cases since I don't really understand
> the code. Also, I have no clue how the adblocker handles differences between
> normal pattern filters and actual regex filters, and don't want to break
> those.

I mean, are there really *NO* special characters in adblock filters besides * that have the same meaning as in a regexp? We'd have to be sure we're un-escaping all the right characters if we do this. I think it needs an expert on adblock filters (hi ;)

Comment 20 Michael Catanzaro 2017-02-02 16:35:13 UTC

g_regex_escape_string escapes the following characters: '\0', '\\', '|', '(', ')', '[', ']', '{', '}', '^', '$', '*', '+', '?', '.'

Our current code escapes the following characters: '?', '[', ']'

With my patches, it additionally escapes: '.', '(', ')'

Our code removes (completely ignores) the following characters:

'|', '^', '+'

Hmm.

(In reply to Michael Catanzaro from comment #18)
> Because I think it's legit to have a . at the end of the rule now, matching
> an actual . and not any character. .* in a regex would just be stupid as the
> . there is redundant, and that's what this was removing before. But \.* is
> meaningful.

But I missed this rule:

case '*':
    g_string_append (str, ".*");
    break;

That has to be wrong; this is not ready to commit until we understand what's going on here. Is it supposed to be "\.*" perhaps? That would at least make more sense.

Comment 21 Michael Catanzaro 2017-02-02 16:36:22 UTC

Review of attachment 344754 [details] [review]:

::: embed/web-extension/ephy-uri-tester.c
@@ +235,3 @@
         break;
+      case '.':
+        g_string_append (str, "\\.");

I should fall through to the next cases instead of doing anything here. The fallthrough is better and does the right thing.

Comment 22 Adrian Perez 2017-02-02 16:52:40 UTC

(In reply to Michael Catanzaro from comment #18)
> (In reply to Adrian Perez from comment #17)
> > Review of attachment 344754 [details] [review] [review] [review]:
> > 
> > ::: embed/web-extension/ephy-uri-tester.c
> > @@ -262,3 @@
> > -  if (str->str && str->str[len - 1] == '*' && str->str[len - 2] == '.')
> > -    g_string_erase (str, len - 2, 2);
> > -
> > 
> > Why is this being removed?
> 
> Because I think it's legit to have a . at the end of the rule now, matching
> an actual . and not any character. .* in a regex would just be stupid as the
> . there is redundant, and that's what this was removing before. But \.* is
> meaningful.

I have re-read the page I linked earlier and you are right: having a period
at the end of and AdBlock rule is valid, and of course “.*” also is valid.
They translate as:

  AdBlock → GRegex
  -------   -------
  foo.      foo\.
  foo.*     foo\..*  (also works: foo\..*$)
   
Both of the above match the same set of patterns.

> > It is true that “.*” at the end of the regular
> > expression is kind of useless. If we prefer to keep it, then it would be
> > better to transform:
> > 
> >   foo*
> > 
> > into:
> > 
> >   foo.*$
> > 
> > This anchors the regexp match to the end of the string explicitly, which can
> > be more easily optimized by regular expression engines.
> 
> The filters are supposed to match any substring of the rule. I'm really not
> sure why the original code removed the trailing *

AdBlock rules use the pipe character “|” for anchoring the match:

  AdBlock → GRegex
  -------   -------
  |foo      ^foo
  bar|      bar$

The documentation at https://adblockplus.org/filters list as an example
a rule to block Flash movie URLs which end exactly in “.swf” with:

  swf|

That won't prevent loading other URLs which contain the string “swf” in
the middle.

> (In reply to Adrian Perez from comment #16)
> > I wonder whether it would be safer to use g_regex_escape_string() first, and
> > then transform the already-escaped string. For example, instead of
> > transforming:
> > 
> >   foo*bar -> foo.*bar
> > 
> > we would first escape the string with g_regex_escape_string(), then
> > transform:
> > 
> >   foo\*bar -> foo.*bar
> 
> I think I don't want to try that in this patch. It might be good to do
> indeed, but I'm worried about corner cases since I don't really understand
> the code. Also, I have no clue how the adblocker handles differences between
> normal pattern filters and actual regex filters, and don't want to break
> those.

Yep, I understand your concerns about breaking things.

Regarding the regexp vs. non-regexp rules, AdBlock parses the rule as a regexp
when the rule is contained between slashes “/”, as in:

  .swf|     → non-regexp rule
  /\.swf$/  → regexp rule

It's explained here: https://adblockplus.org/filters#regexps

The syntax used for regexps is that of JavaScript regular expressions, and
unfortunately that's not exactly the same syntax as GRegex, which is compatible
with PCRE/Perl. If there were some function in the JavaScriptCore API to
easily construct JS-compatible regexps and check for matches, that would be
the perfect solution... Unfortunately there isn't and we would need to go
through many hoops to reuse JSC's regexp engine without modifying its API :-\

Comment 23 Michael Catanzaro 2017-02-02 17:01:55 UTC

Note: we use GRegex's JS-compatible mode.

Comment 24 Adrian Perez 2017-02-02 17:08:23 UTC

(In reply to Michael Catanzaro from comment #19)
> (In reply to Michael Catanzaro from comment #18)
> > I think I don't want to try that in this patch. It might be good to do
> > indeed, but I'm worried about corner cases since I don't really understand
> > the code. Also, I have no clue how the adblocker handles differences between
> > normal pattern filters and actual regex filters, and don't want to break
> > those.
> 
> I mean, are there really *NO* special characters in adblock filters besides
> * that have the same meaning as in a regexp? We'd have to be sure we're
> un-escaping all the right characters if we do this. I think it needs an
> expert on adblock filters (hi ;)

I am no expert writing AdBlock rulesets, just reading the docs here O:-)
(I have a pretty complete understanding about how regexps work, though!)

From the AdBlock documentation, for no-regexp rules, those are the characters
with special meanings:

 - “*” matches zero or more characters (like “.*” in a regexp).

 - “|” anchors a match to the beginning/end (like “^” and “$” in a regexp).

 - “||” anchors a match to the beginning, ignoring the URL scheme (this
   would be something like “^[a-zA-Z\-\+\.]:\/\/” in a regexp).

 - “^” matches a separator character. Documentation defines those as
   “anything but a letter, a digit, or one of the following: _ - . %”
   (regexp: “([^a-zA-Z\d]|[_\-\.%])”).

 - “$” separates the rule match from its options. Oddly enough, there is
   no explanation on how to match a literal dollar sign in a rule. That's
   probably fine because I understand that the URLs are matched in the
   %-encoded form, so the rule can match it with “%24”.

Comment 25 Adrian Perez 2017-02-02 17:18:37 UTC

(In reply to Michael Catanzaro from comment #23)
> Note: we use GRegex's JS-compatible mode.

Somehow I missed this, awesome! As we discussed in chat, I will try to
make a patch that handles all the special characters in AdBlock rules
that I listed in my previous comment.

Comment 26 Michael Catanzaro 2017-02-02 18:08:30 UTC

See also: bug #778054

Comment 27 Adrian Perez 2017-02-02 23:43:15 UTC

Created attachment 344815 [details] [review]
Ensure that all regexp special characters are escaped

This version of the patch handles a few more escaping cases, and adds support
for “^” to match a “delimiter character” and “|” at the end of a pattern to
anchor it to the end of the string. IMHO it could obsolete both of the patches
previously proposed.

To make sure of not leaving anything behind, I took care of going through the
GRegex docs (https://developer.gnome.org/glib/stable/glib-regex-syntax.html)
and writing down each possible metacharacter as “handled already”, “not
handled/needs work”, or “not handled, but cannot happen in AdBlock patterns”.
Half of the patch are comments with a brief explanations — let's make it
easier for others (or ourselves) coming back to this code in the future :-)

I have tried with a few sites already, including Twitter (image upload and
completions work), a few Bugzillas, Feedly, Google Inbox, WordReference,
DuckDuckGo... all seem fine.

Comment 28 Michael Catanzaro 2017-02-02 23:53:16 UTC

Review of attachment 344815 [details] [review]:

OK great, let's do this.

Regarding the commit message: please fix "conerting" -> "converting" first of all. Also prefix the first line of the commit message with "uri-tester: ". Also, since this patch does more than just ensure all special characters are escaped, please make the first line a bit more generic: e.g. "uri-tester: Ensure regex is properly constructed"

Also please commit this to gnome-3-22. The uri-tester file moved to a different place, but other than that it should apply fine there.

::: embed/web-extension/ephy-uri-tester.c
@@ +387,3 @@
 
+  /* NOTE: The '$' is used as separator for the rule options, so rule patterns
+     cannot ever contain them. If a rule needs to match it, it uses "%24".

Epiphany style is to put a * on each line of a multi-line comment:

/* foo
 * foo
 * foo
 * foo */

@@ +411,3 @@
+          break;
+        }
+      /* Fall-through to escape the vertical bar. */

I would escape this manually. It's too fragile to have a fall-through past this huge comment.

Comment 29 Adrian Perez 2017-02-03 13:16:11 UTC

Created attachment 344851 [details] [review]
[PATCH v2] uri-tester: Ensure regexps are properly constructed (for the gnome-3-33 branch)

Updated patch to apply on th “gnome-3-22” branch.

Comment 30 Adrian Perez 2017-02-03 13:20:31 UTC

Created attachment 344852 [details] [review]
[PATCH v2] uri-tester: Ensure regexps are properly constructed (for the master branch)

Same, checrry-picked and ready to apply onto “master”.

Comment 31 Michael Catanzaro 2017-02-06 00:55:10 UTC

Note: Andres, I'm told this fixed the search page on Flickr, can you confirm? I bet this fixed odd problems on thousands or tens of thousands of websites. Dogfooding is good!

Comment 32 Andres Gomez 2017-02-06 10:04:52 UTC

(In reply to Michael Catanzaro from comment #31)
> Note: Andres, I'm told this fixed the search page on Flickr, can you
> confirm? I bet this fixed odd problems on thousands or tens of thousands of
> websites. Dogfooding is good!

I'm a bit lost. What was broken on Flickr's search page?

Comment 33 Michael Catanzaro 2017-02-06 17:04:09 UTC

See https://bugzilla.redhat.com/show_bug.cgi?id=1239249.

Comment 34 Andres Gomez 2017-02-07 07:35:07 UTC

(In reply to Michael Catanzaro from comment #33)
> See https://bugzilla.redhat.com/show_bug.cgi?id=1239249.

Sorry, didn't realize about the "See also" field.

Arguably, I've never had the adblock activated (I didn't even know, maybe that's why I hit much more bugs than most :P)

Now, I think I have it but I still cannot reproduce this bug with 3.22.5.

Comment 35 Andres Gomez 2017-02-07 07:39:11 UTC

(In reply to Andres Gomez from comment #34)
> (In reply to Michael Catanzaro from comment #33)
> > See https://bugzilla.redhat.com/show_bug.cgi?id=1239249.
> 
> Sorry, didn't realize about the "See also" field.
> 
> Arguably, I've never had the adblock activated (I didn't even know, maybe
> that's why I hit much more bugs than most :P)
> 
> Now, I think I have it but I still cannot reproduce this bug with 3.22.5.

I mean, the problem with the flickr search. I've not tested the bug with Twitter.