After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 751524 - Network changes cause authentication token timeouts (Evolution)
Network changes cause authentication token timeouts (Evolution)
Status: RESOLVED FIXED
Product: gnome-online-accounts
Classification: Core
Component: general
3.16.x
Other All
: Normal normal
: ---
Assigned To: GNOME Online Accounts maintainer(s)
GNOME Online Accounts maintainer(s)
: 751604 756277 757395 759515 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2015-06-26 05:11 UTC by Berend De Schouwer
Modified: 2016-01-11 16:41 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
daemon: Improve the debug message by adding more context (857 bytes, patch)
2015-11-25 18:55 UTC, Debarshi Ray
committed Details | Review
daemon: Enqueue EnsureCredentials calls instead of firing them at once (8.48 KB, patch)
2015-11-25 18:55 UTC, Debarshi Ray
committed Details | Review
daemon: Style fix (893 bytes, patch)
2015-11-25 18:56 UTC, Debarshi Ray
committed Details | Review
daemon: Simplify the code (2.32 KB, patch)
2015-11-25 18:56 UTC, Debarshi Ray
committed Details | Review
daemon: Prioritize D-Bus calls over internal ones (2.41 KB, patch)
2015-11-25 18:57 UTC, Debarshi Ray
committed Details | Review
daemon: Coalesce pending EnsureCredentials calls on the same GoaObject (5.03 KB, patch)
2015-11-26 15:53 UTC, Debarshi Ray
none Details | Review
daemon: Coalesce pending EnsureCredentials calls on the same GoaObject (5.41 KB, patch)
2015-11-27 10:17 UTC, Debarshi Ray
committed Details | Review
daemon: Log a debug message after completing EnsureCredentials (2.92 KB, patch)
2015-11-27 10:17 UTC, Debarshi Ray
committed Details | Review
daemon: Timestamp the EnsureCredentials logs and shorten their length (4.46 KB, patch)
2015-11-27 14:31 UTC, Debarshi Ray
committed Details | Review

Description Berend De Schouwer 2015-06-26 05:11:22 UTC
When the network changes, evolution can get stuck retrieving contacts, mail, etc.  The error message in Evolution says "Cannot obtain authentication token: timeout"

The fix is to run 'goa-daemon --replace'.  This is why I filed it gnome-online-accounts.

Two network changes that seem to trigger this:
- suspend/un-suspend on a different network.
- work move from wireless to wired (same IP range, no suspend)

GOA 3.16.3 (Fedora)
Google accounts.
Comment 1 Debarshi Ray 2015-06-26 05:53:56 UTC
If this only started happening in 3.16.3, then it could be due to the patches in bug 693578 So, can you reproduce it using 3.16.2?
Comment 2 Berend De Schouwer 2015-06-26 06:09:39 UTC
This is a new laptop, and it's also the first time I'm trying GOA, so I can't definitively say it's a regression.

I will track down a 3.16.2, though, and try it.

Speculation: it could also be that the evolution timeout (which appears to be a few seconds) might be too short for GOA.
Comment 3 Debarshi Ray 2015-06-26 06:55:10 UTC
(In reply to Berend De Schouwer from comment #2)
> This is a new laptop, and it's also the first time I'm trying GOA, so I
> can't definitively say it's a regression.
> 
> I will track down a 3.16.2, though, and try it.

Thanks for your effort. You can find a 3.16.2 build for Fedora here: http://koji.fedoraproject.org/koji/buildinfo?buildID=636397

> Speculation: it could also be that the evolution timeout (which appears to
> be a few seconds) might be too short for GOA.

With 3.16.3, whenever the network changes (wired to wireless, VPN, etc.), GOA tries to ensure that all the credentials are valid. It will try to renew the OAuth2 tokens and Kerberos tickets, if required. This is supposed to help in situations where your OAuth2 token or Kerberos ticket expired while you were offline or off-the-VPN.

GOA doesn't implement IMAP or CalDAV, though. It only lets you add an account, checks if the credentials are valid (for which it implements a very very minimal subset of IMAP that is completely internal to itself), and publishes the account on DBus for applications to consume.

This makes me wonder why Evolution is saying: "Cannot obtain authentication token: timeout" I wonder if Evolution tries to re-fetch the credentials from GOA on network changes, and if GOA is too busy doing something else to respond in time.
Comment 4 Berend De Schouwer 2015-07-03 17:42:01 UTC
I've downgraded for most of the week to 3.16.2, and the problem disappeared.

I've upgraded back to 3.16.3 today, and the problem occurred again.

It does appear to be a regression in 3.16.3 -- or it triggers a bug in Evolution in 3.16.3.
Comment 5 Berend De Schouwer 2015-07-23 08:16:32 UTC
I've now updated Evolution to 3.16.4 to see if that helps.
Comment 6 Berend De Schouwer 2015-07-24 07:56:02 UTC
seems better with gnome-online-accounts 3.16.3 and evolution 3.16.4.  I've looked at the Evolution changelog, and I can't see a specific reason for the better behaviour.
Comment 7 Berend De Schouwer 2015-07-25 17:12:24 UTC
Spoke too soon.  The problem still exists in evolution 3.16.4 with gnome-online-accounts 3.16.3
Comment 8 Milan Crha 2015-10-06 18:49:24 UTC
I would also install debuginfo package for gnome-online-accounts and get a backtrace of the running goa-daemon, to see what it does. A caveat is that the timeout can be followed by finished token refresh (if it's it), thus the backtrace will not show where the goa-daemon is stuck.

The last time I tried something similar involved ownCloud server configured in GOA. The evolution-data-server (most of the important things are done there, not in the evolution) tried to open all of my 4 calendars, tasks lists and memo lists, in total (3 * 4 =) 12 sources, which resulted in 12 password refresh attempts, though the server and the password were the same.

We debugged this scenario with Rishi and the result of our discussion was that the goa-daemon would ideally queue password requests and group together those towards the same server and response with the same result for the whole group, not re-doing the same thing multiple times. Such behaviour will also make it easier for the server, bandwidth and so on - it has only benefits (maybe except one case, if the ongoing request is cancelled for whatever reason, the pending requests in that group shouldn't inherit the cancelled state, but should retry once again; other than cancelled errors should be inherited for the whole group).
Comment 9 Berend De Schouwer 2015-10-07 13:58:35 UTC
There are two accounts, each with multiple calendars.
Comment 10 Horea Christian 2015-10-30 13:36:24 UTC
I have also been experiencing this issue for quite a while now, and it is annoying to the point that it really had noticeable negative effects on my productivity. I am often on unstable WLANS at the university, so I get this almost daily...

I have had to resort to writing a script and adding it to my path. Here is the script that fixes this until it breaks again:

```
cd /usr/libexec
./goa-daemon --replace
evolution --force-shutdown
./goa-daemon --replace
```

Side note: I am navigating to /usr/libexec because on my distribution (Gentoo) goa-daemon is not by default added to my path.
Comment 11 Berend De Schouwer 2015-10-30 14:20:17 UTC
You can run that from /etc/NetworkManager/dispatcher.d/ every time the network changes.  Put in a sleep if your network is like this corporate network (requires a sign-on)

You just have to get it running as the right user (sudo?)

You don't need "cd ..." -- you can just run it with "/usr/libexec/goa-daemon ..." if it's not in your path.

At some point Gentoo might replace NetworkManager with systemd-networkd, then you'll have to find a different location to put your script.


(I can't really debug it further right now since I've hit a kernel bug at the moment that means suspend=>crash every second suspend, which means I can't duplicate the problem without crashing my system right now.)
Comment 12 Berend De Schouwer 2015-11-22 05:59:20 UTC
Possible duplicate #703078
Comment 13 Berend De Schouwer 2015-11-22 06:05:40 UTC
Still and issue with 3.18.2.1

In 3.18.2.1 goa-daemon --replace no longer works.  I need to kill the old one, and start a new one.
Comment 14 Debarshi Ray 2015-11-25 18:51:36 UTC
*** Bug 751604 has been marked as a duplicate of this bug. ***
Comment 15 Debarshi Ray 2015-11-25 18:52:33 UTC
*** Bug 756277 has been marked as a duplicate of this bug. ***
Comment 16 Debarshi Ray 2015-11-25 18:55:19 UTC
Created attachment 316260 [details] [review]
daemon: Improve the debug message by adding more context
Comment 17 Debarshi Ray 2015-11-25 18:55:43 UTC
Created attachment 316261 [details] [review]
daemon: Enqueue EnsureCredentials calls instead of firing them at once
Comment 18 Debarshi Ray 2015-11-25 18:56:07 UTC
Created attachment 316262 [details] [review]
daemon: Style fix
Comment 19 Debarshi Ray 2015-11-25 18:56:30 UTC
Created attachment 316263 [details] [review]
daemon: Simplify the code
Comment 20 Debarshi Ray 2015-11-25 18:57:02 UTC
Created attachment 316264 [details] [review]
daemon: Prioritize D-Bus calls over internal ones
Comment 21 Debarshi Ray 2015-11-25 19:09:43 UTC
Reading through the bug reports and also from my own experience, I am reasonably sure that this a regression from the patches in bug 693578

Now that we check the credentials whenever the network changes, there is the possibility that we choke EnsureCredentials D-Bus calls that were made by applications. This is due to:

  (i) Sometimes NetworkManager sends a constant stream of network change notifications even when nothing seems to have changed. I am talking about only a IPv4 wired connection without any WiFi or VPN or IPv6 or anything fancy. I have seen this before (bug 739731), and I know that there was atleast one NetworkManager bug that caused this in the past but that was fixed. I don't know what is happening in this case.

  (ii) We fire all the network calls at the same time. Older GLib versions had a limit on the number of threads a "typical" asynchronous operation could spawn, and that could cause a deadlock., Newer versions keep increasing the limit, so we can hoard resources by firing off a huge number (possibly due to (i)) of calls.

Therefore, I have added a queue to handle the calls sequentially, and give higher priority to calls that we received over D-Bus over the ones that were made internally due to network changes.

The next thing to do would be to coalesce all pending calls against the same GoaObject.
Comment 22 Milan Crha 2015-11-26 12:07:44 UTC
Thanks. I gave a try to your changes (git branch origin/wip/rishi/ensure-credentials) and it's still not enough (I get timeouts on the password prompts when trying to open many calendar from one server), thus the last paragraph applies here ("The next thing to do...").
Comment 23 Debarshi Ray 2015-11-26 13:08:10 UTC
(In reply to Milan Crha from comment #22)
> (I get
> timeouts on the password prompts when trying to open many calendar from one
> server)

Can you confirm that it is the EnsureCredentials call that is timing out? Or is it some other D-Bus call?

I am also curious how evolution / evolution-data-server behaves internally. Does it fire all the D-Bus calls at once? Or is there some kind of queueing in place? 

While gnome-online-accounts should try to be smarter and avoid time outs from happening, I can think of cases where the calls would still time out. eg., if goa-daemon is stuck talking to a very slow server over a slow network, then subsequent EnsureCredentials calls can time out. Maybe the application should retry the call if it receives a G_IO_ERROR_TIMED_OUT? Ideally with some kind of exponential back-off.

In the meantime I will implement "the next thing to do ...".
Comment 24 Berend De Schouwer 2015-11-26 13:21:44 UTC
I frequently find Evolution marks the network as unavailable when it is available (bottom left icon indicates off-line).  At the same time various apps are quite happy with the network.

This *may* be related to goa timing out, but it could be symptom or cause.

This seems to happen more at work.  At work, the network has a captive portal.  gnome-shell-portal-helper fails horribly with work's captive portal, although a few seconds after un-suspending the laptop the network *is* available (due to a custom NetworkManager/dispatcher script)


If you specify what D-Bus calls you're looking for, I can leave dbus-monitor running and suspend/un-suspend.


If you want me to try a specific branch of goa, I can do that to.
Comment 25 Debarshi Ray 2015-11-26 13:59:51 UTC
(In reply to Berend De Schouwer from comment #24)

> If you specify what D-Bus calls you're looking for, I can leave dbus-monitor
> running and suspend/un-suspend.

These patches deal with the EnsureCrendentials method which is part of the org.gnome.OnlineAccounts.Account interface that each account object implements.

> If you want me to try a specific branch of goa, I can do that to.

I pushed them to master, so you can use that. They should also apply cleanly to the gnome-3-18 branch, so you can locally apply and try them out there too.

Remember to restart goa-daemon ($prefix/libexec/goa-daemon --replace) and also all the evolution processes after you do so.
Comment 26 Milan Crha 2015-11-26 15:28:33 UTC
(In reply to Berend De Schouwer from comment #24)
> I frequently find Evolution marks the network as unavailable when it is
> available (bottom left icon indicates off-line).  At the same time various
> apps are quite happy with the network.

That's bug #758152 for the Evolution itself.

(In reply to Debarshi Ray from comment #23)
> Can you confirm that it is the EnsureCredentials call that is timing out? Or
> is it some other D-Bus call?

I'm not sure how to do that easily.
 
> I am also curious how evolution / evolution-data-server behaves internally.
> Does it fire all the D-Bus calls at once? Or is there some kind of queueing
> in place? 

Some client (evolution, gnome-calendar, gnome-contacts,...) asks to open some calendar/address book/... and then the backend (on the evolution-addressbook/calendar-factory side) tryes to connect to the server and asks for the password of that book/calendar/.... The Google, ownCloud and other (not only GOA-based accounts) check for the list of available books/calendars/... on the start of the evolution-source-registry.

As an example, I've configured an ownCloud account, which advertises 4 calendars. The ownCloud supports events/memos/tasks in each of these calendars, thus I have 12 sources pointing to the same server, using the same credentials and so on, everything asked through the GOA. When I open the Evolution in the Calendar view, it also runs the Memos and Tasks views, and opens all selected calendars/memo lists/task lists. When I have all 12 ownCloud sources selected, it means the calendar backends for each single one ask the GOA for the password, more or less at the same time.

> Maybe the application should retry the call if it receives
> a G_IO_ERROR_TIMED_OUT? Ideally with some kind of exponential back-off.

I'd rather not do that, even the issue is partially valid. The main problem in my case is that GOA tries to open 12 connections doing the same thing at the same time. When the server has a connection limit or anything like that, then it'll fail anyway.
Comment 27 Debarshi Ray 2015-11-26 15:44:38 UTC
(In reply to Milan Crha from comment #26)
> (In reply to Debarshi Ray from comment #23)
> > Can you confirm that it is the EnsureCredentials call that is timing out? Or
> > is it some other D-Bus call?
> 
> I'm not sure how to do that easily.

Can't you locate the call site in your code? If it is either goa_account_call_ensure_credentials or goa_account_call_ensure_credentials_sync, then it is EnsureCredentials.
Comment 28 Debarshi Ray 2015-11-26 15:53:35 UTC
Created attachment 316323 [details] [review]
daemon: Coalesce pending EnsureCredentials calls on the same GoaObject
Comment 29 Milan Crha 2015-11-26 16:21:49 UTC
Okay, getting backtrace of the evolution-source-registry, one of the objects which do the password prompt(s), show that there are multiple threads in the EnsureCredentials() call.

I still got "Failed to lookup credentials: Failed to get password from GOA: Timeout was reached" in the evolution, but it succeeded afterwards and the error message disappeared. It might be the case you suggested, one server call took longer than expected.

I also got: "Failed to obtain an access token for 'public_cal': Timeout was reached" (that's a Google calendar). Maybe it's unrelated.

Looking into the log of the goa-daemon, I see it tried password credentials for my ownCloud account 4 times. I'd expect it being done only once, but I can be wrong. I sent you a link for the log in private.
Comment 30 Debarshi Ray 2015-11-27 10:17:01 UTC
Created attachment 316370 [details] [review]
daemon: Coalesce pending EnsureCredentials calls on the same GoaObject
Comment 31 Debarshi Ray 2015-11-27 10:17:25 UTC
Created attachment 316371 [details] [review]
daemon: Log a debug message after completing EnsureCredentials
Comment 32 Milan Crha 2015-11-27 10:58:11 UTC
Thanks for the patches. I gave them a try and it looks to me like it's it. I still get some odd responses from the ownCloud server (code 7, sometimes "Socket I/O timeout", sometimes "Connection terminated unexpectedly"), but it doesn't seem to be related to the goa-daemon itself.
Comment 33 Debarshi Ray 2015-11-27 14:31:43 UTC
Created attachment 316386 [details] [review]
daemon: Timestamp the EnsureCredentials logs and shorten their length
Comment 34 Debarshi Ray 2015-11-27 14:32:29 UTC
(In reply to Milan Crha from comment #32)
> Thanks for the patches. I gave them a try and it looks to me like it's it. I
> still get some odd responses from the ownCloud server (code 7, sometimes
> "Socket I/O timeout", sometimes "Connection terminated unexpectedly"), but
> it doesn't seem to be related to the goa-daemon itself.

Cool. Thanks for testing them!
Comment 35 Debarshi Ray 2015-12-17 10:08:34 UTC
*** Bug 759515 has been marked as a duplicate of this bug. ***
Comment 36 Debarshi Ray 2016-01-11 16:41:25 UTC
*** Bug 757395 has been marked as a duplicate of this bug. ***