After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 733775 - Connection to IMAP server dies after 5 minutes of inactivity, indefinitely waits with "scanning folders in..."
Connection to IMAP server dies after 5 minutes of inactivity, indefinitely wa...
Status: RESOLVED FIXED
Product: evolution-data-server
Classification: Platform
Component: Mailer
3.12.x (obsolete)
Other Linux
: Normal major
: ---
Assigned To: evolution-mail-maintainers
Evolution QA team
: 733802 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2014-07-26 06:06 UTC by Mehmet Giritli
Modified: 2015-12-02 13:56 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
gdb --batch --ex "t a a bt" -pid=`pidof evolution` &>bt.txt (13.96 KB, text/plain)
2014-09-09 18:03 UTC, Mehmet Giritli
Details
CAMEL_DEBUG=imapx:io,imapx:conman evolution &>log.txt (82.38 KB, text/plain)
2014-09-09 18:04 UTC, Mehmet Giritli
Details
CAMEL_DEBUG=imapx evolution &>log.txt (12.79 KB, application/x-compressed-tar)
2014-09-11 18:53 UTC, Mehmet Giritli
Details
gdb --batch --ex "t a a bt" -pid=`pidof evolution` &>bt.txt (13.19 KB, text/plain)
2014-09-11 18:53 UTC, Mehmet Giritli
Details
CAMEL_DEBUG=imapx evolution &>log.txt (61.27 KB, text/plain)
2014-10-26 19:22 UTC, Mehmet Giritli
Details
gdb --batch --ex "t a a bt" -pid=`pidof evolution` &>bt.txt (15.64 KB, text/plain)
2014-10-26 19:23 UTC, Mehmet Giritli
Details

Description Mehmet Giritli 2014-07-26 06:06:51 UTC
I can connect to my courier imap mail server no problems. If I do not interact for 5 minutes with the mailer, like changing folders for example, then the next time I try to change to a different folder evolution keeps waiting with "scanning folders in ....blah blah" and it can not check for any new mail in the changed folder until I restart evolution. I do not see any error messages.

I tried disabling IDLE mode and quick resync but did not make any difference. 

I did not have the problem until 3.10 and I now have it with 3.12 as well. If you give me some tips how to help debug this, I'd try my best.
Comment 1 Mehmet Giritli 2014-09-09 18:03:22 UTC
Created attachment 285761 [details]
gdb --batch --ex "t a a bt" -pid=`pidof evolution` &>bt.txt
Comment 2 Mehmet Giritli 2014-09-09 18:04:03 UTC
Created attachment 285762 [details]
 CAMEL_DEBUG=imapx:io,imapx:conman evolution &>log.txt
Comment 3 Mehmet Giritli 2014-09-09 18:05:35 UTC
The above debug info were taken when evo was stuck as described in the original report above. The running version is 3.12.6.
Comment 4 Milan Crha 2014-09-10 12:07:43 UTC
Thanks for the bug report. This looks like a bug #733081, as I mentioned on the mailing list in your thread [1], the circumstances and everything points into that bug report. It's possible I missed certain details, but just in case, do you have evolution-data-server of the version 3.12.6 too? It's evolution-data-server there this fix was done, not the evolution.

You also mentioned that you disabled IDLE, but the imapx log (comment #2) shows the IDLE being in effect. It's better to close and run evolution to be sure the changes you do in account properties are properly used by the underlying code. Many of the options do not require it, but there are still some of them with which it's safer to restart evolution.

[1] https://mail.gnome.org/archives/evolution-list/2014-September/msg00059.html
Comment 5 Mehmet Giritli 2014-09-10 12:20:01 UTC
I tried disabling IDLE but when it did not make any difference I turned it back on.

Do you want me to get the log files when IDLE is off as well?

Yes, I re-checked and e-d-s is 3.12.6 as well. The gentoo package system says that it is 3.12.6...
Comment 6 Milan Crha 2014-09-11 06:53:39 UTC
My courier machine is practically dead, thus I cannot recheck against it (the issue can be related to certain things being done before the crash, and/or actual messages, thus not a full assurance about reproducibility).

I still think of the previous case from bug #733081. The problem was that one command (in this case IDLE) wasn't stopped before another command was issued, which made the two commands interleave in a way which the server could interpret as a "big client failure" and just give up to send completion responses for the previous (running) commands, which leads to that never ending jobs in a status bar and the backtrace.

The backtrace shows what you said, two operations waiting for a response, but the imapx_parser_thread() is idle. The confusing thing to me is that the imapx log doesn't show any issue, everything ther eis properly started and finished and no interleaving of IMAP commands is indicated in the log.

It can be that my assumption about the previous bug report is wrong and your issue doesn't interfere with it at all, but I also have run evolution for more than 5 minutes with multiple IMAPx accounts (none connecting to a courier serer) and I didn't notice any such failure here.

I do not know whether you use gdb, or know to read backtrace, but please let me try one thing. When you look into your backtrace, then there is Thread 4 with "imapx_parser_thread", this one is responsible to read/write data from/to the server. The Thread 3 has "camel_imapx_server_idle", as the name suggests, it is there for IDLE command, waiting for the completion. Finally, Thread 2 has "imapx_refresh_info_sync", which does a folder refresh. If everything works fine, then the refresh cancels IDLE thread and then gets to be served. I'd like you to try to get a status of the imapx_store in this stuck state. This can be done only under gdb. When you have evolution in this odd state, do these commands:
   - attach gdb to the running evolution
   $ gdb -pid=`pidof evolution`

   - see what thread does what
   (gdb) t a a bt

   - find the one with imapx_refresh_info_sync, say it's Thread 2
   - switch to Thread 2
   (gdb) t 2

   - numbers at the beginning of the line in the backtrace are frame
     numbers, in your backtrace it's frame #5 where imapx_refresh_info_sync
     is; do switch to that frame
   (gdb) f 5

   - obtain status information on the connections
   (gdb) p camel_imapx_store_dump_queue_status(imapx_store)

   - if the above will claim any errors, like variables being optimized
     out, then run this alternative command (it's one long line)
   (gdb) p camel_imapx_store_dump_queue_status(camel_folder_get_parent_store (folder))

This is supposed to print the information about scheduled and active jobs and about connections as such for that particular IMAPx account.

   - finally quit gdb and get back to the terminal
   (gdb) q

The queue status may not explain "why and what", but it should show whether anything else didn't break for some reason. There might be shown that one of the two commands is active, probably the IDLE one, and another is pending, probably the refresh (FETCH).

You can also get more verbose logging from IMAPx, when you run evolution as
   $ CAMEL_DEBUG=imapx evolution
the problem is that the log will be quite large, with logged everything on the wire, like the log before. I'm still interested only in a little piece of it, its end before this state. I'm also afraid that if you for example close evolution, then the queue gets flushed, which makes it work properly. Another issue can be that the redirect to log.txt didn't write the debugging information immediately, but some buffering was involved, thus the log contains proper part "before the issue". Not redirecting to log.txt may help, as terminal usually shows printed text by lines.

I'd also like to try with IDLE disabled, just to avoid one possible part of the issue. Having quick resync disabled may also help, for the similar reason.

(In reply to comment #0)
> ... I try to change to a different folder evolution keeps waiting with
> "scanning folders in ....blah blah"

Hmm, really "Scanning folders"? The backtrace doesn't show exactly that, neither the imapx log, but it would look like you've got disconnected from the server for some reason and Evolution was trying to reconnect, probably on the old connection, which might eventually (theoretically) timeout.
Comment 7 Mehmet Giritli 2014-09-11 17:44:29 UTC
>    - attach gdb to the running evolution
>    $ gdb -pid=`pidof evolution`
> 
>    - see what thread does what
>    (gdb) t a a bt
> 
>    - find the one with imapx_refresh_info_sync, say it's Thread 2
>    - switch to Thread 2
>    (gdb) t 2
> 
>    - numbers at the beginning of the line in the backtrace are frame
>      numbers, in your backtrace it's frame #5 where imapx_refresh_info_sync
>      is; do switch to that frame
>    (gdb) f 5
> 
>    - obtain status information on the connections
>    (gdb) p camel_imapx_store_dump_queue_status(imapx_store)

This retuned:

$1 = void

So I run the next one:
> 
>    - if the above will claim any errors, like variables being optimized
>      out, then run this alternative command (it's one long line)
>    (gdb) p camel_imapx_store_dump_queue_status(camel_folder_get_parent_store
> (folder))

Which returned

$2 = void

Am I doing something wrong?
Comment 8 Mehmet Giritli 2014-09-11 18:52:11 UTC
So, I now took the logs with Quick resync and listen for server change notifications turned off. The operation evolution stuck now is "storing folder blah blah". I guess you will see the details from the full logs I am attaching now.
Comment 9 Mehmet Giritli 2014-09-11 18:53:16 UTC
Created attachment 285946 [details]
CAMEL_DEBUG=imapx evolution &>log.txt
Comment 10 Mehmet Giritli 2014-09-11 18:53:45 UTC
Created attachment 285947 [details]
gdb --batch --ex "t a a bt" -pid=`pidof evolution` &>bt.txt
Comment 11 Milan Crha 2014-09-12 05:46:30 UTC
(In reply to comment #7)
> >    (gdb) p camel_imapx_store_dump_queue_status(imapx_store)
> 
> This retuned:
> 
> $1 = void

Both are returning void, but they print on stdout what the state of the connections is.
Comment 12 Milan Crha 2014-09-12 05:59:23 UTC
I think I see the issue. The first opened connection did as its last command:
> [imapx:A] I/O: 'A00019 UID STORE 5618 +FLAGS.SILENT (\SEEN)'

then there was opened another connection (you've setup to open up to 5 connections), because the first connection was busy and an expensive (FETCH) operation was initiated. This FETCH is finished properly, then the first connection is also finished, but with an error:
> [imapx:A] Data read failed with error 'Error receiving data: Connection timed out'

This leads to the connection close. There was opened the third (actually second active) connection to the server, on which another STORE is run. The second connection is also in a STORE command.

I guess that if you'll wait enough, then even the second connection will finish with the 'Connection timed out' error, the same as the first connection did. It seems to me that your server has disabled mutliple concurrent connections for some reason (by some way) and it kicks off previous connection when a new is established. Thus I would try to limit concurrent connections to 1 only. You can find the option in Edit->Preferences->Mail Accounts-><IMAPx account>->Edit->Receiving Options tab, Connection to Server sections, approximately in the middle of the window.
Comment 13 Mehmet Giritli 2014-09-13 10:07:06 UTC
(In reply to comment #11)
> (In reply to comment #7)
> > >    (gdb) p camel_imapx_store_dump_queue_status(imapx_store)
> > 
> > This retuned:
> > 
> > $1 = void
> 
> Both are returning void, but they print on stdout what the state of the
> connections is.

And how do I get this?
Comment 14 Mehmet Giritli 2014-09-13 10:08:13 UTC
(In reply to comment #12)
> I think I see the issue. The first opened connection did as its last command:
> > [imapx:A] I/O: 'A00019 UID STORE 5618 +FLAGS.SILENT (\SEEN)'
> 
> then there was opened another connection (you've setup to open up to 5
> connections), because the first connection was busy and an expensive (FETCH)
> operation was initiated. This FETCH is finished properly, then the first
> connection is also finished, but with an error:
> > [imapx:A] Data read failed with error 'Error receiving data: Connection timed out'
> 
> This leads to the connection close. There was opened the third (actually second
> active) connection to the server, on which another STORE is run. The second
> connection is also in a STORE command.
> 
> I guess that if you'll wait enough, then even the second connection will finish
> with the 'Connection timed out' error, the same as the first connection did. It
> seems to me that your server has disabled mutliple concurrent connections for
> some reason (by some way) and it kicks off previous connection when a new is
> established. Thus I would try to limit concurrent connections to 1 only. You
> can find the option in Edit->Preferences->Mail Accounts-><IMAPx
> account>->Edit->Receiving Options tab, Connection to Server sections,
> approximately in the middle of the window.

I reduced it to 1 connection only, rebooted and the same thing happening. No difference at all. Do you want me to get the same logs when connections are limited to 1?
Comment 15 Milan Crha 2014-09-15 09:01:52 UTC
(In reply to comment #13)
> > Both are returning void, but they print on stdout what the state of the
> > connections is.
> 
> And how do I get this?

They should be just printed at the console where evolution was run.

(In reply to comment #14)
> I reduced it to 1 connection only, rebooted and the same thing happening. No
> difference at all. Do you want me to get the same logs when connections are
> limited to 1?

Yes, please. The CAMEL_DEBUG log should show what it does. I suppose that one connection dies (on timeout) and only then another is created. If that's true, then the workaround would be to change the time limit for automatic new mail checks for lower than 5 minutes interval - as it looks like your server has set pretty low connection activity timeout.
Comment 16 Mehmet Giritli 2014-10-26 19:20:44 UTC
Hi and sorry for late reply. Well, I think I managed to get it all this time.

First, I have the following:

> The queue status may not explain "why and what", but it should show whether
> anything else didn't break for some reason. There might be shown that one of
> the two commands is active, probably the IDLE one, and another is pending,
> probably the refresh (FETCH).

Queue status for server 0x17de840: jobs:1 queued:0 active:1 done:0
      Content of 'jobs':
         [0] job:0x23830b0 (type:0x10 REFRESH_INFO) with pending commands:0
      Content of 'active':
         [0] command:0x7f8c380110e0 for job:0x23830b0 (type:0x10 REFRESH_INFO)
Comment 17 Mehmet Giritli 2014-10-26 19:22:29 UTC
Created attachment 289378 [details]
CAMEL_DEBUG=imapx evolution &>log.txt
Comment 18 Mehmet Giritli 2014-10-26 19:23:03 UTC
Created attachment 289379 [details]
gdb --batch --ex "t a a bt" -pid=`pidof evolution` &>bt.txt
Comment 19 Mehmet Giritli 2014-10-26 19:24:57 UTC
> (In reply to comment #14)
> > I reduced it to 1 connection only, rebooted and the same thing happening. No
> > difference at all. Do you want me to get the same logs when connections are
> > limited to 1?
> 
> Yes, please. The CAMEL_DEBUG log should show what it does. I suppose that one
> connection dies (on timeout) and only then another is created. If that's true,
> then the workaround would be to change the time limit for automatic new mail
> checks for lower than 5 minutes interval - as it looks like your server has set
> pretty low connection activity timeout.

Above two attachments are taken when the connections are limited to 1 and "listen to server change notifications" disabled.
Comment 20 Mehmet Giritli 2014-10-26 19:30:44 UTC
> ...If that's true,
> then the workaround would be to change the time limit for automatic new mail
> checks for lower than 5 minutes interval - as it looks like your server has set
> pretty low connection activity timeout.

But Milan, this is one of the most commonly used imap servers, it is working perfectly well with thunderbird and it also used to work with evolution until couple of versions ago. I think (but not sure) it was working with 3.8...

And I don't think that I messed with the server settings either. I don't think I touched any configuration settings of that server in years. So I think there is something else that we are missing here.
Comment 21 Milan Crha 2014-10-30 11:06:53 UTC
Thanks for the update. I made couple IMAP changes meanwhile, some partly related to this. For example, there could happen that the requests failed to finish due to some connectivity intermittent outage, but the IMAPx was not able to recognize it on its own (there was no timeout on the connection). It's only one example, there were done more changes (for 3.12.8+).

The backtrace is showing what you indicated above, there is a pending refresh request and a pending store request (to save changes in the folder to the server). It seems to me from the imapx log that the store request had being ongoing, but then the log suddenly stops, it seems to me like it being cut for some reason.

I'm thinking how to debug this in a way to make it easier for both of us. Would it be possible to get a test account on that server and see whether I would be able to reproduce it? In case you've the server reachable from the outside. If you have and you can create me the test account, please send me the IMAP/SMTP connection details privately, to the bugzilla email, with a reference to this bug report in the message subject, thus I'd not overlook it in my junk folder.
Comment 22 Mehmet Giritli 2014-10-30 12:21:03 UTC
Account details are sent over to your email. Thanks for looking into this Milan. Appreciated.
Comment 23 Milan Crha 2014-10-30 18:47:04 UTC
Also received and the account configured. Thank you for the help on this. Let's see whether I'll be able to reproduce it too.
Comment 24 Milan Crha 2014-10-31 14:11:58 UTC
Hmm, so I tried to reproduce this with and without the Quick Resync against your server with eds 3.12.7.1, but it works as expected. What I did:
- I have two messages in Inbox, and couple other messages in a created
  subfolder of the Inbox
- I enter the subfolder and mark few messages read and few unread (by clicking
  the envelope column)
- then I keep evolution untouched for several minutes (more than 5 minutes)
- then I move to Inbox - the changes are saved and the Inbox doesn't claim
  any issue too, nothing what would be shown in the UI or console

Can there be any connection issue on your route to the server? I know it's a stupid question and an unlikely condition, but as it works for me, and I connect from a different place, then it is one of the differences between your and mine environment.
Comment 25 Milan Crha 2014-11-28 10:22:34 UTC
*** Bug 733802 has been marked as a duplicate of this bug. ***
Comment 26 Mehmet Giritli 2015-01-18 17:35:37 UTC
Milan, just to answer a possibility you mentioned before: I tried this from a friend's home, a different router and internet connection, and I still got the problem, both on my laptop and PC.
Comment 27 Ben 2015-02-06 10:15:38 UTC
Hi, I seem to have the same issue. Some extra investigations: 
Issue appears on my Home Network, connected with an Ubee modem/router to my Internet Provider (Ziggo)

a) On several other networks (Hotel, 4G via my telephone) the issue does not appear.
b) Next I used another router (an Old Motorola) to my provider (Ziggo). Even than the issue does NOT appear. So working fine
c) Ziggo (provider) switched my router (Ubee) for a Cisco, but that didn't solve the issue. 
d) The old Motorla does not support IP6, so I switched of IP6 on my laptop but this did not solve the issue.

So it seems a combination of Evolution, probably laptop HW and Network components (Router - new network features?)

Using OpenSuse 13.2, Evolution 3.12.7, Network chipset: netlink BCM5787M gigabit Ethernet PCI Express chips, firmware tg3 version v3.137.
Comment 28 Milan Crha 2015-05-26 20:56:34 UTC
I believe this had been fixed for 3.16.0 at bug #720197 and with its followup fix for 3.16.3+ as well, thus I'm closing this as a duplicate of that bug report.

*** This bug has been marked as a duplicate of bug 720197 ***
Comment 29 Mehmet Giritli 2015-07-07 10:31:14 UTC
(In reply to Milan Crha from comment #28)
> I believe this had been fixed for 3.16.0 at bug #720197 and with its
> followup fix for 3.16.3+ as well, thus I'm closing this as a duplicate of
> that bug report.
> 
> *** This bug has been marked as a duplicate of bug 720197 ***

Milan, I'm sorry to say this but I'm still expriencing the same problems after a while of inactivity. Do you want me to get the same debug info with 16.3 as well?
Comment 30 Milan Crha 2015-07-07 16:10:22 UTC
Yes, please.
Comment 31 Milan Crha 2015-08-07 06:23:03 UTC
I made some more changes in the bug #745545, but only for 3.18.0 (to be released in the autumn). The changes were quite large, thus I do not want to break the stable release.
Comment 32 Mehmet Giritli 2015-08-16 13:31:46 UTC
So Milan I guess getting debug info is unnecessary now. I'll try to find live ebuilds for gentoo so that I can give this new code a try.
Comment 33 Milan Crha 2015-08-17 12:04:05 UTC
Correct. The code which is in 3.17.90 (development version) and will be in the 3.18.0+, is the one where the focus will be. And vice versa, 3.16.x is technically dead now.
Comment 34 Mehmet Giritli 2015-12-01 09:20:21 UTC
Milan, I have been trialling 3.18.2 for a week now and I can so far say that the bug is not present anymore. I think you can close the bug now.
Comment 35 Milan Crha 2015-12-02 13:56:10 UTC
Nice, thanks for the update. Please note that 3.18.2 has a bad bug #757789.