GNOME Bugzilla – Bug 362638
EThread is obsolete
Last modified: 2007-12-20 18:02:48 UTC
EThread is a data structure in libedataserver that provides various means of dispatching messages asynchronously, using a single thread or a pool of threads. It was written before GLib provided GThreadPool [1], and based on my analysis of EThread usage in Evolution and Evolution-Data-Server, I think it's time to remove or at least deprecate EThread in favor of GThreadPool. EThread provides three modes of operation, but Evolution only uses two: E_THREAD_QUEUE A single thread processes messages as they arrive on an asynchronous queue. This guarantees that messages are processed in the order they arrive. Evolution never limits the size of the queue. E_THREAD_NEW Multiple threads process messages as they arrive on an asynchronous queue. Messages spend less time waiting on the queue, but processing is unordered. Evolution uses a single EThread in E_THREAD_NEW mode in mail-mt.c, where it allocates 10 threads to process mail messages. All other uses of EThread are in E_THREAD_QUEUE mode with unlimited queues. GThreadPool can easily cover these use cases by setting its max_threads limit to 1 to simulate E_THREAD_QUEUE behavior, and some N > 1 to simulate E_THREAD_NEW behavior. GThreadPool's callback function is equivalent to EThread's "received" callback, except the function will need to issue a reply or destroy the message itself as appropriate. [1] GThreadPool has been available since (at least) GLib 2.0. http://developer.gnome.org/doc/API/2.0/glib/glib-Thread-Pools.html
Created attachment 74829 [details] [review] Proposed patch for Evolution In this patch, I use a simple pattern for creating thread pools: Each thread pool has its own "push message" function. In each of those functions I use a static GOnce to create a private GThreadPool on-demand, and then just let the thread(s) run forever. They're all session-long threads so I didn't see much point in explicitly shutting them down.
Created attachment 74830 [details] [review] Proposed patch for Evolution-Data-Server This patch modifies CamelSession, and also removes EThread from libedataserver. The latter change can be snipped from the patch if we decide to just deprecate EThread.
Oops, a typo slipped into attachment #74829 [details] (Evolution). "mail_msg_unordered_t" should read "mail_msg_unordered_push" in the following patch hunk: Index: mail/mail-ops.c =================================================================== RCS file: /cvs/gnome/evolution/mail/mail-ops.c,v retrieving revision 1.459 diff -u -p -r1.459 mail-ops.c --- mail/mail-ops.c 16 Oct 2006 14:25:28 -0000 1.459 +++ mail/mail-ops.c 16 Oct 2006 16:37:40 -0000 @@ -426,7 +426,7 @@ mail_fetch_mail (const char *source, int if (status) camel_filter_driver_set_status_func (fm->driver, status, status_data); - e_thread_put (mail_thread_new, (EMsg *)m); + mail_msg_unordered_t ((mail_msg_t *) m); } /* ********************************************************************** */
Awesome work, Mathew. In my original tasklist, I had similar TODO wherein, I planned to replace EThread with an implementation that uses GThreadPool and implements a priority queue. How about adding a priorty-based-queue handling for messages of type E_THREAD_QUEUE? This ensures that operations are processed in expected order, even if the order is not same in the queue. This closes couple of loop-holes that result in a race condition.
That's a great idea. Would it be sufficient to just use a simple high-priority boolean flag or do we need something more sophisticated? This should be easy to implement. Just embed the priority directly in the base message structure (CamelSessionThreadMsg, mail_msg_t, etc.), and then provide a suitable sort function for each GThreadPool (by way of g_thread_pool_set_sort_function). Also, I'm curious. What are some examples of loop-holes that cause a race condition?
Created attachment 75469 [details] [review] Amended patch for Evolution This patch adds a few enhancements to the previous one: 1) Adds a 'priority' field to mail_msg_t (type gint). Messages default to priority 0, and the mail queues will prioritize them numerically. Just set the 'priority' field appropriately before pushing the message. Note that I'm not actually setting priorities anywhere. I've just implemented the mechanism for Varadhan so he can fix the race conditions he mentioned. 2) I wanted to get away from polling a FIFO pipe for internal messages and trying to keep it in sync with an asynchronous queue, which seems overly complex and brittle. So I've replaced that mechanism with a function that gets called periodically from the main loop. The function checks for and processes any messages that are waiting on what used to be the "GUI" and "reply" EMsgPorts, but are now just asynchronous queues. The periodic function gets called at 10 Hz, which seems to yield pretty good reponsiveness in the GUI without much overhead. You can tune this by changing PERIODIC_RATE_HZ in mail-mt.c. I've also added a new function for pushing messages onto the "GUI" (or main) thread queue: void mail_msg_main_thread_push (mail_msg_t *msg) 3) It looks like the only reason mail_gui_thread was exposed was so that it could be compared to the current thread by calling pthread_equal(). I've provided a more convenient way of doing that: gboolean mail_in_main_thread (void)
Thanks for the patch, Mathew. Will update the bug with my observations by this week.
Created attachment 77744 [details] [review] Revised patch for evolution-data-server In this revision, instead of removing the EThread code I tagged it as deprecated by wrapping the relevant declarations in a preprocessor conditional like so: #ifndef EDS_DISABLE_DEPRECATED /* ... EMutex declarations ... */ #endif This is similar to what other GNOME libraries do for deprecated symbols.
Varadhan: Ping... Ever get a chance to look at this again?
(In reply to comment #9) > Varadhan: Ping... Ever get a chance to look at this again? > I have committed the EDS part of the fix to HEAD. Will look into Evo part soon. Thanks for the patches. :)
Created attachment 82646 [details] [review] Revised patch for Evolution A downstream bug [1] revealed a flaw in my Evolution patch. The bug was that choosing either response on the dialog that asks the user whether to accept a certificate caused Evolution to crash. Here's what the problem was and how I fixed it: The periodic_processing() function introduced by the patch was intended to replace both GUI EMsgPorts (mail_gui_port and mail_gui_port2), but I overlooked the fact that they have different semantics. Sending a message to either GUI port triggered the message's receive_msg() callback (if it exists). But then mail_gui_port tried to trigger a reply via either the message's reply port or its reply callback, whereas mail_gui_port2 left it to the receive_msg() callback to issue the reply. The periodic_process() function mimicked the mail_gui_port behavior, which caused functions that relied on mail_gui_port2's behavior to malfunction. alert_user() in mail-session.c is the only function that used mail_gui_port2. This function, along with its callback function do_user_message(), displays a dialog window to the user and registers another callback function -- user_message_response() -- to handle the user's response. user_message_response() issues a reply over the message's reply port, which is picked up by the waiting alert_user() function. This chain of events requires mail_gui_port2's behavior of not issuing a reply message itself. The only other function that used either mail_gui_port or mail_gui_port2, AND submitted messages with a reply port defined is mail_call_main(). It, and its callback function do_call(), expected the mail_gui_port to automatically issue reply messages. But one can easily make this compatible with mail_gui_port2's behavior, but having do_call() issue its own reply messages. That, along with rewriting the periodic_processing() logic to not issue reply messages (similar to mail_gui_port2's behavior), was the solution to the bug. So only do_call() and periodic_processing() were changed in this revision. [1] http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=220714
I verified the patch in comment #11 applies cleanly to Subversion trunk and that Evolution successfully compiles with EDS_DISABLE_DEPRECATED defined. Still waiting for a review of the Evolution work.
Created attachment 85410 [details] [review] Corrected patch for Evolution It seems some changes to mail/em-camel-stream.c somehow got lost in my previous patch. The patch replaced an "if" expression in a few places: - if (pthread_self() == mail_gui_thread) + if (mail_in_main_thread ()) This patch restores the mail/em-camel-stream.c changes.
Okay, so apparently em-camel-stream.c is not distributed. 2003-09-03 Not Zed <NotZed@Ximian.com> * em-camel-stream.[ch]: removed. Which means that any subsequent maintenance on these files was wasted time and effort. These and any other source files that we don't distribute should be deleted. The code we _do_ distribute is big and complex enough as it is without having these dead weeds littering the repository.
Created attachment 85582 [details] [review] Revised patch for Evolution At the risk of letting the scope of this bug creep a little, this revision makes the following additional changes: - Use an EFlag [1] instead of an EMsgPort when synchronizing mail messages between threads. This alters the definition of mail_msg_t slightly in that it no longer "inherits" from EMsg. - Refactor and simplify EMSyncStream. Use an EFlag [1] and an idle callback to synchronize message passing, instead of two EMsgPorts and a GIOChannel. This makes the threading logic a lot easier to understand. Varadhan: Ping, can you review this please? [1] EFlag was proposed in bug #415891. It's more light-weight than an EMsgPort and makes thread synchronization more intuitive.
Created attachment 86149 [details] [review] Yet another revision for Evolution Ray Strode flushed out another bug (hopefully the last) when his IMAP account approached quota and the server started issuing warning messages [1]. The bug was hiding in a rarely used code path in that pesky do_user_message() function. This revision corrects the bug. [1] http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=235096
Changing the product since only the Evolution patch remains.
FYI: Attachment #77744 [details] is in Tinymail's camel-lite (verified)
Created attachment 88358 [details] [review] Revised patch for Evolution This will hopefully be the final revision of the Evolution patch for this bug. There's been a lot of revisions here so let me quickly recap what all I've done until now, then I'll review what this revision adds. - The primary goal is to eliminate the use of EThread, which is now deprecated in libedataserver. GLib provides GThreadPool, which is a suitable replacement for EThread. - The various message queues are now priority-queues (as requested in comment #4), though the patch does not utilize this capability. - As the patch evolved, a secondary goal emerged which was to substitute all remaining uses of EMsgPort in Evolution with the new EFlag structure in libedataserver (bug #415891). This secondary goal led to some refactoring work in EMSyncStream to eliminate the last EMsgPort there. See comment #15. - Fixed a couple bugs reported by Fedora users along the way. See comment #11 and comment #16. This revision adds the following: - All of the reported bugs in this patch have been related to message ownership and trying to figure out who should free the message and when. It occurred to me that the solution to these kind of ownership problems is to use reference counting, so I added a reference count to each message and modified the API accordingly. Now, instead of calling mail_msg_free() we call mail_msg_unref(). This simplifies the parts of the logic that were causing all the bugs. - Add an EFlag pointer only to the message types that actually need one. Previously I had added an EFlag pointer to the base mail message struct. This eliminates a parameter in mail_msg_new(). - Store the message struct size in the same structure as the message callback function pointers. This eliminates another parameter in mail_msg_new(). - Define type names for each of the mail message callback functions (e.g. MailMsgExecFunc, MailMsgFreeFunc, etc). This eliminates a lot of needless type casting in the code and makes things a bit more readable. - Use idle callbacks instead of the 10 Hz timers I added in comment #6. This should satisfy bug #438817, though I'm not using Linux PowerTOP's Evolution patch directly. - Refactor the mail_cancel_hook_*() functions to use a GHookList instead of an EDList. To be honest, this change has nothing to do with EThread but it was adjacent to code I was already hacking on and I gave in to the temptation to tack it on. We can eliminate this part or break it into a separate bug if desired, but it's pretty straight-forward. I'd really like to get this into 2.11.3 if possible so that it gets plenty of testing before 2.12 ships.
2.11.3 sounds good, the sooner, the better.
Created attachment 89488 [details] [review] Revised patch for Evolution A small reference counting error in mail-mt.c:mail_call_main() was causing a bug reported downstream [1] where Evolution would hang on exit if the Send/Receive button was clicked during the session (probably by other triggers too). This revision fixes the problem. [1] http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=240507
Created attachment 93752 [details] [review] Revised patch for Evolution - sync'ed with latest Subversion
Created attachment 94588 [details] [review] Revised patch fixes bug #357175 This patch modifies the alert_user() and user_message_exec() functions in mail-session.c to fix bug #357175. I think I finally have the reference counting correct for these two very tricky functions. I also discovered that the non-cancellable dialogs were not calling user_message_response(), and so many of the queued dialogs were never getting a chance to run. That was why Evolution was hanging on exit for many users.
varadhan: ping Do you think you can find time to review this for Evo 2.13.1 (or 2.21.1) ? I realize it's a large patch but we shipped it in Fedora 7 and I'm pretty confident that all the major bugs have been shaken out. Let me know if you have any questions or reservations.
See comment #19 for a summary of the changes. Subsequent comments were just about reference counting bugs.
Created attachment 98128 [details] [review] Revised patch for Evolution 2.21.1 Would be nice to get this in for 2.21.2...
Created attachment 101131 [details] [review] Revised patch for Evolution 2.21.4 Merged with Srini's non-intrusive error feature.
Should get in not later than 2.21.5. I'm favor of getting this getting in
...then please commit it NOW so we have two weeks to beta-test and fix any regressions. also please announce the commit to gnome-i18n@ - the patch adds new strings and we are in string change announcement period.
The em-filter-i18n.h hunk should not be there and I'll remove it before I commit the patch. I don't know how that booger keeps getting in there. Apart from that I don't think I changed any strings. In fact I recall making it a point not to. I'll double-check though, of course. Srini asked me to have Milan smoketest this and I'll do so, but really he and I and the rest of the Fedora community have already been using it for months since it shipped in Fedora 7 and 8. I had to modify the patch a bit yesterday to accommodate Srini's error reporting changes, so I'll give that a couple days to bake in Rawhide and have Milan scrutinize the code in the meantime. Then if all goes well, commit later this week. Sound good?
Matt, perfect!
(In reply to comment #30) > Apart from that I don't think I changed any strings. ah, okay, just compared again against an updated pot file, you're right.
Created attachment 101200 [details] [review] Same patch, sans the em-filter-i18n.h hunk Just so I don't forget...
With this patch, I got this crash every time I start Evolution (aka I cannot start it). Without the patch it works just fine. I found that you changed mail/mail-mt.c:end_event_callback prototype too much, it should be same as before. (the "void *data" part should not be there, the function is called with only 3 parameters). Regarding to this, mail-mt.c: line 261: Is it possible that there is activity_id=0 and the error is set? In that case, the error is not initialized properly and it will crash/leak a bit later probably, so the cleaner solution can be: ... } else if (mail_msg->priv->activity_id) { activity_id = mail_msg->priv->activity_id; ... mail/mail-ops.c: em_filter_folder_element_filter: From my point of view, the check for non-NULL is not necessary and can be removed here: + camel_folder_sync (folder, FALSE, camel_exception_is_set (&m->base.ex) ? NULL : &m->base.ex); mail/mail-ops.c: fetch_mail_fetch: do the check with camel_exception_is_set, not with ex.id comparision, (Fejj told me it's not the evo style, when I did almost the same in other bug/patch.) prepare_offline_desc (struct _set_offline_msg *m, int done) - remove that "int done" parameter, as you did everywhere else. I also noticed your g_assert calls. You know, srag needs that as something non-crashing. I would personally prefer g_slice_new0, instead of g_slice_new (just because of future possible enhancement of the structure and in case of forgotten initialization on one place in the code by anyone), so if you like (aka not mandatory), you can change it to g_slice_new0. All other things seems to be fine for me, so after you fix things above, let it be in trunk and see what will users report.
(In reply to comment #34) > With this patch, I got this crash every time I start Evolution (aka I cannot > start it). Without the patch it works just fine. I think that was the extra parameter in end_event_callback(). I fixed in Fedora but I guess I forgot to fix it in the upstream patch. > I found that you changed mail/mail-mt.c:end_event_callback prototype too much, > it should be same as before. (the "void *data" part should not be there, the > function is called with only 3 parameters). Fixed. > Regarding to this, mail-mt.c: line 261: Is it possible that there is > activity_id=0 and the error is set? In that case, the error is not > initialized properly and it will crash/leak a bit later probably, so the > cleaner solution can be: > ... > } else if (mail_msg->priv->activity_id) { > activity_id = mail_msg->priv->activity_id; > ... Not sure, that's Srini's logic and I didn't mess with it. Let's deal with that in the context of his non-intrusive errors work rather than here. > mail/mail-ops.c: em_filter_folder_element_filter: From my point of view, the > check for non-NULL is not necessary and can be removed here: > + camel_folder_sync (folder, FALSE, camel_exception_is_set > (&m->base.ex) ? NULL : &m->base.ex); The check is to make sure an existing error doesn't get overwritten. > mail/mail-ops.c: fetch_mail_fetch: do the check with camel_exception_is_set, > not with ex.id comparision, (Fejj told me it's not the evo style, when I did > almost the same in other bug/patch.) Fixed. > prepare_offline_desc (struct _set_offline_msg *m, int done) - remove that > "int done" parameter, as you did everywhere else. Fixed. > I also noticed your g_assert calls. You know, srag needs that as something > non-crashing. Fixed. > I would personally prefer g_slice_new0, instead of g_slice_new (just because > of future possible enhancement of the structure and in case of forgotten > initialization on one place in the code by anyone), so if you like (aka not > mandatory), you can change it to g_slice_new0. Fixed. Thanks for the review! It'll be nice to finally get this off my hands. A quick smoketest shows the modified patch is working okay for me. I'll write up the ChangeLogs and commit this.
(In reply to comment #35) > > > mail/mail-ops.c: em_filter_folder_element_filter: From my point of view, the > > check for non-NULL is not necessary and can be removed here: > > + camel_folder_sync (folder, FALSE, camel_exception_is_set > > (&m->base.ex) ? NULL : &m->base.ex); > > The check is to make sure an existing error doesn't get overwritten. > Oh, my fault, I didn't realize it as opposite check and I misread brackets. Oops.
Committed to trunk (revision 34730).