GNOME Bugzilla – Bug 563954
summary list order is 'random'
Last modified: 2009-10-22 14:20:18 UTC
Please describe the problem: Another regression between 2.22 (and typical, expected mail summary behavior) and 2.24: If a folder summary list is not specifically sorted by any particular column, the normal behavior of mailers, including Evo up to 2.22, is to show the messages in the arrival order (ie, what in an mbox-format mailbox would be the on-disk order). This is such an old convention that it should be considered de-facto standard. Evo 2.24 will show them basically whatever order the sqlite backend returns them in. Worse yet, every time the summary is updated by eg deleting one message, the entire summary is reordered and the messages which one second ago were "next in list" may now be somewhere else, often "behind" the new selection. This basically makes it impossible to traverse efficiently through a busy INBOX. Sorting via sent date is not a good workaround, because sent dates are a) not necessarily correct as they depend on sender having accurate clock b) often out-of-order due to message delivery delays. Steps to reproduce: Actual results: Expected results: Does this happen every time? Other information:
Infact, things are according to the order of UID. Even though we store in db, the UIDS are ordered and fetched and they both are in the same order wrt 2.23, 2.24. But anyways, lemme test out, how it looks.
*** Bug 563123 has been marked as a duplicate of this bug. ***
Installed 2.24.4 and tried again. Yes, you may be ordering by UID, but since you're storing UID as text instead of integer, that's pretty much meaningless: sqlite> .schema INBOX CREATE TABLE 'INBOX' ( uid TEXT PRIMARY KEY , flags INTEGER , msg_type INTEGER , read INTEGER , deleted INTEGER , replied INTEGER , important INTEGER , junk INTEGER , attachment INTEGER , msg_security INTEGER , size INTEGER , dsent NUMERIC , dreceived NUMERIC , subject TEXT , mail_from TEXT , mail_to TEXT , mail_cc TEXT , mlist TEXT , followup_flag TEXT , followup_completed_on TEXT , followup_due_by TEXT , part TEXT , labels TEXT , usertags TEXT , cinfo TEXT , bdata TEXT ); sqlite> select uid, date(dreceived,'unixepoch') from INBOX order by uid; 124876|2009-01-19 124877|2009-01-19 124879|2009-01-19 [snipsnip] 35896|2007-11-16 35973|2007-11-20 35975|2007-11-20 36027|2007-11-22 The more recent the folder is you're looking at, the more often digits roll over and the sorting gets screwed up.
But in fact, the UID order is not how an "unsorted" mailbox is shown, either. I did manage to narrow it down, though. If I select messages from the SQLite database without any ORDER BY clause, I get them in the same order as Evolution shows them; that is, a semi-random order which might be SQLite's preferred index order. You should always set an ORDER BY column, which by default should either be ABS(uid) (which should convert them to integers and sort in numeric order, though I don't know if IMAP uids are guaranteed to by int), or by dreceived.
Actually I was wrong initially. We get the uids and sort initially. But the message list does the search and picks the uid, which isn't sorted. I could do a simple fix to sort it, but it gonna result in slowness. I think I need to make the db-sort with a integer column. But I got the issue. THanks
I'm butting in totally out of context here, but you're aware of the built-in ROWID column, right? Could you use that for the default sort order?
At least on my INBOX, order by rowid provides the exact same results as no order by clause at all; ie, semi-random. Ordering by dreceived would make a whole lot of sense to me, since if this was an mbox-format folder, it would be a very good approximation of what would be the on-disk sequence. However, it's not INTEGER but NUMERIC, and seems to be slow to use for sort. Why is it NUMERIC? If the point was to have larger-than-32 bits values, then INTEGER(8) would have been the right choice, and NUMERIC nearly the worst possible choice: http://sqlite.org/datatype3.html: "A column with NUMERIC affinity may contain values using all five storage classes. When text data is inserted into a NUMERIC column, an attempt is made to convert it to an integer or real number before it is stored." Then again, I don't see why it isn't a DATETIME either, nor do I see why there are n tables with the same structure repeated for each folder rather than having a relation to a folder list table and just a single message list table.. Anyways, some timing data now that I ran it. Results stripped, each query ran three times in sequence to approximate cache heat and get some repeatable timings. sqlite> select count(*) from INBOX; 902 sqlite> select uid,date(dsent,'unixepoch'),subject from INBOX; ... CPU Time: user 0.012998 sys 0.006999 CPU Time: user 0.008999 sys 0.007998 CPU Time: user 0.014998 sys 0.006000 sqlite> select uid,date(dsent,'unixepoch'),subject from INBOX order by uid; ... CPU Time: user 0.013998 sys 0.007999 CPU Time: user 0.015998 sys 0.006000 CPU Time: user 0.013998 sys 0.007998 sqlite> select uid,date(dsent,'unixepoch'),subject from INBOX order by dreceived; ... CPU Time: user 0.026996 sys 0.004999 CPU Time: user 0.023996 sys 0.004999 CPU Time: user 0.024997 sys 0.005999
Created attachment 133112 [details] [review] proposed eds patch for evolution-data-server; I really like the word "random". It's misused sometimes, like here, mostly means 'we didn't find the pattern/exact circumstances when this happen, thus we call it random'. Here the order isn't broken on sqlite side, but on camels. It stores infos based on the order given by GHashTable, which might seem as random to most people :) With this patch, just change something on all messages in the folder, go out of the folder and back and it should be fine again. I thought it will require to bump table version to let it recreate rows, but it wasn't. I consider it as a bug, but let me investigate first. Necessary patches to let this work fully and properly for eex and ema follows.
Created attachment 133113 [details] [review] proposed eex patch for evolution-exchange;
Created attachment 133114 [details] [review] proposed ema patch for evolution-mapi;
Might it be easier to use a GTree instead of a GHashTable? GTree works like a hash table but keeps the keys sorted for you.
Nope, it won't work for existing data. What about modification/inserts that are done in between? We don't rewrite entire summary as before . Just the modified items are written. Sqlite updates are costly, and so we delete+insert records. It wont work well. It should be done like described below. When message list does a search for UIDS, in message-list.c:regen_* udis = camel_folder_search_by_expression it gets the UIDS. and adds as rows. These uid needs to be sorted depending on the backend. If mbox, the sorting should be based on minfo->from_pos, else if IMAP sort uids based on strtoul(minfo->uid) and it varies based on providers. Milan, it has to be done something like what you started. But this is the fix. Do compare_uids. And in the providers, override them and compare based on what is reqd by the providers. I hope Im clear.
(In reply to comment #11) > Might it be easier to use a GTree instead of a GHashTable? > GTree works like a hash table but keeps the keys sorted for you. Nope, tried that the first time. If the provider doesn't subclass cmp_uids properly, then two different UIDs with the same decimal int prefix results in 0 when comparing, which in the GTree means getting rid of one of them. Thus I rather used the safe way.
Sorry the last two patches would be still needed.
Created attachment 133119 [details] [review] proposed eds patch ][ for evolution-data-server; Just a relevant part left.
Created attachment 133121 [details] [review] proposed evo patch for evolution; Hmm, I do not touch all the family often.
Created attachment 133130 [details] [review] proposed eds patch ]I[ for evolution-data-server; Final patch. The code in evo should stay as is, it's fine that with this change. srag, consider using camel_folder_summary_ensure_infos_loaded, it's the correct way to ensure all infos are loaded in memory. You cannot do the same in the way you do that now, by comparing showuids->len with the summary cache size (loaded infos), as you do not know which ids are loaded and which not.
Created attachment 133180 [details] [review] proposed eds patch IV for evolution-data-server; I hope this is the final resend. Added changes to mimic the actual "logic" in camel_folder_summary_ensure_infos_loaded though I believe it'll not work when changing from two totally different filters (half messages with one label and the other half with other). I also noticed the remove_cache is not called before the first reload_from_db call (or maybe some other function, I didn't check fully). Just a note, nothing more. Notice also a little change in maildir and mbox sort_uids functions.
Awesome.. to trunk. Just take care of GW case as we dicussed. Thanks a lot Milan
Created commit 6f92ee2 in eds master. (with a gw addon) Created commit 70fe80d in evo master. Created commit c1362a3 in eex master. Created commit ae23535 in ema master.
The change here breaks Camel's ABI due to the new cmp_uids member of CamelFolderClass, which I'm fine with. Just a reminder to bump the soname.
(In reply to comment #21) > The change here breaks Camel's ABI due to the new cmp_uids member of > CamelFolderClass, which I'm fine with. Just a reminder to bump the soname. I never did such thing before, when breaking object sizes, neither I know where to and how. I thought it's fine as the package version requirement on eds matches its own version.
The version requirement is fine for evo, but anything else linking to libcamel may need to be rebuilt. I think you just need to bump LIBCAMEL_CURRENT from 14 to 15 and reset LIBCAMEL_REVISION to 0.
Matt, I want to keep off the version bumps and use the version dependcy. I dont think any one depends explicity on it.
If we're not going to keep the soname version accurate, why are we versioning it at all?
*** Bug 580476 has been marked as a duplicate of this bug. ***
(In reply to comment #24) > Matt, I want to keep off the version bumps and use the version dependcy. I > dont think any one depends explicity on it. At least totem-pl-parser depends on libcamel. Please, when you change ABIs, change the sonames in the same way. Otherwise it makes our life as distributors a nightmare.
Josselin, But totel-pl-parser would have no business with CamelFolder. It should be safe for it to use it. Beliveme, the totem would work seamlessly with both the versions on binary form. Matt, I want to bump versions only when there is a change, what everyone should adopt with. Disk summary was one such thing.
*** Bug 554542 has been marked as a duplicate of this bug. ***