Bug 563954 – summary list order is 'random'

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 563954 - summary list order is 'random'


Summary:	summary list order is 'random'


Status:	RESOLVED FIXED

Product:	evolution
Classification:	Applications
Component:	Mailer
Version:	2.24.x (obsolete)
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Milan Crha
QA Contact:	Evolution QA team

URL:
Whiteboard:	evolution[disk-summary]

Duplicates:	554542 563123 580476 (view as bug list)
Depends on:
Blocks:	543389

Reported:	2008-12-10 09:12 UTC by oa
Modified:	2009-10-22 14:20 UTC

See Also:
GNOME target:	---
GNOME version:	2.23/2.24

Attachments
proposed eds patch (7.06 KB, patch) 2009-04-22 13:53 UTC, Milan Crha	rejected	Details \| Review
proposed eex patch (1.85 KB, patch) 2009-04-22 13:57 UTC, Milan Crha	committed	Details \| Review
proposed ema patch (1.57 KB, patch) 2009-04-22 13:59 UTC, Milan Crha	committed	Details \| Review
proposed eds patch ][ (4.50 KB, patch) 2009-04-22 15:11 UTC, Milan Crha	none	Details \| Review
proposed evo patch (933 bytes, patch) 2009-04-22 15:14 UTC, Milan Crha	committed	Details \| Review
proposed eds patch ]I[ (15.98 KB, patch) 2009-04-22 18:13 UTC, Milan Crha	none	Details \| Review
proposed eds patch IV (16.44 KB, patch) 2009-04-23 14:11 UTC, Milan Crha	committed	Details \| Review

Description oa 2008-12-10 09:12:06 UTC

Please describe the problem:
Another regression between 2.22 (and typical, expected mail summary behavior) and 2.24:

If a folder summary list is not specifically sorted by any particular column, the normal behavior of mailers, including Evo up to 2.22, is to show the messages in the arrival order (ie, what in an mbox-format mailbox would be the on-disk order). This is such an old convention that it should be considered de-facto standard.

Evo 2.24 will show them basically whatever order the sqlite backend returns them in. Worse yet, every time the summary is updated by eg deleting one message, the entire summary is reordered and the messages which one second ago were "next in list" may now be somewhere else, often "behind" the new selection. 

This basically makes it impossible to traverse efficiently through a busy INBOX. Sorting via sent date is not a good workaround, because sent dates are a) not necessarily correct as they depend on sender having accurate clock b) often out-of-order due to message delivery delays.


Steps to reproduce:


Actual results:


Expected results:


Does this happen every time?


Other information:

Comment 1 Srinivasa Ragavan 2008-12-10 16:28:40 UTC

Infact, things are according to the order of UID. Even though we store in db, the UIDS are ordered and fetched and they both are in the same order wrt 2.23, 2.24.

But anyways, lemme test out, how it looks.

Comment 2 Akhil Laddha 2008-12-31 02:57:45 UTC

*** Bug 563123 has been marked as a duplicate of this bug. ***

Comment 3 oa 2009-02-02 10:02:53 UTC

Installed 2.24.4 and tried again. Yes, you may be ordering by UID, but since you're storing UID as text instead of integer, that's pretty much meaningless:

sqlite> .schema INBOX
CREATE TABLE 'INBOX' (  uid TEXT PRIMARY KEY , flags INTEGER , msg_type INTEGER , read INTEGER , deleted INTEGER , replied INTEGER , important INTEGER , junk INTEGER , attachment INTEGER , msg_security INTEGER , size INTEGER , dsent NUMERIC , dreceived NUMERIC , subject TEXT , mail_from TEXT , mail_to TEXT , mail_cc TEXT , mlist TEXT , followup_flag TEXT , followup_completed_on TEXT , followup_due_by TEXT , part TEXT , labels TEXT , usertags TEXT , cinfo TEXT , bdata TEXT );

sqlite> select uid, date(dreceived,'unixepoch') from INBOX order by uid;
124876|2009-01-19
124877|2009-01-19
124879|2009-01-19
[snipsnip]
35896|2007-11-16
35973|2007-11-20
35975|2007-11-20
36027|2007-11-22

The more recent the folder is you're looking at, the more often digits roll over and the sorting gets screwed up.

Comment 4 oa 2009-02-02 10:16:34 UTC

But in fact, the UID order is not how an "unsorted" mailbox is shown, either. I did manage to narrow it down, though. If I select messages from the SQLite database without any ORDER BY clause, I get them in the same order as Evolution shows them; that is, a semi-random order which might be SQLite's preferred index order.

You should always set an ORDER BY column, which by default should either be ABS(uid) (which should convert them to integers and sort in numeric order, though I don't know if IMAP uids are guaranteed to by int), or by dreceived.

Comment 5 Srinivasa Ragavan 2009-02-02 16:43:53 UTC

Actually I was wrong initially. We get the uids and sort initially. But the message list does the search and picks the uid, which isn't sorted. I could do a simple fix to sort it, but it gonna result in slowness. I think I need to make the db-sort with a integer column. But I got the issue. THanks

Comment 6 Matthew Barnes 2009-02-02 17:27:23 UTC

I'm butting in totally out of context here, but you're aware of the built-in ROWID column, right?  Could you use that for the default sort order?

Comment 7 oa 2009-02-02 18:11:59 UTC

At least on my INBOX, order by rowid provides the exact same results as no order by clause at all; ie, semi-random.

Ordering by dreceived would make a whole lot of sense to me, since if this was an mbox-format folder, it would be a very good approximation of what would be the on-disk sequence. However, it's not INTEGER but NUMERIC, and seems to be slow to use for sort. Why is it NUMERIC? If the point was to have larger-than-32 bits values, then INTEGER(8) would have been the right choice, and NUMERIC nearly the worst possible choice:

http://sqlite.org/datatype3.html: "A column with NUMERIC affinity may contain values using all five storage classes. When text data is inserted into a NUMERIC column, an attempt is made to convert it to an integer or real number before it is stored."

Then again, I don't see why it isn't a DATETIME either, nor do I see why there are n tables with the same structure repeated for each folder rather than having a relation to a folder list table and just a single message list table..

Anyways, some timing data now that I ran it. Results stripped, each query ran three times in sequence to approximate cache heat and get some repeatable timings.

sqlite> select count(*) from INBOX;
902
sqlite> select uid,date(dsent,'unixepoch'),subject from INBOX;
...
CPU Time: user 0.012998 sys 0.006999
CPU Time: user 0.008999 sys 0.007998
CPU Time: user 0.014998 sys 0.006000
sqlite> select uid,date(dsent,'unixepoch'),subject from INBOX order by uid;
...
CPU Time: user 0.013998 sys 0.007999
CPU Time: user 0.015998 sys 0.006000
CPU Time: user 0.013998 sys 0.007998
sqlite> select uid,date(dsent,'unixepoch'),subject from INBOX order by dreceived;
...
CPU Time: user 0.026996 sys 0.004999
CPU Time: user 0.023996 sys 0.004999
CPU Time: user 0.024997 sys 0.005999

Comment 8 Milan Crha 2009-04-22 13:53:52 UTC

Created attachment 133112 [details] [review]
proposed eds patch

for evolution-data-server;

I really like the word "random". It's misused sometimes, like here, mostly means 'we didn't find the pattern/exact circumstances when this happen, thus we call it random'. Here the order isn't broken on sqlite side, but on camels. It stores infos based on the order given by GHashTable, which might seem as random to most people :)

With this patch, just change something on all messages in the folder, go out of the folder and back and it should be fine again. I thought it will require to bump table version to let it recreate rows, but it wasn't. I consider it as a bug, but let me investigate first.

Necessary patches to let this work fully and properly for eex and ema follows.

Comment 9 Milan Crha 2009-04-22 13:57:28 UTC

Created attachment 133113 [details] [review]
proposed eex patch

for evolution-exchange;

Comment 10 Milan Crha 2009-04-22 13:59:28 UTC

Created attachment 133114 [details] [review]
proposed ema patch

for evolution-mapi;

Comment 11 Matthew Barnes 2009-04-22 14:01:36 UTC

Might it be easier to use a GTree instead of a GHashTable?
GTree works like a hash table but keeps the keys sorted for you.

Comment 12 Srinivasa Ragavan 2009-04-22 14:22:13 UTC

Nope, it won't work for existing data. What about modification/inserts that are done in between? We don't rewrite entire summary as before . Just the modified items are written. Sqlite updates are costly, and so we delete+insert records. It wont work well. It should be done like described below.

When message list does a search for UIDS, in

message-list.c:regen_* 
udis = camel_folder_search_by_expression

it gets the UIDS. and adds as rows. These uid needs to be sorted depending on the backend. 

If mbox, the sorting should be based on 
 minfo->from_pos,
else if IMAP
 sort uids based on strtoul(minfo->uid)
and it varies based on providers.

Milan, it has to be done something like what you started. But this is the fix.

Do compare_uids. And in the providers, override them and compare based on what is reqd by the providers.

I hope Im clear.

Comment 13 Milan Crha 2009-04-22 14:25:17 UTC

(In reply to comment #11)
> Might it be easier to use a GTree instead of a GHashTable?
> GTree works like a hash table but keeps the keys sorted for you.

Nope, tried that the first time. If the provider doesn't subclass cmp_uids properly, then two different UIDs with the same decimal int prefix results in 0 when comparing, which in the GTree means getting rid of one of them. Thus I rather used the safe way.

Comment 14 Srinivasa Ragavan 2009-04-22 14:46:07 UTC

Sorry the last two patches would be still needed.

Comment 15 Milan Crha 2009-04-22 15:11:17 UTC

Created attachment 133119 [details] [review]
proposed eds patch ][

for evolution-data-server;

Just a relevant part left.

Comment 16 Milan Crha 2009-04-22 15:14:54 UTC

Created attachment 133121 [details] [review]
proposed evo patch

for evolution;

Hmm, I do not touch all the family often.

Comment 17 Milan Crha 2009-04-22 18:13:45 UTC

Created attachment 133130 [details] [review]
proposed eds patch ]I[

for evolution-data-server;

Final patch. The code in evo should stay as is, it's fine that with this change.

srag, consider using camel_folder_summary_ensure_infos_loaded, it's the correct way to ensure all infos are loaded in memory. You cannot do the same in the way you do that now, by comparing showuids->len with the summary cache size (loaded infos), as you do not know which ids are loaded and which not.

Comment 18 Milan Crha 2009-04-23 14:11:55 UTC

Created attachment 133180 [details] [review]
proposed eds patch IV

for evolution-data-server;

I hope this is the final resend. Added changes to mimic the actual "logic" in 
camel_folder_summary_ensure_infos_loaded though I believe it'll not work when changing from two totally different filters (half messages with one label and the other half with other). I also noticed the remove_cache is not called before the first reload_from_db call (or maybe some other function, I didn't check fully). Just a note, nothing more.

Notice also a little change in maildir and mbox sort_uids functions.

Comment 19 Srinivasa Ragavan 2009-04-24 13:58:17 UTC

Awesome.. to trunk. Just take care of GW case as we dicussed. Thanks a lot Milan

Comment 20 Milan Crha 2009-04-24 18:37:41 UTC

Created commit 6f92ee2 in eds master. (with a gw addon)
Created commit 70fe80d in evo master.
Created commit c1362a3 in eex master.
Created commit ae23535 in ema master.

Comment 21 Matthew Barnes 2009-04-27 14:22:46 UTC

The change here breaks Camel's ABI due to the new cmp_uids member of CamelFolderClass, which I'm fine with.  Just a reminder to bump the soname.

Comment 22 Milan Crha 2009-04-27 14:35:53 UTC

(In reply to comment #21)
> The change here breaks Camel's ABI due to the new cmp_uids member of
> CamelFolderClass, which I'm fine with.  Just a reminder to bump the soname.

I never did such thing before, when breaking object sizes, neither I know where to and how. I thought it's fine as the package version requirement on eds matches its own version.

Comment 23 Matthew Barnes 2009-04-27 14:55:45 UTC

The version requirement is fine for evo, but anything else linking to libcamel may need to be rebuilt.  I think you just need to bump LIBCAMEL_CURRENT from 14 to 15 and reset LIBCAMEL_REVISION to 0.

Comment 24 Srinivasa Ragavan 2009-04-27 15:00:20 UTC

Matt, I want to keep  off the version bumps and use the version dependcy. I dont think any one depends explicity on it.

Comment 25 Matthew Barnes 2009-04-27 15:03:15 UTC

If we're not going to keep the soname version accurate, why are we versioning it at all?

Comment 26 Matthew Barnes 2009-04-27 16:22:22 UTC

*** Bug 580476 has been marked as a duplicate of this bug. ***

Comment 27 Josselin Mouette 2009-04-27 16:29:42 UTC

(In reply to comment #24)
> Matt, I want to keep  off the version bumps and use the version dependcy. I
> dont think any one depends explicity on it.

At least totem-pl-parser depends on libcamel.

Please, when you change ABIs, change the sonames in the same way. Otherwise it makes our life as distributors a nightmare.

Comment 28 Srinivasa Ragavan 2009-04-27 17:17:50 UTC

Josselin, But totel-pl-parser would have no business with CamelFolder. It should be safe for it to use it. Beliveme, the totem would work seamlessly with both the versions on binary form. 

Matt, I want to bump versions only when there is a change, what everyone should adopt with. Disk summary was one such thing.

Comment 29 Milan Crha 2009-10-22 14:20:18 UTC

*** Bug 554542 has been marked as a duplicate of this bug. ***