After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 771896 - Improve performance in CoreCache handling
Improve performance in CoreCache handling
Status: RESOLVED WONTFIX
Product: banshee
Classification: Other
Component: general
git master
Other All
: Normal normal
: ---
Assigned To: Banshee Maintainers
Banshee Maintainers
gnome[unmaintained]
Depends on:
Blocks:
 
 
Reported: 2016-09-23 19:14 UTC by Nicolas
Modified: 2020-03-17 10:00 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch that avoids performance degradation potentizing with increasing number of tracks (1.07 KB, text/plain)
2016-10-02 14:55 UTC, Nicolas
  Details
Patch that avoids performance degradation potentizing with increasing number of tracks (1.04 KB, patch)
2016-10-13 17:36 UTC, Nicolas
none Details | Review
Patch that avoids performance degradation potentizing with increasing number of tracks (1.07 KB, patch)
2016-10-19 17:16 UTC, Nicolas
none Details | Review

Description Nicolas 2016-09-23 19:14:41 UTC
Hi,

this is a request for a performance improvement at populating the temporary CoreCache table at startup with ModelID 9.

For me, this takes more than 12 seconds and it is done 3 times at start!

[4 Debug 20:27:07.706] Executed in 12581ms 
                    DELETE FROM CoreCache WHERE ModelID = 9;
                        INSERT INTO CoreCache (ModelID, ItemID) SELECT 9, CoreTracks.TrackID 
                FROM (SELECT MIN(CoreTracks.TrackID) AS TrackID, CoreTracks.Year FROM CoreTracks GROUP BY CoreTracks.Year) AS CoreTracks
                WHERE CoreTracks.Year IN
                    (SELECT CoreTracks.Year FROM CoreTracks, CoreCache
                        WHERE CoreCache.ModelID = 71 AND
                              CoreCache.ItemID = CoreTracks.TrackID )
                    ORDER BY Year

Can s. o. please fix this with adding the DISTINCT keyword to the subselect that lists all available years:

DELETE FROM CoreCache WHERE ModelID = 9;
INSERT INTO CoreCache (ModelID, ItemID)
	SELECT 9, CoreTracks.TrackID FROM (SELECT MIN(CoreTracks.TrackID) AS TrackID, CoreTracks.Year FROM CoreTracks GROUP BY CoreTracks.Year) AS CoreTracks
    WHERE CoreTracks.Year IN (SELECT DISTINCT CoreTracks.Year FROM CoreTracks, CoreCache WHERE CoreCache.ModelID = 71 AND CoreCache.ItemID = CoreTracks.TrackID)
    ORDER BY Year;

(I replayed all sql statements that were printed to the console manually in a sql client and the execution time fell to zero for this statement after adding DISTINCT)

Thanks,
Nicolas
Comment 1 Nicolas 2016-10-02 14:55:32 UTC
Created attachment 336760 [details]
Patch that avoids performance degradation potentizing with increasing number of tracks

While my proposal improved the performance flaw at startup, I found another proposal that improved the performance at rescanning the library as well (there my improvement did not worked).

For avoiding the need to make major modifications to the banshee code architecture, I altered the proposal by Marcello Perathoner from this bug comment https://bugzilla.gnome.org/show_bug.cgi?id=740879#c28 to fit into the ReloadFragmentFormat variable (FROM part only of the query).

All the delays are gone now!
So please incorporate the patch into master :)
Comment 2 Andrés G. Aragoneses (IRC: knocte) 2016-10-04 03:27:50 UTC
Hi Nicolas thanks for your patch!

I'm wondering why this bug is titled "by using DISTINCT", but your patch doesn't add DISTINCT at all?
Comment 3 Nicolas 2016-10-04 10:39:10 UTC
Hi Andrés,

the title is obsolete as my first improvement only added the DISTINCT keyword to the original query, but the second improvement based on a completely rewritten query by Marcello Perathoner that used the GROUP BY mechanism to reduce all records in the result set to one per year.
Comment 4 Andrés G. Aragoneses (IRC: knocte) 2016-10-04 10:50:41 UTC
(In reply to Nicolas from comment #3)
> ...but the second improvement...

Second improvement? Oh, so you're going to provide two patches here?
Comment 5 Nicolas 2016-10-04 12:51:59 UTC
(In reply to Andrés G. Aragoneses (IRC: knocte) from comment #4)
> (In reply to Nicolas from comment #3)
> > ...but the second improvement...
> 
> Second improvement? Oh, so you're going to provide two patches here?

Nooo ... noooo ... the first improvement was written into the original description only.

The story (optional):
At that time I worked on the debug output on the console only (where all sql queries are printed out) and I hadn't identified the source code class where this comes from - meanwhile I teached myself some banshee code architecture stuff and therefore I a) found the class responsible for these queries which allowed me to create the patch and b) was able to really compile banshee (with some stuff deactivated)
Comment 6 Adam Szmigin 2016-10-09 23:25:29 UTC
(In reply to Nicolas from comment #5)
> At that time I worked on the debug output on the console only (where all sql
> queries are printed out) and I hadn't identified the source code class where
> this comes from - meanwhile I teached myself some banshee code architecture
> stuff and therefore I a) found the class responsible for these queries which
> allowed me to create the patch and b) was able to really compile banshee
> (with some stuff deactivated)

Hi Nicolas,

I am able to confirm this behaviour (slow reload SQL statements from DatabaseYearListModel) on version 2.6.3+sarah, which is the version currently provided by my distro, Linux Mint 18:
http://packages.linuxmint.com/pool/upstream/b/banshee/

I was able to rebuild from source to test your patch, but I don't think it's quite right.  Two issues:
* You forgot to select CoreTracks.Year in the SELECT statement
* You left off the 0th positional parameter {0} after "FROM .. CoreCache"

I used:

    ReloadFragmentFormat = @"
        FROM (SELECT MIN(CoreTracks.TrackID) AS TrackID, CoreTracks.Year
              FROM CoreTracks, CoreCache{0}
              WHERE (CoreCache.ModelID = {1}) AND CoreCache.ItemID = {2} {3}
              GROUP BY Year) AS CoreTracks";

Note I also removed {4} from after the WHERE keyword, but that is because the Linux Mint version doesn't have code for sqlite's likely/unlikely query hints; I don't think there is any problem regarding {4} in your patch.

Once the above issues were corrected, I could verify that the new SQL is indeed much more efficient, and I didn't see any further slowdown on the cache reloads for tracks by year.

Kind regards,

Adam


Side note: Apologies for referencing a distro-specific codebase in this reply.  I am trying to replicate the problem on the stable-2.6 branch from https://git.gnome.org/browse/banshee/ but I am also experiencing other issues when building stable-2.6 that lead me to not trust the results at present.
Comment 7 Andrés G. Aragoneses (IRC: knocte) 2016-10-10 09:25:33 UTC
Comment on attachment 336760 [details]
Patch that avoids performance degradation potentizing with increasing number of tracks

Thanks for the review Adam! Marking patch as such.
Comment 8 Nicolas 2016-10-12 18:39:26 UTC
Hi Adam,

thank you for your great feedback!

(In reply to Adam Szmigin from comment #6)
> I am able to confirm this behaviour (slow reload SQL statements from
> DatabaseYearListModel) on version 2.6.3+sarah, which is the version
> currently provided by my distro, Linux Mint 18:
> http://packages.linuxmint.com/pool/upstream/b/banshee/
This is my distro as well. I encountered and investigated the issue with the distribution package version first. Then I fetched the source code from git-master to check whether it is improved there already - but it isn't.

> I was able to rebuild from source to test your patch, but I don't think it's
> quite right.  Two issues:
> * You forgot to select CoreTracks.Year in the SELECT statement
Are you sure that this field is needed there? I admit that I always have to investigate a lot of the banshee code architecture before I am able to identify anything, cause it is quite complex - but I found no usage of the CoreTracks.Year field anywhere.
In fact, the only usages of this specific query fragment seem to be in SqliteModelCache.cs:337, where it is appended to reload_sql that is only populated in the same class at lines 123, 137 and 155. From these lines, only 155 seem to apply on the DatabaseYearListModel, as the conditions around lines 123 and 137 are DatabaseYearListModel.CachesJoinTableEntries and DatabaseYearListModel.CachesValues which are both false. In line 155, the outer SELECT of our fragment ist build with the variables uid and provider.PrimaryKey only - no year here.

But maybe I missed something somewhere in the overloaded derived landscape? Though I scanned them as well, I can not tell for sure.

The year information for populating the UI comes from the DatabaseYearInfo : YearInfo class that seems to be populated by an additional select.

> * You left off the 0th positional parameter {0} after "FROM .. CoreCache"
Ouh, I simply missed this, sorry!

> Once the above issues were corrected, I could verify that the new SQL is
> indeed much more efficient, and I didn't see any further slowdown on the
> cache reloads for tracks by year.
Credits go to Marcello Perathoner :)

> Side note: Apologies for referencing a distro-specific codebase in this
> reply.  I am trying to replicate the problem on the stable-2.6 branch from
> https://git.gnome.org/browse/banshee/ but I am also experiencing other
> issues when building stable-2.6 that lead me to not trust the results at
> present.
I had some hard times as well, until getting git master to work - some problems I had because of my lack of expierences with a) developing on Linux in general, b) Mono/Monodevelop and c) GTK, of course.
But additionally there seem to be some implementations of additional abstract elements of newer gtk libraries missing in master.

Nicolas
Comment 9 Adam Szmigin 2016-10-12 19:22:32 UTC
Hi Nicolas,

(In reply to Nicolas from comment #8)
> (In reply to Adam Szmigin from comment #6)
> > I am able to confirm this behaviour (slow reload SQL statements from
> > DatabaseYearListModel) on version 2.6.3+sarah, which is the version
> > currently provided by my distro, Linux Mint 18:
> > http://packages.linuxmint.com/pool/upstream/b/banshee/
> This is my distro as well. I encountered and investigated the issue with the
> distribution package version first.

You might be interested in the Linux Mint-specific bug that I raised here:
https://bugs.launchpad.net/linuxmint/+bug/1632249

I am going to try to test whether the UNLIKELY query hint to SQLite is an alternative way to resolve this bug (although, as Marcello points out in the other thread, it isn't a robust fix as the SQL remains suboptimal).

Will report back here..

> > * You forgot to select CoreTracks.Year in the SELECT statement
> Are you sure that this field is needed there?

I admit to not checking the usage of the field; I assumed it was an accidental omission whilst doing a cursory check, and erred on the side of not changing any more than strictly necessary as I'm also not a Banshee architecture expert.

Many thanks,

Adam
Comment 10 Nicolas 2016-10-12 20:20:59 UTC
Hi Adam,

(In reply to Adam Szmigin from comment #9)
> You might be interested in the Linux Mint-specific bug that I raised here:
> https://bugs.launchpad.net/linuxmint/+bug/1632249
Good description, I flagged me as affected.

> I am going to try to test whether the UNLIKELY query hint to SQLite is an
> alternative way to resolve this bug (although, as Marcello points out in the
> other thread, it isn't a robust fix as the SQL remains suboptimal).
As far as I understand (which is not really far, as I never heard from the UNLIKELY thing before) the likelihood stuff originally is for a different case - for LIKE queries, as the name suggests.
The problem with our query here is that the ".. WHERE .. year IN (SELECT .." query part checks each of these year values against all year values in the subselect. The subselect returns a list of redundant years as it seems to simply return all the year values of all tracks in cache. This means for e. g. 1000 tracks from 2016, the subselect lists 1000 times the value 2016 and each year from the outer select is compared 1000 times against 2016 ..... to shoot with likelihood cannons on an armee of redundant integer value ants is not quite appropriate. :D

> I admit to not checking the usage of the field; I assumed it was an
> accidental omission whilst doing a cursory check, and erred on the side of
> not changing any more than strictly necessary as I'm also not a Banshee
> architecture expert.
You are right, this field does not harm anything and it is best to reduce the changes to the necessary parts.

Thank you for all your input, I will create a new patch.

Nicolas
Comment 11 Andrés G. Aragoneses (IRC: knocte) 2016-10-13 02:30:30 UTC
(In reply to Nicolas from comment #10)
> You are right, this field does not harm anything and it is best to reduce
> the changes to the necessary parts.

Yes please, the least unrelated changes (which should go in different patch), the better, as it will have less chances of creating regressions, and it would be easier to review.

> Thank you for all your input, I will create a new patch.

Cool, keep up the good work!
Comment 12 Nicolas 2016-10-13 17:36:32 UTC
Created attachment 337656 [details] [review]
Patch that avoids performance degradation potentizing with increasing number of tracks

Updated patch that includes the two parts, missing previously
Comment 13 Andrés G. Aragoneses (IRC: knocte) 2016-10-14 08:36:45 UTC
Aren't you missing the "ORDER BY Year" at the end?
Comment 14 Nicolas 2016-10-16 16:04:58 UTC
(In reply to Andrés G. Aragoneses (IRC: knocte) from comment #13)
> Aren't you missing the "ORDER BY Year" at the end?
Good catch, Andrés!

Though the output of GROUP BY seems to be sorted, I just investigated that this isn't guaranteed.

While adding the order statement, I just gave it a try to invert the order - and it really affected the output in the banshee year list!
I wasn't aware of this until now, but the ascending order of the years list made me turn it off a long time ago, as I am mainly interested in current stuff.

Now ... just a question ..... what would you think of descending the order of years in this list?

I know that this is kind of a personal preference and changes like these may upset some people. But IMHO I have a tendency to assume that most people would prefer a "current to older" sorted list and "only" (no offense!) the people that mainly listen to classical music would prefer it the other way round.
Comment 15 Andrés G. Aragoneses (IRC: knocte) 2016-10-17 05:11:41 UTC
(In reply to Nicolas from comment #14)
> Now ... just a question .....

Sounds fair enough, but this is unrelated to this bug. Please open a new bug for discussing that.
Comment 16 Nicolas 2016-10-19 17:16:38 UTC
Created attachment 338038 [details] [review]
Patch that avoids performance degradation potentizing with increasing number of tracks

New patch, fixing the performance issue with the year list for big track collections.
* The previously missing parameter is included
* The original field list in output is preserved
* The original order of records is assured

(sorry for slow reaction, I was really busy)
Comment 17 Andrés G. Aragoneses (IRC: knocte) 2016-10-20 06:17:56 UTC
Thanks! So, by the looks of it, this is a generalized problem that could also be fixed in the other classes like DatabaseYearListModel, such as DatabaseAlbumListModel, DatabaseArtistListModel, DatabaseTrackListModel, etc, right?
Comment 18 Nicolas 2016-10-20 10:31:25 UTC
(In reply to Andrés G. Aragoneses (IRC: knocte) from comment #17)
> Thanks! So, by the looks of it, this is a generalized problem that could
> also be fixed in the other classes like DatabaseYearListModel, such as
> DatabaseAlbumListModel, DatabaseArtistListModel, DatabaseTrackListModel,
> etc, right?
This is possible, but I didn't checked this out so far as there were no further performance issues when building the cache.
But I can check this, if needed.
Comment 19 Adam Szmigin 2016-10-24 21:55:43 UTC
(In reply to Andrés G. Aragoneses (IRC: knocte) from comment #17)
> Thanks! So, by the looks of it, this is a generalized problem that could
> also be fixed in the other classes like DatabaseYearListModel, such as
> DatabaseAlbumListModel, DatabaseArtistListModel, DatabaseTrackListModel,
> etc, right?

I'm sure the SQL could be better, but I am unable to identify any slowness from queries in those other files.  Only DatabaseYearListModel appears to run slowly on my machine.

FWIW, I've confirmed the patch as working.

Kind regards,
Adam
Comment 20 André Klapper 2020-03-17 10:00:55 UTC
Banshee is not under active development anymore and had its last code changes more than three years ago. Its codebase has been archived.

Closing this report as WONTFIX as part of Bugzilla Housekeeping to reflect
reality. Please feel free to reopen this ticket (or rather transfer the project
to GNOME Gitlab, as GNOME Bugzilla is being shut down) if anyone takes the
responsibility for active development again.
See https://gitlab.gnome.org/Infrastructure/Infrastructure/issues/264 for more info.