GNOME Bugzilla – Bug 623891
last_insert_rowid's performance is bad.
Last modified: 2018-09-21 13:41:18 UTC
After so much benchmarking and hacking on symbol-db population flow trying to understand why it was so slow, I probably found the reason. last_insert_rowid information retrieved with a gda_connection_statement_execute_non_select () statement is the bottle neck. When libgda retrieves last_insert_rowid information it becomes 10x times slower than SQLite native apis! 7.70 vs 0.80 When the sqlite3_last_insert_rowid (db) is used on sqlite side it's performance grows up only for some milliseconds, but outperforms libgda in any case. Please check the benchmark programs here http://git.gnome.org/browse/anjuta/tree/plugins/symbol-db/benchmark/libgda?h=sdb-core-trans and http://git.gnome.org/browse/anjuta/tree/plugins/symbol-db/benchmark/sqlite?h=sdb-core-trans to have some tests. you don't need to compile Anjuta to have them working, you can just compile them out of the box.
I've committed some changes to improve performances, but it only goes down to 5.5s instead of 8, which still leaves a big gap... However last_insert_rowid in Libgda actually retreives a complete row of data, which is more than only getting a rowid, so I guess the SQlite test should also include retreiving the complete row. Anyway, Libgda can't compete on the raw performances with SQLite so my advice would either be to use SQLite directly (but you'll the ability to use another dbms in the future, if you ever wanted to and you'll use the other benefits of Libgda if you use them), and also I don't think it would solve the performances problem completely because you'll always find a project with more symbols and thus a longer time to insert into a db; so IMHO you'll still have, at one point or another, to use one or more worker threads to actually do the job (if this is already the case, please forget this).
Committed to git/master BTW. I'll commit the changes to LIBGDA_4.0 before the next release.
(In reply to comment #1) > I've committed some changes to improve performances, but it only goes down to > 5.5s instead of 8, which still leaves a big gap... > However last_insert_rowid in Libgda actually retreives a complete row of data, > which is more than only getting a rowid, so I guess the SQlite test should also > include retreiving the complete row. > > Anyway, Libgda can't compete on the raw performances with SQLite so my advice > would either be to use SQLite directly (but you'll the ability to use another > dbms in the future, if you ever wanted to and you'll use the other benefits of > Libgda if you use them), and also I don't think it would solve the performances > problem completely because you'll always find a project with more symbols and > thus a longer time to insert into a db; so IMHO you'll still have, at one point > or another, to use one or more worker threads to actually do the job (if this > is already the case, please forget this). thanks for the patch. Now it goes a little bit faster. I asked to the other Anjuta devs whether they wanted to switch to pure SQLite apis but I stil haven't a answer. I suppose that for them it's ok to use libgda in the core. We're using libgda master so there's no hurry to committing in LIBGDA_4.0. I'm wondering anyway why last_insert GdaSet keeps track of all the info just inserted in a table. Wouldn't it be better to just have an id? On my experience I hardly found a case where I had to use the data already inserted. Maybe this could be a feature request for libgda next version: let the user choice whether to have only an id (fastest method) or having also additional data. With this feature we can keep libgda without problems on the core because it would automatically become faster than it's now, even if it has a little (but it's ok) overhead due to gobject architecture.
feel free to close the bug, but please consider the feature request on comment #3.
(In reply to comment #3) > > thanks for the patch. Now it goes a little bit faster. > I asked to the other Anjuta devs whether they wanted to switch to pure SQLite > apis but I stil haven't a answer. I suppose that for them it's ok to use libgda > in the core. > We're using libgda master so there's no hurry to committing in LIBGDA_4.0. > I'm wondering anyway why last_insert GdaSet keeps track of all the info just > inserted in a table. Wouldn't it be better to just have an id? On my experience > I hardly found a case where I had to use the data already inserted. > Maybe this could be a feature request for libgda next version: let the user > choice whether to have only an id (fastest method) or having also additional > data. The problem is once you've got an ID, how do you go to fetch the actual value? SQLite's got a rowID, postgreSQL's got one sometimes, for MySQL there is something slightly different, hence the current API which returns something useable but at a performance cost. But I agree something should be done to improve performances, so I think it'd be better to: * actually execute the SELECT statement only if needed * subclass the GdaSet returned * allow the user to specify which column of data should be fetched. I'll add this to the TODO list.
Oh well, you're right, there are different dbs to manage... Thanks in advance anyway
Hi Vivien, have you scheduled some TODO improvements for this bug? thanks
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/libgda/issues/49.