After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 575235 - gda_connection_update_meta_store() is slow
gda_connection_update_meta_store() is slow
Status: RESOLVED FIXED
Product: libgda
Classification: Other
Component: Client library
3.99.x
Other Linux
: Normal normal
: ---
Assigned To: malerba
gnome-db Maintainers
Depends on:
Blocks:
 
 
Reported: 2009-03-13 11:28 UTC by Murray Cumming
Modified: 2011-11-03 21:35 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
gda_meta_update_tables_with_schema_fails.c (5.03 KB, text/x-csrc)
2009-08-25 13:24 UTC, Murray Cumming
Details

Description Murray Cumming 2009-03-13 11:28:09 UTC
Compared to the equivalent schema stuff in libgda-3.0, gda_connection_update_meta_store() is very slow, at least for PostgreSQL. This is very noticeable in Glom, which takes around 1 minute to call this when opening databases. We try to reduce how often we call this, and we call it for single tables when possible, (which is fast), but we can't avoid calling it at least once.

I don't know yet what is causing the slowness. Kcachegrind is not showing me sensible information. Could it be all network IO?

If this really must be this slow then it really should be async, with a callback. Applications shouldn't be forced to block for 1 minute.
Comment 1 Murray Cumming 2009-03-13 11:42:44 UTC
I notice that libgda seems to use sqlite (regardless of what provider is used) to store this meta data. I wonder if that's a factor.
Comment 2 Murray Cumming 2009-03-13 12:22:49 UTC
I wonder if this is the old syncing issue that Firefox had with sqlite. I hope that all the sqlite stuff is happening in memory, but it would make it slow if it is doing lots of disk IO.
Comment 3 malerba 2009-03-13 12:49:07 UTC
Comparing to Libgda V3 is unfair since then far less meta data was retreived (the big problem was it did not handle schemas at all). I tried to make things better by providing 2 updates mode: one where everything is updated (if you pass a NULL GdaMetaContext in gda_connection_update_meta_store()) and one where only a part of the meta data is updated.

I believe things could be improved for PostgreSQL by rewriting/simplyfing some of the SELECT statements used when retreiving meta data: as you're probably aware, you can use the pg_catalog.* tables or the information_schema.* tables and views to retreive meta data but information_schema.* is slower, so some benefit could be gained by using only pg_catalog.* tables.

Anyway, the best solution to this problem is (as is done by the gda-sql tool):
* if the meta data is retreived for the 1st time, then run it in a thread different than the GUI one, and cache it in a database locally
* if a cached version already exists, then use it
* update some parts of the meta data when you know it has changed (which is quick, but can also be done in a non GUI thread)

(Be aware however that depending on how PostgreSQL was compiled, it may not support multi threading, which means the GdaConnection will only work for the thread which has created it.)

As for SQLite, it does not introduce any significative slowdown.
Comment 4 Murray Cumming 2009-03-13 15:49:02 UTC
I still doubt that there is so much extra metadata that it can explain an order of magnitude slowdown. However, I have no useful information. Do you know if it is syncing every change to disk immediately?

I have changed the Glom code to get just the table names and the supported field types, as well as getting the per-table meta data when it needs it. That is slightly faster, though getting the tables list is still surprisingly slow (5 or 6 seconds here). Hopefully that is enough for us.
Comment 5 Armin Burgmeier 2009-05-19 15:13:33 UTC
Getting the list of all tables seems to be that slow because it also fetches a lot of (internal?) postgresql tables. We can prevent these from being fetched by filtering on the table's schema. The tables Glom uses are all in the "public" schema. I tried this out, and it significantly speeds up the metastore update process, and Glom still seems to have all information it needs.

However, the schema name for the non-internal tables doesn't seem common for different database systems. For example, for postgresql it is "public", but for SQLite it is "main". Is there another way to cause it to only update the metastore for all the non-internal tables? Otherwise I think we can just hardcode the corresponding schema for each supported backend in Glom.
Comment 6 malerba 2009-05-19 15:27:07 UTC
Have you tried then to restrict the meta store update to the schemas which are not public, as:
GdaMetaContext mcontext = {"_schemata", 1, NULL, NULL};
mcontext.column_names = g_new (gchar *, 1);
mcontext.column_names[0] = "schema_internal";
mcontext.column_values = g_new (GValue *, 1);
g_value_set_boolean ((mcontext.column_values[0] = gda_value_new (G_TYPE_BOOLEAN)), FALSE);
gda_connection_update_meta_store (connection, &mcontext, &error))
Comment 7 malerba 2009-05-19 15:30:26 UTC
Typo correction: "restrict the meta store update to the schemas which are not internal"
Comment 8 Armin Burgmeier 2009-05-19 15:59:51 UTC
I don't think I fully understand: Basically I would like to get information about tables, not schemas, which is why I told it to update the "_tables" table, not the "_schemata" table. Am I supposed to call gda_connection_update_meta_store afterwards again for each non-internal schema? However, the code you posted already takes even longer to execute than just updating "_tables" for all schemas.
Comment 9 malerba 2009-05-19 18:52:25 UTC
The meta store update is done in a way that ensures the database internally used is never left in an inconsistent state. For example if table B references table A, when table A is updated then table B is also updated. This means that when the context specifies that the "_tables" table has to be updated, then all the tables which reference "_tables" are also updated (and recursively).

To illustrate this, you can modify the gda-connection.c file and change the #ifdef from GDA_DEBUG_NO to GDA_DEBUG and you'll see some debug info.

For example with:
GdaMetaContext mcontext = {"_schemata", 1, NULL, NULL};
mcontext.column_names = g_new (gchar *, 1);
mcontext.column_names[0] = "schema_internal";
mcontext.column_values = g_new (GValue *, 1);
g_value_set_boolean ((mcontext.column_values[0] = gda_value_new (G_TYPE_BOOLEAN)), FALSE);

you'll get:
UP: GdaMetaContext for table _information_schema_catalog_name: ---
->: GdaMetaContext for table _schemata:  [schema_internal => FALSE]
DN: GdaMetaContext for table _udt: ---
DN: GdaMetaContext for table _domains: ---
DN: GdaMetaContext for table _tables: ---
DN: GdaMetaContext for table _collations: ---
DN: GdaMetaContext for table _routines: ---
DN: GdaMetaContext for table _character_sets: ---
DN: GdaMetaContext for table _triggers: ---
[...MORE...]

and with:
GdaMetaContext mcontext2 = {"_tables", 2, NULL, NULL};
mcontext2.column_names = g_new (gchar *, 2);
mcontext2.column_names[0] = "table_name";
mcontext2.column_values = g_new (GValue *, 2);
g_value_set_string ((mcontext2.column_values[0] = gda_value_new (G_TYPE_STRING)), "customers");
mcontext2.column_names[1] = "table_schema";
g_value_set_string ((mcontext2.column_values[1] = gda_value_new (G_TYPE_STRING)), "main");

you'll get:
UP: GdaMetaContext for table _information_schema_catalog_name: ---
UP: GdaMetaContext for table _schemata:  [schema_name => main]
->: GdaMetaContext for table _tables:  [table_name => customers]  [table_schema => main]
DN: GdaMetaContext for table _triggers: ---
DN: GdaMetaContext for table _table_constraints: ---
DN: GdaMetaContext for table _columns: ---
DN: GdaMetaContext for table _referential_constraints: ---
DN: GdaMetaContext for table _referential_constraints: ---
DN: GdaMetaContext for table _key_column_usage: ---
DN: GdaMetaContext for table _check_column_usage: ---
DN: GdaMetaContext for table _key_column_usage: ---
DN: GdaMetaContext for table _check_column_usage: ---
DN: GdaMetaContext for table _view_column_usage: ---

You can notice that in the 2nd case much less tables are updated, which is the reason it's much faster.

Now the current API does not allow one to specify several GdaMetaContext at once so you can't at the same time specify to update only non internal schemas and only the "_tables" table. Maybe I should add that possibility in 4.2, but in the meanwhile, you'll have to hard code that "public" schema names as you proposed.
Comment 10 Murray Cumming 2009-05-29 15:35:28 UTC
Armin, this has made Glom much faster when doing this update.

However, I think you maybe caused Glom to not show any data immediately after creating from an example (it works when closing and reloading the file), with these errors:

Glom  Base_DB::query_execute_select(): Error while executing SQL
  SELECT "pg_group"."groname" FROM "pg_group"
Internal error (Database): Internal error: invalid provider handle
Glom  Base_DB::query_execute_select(): Error while executing SQL
  SELECT "pg_group"."groname" FROM "pg_group"
Internal error (Database): Internal error: invalid provider handle
Glom  Base_DB::query_execute_select(): Error while executing SQL
  SELECT "pg_group"."groname" FROM "pg_group"
Internal error (Database): Internal error: invalid provider handle
Glom  Base_DB::query_execute_select(): Error while executing SQL
  SELECT "pg_group"."groname", "pg_group"."grolist" FROM "pg_group" WHERE "pg_group"."groname" = 'glom_developer'
Internal error (Database): Internal error: invalid provider handle
DEBUG: User=glom_default_developer_user _is_ in the developer group on the server.
Glom  get_choice_values(): Error while executing SQL
  SELECT "artists"."artist_id", "artists"."name" FROM "artists" ORDER BY "artists"."artist_id"
Glom  get_choice_values(): Error while executing SQL
  SELECT "publishers"."publishers_id", "publishers"."name" FROM "publishers" ORDER BY "publishers"."publishers_id"
Glom  Base_DB::query_execute_select(): Error while executing SQL
  SELECT "pg_group"."groname" FROM "pg_group"
Internal error (Database): Internal error: invalid provider handle


Yeah, I should file this as a Glom bug really.
Comment 11 Armin Burgmeier 2009-05-29 18:06:01 UTC
I have also noticed this, and fixed it in git master now. The problem was unrelated to the metastore update, though.
Comment 12 Murray Cumming 2009-06-01 11:53:35 UTC
Thanks.

Armin, I think libgdamm needs more explanation of what this schema parameter is. So far it's just:

  /** Update the meta store information for the table named @a table_name
   * 
   * This can speed up the update of the meta store if you only need the information
   * for a specific table
   * @param table_name Name of the table where the information is needed
   * @param schema_name Name of the schema @table_name is in, or "" to update
   * all tables with the given name.
   * @return <tt>true</tt> if no error occurred.
   */
  bool update_meta_store_table(const Glib::ustring& table_name, const Glib::ustring& schema_name = Glib::ustring());
Comment 13 Armin Burgmeier 2009-06-02 17:14:34 UTC
I improved the documentation for it.
Comment 14 Murray Cumming 2009-07-17 11:29:51 UTC
The documentation is still really vague about what this schema parameter actually is or means. There's a TODO in the code about that now though.
Comment 15 Murray Cumming 2009-08-25 13:24:39 UTC
Created attachment 141640 [details]
gda_meta_update_tables_with_schema_fails.c

Using these schema names no longer works with libgda in git (at least, the master branch). Here is a test case. It produces this output:

SUGGEST META UPDATE Returned FALSE: syntax error at or near "SELECT"
LINE 1: ...espace n, pg_roles u WHERE n.nspowner = u.oid AND SELECT cur...
Comment 16 malerba 2009-08-25 17:37:59 UTC
This has been corrected in git, master and LIBGDA_4.0 branches, and was due to a too quick copy/paste...
Comment 17 Murray Cumming 2011-11-03 21:35:47 UTC
I think we forgot to close this bug.