After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 792105 - Startup takes several minutes
Startup takes several minutes
Status: RESOLVED FIXED
Product: GnuCash
Classification: Other
Component: General
3.0
Other Windows
: Normal normal
: ---
Assigned To: gnucash-general-maint
gnucash-general-maint
Depends on:
Blocks:
 
 
Reported: 2018-01-01 21:34 UTC by Alen Siljak
Modified: 2018-06-30 00:02 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Printscreen of count of registry calls to \NLS key in Windows (60.90 KB, image/png)
2018-04-11 17:59 UTC, mkubat
Details
Count of registry key calls - version 2.6.21 after downgrade from 3.0 (63.58 KB, image/png)
2018-04-12 19:04 UTC, mkubat
Details
Count of registry calls - version 2.6.16 (56.05 KB, image/png)
2018-04-12 19:15 UTC, mkubat
Details
Screenshot of registry query stack showing what leads to the call (38.70 KB, image/png)
2018-04-13 05:42 UTC, mkubat
Details
Stack during registry query for v3.0 (45.52 KB, image/png)
2018-04-13 06:28 UTC, mkubat
Details

Description Alen Siljak 2018-01-01 21:34:51 UTC
The 2.7 series keeps having the same issue. Reporting now for 2.7.3 on Windows 10.

The application is stuck on 'Loading user data' for several minutes on startup, making it almost unusable for everyday scenarios.
I've experienced this with the last three test versions on two different PCs and am wondering if others are experiencing the same.

During this "stuck" time, the app takes 25% of processor time and uses all 4 cores. The process is shown as "Not responding" in Resource Monitor.
There is no file access during this time.

Initially I thought this was due to the initial migration but it happens every time the app is started.

While using the application, the responsiveness is generally sluggish. This was not the case with 2.7.1.

Is there a parameter or a log that I can look into for more information?
Comment 1 Alen Siljak 2018-01-01 21:43:17 UTC
I have already removed Perl completely earlier (weeks ago), so that does not affect the startup performance, btw.
Comment 2 Alen Siljak 2018-01-01 22:01:25 UTC
The only issues in the log are:

* 21:49:48  WARN <gnc.app-utils> Could not spawn perl: Failed to execute helper program (Invalid argument)
* 21:50:53  WARN <gnc.pricedb> [add_price()] no commodity
Comment 3 John Ralls 2018-03-11 01:31:40 UTC
I did some profiling a couple of weeks ago and fixed a couple of bottlenecks. Can you test with 2.7.6, which will be released tomorrow?
Comment 4 Alen Siljak 2018-03-12 22:10:06 UTC
Certainly!
Comment 5 John Ralls 2018-03-12 22:16:32 UTC
Coming soon... there was an issue with the Windows bundle.
Comment 6 Alen Siljak 2018-03-13 11:20:16 UTC
The initial run still takes a long time (less than 45 min. though) on "Loading user data..." splash screen. No disk activity, just all processor cores running at around 30%, out of which 25 is for GnuCash. This is the migration phase for a 3MB book.
Here is the trace for the first run:

* 11:26:00  WARN <gnc.app-utils> Could not spawn perl: Failed to execute child process (No such file or directory)
* 11:26:01  CRIT <GLib> g_str_has_prefix: assertion 'str != NULL' failed
* 13:00:55  WARN <gnc.gui.sx.slr> error parsing value [(Need Value)]
Comment 7 Alen Siljak 2018-03-13 11:41:33 UTC
The next run took some 35 minutes on "Loading user data...". The process is marked as "not responding". During that time, this was the only line in the log file:

* 10:56:06  WARN <gnc.app-utils> Could not spawn perl: Failed to execute child process (No such file or directory)

This run was done in Sandboxie but it is almost the same as when the installation was done without a sandbox.

Other than that, opening the Scheduled Transactions windows is quick but closing it takes several seconds. The same with the register window for an account.
Simply switching focus from/to the application makes the processor going to 25% and hanging for a couple of seconds. The memory would also increase by a few megabytes for only switch the focus to the desktop and then also when switching back to the app (clicking on it or alt+tab).

Every couple of seconds the app would feel sluggish and would hang for a second or two, missing a click, selection, etc. 
Apart from these hiccups, it feels fairly fast and responsive.

I may try tomorrow on a different PC.
Comment 8 Alen Siljak 2018-03-13 11:44:20 UTC
While running with just Accounts and Scheduled Transactions open, switching from Sch/T to Accounts takes a couple of seconds while switching from Accounts to Sch/T tab is immediate.
Comment 9 Alen Siljak 2018-03-13 11:47:19 UTC
ok, so the reported delays when switching focus or tabs are the longest when the Accounts tab is becoming active. It is still there but much shorter when an account register or scheduled transactions is getting the focus, either by switching focus with the desktop or clicking the tabs in the application.
Comment 10 Alen Siljak 2018-03-13 21:55:11 UTC
More or less the same thing on another PC. A little bit faster but otherwise the same behaviour.
Comment 11 John Ralls 2018-03-14 00:31:44 UTC
I dunno. I just loaded a 4M compressed file in 82 seconds according to the tracefile.

Maybe if you run from the commandline and pass --debug or even --debug --extra we can use the timestamps as a sort of rough profiler.

You might also try creating a new user account and running GnuCash from there just as a check that you don't have something in your user environment that's getting in the way. While it might seem unlikely it wouldn't be the first time.
Comment 12 Alen Siljak 2018-03-16 09:53:01 UTC
Alright. I re-run the application with --debug --extra parameters. The performance is more-or-less the same, with about half an hour for start-up.
I have the logs available, that I can send via email, if you wish.

They don't tell me much except that the log goes on for about 10 minutes and then there is a big gap of about 15-20 minutes without any messages. Once the main window shows, I try opening and switching a few windows for about less than 1 minute and then close the application. Only then the second part of the log is written.
Comment 13 John Ralls 2018-03-16 14:10:51 UTC
This is an already-migrated book, right?

I'd normally interpret the 15-20 minutes without any messages as meaning that GnuCash was loaded and ready, waiting for you to notice. Even if that's the case, 10 minutes is  rather long to load when compared to less than a minute and a half for me. Go ahead and send the log, I'll try to work out a perl script to do a rough profile.

Do you have a Linux box or VM that you could try opening the same file? I'm curious if this is a Windows-only problem.
Comment 14 Alen Siljak 2018-03-16 14:20:42 UTC
No, this is a book in use with 2.6. That's why I do two runs - the first with migration and then the regular one. There is no significant difference between the two, though. Something else is messed up (at least for me; but having the same results on two separate machines makes it suspicious).

No, ~30 minutes is the time from running the shortcut to having the Since Last Run and Main windows show up.
However, the log seems to flush only when I close the app so that later part is written only at that stage and has the same timestamp. 

Good idea with the Linux VM. I'll try that over the weekend. But my gut feeling is that this is a Windows-only problem. I wonder if any other beta testers experience the same. I haven't seen any reports, though, even as it is a pretty drastic show-stopper.

Sending the logs...
Comment 15 John Ralls 2018-03-16 14:34:35 UTC
Let's hope it replicates on Linux where you can point a profiler at it.

The timestamps are from the time the log message is created, not when the log buffer is flushed to disk. So the gap is really a gap in logging even if it isn't in execution.

It occurs to me that one very expensive part of loading an XML file is checking that all of the strings are valid UTF-8. That's not in the SQL backend because the database engines take care of it, so another thing to try is to save as SQLite3 and see if that loads any faster.
Comment 16 Alen Siljak 2018-03-16 14:38:04 UTC
Oh, I did not mention here but my book is SQLite3.
OK, then hopefully the logs contain some interesting info for you.
Comment 17 John Ralls 2018-03-16 15:40:23 UTC
Oh, so it's only 7M uncompressed? Why don't you zip that up and mail it to me?
Comment 18 Alen Siljak 2018-03-16 16:05:08 UTC
Do you have any ready sanitization sql scripts available that I could use beforehand?
Comment 19 John Ralls 2018-03-16 16:35:26 UTC
What, you don't trust me?
No, I don't have any scripts. Depending on how sanitized you want to make it you might find simply changing the asset and liability account names sufficient. You could do that by hand in less time than it would take to run a script. If you use online banking or import data you might want to change any online-id slots. That's easily done in SQL
update slots string_val = "123456789" where name = "online-id";

The next level would be to replace all of the description, notes, and memo fields with some string, easy in SQL: update transactions description = "Lorem Ipsum" notes = NULL; update splits memo = NULL; should do it.

The catch to fuzzing the text fields is that if the problem is associated with text processing then changing the values of the fields might affect the load time, so check to make sure that you still see the slow load with the sanitized database.
Comment 20 John Ralls 2018-03-16 19:32:42 UTC
Thanks for sending the DB. When I tried it in my regular account it loaded (including filling in the SLR transaction amounts) in about 90 seconds. So I tried with a non-admin user and there it took 7 minutes 20 seconds. Nowhere near as awful as your experience but bad enough. Note that I don't get the 20-minute gap either.

Now to figure out how to profile it in the absence of the normal tools (Microsoft's profiler can't read the symbols created by gcc so it doesn't produce useful results.)
Comment 21 Alen Siljak 2018-03-16 19:53:27 UTC
Thanks a lot for all your efforts!
Comment 22 Alen Siljak 2018-03-19 11:42:37 UTC
Just a quick update - no changes with v2.7.7.
Comment 23 John Ralls 2018-03-23 16:10:44 UTC
--log gnc.engine=debug reveals what its doing during the gap: It's loading the splits.

The same file converted to XML loads in under 3 minutes, so the issue is in the new C++ SQL backend. I've found one instance where a query was inside a loop where it should not have been (loading slots for commodities), but that's not a big enough win. I'll keep looking for other optimization opportunities, but unfortunately it seems that the fundamental problem is that gcc under Mingw64 is generating poor machine code compared to what it does under Linux; your file loads in under 20 seconds on a similarly-provisioned Debian 9 VM. That's obviously a problem.
Comment 24 Alen Siljak 2018-03-26 08:53:32 UTC
Thank you for the info. Unfortunately, that's not-so-good news just before the release. But essentially not a big impact since most of the changes are under the hood and I guess Windows users would be advised to skip the update to 3.0 until a solution emerges. 

In any case, this is just to report that I've tested 2.7.8 with no significant changes in behaviour. The first run, including the data conversion from 2.6, without the --debug option took approximately 40 minutes.
Comment 25 John Ralls 2018-03-26 14:04:49 UTC
I'd say rather that they should avoid the SQL backend until the problem is resolved; perhaps you could test that on your system.
Comment 26 Alen Siljak 2018-03-26 14:24:10 UTC
Ah, yes, you're right. The option I keep forgetting about due to the Python scripts that read from my sqlite db.

I converted the book to XML format with 2.6 and yes, I can confirm, the 2.7 application now opens in less than a minute.

It's a valid workaround. I might keep using the XML book and export to sqlite periodically for the scripts/reports.
Comment 27 Alen Siljak 2018-03-26 14:33:58 UTC
There is still a couple of seconds pause when switching to the Accounts tab, though. In 2.6 it is (almost) not noticable.
Comment 28 John Ralls 2018-03-29 23:52:51 UTC
Well, I've managed to shave the actual query time for transactions and splits to negligible by fixing bug 772776. It's a sizable change so I won't push it until after release and unfortunately gives only a 20% improvement overall. The rest of the split processing time is in actually loading the splits: Only 25/second. Transactions (of which there are at most half as many) load at 148/sec.

I had a thought that maybe it was our use of exceptions in C++; that's been reported as a problem for gcc on Windows, but removing the string and date ones from the SQL backend reduced the load time by only a couple of seconds.
Comment 29 mkubat 2018-04-10 07:38:14 UTC
Hi,

I'm experiencing similar behavior and I wanted to add some more info that might be of relevance. 

I have been using GnuCash 2.6.16 with MariaDB 10.2.13 on Windows 10 Pro for some time. For size reference the accounts table is 200 rows, slots is 12104 rows, splits is 20050 rows and transactions are 9214 rows. it dates back to 2014-04 and has been uppgraded as each new version of GnuCash has been released up to 2.6.16. 

It is running on an Intel i7-6820HQ with 32GB of RAM and a NVMe drive. 

I upgraded to 3.0 and during the first start, after asking me to enter DB credentials, it got "stuck" for 10-15 minutes before it started prompting me with "Unable to save database" errors, a couple of them.

It then loaded the GUI and when I tried to close it ( no changes made ) it said it was unable and that DB was corrupt. 

I reviewed the trace file and it hade the following errors:
1) DBI error: 1292: Incorrect datetime value: '1970-01-01 00:00:00' for column 'reconcile date' at row 1

2) For the table versions and for each entry in it I got an error that it was unable to run INSERT INTO versions VALUES('table',version)

I then modified my my.cnf to exclude STRICT_TRANS and I compared what the log said it was trying to INSERT vs the versions already in the version table, all matched except for the gnucash version number, mine was 19000 something while it tried to store it as 3000000. 

I updated the value manually in the database to 3000000 and ran GNUCash again. I received no more "Unable to save database" or "corrupt data" errors but the load time was still very long. 

I ran --debug --extra to see what it was doing and here are my results:

07:19:15 it started the first log entry
07:19:27 it inserts values into gnc_commodity_table
07:19:37 -> 07:23:49 it repeats over and over again "[xaccOpenLog] Attempt to write disabled transaction log" and "[xaccTransWriteLog] Attempt to write disabled transaction log" - it logged 19589 rows so at least it did around 9700 xaccOpenLog - I had around 90 or so translogs before I deleted the older ones

07:23:49 -> 07:39:57 it does not log anything using --debug --extra. I will try with a sql profiler and I will run a procmon capture to see if I can see anything

07:39:57 -> 07:39:58 it perfoms a lot of "[trans_cleanup_commit] get rid of rollback trans=different ID" and "[xaccTransWriteLog] attempt to write disabled transaction log" - same amount of rows logged as before

So in my case as far as I can see it:

- Takes around 4 minutes to go over the translog part
- Takes around 15 minutes to do what it does when it does not log anything - presumably related to John Ralls fix in comment 28. 

I went and deleted som old translog files only keeping 21 of them and reran above but also using --log gnu.engine=debug ( woho, 111MB and 965458 rows ) and here are the results:

08:04:02 -> 08:08:33 - It loads trans logs
08:08:33 -> 08:14:09 - it does not log anything
08:14:09 -> 08:26:09 - It runs "xaccSplitSetValue()/xaccTransCommitEdit()" - in some cases there old value and new value or the same and in others they are not the same

So the second run has more or less the same times, however confiriming that it is splits that it is doing:

- Takes around 4 minutes to load trans logs - even if I went from 90 to 21 of them
- Takes around 12 minutes to do "xaccSplitSetValue()"
- Does "something" for 6 minutes that --debug --extra --log gnu.engine=debug is not catching. 

I will see if I get the time to also run ProcMon and see what it might be doing, as well as a SQL profiler ( if such exists for MariaDB... ).

Please note that I have not made any entries in GnuCash between the runs but I know that at least upon first start the first time it had some scheduled transactions that ran, I think it was a total of 6 of them.

I hope this can be of any assistance in solving this bug. Also, as others have stated, many many many thanks for giving us this fantastic piece of software!
Comment 30 mkubat 2018-04-11 13:33:20 UTC
Adding some more info as I have now looked at the queries sent to MariaDB as well as looked at a procmon capture.

From a query PoV I looked at the ones having syntax outlined in bug 772776 and I executed them directly in MariaDB using HeidiSQL as frontend. 

- SELECT * FROM slots WHERE obj_guid IN (total of 9795 GUIDS ) executed in 0,063 seconds

- SELECT * FROM splits WHERE tx_guid IN (total of 20050 GUIDS ) executed in 0,141 seconds

- SELECT * FROM slots WHERE obj_guid IN (total of 9795 GUIDS ) executed in 0,125 seconds

Backend resides on the same machine as GnuCash so no network traffic outside of localhost communication is involved. 

I also ran a Procmon capture focusing on gnucash.exe and procmon captured a total of 70 868 592 events ( yes, that is 70 million events... ) related to the gnucash.exe process only before ProcMon crashed on me... For reference, it captured a total of 2.8 million events for the rest of Windowsd (during the same time) and the apps I had running at the same time that it filtered out.

The vast majority were "hammering" the registry, more specifically HKLM\System\CurrentControlSet\Control\Nls and its subkeys for Locale and Language Groups.

 15:03:25 - Starts Process
 15:03:47 - Starts hammering HKLM\System\CurrentControlSet\Control\Nls and its   subkeys

 15:18:33 - Still hammering HKLM\System\CurrentControlSet\Control\Nls and its  subkeys but capture stopped

I was unable to count the events related to the hammering as ProcMon crashed when I tried to save the file but it did nothing else then write about the HKLM hammering for those 15 minutes before it crashed so vast majority, I would say at least 70 out of the 70.xxxx million were the registry calls as Procmon saw them. For reference, NLS\Language has a total of ~260 values and all subkeys under nls have a total of 594 entries so something is making it loop over those over and over again. 

I tried looking at the source code and closest I could find that might be related was the g_win32_getLocale() function but I have no idea if that is the root cause for the amount of queries towards that reg key.

Anyhow, might be worth looking into as I suspect that the SQL part is only a minor part as it executes so quick when run directly in SQL.
Comment 31 mkubat 2018-04-11 17:59:18 UTC
Created attachment 370818 [details]
Printscreen of count of registry calls to \NLS key in Windows
Comment 32 mkubat 2018-04-11 18:05:12 UTC
I did some further tests on a brand new install of Windows 10, MariaDB and GnuCash 3.0 without any history. 

I created a new SQL DB with "Common Accounts", saved to SQL and restared GnuCash. During the restart I ran a Procmon capture and this time it only generated around 368.000 events of which vast majority were calls to NLS. I've attached an attachment in comment 31 which shows the count per registry key call, one has over 100.000 calls and many others seem to be around 656 calls per sub-key. 

Even if they are many calls which imho seem to be unneccessary to be performed that many times the load time with an empty database was only was ca 30 seconds, running in a VM with 4GB of RAM. 

I will see if I can copy over piece by piece from my other install to see what will trigger the load time, the settings, translogs or the database itself.
Comment 33 mkubat 2018-04-12 17:50:13 UTC
One more update, I ran some more tests in my clean VM.  

Startup times from entering DBA credentials to main window open:
 - Clean install - Common Accounts and otherwise empty DB = ~20 seconds
 - Clean install - Restored SQL backup only, no %appdata% settings = 21 minutes and 22 seconds
 - Clean install - Restored SQL + %appdata% settings = 20 minutes and 45 seconds

So by just restoring my database backup the time from entering DB credentials to the main windows showing up went from ca 20 seconds to ~21 minutes. 

Adding back my %appdata%\gnucash directory did not make any difference indicating this is database related - and since the SQL queries are fast to execute in SQL directly it is the processing within GnuCash, as has already been pointed out. 

I have not yet figured out what the connection is between the hammering of the registry key and why it increases when there are many entries in the database but I suspect it will be faster for you who wrote the software than me who does not even understand C++ ;)
Comment 34 John Ralls 2018-04-12 18:05:18 UTC
mkubat, could you try one more test,  install whatever GnuCash 2.6 you were using and see how many registry queries there are?
Comment 35 mkubat 2018-04-12 18:24:51 UTC
(In reply to John Ralls from comment #34)
> mkubat, could you try one more test,  install whatever GnuCash 2.6 you were
> using and see how many registry queries there are?

Hi John,

Due to upgrading the DB to 3.0 schema I'm unable to test the 2.6.16 version that I had with my DB. I have instead tried with 2.6.21.

If you want I can try to see if I can find some old backup prior to the upgrade to 3.0 so I can test the old db schema with 2.6.16 but I suspect there has been no change between 2.6.16 and 2.6.21 that would warrant that? 

As for 2.6.21, I tested on my clean vm as I'm not near my main computer - on the clean vm with my production database I got 7 260 418 events related to gnucash.exe ( compared to the 70 million for v3 ) and vast majority were the same registry keys. 

I will upload an attachment as soon as procmon has finished counting occurances, at the moment it estimates +1 hour to complete, unless it crashes...
Comment 36 mkubat 2018-04-12 19:04:52 UTC
Created attachment 370871 [details]
Count of registry key calls - version 2.6.21 after downgrade from 3.0
Comment 37 mkubat 2018-04-12 19:15:37 UTC
Created attachment 370872 [details]
Count of registry calls - version 2.6.16
Comment 38 mkubat 2018-04-12 20:04:58 UTC
Minor update, I reset the virtual machine tried the following:

- installed 2.6.16 with standard accounts and clean sql db - ~35.000 events. 
- Reset VM
- installed 2.6.21 ( skipping going from 3.0 to 2.6.21 ) and I got around 35.000 events
- Reset VM
- installed 3.0 and I got 379.150 events. 

I will try to get hold of an older backup prior to the 3.0 upgrade and see how 2.6.16 vs 2.6.21 behaves with my prod database from that time. 

I can see that on the 2.6 series it increases from 35.000 to 7 million events when I load my data but when I tried going from 3.0 to 2.6.21 with clean database it was still at 35.000 events so this might be related to number of entries in the db, maybe transactions. 

In any case it is evident that 3.0 even during clean install and db seems to perform the action many more times by an order of 10 or so and whatever triggers the jump from 35.000 to 7 million events due to number of entries in the DB increases in magnitude with v3.0
Comment 39 mkubat 2018-04-13 05:41:15 UTC
I have now found an older backup from february so the number of records should about 100 or so less than the tests I did with the 3.0 data. 

Clean install of 2.6.16 with same amount of data as initial tests in comment 29 resulted in 7.2 million events with vast majority calls to the registry, more or less same output as 2.6.21 with 3.0 data. 

So as far as I can see the "excessive quering of registry" has existed since 2.6.16 (at least), it seems to be somehow related to number of entries in the database ( not sure which but I suspect slots or splits ) as it increases when the database is bigger. 

Also, something in 3.0 has changed that seems to have increased the calls by a magnitude of 10 - which is likely related to the slow start up experience.

I've tried to take a peek in the source code to figure out what and how but coming from a small knowledge in C# and Visual Studio and going to C++ and VS Code makes me lack the possibility to click on methods to see which calls what and how... 

I was however able to see the stack from within procmon for the registry calls - see latest attachment. It seems to go like this:

gnucash.exe -> gnc_file_open_file -> gnc_file_query_save -> qof_session_load -> liggncmod-backend-dbi.dll -> gnc_sql_load -> qof_object_foreach_backend -> q_hash_table_foreach_sorted -> qof_object_lookup_backend -> gnc_sql_set_load_order -> gnc_sql_transaction_load_all_tx -> gnc_taxtable_sql_initilize -> gnc_sql_slots_load_for_list -> gnc_sql_slots_load -> gnc_sql_load_object -> gnc_sql_add_subtable_colnames_to_list -> qof_backend_module_finalize -> libintl_swprintf

after the call to libintl_swprintf it calls msvcrt.dll and setlocale / get_current_locale.

So as far as I can see, during the load and for loop processing slots it makes some call that triggers libintl_swprintf which runs the get/set locale in msvrct which in turn calls upwards and does the registry query as part of that operation. 

Looking at the qof_backend_module_finalize function in the source code I cannot see anything that would result in such a call but you guys know the code much better than I do :)
Comment 40 mkubat 2018-04-13 05:42:27 UTC
Created attachment 370878 [details]
Screenshot of registry query stack showing what leads to the call
Comment 41 mkubat 2018-04-13 06:27:49 UTC
I did the same check for 3.0 and here is the stack:

gnucash.exe -> gnc_file_open_file -> gnc_file_query_save -> qof_session_load -> sessionImplload -> qof_backend_module_finalize -> gncsqlbackend qofbook qofbackendloadtype -> gncsqlaccountbackend load all gncsqlbackend -> gncsqlaccountbackend -> gnc_sql_load_object gncsqlbackend gncsqlrowpk gnc sql columntable -> gncsqlrow get string at col -> qof_backend_module_finalize -> libint_swprintf

Not sure if it has to do with the code in gnc-dbisqlresult.cpp but it has a call to gnc_push and gnc_pop_locale which seem to be executed each time one runs get_string_at_col. gnc_push and pop seem to call g_win32_getlocale() which I suspect is what triggers the call to libint_swprintf and queries it. 

It would also explain why it is related to number of entries in the slot table as get_string_at_col seems to be called for each row. 

std::string
GncDbiSqlResult::IteratorImpl::get_string_at_col(const char* col) const
{
    auto type = dbi_result_get_field_type (m_inst->m_dbi_result, col);
    auto attrs = dbi_result_get_field_attribs (m_inst->m_dbi_result, col);
    if(type != DBI_TYPE_STRING)
        throw (std::invalid_argument{"Requested string from non-string column."});
    gnc_push_locale (LC_NUMERIC, "C");
    auto strval = dbi_result_get_string(m_inst->m_dbi_result, col);
    if (strval == nullptr)
    {
        gnc_pop_locale (LC_NUMERIC);
        throw (std::invalid_argument{"Column empty."});
    }
    auto retval =  std::string{strval};
    gnc_pop_locale (LC_NUMERIC);
    return retval;
}
Comment 42 mkubat 2018-04-13 06:28:41 UTC
Created attachment 370879 [details]
Stack during registry query for v3.0
Comment 43 John Ralls 2018-04-15 00:32:48 UTC
Good catch. Commenting out those locale_push/locale_pop statements Ngot Alen's test file to load in 46 seconds. Removing all of them (there are ones guarding the loading of floats and doubles as well as every select query) speeds it up even more, to 31 seconds.

The one in get_string_at_col wasn't in 2.6 so I'm pretty comfortable that adding it was a mistake. 

It looks like the DBI folks have fixed the bug that requires the others, but unfortunately haven't released that change yet.

I've pushed the commit with the change. I'll get a nightly up for testing tomorrow.
Comment 44 mkubat 2018-04-15 06:55:11 UTC
Excellent! 

I'll gladly try out the nightly build in my vm as well if you want. Just let me know where I can download it :)

Not being a developer myself but just curious, would it not be good enough to read the system locale and other locale specific settings once during start up, store them internally and do all validations towards the internal variables instead of calling the OS each time?

I doubt ones locale settings change that frequently that validations towards OS during runtime would be necessary.
Comment 45 Alen Siljak 2018-04-15 09:05:24 UTC
That's some fantastic news! Thanks for getting right at it, mkubat.
Are you saying that my Linux migration yesterday was in vain?! 
Comment 46 Alen Siljak 2018-04-15 09:08:17 UTC
Oops, half a message went missing.
Looking forward to that nightly. Having an SQL backend makes Python scripts and reports always have access to up-to-date data. Thanks a lot to both of you!
Comment 47 Alen Siljak 2018-04-15 09:10:09 UTC
I see. The Unicode smiley cut the message, it seems.
Comment 48 Alen Siljak 2018-04-16 07:22:08 UTC
Which nightly should we be looking for, btw?
There is currently no builds for v3 at https://code.gnucash.org/builds/win32/
Comment 49 John Ralls 2018-04-16 13:49:52 UTC
Yes, the windows build is broken again. I spent most of yesterday trying to fix it and haven't gotten there yet.
Comment 50 Alen Siljak 2018-04-16 13:53:46 UTC
:( 
Sorry to hear that. OK, no worries. Just let us know once it's available. I'll happily test on the windows machine.
Today I'll dismantle what's left of my Windows system at home so will have to test only in VM there, I guess. :)
Comment 51 John Ralls 2018-04-16 23:17:33 UTC
It turned out to be a borked Mingw-w64 upgrade.

Please test out https://wiki.gnucash.org/builds/win32/maint/gnucash-3.0-2018-04-16-git-3.0-75-g87f94abc8+.setup.exe.
Comment 52 mkubat 2018-04-17 05:38:34 UTC
After install it fails to start saying "The code execution cannot proceed because libfribidi-0.dll was not found". 

I noticed that the original 3.0 is ca 120mb where as this one is 90mb so it feels like some stuff were not included? 

I'll see if I can install the org 3.0 and copy over the new 3.0 files manually but I suspect it would be best if we could get a new build to avoid getting inconsistent test results as there would be a mix of files from original 3.0 and new 3.0.
Comment 53 Alen Siljak 2018-04-17 07:27:04 UTC
I can confirm having the same results.
I've also tried the suggestion above, as I've installed in a sandbox so had the original installation readily available, but the results are the same (?!).
Definitely borked upgrade. ;)
Comment 54 mkubat 2018-04-17 07:53:39 UTC
Yeah, I cannot even find the libfribidi-0.dll in the "org" 3.0 installation so it seems to be a new file...
Comment 56 mkubat 2018-04-17 11:05:18 UTC
Excellent news!

I'll however be offline for a week so I will not have the possibility to test the fix out until sometime next week but hopefully Alens test will be more than enough :) 

PS: Alen, unless you have already jumped to Linux take a peek at Windows Subsystem for Linux - have not tested GnuCash in it but I would not be surprised if it would not work considering it runs Ubuntu, on Windows, native...
Comment 57 Alen Siljak 2018-04-17 11:08:18 UTC
I've converted my home box to linux irreversibly now.
At work, though, I'm still limited to using Windows so I have that option. I'm also looking at setting up a Windows VM at home.

Thanks for the tip. The original version of linux subsystem worked on my old workstation. Now, however, that is deprecated and the subsystem must be installed through Marketplace, which is unfortunately disabled on our workstations. :(

Have fun!
Comment 58 Alen Siljak 2018-04-17 12:19:27 UTC
The new build is already available: https://wiki.gnucash.org/builds/win32/maint/gnucash-3.0-2018-04-17-git-3.0-75-g87f94abc8+.setup.exe
Comment 59 Geert Janssens 2018-04-17 12:38:07 UTC
Wow Alan, you're following closely :)

Yes, that's the new installer I created and this one starts successfully on my test system. However I was not affected by any of the bugs it covers so others will have to test and report back.
Comment 60 Alen Siljak 2018-04-17 12:42:15 UTC
OK! This is a fantastic improvement in SQLite db load time.
The application starts in exactly one minute in a sandbox. "Loading user data" now goes for ~40 seconds out of 60.
Significantly better than previous versions. 

Starting the application with an XML book, comparatively, takes 20 seconds until "loading user data" and then 10 seconds more until the app shows up.

Thanks a lot to everyone involved for their contributions on improving this!
Comment 61 mkubat 2018-04-17 12:50:08 UTC
Install now works with gnucash-3.0-2018-04-17-git-3.0-75-g87f94abc8+.setup.exe. 

However the slow load time and hammering of registry is still present due to the ( as John mentioned earlier ) other calls to gnc_push/pop_locale.

It loaded in 11 minutes and generated at least 26 million events before procmon gave up on me. I looked at a few random ones and the stack showed that it came from either "get_double_at_col" or "get_time64_at_col". I checked in the code and I could find gnc_pop/push in these places which matches what I saw on the stack. 

gnc-dbisqlresult.cpp - GncDbiSqlResult::IteratorImpl::get_float_at_col
gnc-dbisqlresult.cpp - GncDbiSqlResult::IteratorImpl::get_double_at_col
gnc-dbisqlresult.cpp - GncDbiSqlResult::IteratorImpl::get_string_at_col ( already fixed )
gnc-dbisqlresult.cpp - GncDbiSqlResult::IteratorImpl::get_time64_at_col

Any chance we could get a test build without any gnc_pop/push_locale in the above four methods to see how it behaves on our datasets?
Comment 62 Alen Siljak 2018-04-17 12:55:09 UTC
Replaced previous installation (without sandbox). The results after converting xml to sqlite book and restarting the application.

~15 seconds until "loading user data"
~10 seconds until main window appears
~10 seconds until Since Last Run appears

Not too bad.
Comment 63 mkubat 2018-04-17 13:01:16 UTC
I compared the 2.6.21 source code with the 3.0 and in the 2.6.21 gnc_pop/push_locale is only called in row_get_value_at_col_name if the datatype is Decimal which in above case would count for get_double_at_col and get_float_at_col.

So in theory, removing get_pop/push_locale from get_time64_at_col should return the performance back to 2.6.21 levels however even 2.6.21 has a flaw, imho, where it still asks the system for locale for each row/column it processes of that data type. 

I realize it is a bigger change but I would still propose the approach of reading once from the system, store the data in a variable and read it from there during validation instead of asking the system.
Comment 64 John Ralls 2018-04-17 14:14:19 UTC
We have to use the system calls setlocale() and getlocale() because libdbi, the database abstraction library we use, uses a msvcrt call to strtod to convert the string returned from the database into a double or a float. The wrapper function "pushlocale" stores the old locale and poplocale restores it, and "C" is a default locale specified by the C standard so setlocale and getlocale *shouldn't* be hitting the registry. That they do is a bug in msvcrt.

time64 is an int and doesn't need that work-around, so I'll remove it. I would have done so Saturday if I'd noticed it. That should knock out most of the remaining excess events, there aren't many doubles--only a few slots--in a GnuCash database.
Comment 65 Alen Siljak 2018-04-17 20:20:01 UTC
Wow, just for comparison - the same file loads on linux in about 5 seconds. :S
Comment 66 Alen Siljak 2018-04-18 07:50:09 UTC
Testing with the 2018-04-18 nightly on a production machine.
The complete start, until Since Last Run window is open, takes ~30 seconds. Thumbs up from my side.
The reduced download file size is appreciated, too.
Comment 67 John Ralls 2018-04-18 13:31:54 UTC
Yay! The changes yesterday included removing the time64 locale push-pop and removing the use of g_win32_get_locale that I think was the source of the registry usage: It retrieves locale information to convert the Microsoft string to a POSIX string, which is totally unnecessary for this use-case.

The reduction in download size is from removing a bunch of translations that we don't use.
Comment 68 John Ralls 2018-06-30 00:02:23 UTC
GnuCash bug tracking has moved to a new Bugzilla host. This bug has been copied to https://bugs.gnucash.org/show_bug.cgi?id=792105. Please update any external references or bookmarks.