GNOME Bugzilla – Bug 608032
MySQL timeout and no attempt reconnect.
Last modified: 2018-06-29 22:34:04 UTC
If I leave Gnucash open and I'm using MySQL dbi the connection eventually times out, 8 hours is default for MySQL I think. If I make any changes to an account I get a message box "Unable to save to database" (twice). I would have thought that the Gnucash should attempt a reconnect rather than give up and warn the user. Confusingly though the account change is shown in the account which may lead to people thinking that the change has been recorded when in fact it hasn't. Of course on closing, Gnucash again warns that it was unable to save to database. On re-starting Gnucash the new or changed entry is, of course, missing. Last entries in gnucash.trace... * 00:26:04 DEBUG <gnc.backend.dbi> [conn_commit_transaction] COMMIT * 00:26:04 DEBUG <gnc.backend.dbi> [enter gnc-backend- dbi.c:gnc_dbi_session_end()] * 00:26:04 DEBUG <gnc.backend.dbi> [leave gnc_dbi_session_end()] As you can see session end occurred in the night when Gnucash was not being used. Nothing was logged on attempting to change and account. Steps to reproduce: 1. Open Gnucash using a MySQL database. 2. Leave Gnucash open for langer than SQL timeout. (8 hours is default but can be set to anything in /etc/my.cnf for testing. 3. Make some change to any of the accounts, or close Gnucash. Result: In either case a messagebox pops up to warn "Unable to save to database".
I've confirmed this by adding: wait_timeout=10 interactive_timeout=10 to /etc/mysql/my.cnf in the [mysqld] section.
Created attachment 152994 [details] [review] Here's a patch for this bug. Rationale: When we try to open a database transaction, and the database reports that the "server has gone away", we try to reconnect before failing hard. I've also taken the liberty to add myself to the AUTHORS file. I hope that's customary here, and I apologise if it's not...
Review of attachment 152994 [details] [review]: Thanks you for your patch. It looks like a good start, but to my limited knowledge of the sql backend, it seems incomplete. Here are my thoughts on the patch: * you test for an error by comparing with a string. I think it would probably be safer and definitely be more efficient to simply test for the error number returned by dbi_conn_error. It's possible that the error strings returned by MySQL are translated into other languages on other systems, so your string wouldn't always match. * what happens if not only the connection had timed out, but something else changed as well (an administrator changed the login password for example) ? I mean you don't check the results of dbi_connect or the second dbi_conn_queryf. * I also think there may be more useful places to check for a connection timeout, not only at the beginning of transactions. I found several occurences of dbi_conn_query and dbi_conn_queryf some of which could be called passed a timeout. As said in the beginning, I'm not the dbi expert, so maybe my comments are off mark or plain silly.
Created attachment 153022 [details] [review] Here's an improved patch. For the full discussion, see the mailing list: https://lists.gnucash.org/pipermail/gnucash-devel/2010-February/027323.html
There might be an easier way to handle all of this. libdbi can call back an error handler, and there is an error handler set up. This error handler suffers from some of the problems Geert identified above, so I need to fix it, but the handler could look for specific conditions (already looks for "unknown database" when trying to connect) and set an error code. Your patch should then be something like: dbi_conn->last_error = 0; do { result = dbi_conn_queryf( dbi_conn->conn, "BEGIN" ); if (result == 0 && dbi_conn->last_error == CONNECTION_TIMED_OUT) { DEBUG( "MySQL server has gone away, reconnecting and retrying...\n" ); (void)dbi_conn_connect( dbi_conn->conn ); } } while(dbi_conn->last_error == CONNECTION_TIMED_OUT); We could even make a macro for this: #define HANDLE_TIMEOUT(X) \ dbi_conn->last_error = 0; \ do { \ X; . . . } while(dbi_conn->last_error == CONNECTION_TIMED_OUT); and then use: HANDLE_TIMEOUT(result = dbi_conn_queryf( dbi_conn->conn, "BEGIN" )); which would allow the same code to be used other places that a timeout could occur and need to be detected.
Comment on attachment 153022 [details] [review] Here's an improved patch. To me, this bug seems to be an improvement compared to the current situation, so it is fine. Comment#5 describes even more improvement, which IMHO can very well be implemented as a separate step. (I wouldn't recommend the described macro, though, because it most probably will have way too many side-effects for being a safe macro.)
Comment on attachment 153022 [details] [review] Here's an improved patch. r18672, thanks!
Created attachment 154217 [details] [review] Solution based on the error callback function I wasn't really happy with the committed patch for reasons explained earlier in this bug and the thread on https://lists.gnucash.org/pipermail/gnucash-devel/2010-February/027323.html So here's another version that implements the error checking and reconnecting inside the dbi error handler callback. Additionally, I have added checks in several other places that may require a query rerun due to a lost db connection. I have thoroughly tested the patch and it seems to work fine on my machine. I didn't apply straight away though as it's a fairly large patch and I am interested in feedback.
Seems to work fine on mine too. Nothing scary in /tmp/gnucash.trace Fedora 11. MySQL Server version: 5.1.42
Comment on attachment 154217 [details] [review] Solution based on the error callback function Looks fine for me. I don't build with dbi enabled, so I can't even verify this compiles. IMHO you can commit right away.
I have committed a slightly altered version (r18706). I discovered there were already some error constants defined. Instead of creating my own, I preferred to reuse the existing ones. Other than that, the patch is identical.
GnuCash bug tracking has moved to a new Bugzilla host. This bug has been copied to https://bugs.gnucash.org/show_bug.cgi?id=608032. Please update any external references or bookmarks.