GNOME Bugzilla – Bug 562947
False positives in string addition detection
Last modified: 2009-02-11 21:41:39 UTC
There is currently a race condition in Damned Lies that is triggered when multiple commits on a single module are launching the update-stats script in a short period of time. This produces false alerts to gnome-i18n. We should prevent multiple simultaneous executions of the stats updating process on a single module (or single branch or single domain). One possible implementation of this inter-process locking would be to add a new database boolean field (e.g. domain.update_running) that is set to True before a new update process. It seems to be a little more difficult to use a file system lock if we want to keep portability.
We could add an explicit SQL lock in Branch.update_stats().
Here's my pseudo-code idea: # Get Lock while True: rawsql("UPDATE domain SET current_upate=1 WHERE module='module' AND current_update=0") if one row affected: break else: sleep 5 secs <update code here> # Release lock rawsql("UPDATE domain SET current_upate=0 WHERE module='module'")
The sleep is awful! The clean way is to use the lock mechanism provided by the db http://www.postgresql.org/docs/8.3/static/explicit-locking.html To acquire an exclusive row-level lock on a row without actually modifying the row, select the row with SELECT FOR UPDATE. Note that once the row-level lock is acquired, the transaction can update the row multiple times without fear of conflicts. Django recommends to use PostgreSQL, good opportunity to switch! I also need a solid transaction system to integrate some Vertimus features. PS: I'll try to stress your idea, just for fun.
And it's a bad idea to try to lock the domain table (everyone is concerned).
Stephane, the basic problem is not a database concurrency problem, it's about filesystem commands which shouldn't run simultaneously during the update process. The database is only used as the support of concurrency control in the application. I don't want to prevent any database read during update processes. @ #4: The concurrency is about pot files, and pot files (and statistics objects) are domain-specific, that's why I place the lock at the domain level. It is meant to be inside the "for dom in domains.all()" loop.
Yes (block on write) but all update-stats call will be blocked (different module/branch too), so only one update-stats call will be able to run at one time (I'm not sure it's english :). We can also certainly find a better solution to filter the commit signals (neither of the solutions (lock and sleep) resolve this problem) but it's for another bug report. > @ #4: > The concurrency is about pot files, and pot files (and statistics objects) are > domain-specific, that's why I place the lock at the domain level. It is meant > to be inside the "for dom in domains.all()" loop. domains = Domain.objects.filter(module=self.module).all() for dom in domains: so it's only for some domains of the concerned module. and a simple lock on branch _row_ (SELECT FOR UPDATE) at the beginning of Branch.update_stats (step 0) is required to avoid write concurrency on files of the branch (just after the checkout and released at the very end). Note: The sleep using can have a strange behaviour. Cn: Call n for the same branch C1 (lock) C2 (sleep) C1 (unlock) C3 (lock) C2 (unsleep) C2 (sleep) C3 (unlock) C2 (lock) C2 (do something useless) (I saw the pot_has_changed = False to avoid CPU wasting) C2 (unlock)
I finally implemented a file-system based lock (very basic). http://svn.gnome.org/viewvc/damned-lies?view=revision&revision=1443