After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 676209 - Collation breaks in C locale
Collation breaks in C locale
Status: RESOLVED FIXED
Product: tracker
Classification: Core
Component: Store
0.14.x
Other Linux
: Normal normal
: ---
Assigned To: Aleksander Morgado
Jamie McCracken
Depends on:
Blocks:
 
 
Reported: 2012-05-17 07:07 UTC by Sam Thursfield
Modified: 2014-08-21 09:02 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch. (3.58 KB, patch)
2013-10-11 08:37 UTC, Aleksander Morgado
committed Details | Review
Allow cancellation in the bus backend (2.84 KB, patch)
2013-10-11 19:02 UTC, Aleksander Morgado
committed Details | Review
Allow calling tracker_db_manager_locale_changed() before init (1.42 KB, patch)
2013-10-15 16:51 UTC, Sam Thursfield
committed Details | Review
Remove duplicated version of tracker_db_manager_init_locations() (2.10 KB, patch)
2013-10-15 16:52 UTC, Sam Thursfield
committed Details | Review

Description Sam Thursfield 2012-05-17 07:07:45 UTC
I don't know much about locales and encodings, so I don't know if this is expected to work or not .. but it seems very wrong that this behaviour should be allowed.

-bash-3.2$ LC_COLLATE=en_US tracker-sparql -q 'SELECT ?u { ?u nie:url "file:///usr_data/media/Picture/Buildings" }'
Results:
  urn:uuid:f00ce087-2676-7e88-11cb-82499da74b23

-bash-3.2$ LC_COLLATE=C tracker-sparql -q 'SELECT ?u { ?u nie:url "file:///usr_data/media/Picture/Buildings" }'
Results:
  None

There are lots of filenames with non-Western characters in the tracker-store at this point. The problem is that tracker_collation_utf8 (ICU version) isn't work correctly, so string comparisons in the SQL query don't work either.

If this is a 'feature' of the Unix locale machinery, we should definitely be warning users if they are using C/POSIX locale ... a knock-on effect of this bug is that the miner enters an infinite loop because when a directory is not returned from the store, it thinks it hasn't been indexed and tries again.
Comment 1 Martyn Russell 2012-05-17 08:20:35 UTC
I checked with Aleksander briefly, we know we can't use the C locale generally. We force en_US in test cases even specifically for this reason.

I think you make a good point about warning users. On the flip side, does anyone actually use the C locale these days any more?

Aleksander suggested this might be a feature/bug.

Jürg any input here? I know you've been involved in the locale black magic :)
Comment 2 Sam Thursfield 2012-05-17 08:31:47 UTC
If it's known that C locale doesn't work, surely we should default to en_US instead of C in tracker-locale.c ?
Comment 3 Jürg Billeter 2012-05-17 08:47:58 UTC
While the C locale is the worst case, it's also an issue if we use different UTF-8 locales in tracker-store and applications using tracker direct access. I propose the following approach:

 * In libtracker-direct read ~/.cache/tracker/db-locale.txt and raise an error (and log a warning) if it doesn't match the locale of the current process. This will cause libtracker-sparql to fallback to D-Bus for all queries.

 * In tracker-store warn or even abort if the current locale is not a UTF-8 locale.
Comment 4 Aleksander Morgado 2013-10-11 08:37:19 UTC
Created attachment 256975 [details] [review]
Patch.

> 
>  * In libtracker-direct read ~/.cache/tracker/db-locale.txt and raise an error
> (and log a warning) if it doesn't match the locale of the current process. This
> will cause libtracker-sparql to fallback to D-Bus for all queries.

The attached patch implements this logic. If the DB locale and the current process locale are different, no direct connections are allowed.
Comment 5 Aleksander Morgado 2013-10-11 08:39:34 UTC
> 
>  * In tracker-store warn or even abort if the current locale is not a UTF-8
> locale.

As for this second one, I don't believe we should be restricting to use unicode locales. If a system setup doesn't want to use a unicode locale, Tracker should still run, even if it won't have unicode-based collation (i.e. libicu will fallback to non-unicode collation).

I actually do have a patch implementing it, but I really don't think we should be doing it... and we would also need to force en_US locale in *all* our tests for example. I don't think it's worth it.
Comment 6 Jürg Billeter 2013-10-11 08:45:44 UTC
I don't remember whether we have code that breaks in non-UTF8 locales. If we do, I believe it's better to error out as early as possible to avoid silent issues. If all code is fine in all locales, no need to worry about it, of course.
Comment 7 Aleksander Morgado 2013-10-11 08:55:14 UTC
(In reply to comment #6)
> I don't remember whether we have code that breaks in non-UTF8 locales. If we
> do, I believe it's better to error out as early as possible to avoid silent
> issues. If all code is fine in all locales, no need to worry about it, of
> course.

Given that most (actually, all except for a couple) of the unit tests run in the C locale, I believe it shouldn't be an issue.
Comment 8 Martyn Russell 2013-10-11 08:58:52 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > I don't remember whether we have code that breaks in non-UTF8 locales. If we
> > do, I believe it's better to error out as early as possible to avoid silent
> > issues. If all code is fine in all locales, no need to worry about it, of
> > course.
> 
> Given that most (actually, all except for a couple) of the unit tests run in
> the C locale, I believe it shouldn't be an issue.

What would be a massive improvement would be to sandbox all the tests with the script Sam worked on which I improved and converted to python. That way we can run tests:

1. Without requiring an installed Tracker
2. With specific environments like the locale and G_MESSAGES_DEBUG export (needed to pass distcheck).

The sandbox is in utils/sandbox/tracker-sandbox.py
Comment 9 Martyn Russell 2013-10-11 09:02:14 UTC
Comment on attachment 256975 [details] [review]
Patch.

Patch looks good.

One thing I wondered is if we need a check in tracker-store for locale mismatch with the DB. I can't remember, but maybe we already handle that condition?
Comment 10 Aleksander Morgado 2013-10-11 09:09:50 UTC
(In reply to comment #9)
> (From update of attachment 256975 [details] [review])
> Patch looks good.
> 
> One thing I wondered is if we need a check in tracker-store for locale mismatch
> with the DB. I can't remember, but maybe we already handle that condition?

Yes, we do handle that. If the mismatch is found in tracker-store, we actually rebuild all indexes.
Comment 11 Aleksander Morgado 2013-10-11 09:14:18 UTC
Pushed the patch to git master; resolving the bug.
Comment 12 Sam Thursfield 2013-10-11 10:40:00 UTC
Sadly, on my system this has broken tests/libtracker-sparql/tracker-test:

/libtracker-sparql/tracker/tracker_sparql_cursor_next_async: ASYNC query 0 starting:
OK
  0: http://www.tracker-project.org/ontologies/tracker#added
  ... query results ...
  0: http://www.tracker-project.org/temp/slo#location
Finished 1
ASYNC query 1 starting:
  1: rdf
Cancelling cancellable:0x7f954c005e20 at count:1
**
Tracker:ERROR:tracker-test.c:145:test_tracker_sparql_cursor_next_async_cb: assertion failed (error == (g-io-error-quark, 19)): error is NULL


If I run with TRACKER_VERBOSITY=3 I see this sort of thing:

(/home/sam/gnome/src/tracker/tests/libtracker-sparql/.libs/lt-tracker-test:24852): Tracker-DEBUG: tracker-backend.vala:183: Using backend = 'AUTO'
Tracker-Message:   Could not find database locale file:''
Tracker-Message: Locale change detected from 'unknown' to 'C'...
(/home/sam/gnome/src/tracker/tests/libtracker-sparql/.libs/lt-tracker-test:24852): Tracker-DEBUG: tracker-backend.vala:191: Unable to initialize direct backend: Locale mismatch, cannot use direct connection


I'm not sure exactly what the problem is at a glance, but for a start we should probably force 'TRACKER_SPARQL_BACKEND=direct' in the tests. I agree with Martyn that the best thing to do is run all the tests using tracker-sandbox.
Comment 13 Aleksander Morgado 2013-10-11 11:00:11 UTC
(In reply to comment #12)
> Sadly, on my system this has broken tests/libtracker-sparql/tracker-test:
> 
> /libtracker-sparql/tracker/tracker_sparql_cursor_next_async: ASYNC query 0
> starting:
> OK
>   0: http://www.tracker-project.org/ontologies/tracker#added
>   ... query results ...
>   0: http://www.tracker-project.org/temp/slo#location
> Finished 1
> ASYNC query 1 starting:
>   1: rdf
> Cancelling cancellable:0x7f954c005e20 at count:1
> **
> Tracker:ERROR:tracker-test.c:145:test_tracker_sparql_cursor_next_async_cb:
> assertion failed (error == (g-io-error-quark, 19)): error is NULL
> 
> 
> If I run with TRACKER_VERBOSITY=3 I see this sort of thing:
> 
> (/home/sam/gnome/src/tracker/tests/libtracker-sparql/.libs/lt-tracker-test:24852):
> Tracker-DEBUG: tracker-backend.vala:183: Using backend = 'AUTO'
> Tracker-Message:   Could not find database locale file:''
> Tracker-Message: Locale change detected from 'unknown' to 'C'...
> (/home/sam/gnome/src/tracker/tests/libtracker-sparql/.libs/lt-tracker-test:24852):
> Tracker-DEBUG: tracker-backend.vala:191: Unable to initialize direct backend:
> Locale mismatch, cannot use direct connection
> 
> 
> I'm not sure exactly what the problem is at a glance, but for a start we should
> probably force 'TRACKER_SPARQL_BACKEND=direct' in the tests. I agree with
> Martyn that the best thing to do is run all the tests using tracker-sandbox.

Yeah, got that too here. Seems the test behaves differently depending on whether the direct or bus backends are used.
Comment 14 Aleksander Morgado 2013-10-11 11:27:15 UTC
> I'm not sure exactly what the problem is at a glance, but for a start we should
> probably force 'TRACKER_SPARQL_BACKEND=direct' in the tests. I agree with
> Martyn that the best thing to do is run all the tests using tracker-sandbox.

But yes, we should definitely run all tests in a sandbox, with a sandboxed tracker-store, so that unit tests do not depend on the tracker-store running in your system. In this specific case, the test is running with the C locale, while your tracker-store is running with some other locale.
Comment 15 Aleksander Morgado 2013-10-11 11:57:06 UTC
So the issue with the test is that the cancellable is only checked for being cancellable when cursor_next() runs the TrackerDBCursor implementation, i.e. the one running a thread and the sqlite cursor. The CANCELLED GIO error actually comes from the g_simple_async_result_run_in_thread() used there.

So what to do here? The proper fix would be to make sure that all tests are run in a sandboxed setup, where tracker-store and the tests both share the same locale; and actually force using the direct backend for this specific test, where the cancellable check is applicable.

Another fix would be to make sure that we check whether the cancellable is cancelled in other implementations of TrackerCursor, e.g. the ones implemented by the Bus backend.

Thoughts?
Comment 16 Martyn Russell 2013-10-11 14:20:44 UTC
(In reply to comment #15)
> So the issue with the test is that the cancellable is only checked for being
> cancellable when cursor_next() runs the TrackerDBCursor implementation, i.e.
> the one running a thread and the sqlite cursor. The CANCELLED GIO error
> actually comes from the g_simple_async_result_run_in_thread() used there.
> 
> So what to do here? The proper fix would be to make sure that all tests are run
> in a sandboxed setup, where tracker-store and the tests both share the same
> locale; and actually force using the direct backend for this specific test,
> where the cancellable check is applicable.
> 
> Another fix would be to make sure that we check whether the cancellable is
> cancelled in other implementations of TrackerCursor, e.g. the ones implemented
> by the Bus backend.
> 
> Thoughts?

I agree.
Comment 17 Aleksander Morgado 2013-10-11 19:02:07 UTC
Created attachment 257048 [details] [review]
Allow cancellation in the bus backend

The attached patch implements cancellation in the cursor iteration when using the bus backend, to match the behavior of the direct backend.

The unit test is also updated to make sure that the cancellation doesn't have any race condition; i.e. we actually cancel the cancellable before running the async action.

Comments?
Comment 18 Martyn Russell 2013-10-14 10:20:58 UTC
Comment on attachment 257048 [details] [review]
Allow cancellation in the bus backend

Nice patch Aleksander.

Out of interest, why did the _next() call need to be moved? It changes the logic there AFAICS. Now it looks like we do: next, cancel; before we did: next, next, cancel.
Comment 19 Aleksander Morgado 2013-10-14 10:24:07 UTC
> 
> Out of interest, why did the _next() call need to be moved? It changes the
> logic there AFAICS. Now it looks like we do: next, cancel; before we did: next,
> next, cancel.

the _next() call was just moved after the cancel(). In other words, we now cancel the operation before actually calling _next(), so that we make sure that it is always cancelled.
Comment 20 Martyn Russell 2013-10-14 10:34:13 UTC
Right, actually makes sense :)
Comment 21 Aleksander Morgado 2013-10-14 12:49:57 UTC
Pushed, re-resolving as fixed.
Comment 22 Sam Thursfield 2013-10-15 16:24:34 UTC
This really is the bug that keeps giving. ...

In the 'direct' backend we now call tracker_db_manager_locale_changed() before calling tracker_data_manager_init(). However, global 'data_dir' is set to NULL until tracker_data_manager_init() is called, so apps using the 'direct' backend will always see the locale as 'unknown'. That leads to errors such as this:

utils/sandbox/tracker-sandbox.py --debug -i /tmp/tracker-test-miner-2/ --list --prefix=/opt/tracker-0.16/ 
Using prefix location "/opt/tracker-0.16/"
Using index location "/tmp/tracker-test-miner-2"
Using new D-Bus session with address "unix:abstract=/tmp/dbus-7c3Iaw2i6q,guid=5d991fbee30b8355ba7f8c12525d6b6e" with PID 32469
Using query to list files indexed...
Tracker-Message:   Could not find database locale file:''
Tracker-Message: Locale change detected from 'unknown' to 'C'...
Traceback (most recent call last):
  • File "../../utils/sandbox/tracker-sandbox.py", line 466 in <module>
    db_query_list_files()
  • File "../../utils/sandbox/tracker-sandbox.py", line 144 in db_query_list_files
    conn = Tracker.SparqlConnection.get(None)
gi._glib.GError: Locale mismatch, cannot use direct connection


I will cook up a patch to check and call tracker_db_manager_init_locations() from tracker_db_manager_locale_changed(). That seems like the best solution.
Comment 23 Sam Thursfield 2013-10-15 16:51:37 UTC
Created attachment 257370 [details] [review]
Allow calling tracker_db_manager_locale_changed() before init

This fixes an issue introduced in commit fa6317fbabb1a7, where the
direct backend would fail to find the 'db-locale.txt' marker due to the
global 'data_dir' path variable being NULL at the time it called
tracker_db_manager_locale_changed(). It would then report that the
locale had changed from 'unknown' to the system locale, and abort.
Comment 24 Sam Thursfield 2013-10-15 16:52:06 UTC
Created attachment 257371 [details] [review]
Remove duplicated version of tracker_db_manager_init_locations()

The return value of g_get_user_data_dir() and friends is constant
through the lifecycle of the process, so there's no need to initialise
paths more than once or in different ways.
Comment 25 Martyn Russell 2013-10-19 15:41:17 UTC
Comment on attachment 257371 [details] [review]
Remove duplicated version of tracker_db_manager_init_locations()

Great patch. At first I thought removing the dir= ... bit would be problematic because we used to be able to shutdown and re-init() the locations way back (hence the g_free()), but I don't think it's useful these days. Thanks Sam ;)
Comment 26 Martyn Russell 2013-10-19 15:41:29 UTC
Comment on attachment 257370 [details] [review]
Allow calling tracker_db_manager_locale_changed() before init

Very nice! :)
Comment 27 Martyn Russell 2013-10-19 15:45:31 UTC
I've not marked this bug as fixed just yet. I want to test some more with it. Using the sandbox (utils/sandbox/tracker-sandbox.py), I also found some issues with tests failing due to missing setlocale() calls. I have patched some of those, but there are some tests here which completely b0rk my machine somehow by making the machine unresponsive. Mostly, it's the DB based tests (libtracker-fts, etc) and I think it's related to the sandbox too.
Comment 28 Martyn Russell 2014-08-21 09:02:33 UTC
Marking this as fixed finally.
I've been setting the locale in a lot of test cases lately and I know this patch works, I've seen it in action :)

This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for your bug report.