After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 666749 - Empty window in LANG=ko_KR.UTF-8
Empty window in LANG=ko_KR.UTF-8
Status: RESOLVED OBSOLETE
Product: tracker
Classification: Core
Component: General
0.14.x
Other Linux
: Normal blocker
: ---
Assigned To: tracker-general
Jamie McCracken
: 673224 676368 679316 684640 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2011-12-23 05:57 UTC by sangu
Modified: 2014-03-21 12:52 UTC
See Also:
GNOME target: ---
GNOME version: 3.5/3.6



Description sangu 2011-12-23 05:57:07 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=770024

Description of problem:
Empty window in LANG=ko_KR.UTF-8.

Then,
Many document icons in $ LANG=en_US.UTF-8 or $ LANG=ja_JP.UTF-8 
gnome-documents.

Version-Release number of selected component (if applicable):
0.2.1-1.fc16.x86_64

How reproducible:
always

Steps to Reproduce:
1.  $LANG=ko_KR.UTF-8 gnome-documents
2.
3.

---
gjs-1.30.0-1.fc16.x86_64
tracker-0.12.8-2.fc16.x86_64
Comment 1 Cosimo Cecchi 2012-03-01 18:26:05 UTC
-> tracker

This looks like a bug in tracker's implementation of the fn:starts-with function, which we use to filter out files from URIs we're not interested in.

As a testcase, run the following commands

tracker-sparql -q "SELECT ?u nie:url(?u) WHERE {?u a nmm:Photo FILTER (fn:starts-with(nie:url(?u), \"file:///home/cosimoc/Pictures\")) } "
Results:
  [ a lot of results ]

LANG=ko_KR.UTF-8 tracker-sparql -q "SELECT ?u nie:url(?u) WHERE {?u a nmm:Photo FILTER (fn:starts-with(nie:url(?u), \"file:///home/cosimoc/Pictures\")) } "
Results:
  None

Replacing fn:starts-with with fn:contains fixes the bug on my machine; I will probably commit such a workaround to gnome-documents if we don't manage to get this fixed in a better way before 3.4.
Comment 2 Cosimo Cecchi 2012-03-01 18:26:31 UTC
Actually reassigning.
Comment 3 sangu 2012-03-17 07:23:40 UTC
This issue still happens in tracker 0.14.0
Comment 4 Martyn Russell 2012-03-30 09:52:26 UTC
(In reply to comment #1)
> -> tracker
> 
> This looks like a bug in tracker's implementation of the fn:starts-with
> function, which we use to filter out files from URIs we're not interested in.
> 
> As a testcase, run the following commands
> 
> tracker-sparql -q "SELECT ?u nie:url(?u) WHERE {?u a nmm:Photo FILTER
> (fn:starts-with(nie:url(?u), \"file:///home/cosimoc/Pictures\")) } "
> Results:
>   [ a lot of results ]
> 
> LANG=ko_KR.UTF-8 tracker-sparql -q "SELECT ?u nie:url(?u) WHERE {?u a nmm:Photo
> FILTER (fn:starts-with(nie:url(?u), \"file:///home/cosimoc/Pictures\")) } "
> Results:
>   None
> 
> Replacing fn:starts-with with fn:contains fixes the bug on my machine; I will
> probably commit such a workaround to gnome-documents if we don't manage to get
> this fixed in a better way before 3.4.

Hello Cosimo, I just tested this with master and it seems to work fine, but I don't see any differences in master from the 0.14 branch which would relate to this.

This is what I get:

$ LANG=ko_KR.UTF-8 tracker-sparql -q "SELECT ?u nie:url(?u) WHERE {?u a nmm:Photo FILTER
(fn:starts-with(nie:url(?u), \"file:///home/martyn/Pictures\")) } "|wc -l
1000

$ tracker-sparql -q "SELECT ?u nie:url(?u) WHERE {?u a nmm:Photo FILTER
(fn:starts-with(nie:url(?u), \"file:///home/martyn/Pictures\")) } "|wc -l
1000

I was considering this being a collation issue, but I am unable to reproduce the issue locally. Is there anything special about the file names (i.e. are they non-ascii)?
Comment 5 Cosimo Cecchi 2012-03-31 00:55:13 UTC
(In reply to comment #4)

> Hello Cosimo, I just tested this with master and it seems to work fine, but I
> don't see any differences in master from the 0.14 branch which would relate to
> this.
> 
> This is what I get:
> 
> $ LANG=ko_KR.UTF-8 tracker-sparql -q "SELECT ?u nie:url(?u) WHERE {?u a
> nmm:Photo FILTER
> (fn:starts-with(nie:url(?u), \"file:///home/martyn/Pictures\")) } "|wc -l
> 1000
> 
> $ tracker-sparql -q "SELECT ?u nie:url(?u) WHERE {?u a nmm:Photo FILTER
> (fn:starts-with(nie:url(?u), \"file:///home/martyn/Pictures\")) } "|wc -l
> 1000

Weird; with the same test query I get (using Tracker 0.14)

LANG=ko_KR.UTF-8 tracker-sparql -q "SELECT ?u nie:url(?u) WHERE {?u a nmm:Photo FILTER (fn:starts-with(nie:url(?u), \"file:///home/cosimoc/Pictures\")) } "|wc -l
3

(which is "Results", "None", and newline)

But without setting the locale to ko_KR.UTF-8 I get:

tracker-sparql -q "SELECT ?u nie:url(?u) WHERE {?u a nmm:Photo FILTER (fn:starts-with(nie:url(?u), \"file:///home/cosimoc/Pictures\")) } "|wc -l
292

I can try to test with git master, but I don't see any relevant commits either...any information I can provide to debug this further? Which distribution are you using (I'm on Fedora)? Maybe this is triggered by a different configuration in the underlying locale plumbing between distros?
Comment 6 Cosimo Cecchi 2012-03-31 00:57:02 UTC
(In reply to comment #4)

> Hello Cosimo, I just tested this with master and it seems to work fine, but I
> don't see any differences in master from the 0.14 branch which would relate to
> this.
> 
> This is what I get:
> 
> $ LANG=ko_KR.UTF-8 tracker-sparql -q "SELECT ?u nie:url(?u) WHERE {?u a
> nmm:Photo FILTER
> (fn:starts-with(nie:url(?u), \"file:///home/martyn/Pictures\")) } "|wc -l
> 1000
> 
> $ tracker-sparql -q "SELECT ?u nie:url(?u) WHERE {?u a nmm:Photo FILTER
> (fn:starts-with(nie:url(?u), \"file:///home/martyn/Pictures\")) } "|wc -l
> 1000

Weird; with the same test query I get (using Tracker 0.14)

LANG=ko_KR.UTF-8 tracker-sparql -q "SELECT ?u nie:url(?u) WHERE {?u a nmm:Photo FILTER (fn:starts-with(nie:url(?u), \"file:///home/cosimoc/Pictures\")) } "|wc -l
3

(which is "Results", "None", and newline)

But without setting the locale to ko_KR.UTF-8 I get:

tracker-sparql -q "SELECT ?u nie:url(?u) WHERE {?u a nmm:Photo FILTER (fn:starts-with(nie:url(?u), \"file:///home/cosimoc/Pictures\")) } "|wc -l
292

I can try to test with git master, but I don't see any relevant commits either...any information I can provide to debug this further? Which distribution are you using (I'm on Fedora)? Maybe this is triggered by a different configuration in the underlying locale plumbing between distros?
Comment 7 Cosimo Cecchi 2012-03-31 13:24:08 UTC
*** Bug 673224 has been marked as a duplicate of this bug. ***
Comment 8 Julian Sikorski 2012-03-31 13:41:56 UTC
I just tested this on Fedora 16:
tracker-0.12.10-1.fc16.x86_64
gnome-documents-0.2.1-1.fc16.x86_64

[julas@snowball2 ~]$ LANG=pl_PL.utf8 tracker-sparql -q "SELECT ?u nie:url(?u) WHERE {?u a nmm:Photo FILTER
(fn:starts-with(nie:url(?u), \"file:///home/julas/Obrazy\")) } " | wc -l
3
[julas@snowball2 ~]$ LANG=en_US.utf8 tracker-sparql -q "SELECT ?u nie:url(?u) WHERE {?u a nmm:Photo FILTER
(fn:starts-with(nie:url(?u), \"file:///home/julas/Obrazy\")) } " | wc -l
3766

and on Fedora 17:
tracker-0.14.0-1.fc17.x86_64
gnome-documents-0.4.0.1-1.fc17.x86_64

[julas@branched Obrazy]$ LANG=pl_PL.utf8 tracker-sparql -q "SELECT ?u nie:url(?u) WHERE {?u a nmm:Photo FILTER
(fn:starts-with(nie:url(?u), \"file:///home/julas/Obrazy\")) } " | wc -l
3
[julas@branched Obrazy]$ LANG=en_US.utf8 tracker-sparql -q "SELECT ?u nie:url(?u) WHERE {?u a nmm:Photo FILTER
(fn:starts-with(nie:url(?u), \"file:///home/julas/Obrazy\")) } " | wc -l
4
Comment 9 Martyn Russell 2012-04-02 08:44:53 UTC
To be fair, I tested on my laptop with Ubuntu on it. Testing here with F16 (desktop), I get the same results (using tracker-0.12.10-1.fc16.x86_64) in the en_US, pl_PL and ko_KR locales we've been bouncing around here.

I know that the locale affects the collation and hence the sorting for results generally, but I wouldn't expect a different number of results.

Out of curiosity, what unicode backend are you using? Presumably libunistring as is used for my system here:

  $ rpm -qR tracker-0.12.10-1.fc16.x86_64|grep -i unistring
  libunistring.so.0()(64bit)  

Do the files you search for have interesting names at all? I wonder if I lack the material to test with this end?
Comment 10 Cosimo Cecchi 2012-04-02 14:04:01 UTC
(In reply to comment #9)

> Out of curiosity, what unicode backend are you using? Presumably libunistring
> as is used for my system here:
> 
>   $ rpm -qR tracker-0.12.10-1.fc16.x86_64|grep -i unistring
>   libunistring.so.0()(64bit)  

Same here, tracker 0.14 is compiled against the unistring backend on F17.

> Do the files you search for have interesting names at all? I wonder if I lack
> the material to test with this end?

I think the problem doesn't lie in the names of the files, but it's in the way the FILTER directive is processed; as a data point supporting this, these two (almost equivalent) queries, both with Korean locale, give two completely different results:

$ LANG=ko_KR.utf-8 tracker-sparql -q "SELECT ?u WHERE { ?u a rdfs:Resource }" | wc -l
4938

$ LANG=ko_KR.utf-8 tracker-sparql -q "SELECT ?u WHERE { ?u a rdfs:Resource FILTER (fn:starts-with(nie:url(?u), \"file:///home/cosimoc\")) }" | wc -l
3
Comment 11 Martyn Russell 2012-04-02 15:25:34 UTC
(In reply to comment #10)
> I think the problem doesn't lie in the names of the files, but it's in the way
> the FILTER directive is processed; as a data point supporting this, these two
> (almost equivalent) queries, both with Korean locale, give two completely
> different results:
> 
> $ LANG=ko_KR.utf-8 tracker-sparql -q "SELECT ?u WHERE { ?u a rdfs:Resource }" |
> wc -l
> 4938
> 
> $ LANG=ko_KR.utf-8 tracker-sparql -q "SELECT ?u WHERE { ?u a rdfs:Resource
> FILTER (fn:starts-with(nie:url(?u), \"file:///home/cosimoc\")) }" | wc -l
> 3

Indeed. I discussed this with Jürg in the #tracker room today and I checked the code too. The reason fn:starts-with doesn't work the same way but it does with fn:contains is because (AFAICS) one uses a GLOB and the other uses BETWEEN keywords in SQL. Now, we do something fancy by comparing between 'A' and 'B'+TRACKER_COLLATION_LAST_CHAR. We do this for collation reasons. What makes this complicated is, the TRACKER_COLLATION_LAST_CHAR is not the same for all backends. You can see this in the Tracker source directory:

$ git grep COLLATION_LAST_CHAR . | grep -i define
src/libtracker-data/tracker-collation.h:#define TRACKER_COLLATION_LAST_CHAR ((gunichar) 0x10fffd)
src/libtracker-data/tracker-collation.h:#define TRACKER_COLLATION_LAST_CHAR ((gunichar) 0x9fa5)

One is for libunistring and the other for libicu. So we switch depending on the implementation we were built with (according to configure).

One way this would cause your situation is if the TRACKER_COLLATION_LAST_CHAR is *not* the last character any more or it's now sorted into a position which breaks things for us.

What I wonder is, if you switch to libicu in your build of Tracker, does this change anything for you?

I've CCd Aleksander into this bug so he can comment. He is our resident unicode specialist :) and may correct me on what i've said above and provide some additional input.
Comment 12 Cosimo Cecchi 2012-04-02 16:16:54 UTC
(In reply to comment #11)
 
> Indeed. I discussed this with Jürg in the #tracker room today and I checked the
> code too. The reason fn:starts-with doesn't work the same way but it does with
> fn:contains is because (AFAICS) one uses a GLOB and the other uses BETWEEN
> keywords in SQL. Now, we do something fancy by comparing between 'A' and
> 'B'+TRACKER_COLLATION_LAST_CHAR. We do this for collation reasons. What makes
> this complicated is, the TRACKER_COLLATION_LAST_CHAR is not the same for all
> backends. You can see this in the Tracker source directory:
> 
> $ git grep COLLATION_LAST_CHAR . | grep -i define
> src/libtracker-data/tracker-collation.h:#define TRACKER_COLLATION_LAST_CHAR
> ((gunichar) 0x10fffd)
> src/libtracker-data/tracker-collation.h:#define TRACKER_COLLATION_LAST_CHAR
> ((gunichar) 0x9fa5)
> 
> One is for libunistring and the other for libicu. So we switch depending on the
> implementation we were built with (according to configure).
> 
> One way this would cause your situation is if the TRACKER_COLLATION_LAST_CHAR
> is *not* the last character any more or it's now sorted into a position which
> breaks things for us.
> 
> What I wonder is, if you switch to libicu in your build of Tracker, does this
> change anything for you?

Martyn, thanks for investigation and the time you are spending into this.
I now tested with Tracker git master rebuilt with the libicu backend, and I can confirm that your analysis is right: with that backend I get the correct number of results when testing with the Korean locale.
Comment 13 Cosimo Cecchi 2012-05-29 13:38:12 UTC
*** Bug 676368 has been marked as a duplicate of this bug. ***
Comment 14 Cosimo Cecchi 2012-05-29 13:41:23 UTC
Martyn, I felt free to raise the importance of this report to blocker, since it renders applications such as Documents completely unusable for users with non-english locales.

Do you suggest to just switch to libicu as the default backend?
Comment 15 Aleksander Morgado 2012-05-29 14:05:51 UTC
This thread in the libunistring mailing list talks about the issue; but didn't get any reply about when libunistring will provide a non-strcoll() UTS#10-based collation:
http://lists.gnu.org/archive/html/bug-libunistring/2010-11/msg00008.html

If switching to libicu, just note that it will make the FTS parsing much slower due to extra conversions to/from UTF-16. But of course, if that is the only way to have a proper collation...

Just wondering, can't we re-work Jürg's fix in order to handle these new cases with libunistring? Maybe providing a custom collation method which would treat 0x10fffd really as the last char always and calling libunistring's collator internally?
Comment 16 Martyn Russell 2012-05-29 15:52:00 UTC
(In reply to comment #14)
> Martyn, I felt free to raise the importance of this report to blocker, since it
> renders applications such as Documents completely unusable for users with
> non-english locales.
> 
> Do you suggest to just switch to libicu as the default backend?

Cosimo, thanks for raising it. I wasn't actually sure the best way forward here. After consideration, perhaps the best approach is two fold...

1. We attempt to patch the issue Aleksander mentions on the unistring mailing list. This may end up meaning we fix strcoll() since I believe libunistring is using that under the hood.

2. We try to fix it in Tracker as Aleksander suggests for the short term.

I am slightly concerned that libicu is the less perfect choice of the two because of the reasons Aleksander pointed out around performance.

We've seen some bugs reported against libicu use recently too:
  
   https://bugzilla.gnome.org/show_bug.cgi?id=675660
   https://bugzilla.gnome.org/show_bug.cgi?id=676989

   Though, I suspect these are related to incorrectly set up environments:

   https://bbs.archlinux.org/viewtopic.php?id=140435

   I have a suspicion fixing this bug would resolve the above issues:

   https://bugzilla.gnome.org/show_bug.cgi?id=676209

We could default to libicu over libunistring (in the order of discovery in configure.ac). That would likely help here.

But I would like to see bugs/patches related to other components in the stack filed/created (i.e. for glibc, perhaps Tracker improvements and for libunistring).

(In reply to comment #15)
> This thread in the libunistring mailing list talks about the issue; but didn't
> get any reply about when libunistring will provide a non-strcoll() UTS#10-based
> collation:
> http://lists.gnu.org/archive/html/bug-libunistring/2010-11/msg00008.html

Is it worth asking again?
 
> If switching to libicu, just note that it will make the FTS parsing much slower
> due to extra conversions to/from UTF-16. But of course, if that is the only way
> to have a proper collation...

Any idea how much slower? 1/2 speed? Of course it depends on your end machine.
 
> Just wondering, can't we re-work Jürg's fix in order to handle these new cases
> with libunistring? Maybe providing a custom collation method which would treat
> 0x10fffd really as the last char always and calling libunistring's collator
> internally?

I've CCd Jürg. Any comments Jürg?
Comment 17 Julian Sikorski 2012-05-31 20:11:03 UTC
I am not 100% certain, but gnome-contacts-3.4.1-1.fc17.x86_64 might be suffering from this problem too. It is a bigger issue since in gnome 3.4 you need it to group empathy contacts.
Comment 18 Matej 2012-05-31 21:28:55 UTC
Gnome-contacts 3.4.0-1 seems to work on my system where gnome-documents doesn't.
Comment 19 Cosimo Cecchi 2012-05-31 21:53:14 UTC
I doubt it's related - Contacts 3.4 does not use Tracker in any way AFAIK.
Comment 20 sangu 2012-08-29 13:34:19 UTC
This bug still happens in GNOME 3.6Beta.

tracker-0.14.2-2.fc18.x86_64
gnome-documents-3.5.90-2.fc18.x86_64
Comment 21 Martyn Russell 2012-08-30 08:36:26 UTC
Sangu, sadly, nothing has changed here. My comment #16 suggests approaches to fix/improve this situation, but we're at a bit of a stale mate here.
Comment 22 Matthias Clasen 2012-09-18 11:05:21 UTC
gnome-documents has a (performance-reducing) workaround in place now:

http://git.gnome.org/browse/gnome-documents/commit/?id=29b6bc7d2db52955117a3340bd2ff5434b39dc56
Comment 23 Matthias Clasen 2012-09-18 18:08:36 UTC
And at least for Fedora, we'll get tracker built against icu. Maybe that's worth recommending on distributor-list.
Comment 24 Martyn Russell 2012-09-18 19:01:38 UTC
I was hoping to patch master and release a 0.14.3 some time soon with this too.

Thanks Matthias.
Comment 25 Matthias Clasen 2012-09-19 02:47:28 UTC
Dropping off the blocker list, workaround is in place.
Comment 26 Cosimo Cecchi 2012-09-25 15:13:25 UTC
*** Bug 684640 has been marked as a duplicate of this bug. ***
Comment 27 Martyn Russell 2012-10-17 11:18:26 UTC
I've now defaulted to icu for the unicode support in master to try to avoid this problem. Release 0.14.3. should also have this change.
Comment 28 Cosimo Cecchi 2012-12-17 16:03:46 UTC
*** Bug 679316 has been marked as a duplicate of this bug. ***
Comment 29 Martyn Russell 2014-03-21 11:10:54 UTC
Cosimo, did you say to me on IRC some time ago, this is no longer reproducible? i.e. possible ICU bug fix?
Comment 30 Debarshi Ray 2014-03-21 11:54:21 UTC
It was actually me.

I found that sometime after comment 23 an unrelated change to the tracker package switched it back to libunistring in Fedora. But that did not cause this bug to reappear. I tried the reproducers on this bug, but could not make it fail. While I have switched the package back to use libicu, it might be so that something was fixed somewhere.
Comment 31 Martyn Russell 2014-03-21 11:58:29 UTC
Thanks Rishi, I wonder if we can close the bug, that's all. Any problem with me closing as OBSOLETE?
Comment 32 Debarshi Ray 2014-03-21 12:07:40 UTC
Lets close this OBSOLETE.
Comment 33 Michael Biebl 2014-03-21 12:30:38 UTC
Hi Martin, 
what is the current state of this issue: Is it ok now to use libunistring or is libicu still recommended?
/me wonders which library I should for the Debian package
Comment 34 Martyn Russell 2014-03-21 12:38:55 UTC
Michael, so we defaulted to libicu in master to improve the situation (see above comments), but Rishi is saying that Fedora defaulted to libunistring, but now they don't.

So everyone (me included) seems to go for libicu, and it makes more sense to me because we also have MP3 encoding detection with libicu.

However, it's not clear if it's fixed with libicu.

So I don't think that answers your question, but that's where we are.
If Rishi could verify this bug is obsolete for libicu, that would certainly give you a more concrete way forward here.
Comment 35 Debarshi Ray 2014-03-21 12:51:54 UTC
Using libicu was originally the suggested fix or workaround for this bug. See comment 27 and a few above it. It appears that it now works properly with libunistring, but I don't know why.