After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 675660 - Any search will fail with error about libicu
Any search will fail with error about libicu
Status: RESOLVED FIXED
Product: tracker
Classification: Core
Component: Needle
0.14.x
Other Linux
: Normal critical
: ---
Assigned To: tracker-general
Jamie McCracken
: 676989 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2012-05-08 09:28 UTC by Raphael Rochet
Modified: 2012-07-30 13:32 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
proposed patch (1.44 KB, patch)
2012-07-20 15:37 UTC, Alexandre Rostovtsev
committed Details | Review

Description Raphael Rochet 2012-05-08 09:28:38 UTC
Whatever I type in search, tracker-needle shows an empty list, and prints a lot of errors on the console like this :

(tracker-needle:25809): Tracker-WARNING **: Error initializing libicu support: 'U_ILLEGAL_ARGUMENT_ERROR'

I tried with tracker-search but have same result :

(tracker-search:25833): Tracker-WARNING **: Error initializing libicu support: 'U_ILLEGAL_ARGUMENT_ERROR'

I use ArchLinux, x64, with latest updates. (Tracker 0.14.1)

tracker-stats show that indexing is done
Comment 1 Luis Medinas 2012-05-18 14:18:48 UTC
Got this problem too on a Gentoo system.
Comment 2 Raphael Rochet 2012-05-18 14:29:28 UTC
Hi I have a fix for this.

In fact, when UTF-8 and UTF8 is the same for many applications, tracker-needle and tracker-search only accept UTF-8.

Running :
LANG=fr_FR.utf8 tracker-needle
will make search fail

But running :
LANG=fr_FR.UTF-8 tracker-needle
will make search work
Comment 3 Raphael Rochet 2012-05-18 14:30:19 UTC
Sorry thas not a 'fix' but rather a workaround ...
Comment 4 Martyn Russell 2012-05-18 15:40:33 UTC
What locales are supported with locale -a ?
Comment 5 Luis Medinas 2012-05-18 15:43:29 UTC
my case:

locale -a
C
de_DE
de_DE@euro
de_DE.iso88591
de_DE.iso885915@euro
de_DE.utf8
deutsch
en_US
en_US.iso88591
en_US.utf8
german
portuguese
POSIX
pt_PT
pt_PT.iso88591
pt_PT.utf8

also i have icu 49.1.1 installed and i think it's the cause of such problems.
Comment 6 Raphael Rochet 2012-05-18 15:50:25 UTC
$ locale -a
C
en_US
en_US.iso88591
en_US.utf8
fran�ais
french
fr_FR
fr_FR@euro
fr_FR.iso88591
fr_FR.iso885915@euro
fr_FR.utf8
POSIX

I'm surprised to see 'utf8' here ! That may be why, when I choose my locale from a GUI, then 'utf8' is used instead of 'UTF-8'.
Comment 7 Raphael Rochet 2012-05-18 15:52:49 UTC
(In reply to comment #5)

Can you confirm that when running 
LANG=de_DE.UTF-8 tracker-needle
there is no more error ?
Comment 8 Luis Medinas 2012-05-18 16:24:54 UTC
(In reply to comment #7)
> (In reply to comment #5)
> 
> Can you confirm that when running 
> LANG=de_DE.UTF-8 tracker-needle
> there is no more error ?

same thing here (tracker-needle:2400): Tracker-WARNING **: Error initializing libicu support: 'U_ILLEGAL_ARGUMENT_ERROR'

also tracker makes gnome-shell and empathy crash.
Comment 9 Raphael Rochet 2012-05-18 16:31:39 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #5)
> > 
> > Can you confirm that when running 
> > LANG=de_DE.UTF-8 tracker-needle
> > there is no more error ?
> 
> same thing here (tracker-needle:2400): Tracker-WARNING **: Error initializing
> libicu support: 'U_ILLEGAL_ARGUMENT_ERROR'
> 
> also tracker makes gnome-shell and empathy crash.

What does 
LANG=de_DE.UTF-8 TRACKER_VERBOSITY=3 tracker-needle | grep TRACKER_LOCALE
gives ?

Are there locales set to something else ? I had to add some LC_* variables too.

Maybe running just 
locale
may help see what LC_* variables are not correct.
Comment 10 Martyn Russell 2012-05-21 10:58:51 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #5)
> > 
> > Can you confirm that when running 
> > LANG=de_DE.UTF-8 tracker-needle
> > there is no more error ?
> 
> same thing here (tracker-needle:2400): Tracker-WARNING **: Error initializing
> libicu support: 'U_ILLEGAL_ARGUMENT_ERROR'
> 
> also tracker makes gnome-shell and empathy crash.

That's quite a bold claim which I would say is unfounded and untrue. Please provide evidence as to how we're making those crash.
Comment 11 Luis Medinas 2012-05-21 18:04:55 UTC
Forget Martyn that was caused by a mistake of mine. Sorry for the trouble but the warning is still there.
Comment 12 Alexandre Rostovtsev 2012-07-20 12:43:29 UTC
According to comments in the Gentoo bugzilla (https://bugs.gentoo.org/show_bug.cgi?id=426276):

> Bernd Feige 2012-07-20 10:27:46 UTC
> 
> Update: Setting *all of* LANG=C LC_CTYPE=C LC_NUMERIC=C both tracker-store and tracker-search work (needed to remove ~/.local/share/tracker/data though to get any matches despite a quite sizable tracker-store.journal...)
> 
> Bernd Feige 2012-07-20 12:37:23 UTC
> 
> Update 2: I think I found the reason for the less-than-overwhelming response now: The problem only occurs with "mixed" LC_* settings such as my own.
> 
> When not touching LC_NUMERIC (i.e. unset LC_NUMERIC) everything is file also using LANG=de_DE.UTF-8.
> 
> Now I'm sure that a relatively recent change caused this; could have either been
> 
> sys-libs/glibc-2.15-r2
> 
> or
> 
> dev-libs/icu-49.1.2
Comment 13 Alexandre Rostovtsev 2012-07-20 15:37:10 UTC
Created attachment 219332 [details] [review]
proposed patch

I think I see what happened.

In the FTS parser, tracker_parser_reset() calls ubrk_open(UBRK_WORD, setlocale (LC_ALL, NULL), ...).

Depending on your locale setup, setlocale(LC_ALL, NULL) as implemented by glibc can easily return a string of >200 characters long.

ICU uses a fixed-size buffer (welcome to the year 1990!) to process locale strings. The size of this buffer is ULOC_FULLNAME_CAPACITY bytes, where ULOC_FULLNAME_CAPACITY is defined as 157.

In the chain of calls from ubrk_open() (BreakIterator::createInstance → BreakIterator::makeInstance → BreakIterator::buildInstance → ures_open → uloc_getBaseName → _canonicalize → u_terminateChars), when supplied an overly long locale definition, ICU overflows an ULOC_FULLNAME_CAPACITY-size buffer, and rather than corrupting memory, throws an error.

Which in turn causes tracker_parser_reset() to fail.

Ideally, ICU ought to make its buffers bigger. However, since ULOC_FULLNAME_CAPACITY is a part of ICU's API and ABI (it's defined in unicode/uloc.h) and is used over 100 places in the ICU source, this is unlikely to happen in the near future.

Fortunately, tracker can easily work around the problem by calling ubrk_open(UBRK_WORD, setlocale (LC_CTYPE, NULL), ...), since that should be sufficient for detecting word boundaries, and the LC_CTYPE definition is certainly less than 157 characters.
Comment 14 Alexandre Rostovtsev 2012-07-20 16:03:19 UTC
After thinking about this a bit further: calling ubrk_open(UBRK_WORD, setlocale(LC_ALL, NULL), ...) was not merely bad practice for some complex locale setups, but *conceptually wrong* in the first place.

ubrk_open expects the name of just a single locale (e.g. "en_US.UTF-8"), not the full definition of your various locale variables and their values as returned by glibc's setlocale(LC_ALL, NULL).
Comment 15 Jürg Billeter 2012-07-21 09:26:32 UTC
I agree that LC_CTYPE sounds like the best choice here. The setlocale manpage is not exactly clear that setlocale (LC_ALL, NULL) may return more than a single locale but it definitely does in some setups.
Comment 16 Jürg Billeter 2012-07-21 12:26:00 UTC
commit 48713ba26af38a15a97fc7ebb0828cd287ef2447
Author: Alexandre Rostovtsev <tetromino@gentoo.org>
Date:   Fri Jul 20 10:46:33 2012 -0400

    libtracker-fts: ICU cannot handle complex locale descriptions
    
    ubrk_open expects the name of just a single locale (e.g. "en_US.UTF-8"),
    not the full definition of your various locale variables and their
    values as returned by glibc's setlocale(LC_ALL, NULL).
    
    Instead, limit ourselves to LC_CTYPE, since after all, that's all we
    need to determine word boundaries.
    
    Fixes GB#675660.
Comment 17 Jürg Billeter 2012-07-30 13:32:36 UTC
*** Bug 676989 has been marked as a duplicate of this bug. ***