Bug 675660 – Any search will fail with error about libicu

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 675660 - Any search will fail with error about libicu


Summary:	Any search will fail with error about libicu


Status:	RESOLVED FIXED

Product:	tracker
Classification:	Core
Component:	Needle
Version:	0.14.x
Hardware:	Other Linux

Importance:	Normal critical
Target Milestone:	---
Assigned To:	tracker-general
QA Contact:	Jamie McCracken

URL:
Whiteboard:

Duplicates:	676989 (view as bug list)
Depends on:
Blocks:

Reported:	2012-05-08 09:28 UTC by Raphael Rochet
Modified:	2012-07-30 13:32 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
proposed patch (1.44 KB, patch) 2012-07-20 15:37 UTC, Alexandre Rostovtsev	committed	Details \| Review

Description Raphael Rochet 2012-05-08 09:28:38 UTC

Whatever I type in search, tracker-needle shows an empty list, and prints a lot of errors on the console like this :

(tracker-needle:25809): Tracker-WARNING **: Error initializing libicu support: 'U_ILLEGAL_ARGUMENT_ERROR'

I tried with tracker-search but have same result :

(tracker-search:25833): Tracker-WARNING **: Error initializing libicu support: 'U_ILLEGAL_ARGUMENT_ERROR'

I use ArchLinux, x64, with latest updates. (Tracker 0.14.1)

tracker-stats show that indexing is done

Comment 1 Luis Medinas 2012-05-18 14:18:48 UTC

Got this problem too on a Gentoo system.

Comment 2 Raphael Rochet 2012-05-18 14:29:28 UTC

Hi I have a fix for this.

In fact, when UTF-8 and UTF8 is the same for many applications, tracker-needle and tracker-search only accept UTF-8.

Running :
LANG=fr_FR.utf8 tracker-needle
will make search fail

But running :
LANG=fr_FR.UTF-8 tracker-needle
will make search work

Comment 3 Raphael Rochet 2012-05-18 14:30:19 UTC

Sorry thas not a 'fix' but rather a workaround ...

Comment 4 Martyn Russell 2012-05-18 15:40:33 UTC

What locales are supported with locale -a ?

Comment 5 Luis Medinas 2012-05-18 15:43:29 UTC

my case:

locale -a
C
de_DE
de_DE@euro
de_DE.iso88591
de_DE.iso885915@euro
de_DE.utf8
deutsch
en_US
en_US.iso88591
en_US.utf8
german
portuguese
POSIX
pt_PT
pt_PT.iso88591
pt_PT.utf8

also i have icu 49.1.1 installed and i think it's the cause of such problems.

Comment 6 Raphael Rochet 2012-05-18 15:50:25 UTC

$ locale -a
C
en_US
en_US.iso88591
en_US.utf8
fran�ais
french
fr_FR
fr_FR@euro
fr_FR.iso88591
fr_FR.iso885915@euro
fr_FR.utf8
POSIX

I'm surprised to see 'utf8' here ! That may be why, when I choose my locale from a GUI, then 'utf8' is used instead of 'UTF-8'.

Comment 7 Raphael Rochet 2012-05-18 15:52:49 UTC

(In reply to comment #5)

Can you confirm that when running 
LANG=de_DE.UTF-8 tracker-needle
there is no more error ?

Comment 8 Luis Medinas 2012-05-18 16:24:54 UTC

(In reply to comment #7)
> (In reply to comment #5)
> 
> Can you confirm that when running 
> LANG=de_DE.UTF-8 tracker-needle
> there is no more error ?

same thing here (tracker-needle:2400): Tracker-WARNING **: Error initializing libicu support: 'U_ILLEGAL_ARGUMENT_ERROR'

also tracker makes gnome-shell and empathy crash.

Comment 9 Raphael Rochet 2012-05-18 16:31:39 UTC

(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #5)
> > 
> > Can you confirm that when running 
> > LANG=de_DE.UTF-8 tracker-needle
> > there is no more error ?
> 
> same thing here (tracker-needle:2400): Tracker-WARNING **: Error initializing
> libicu support: 'U_ILLEGAL_ARGUMENT_ERROR'
> 
> also tracker makes gnome-shell and empathy crash.

What does 
LANG=de_DE.UTF-8 TRACKER_VERBOSITY=3 tracker-needle | grep TRACKER_LOCALE
gives ?

Are there locales set to something else ? I had to add some LC_* variables too.

Maybe running just 
locale
may help see what LC_* variables are not correct.

Comment 10 Martyn Russell 2012-05-21 10:58:51 UTC

(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #5)
> > 
> > Can you confirm that when running 
> > LANG=de_DE.UTF-8 tracker-needle
> > there is no more error ?
> 
> same thing here (tracker-needle:2400): Tracker-WARNING **: Error initializing
> libicu support: 'U_ILLEGAL_ARGUMENT_ERROR'
> 
> also tracker makes gnome-shell and empathy crash.

That's quite a bold claim which I would say is unfounded and untrue. Please provide evidence as to how we're making those crash.

Comment 11 Luis Medinas 2012-05-21 18:04:55 UTC

Forget Martyn that was caused by a mistake of mine. Sorry for the trouble but the warning is still there.

Comment 12 Alexandre Rostovtsev 2012-07-20 12:43:29 UTC

According to comments in the Gentoo bugzilla (https://bugs.gentoo.org/show_bug.cgi?id=426276):

> Bernd Feige 2012-07-20 10:27:46 UTC
> 
> Update: Setting *all of* LANG=C LC_CTYPE=C LC_NUMERIC=C both tracker-store and tracker-search work (needed to remove ~/.local/share/tracker/data though to get any matches despite a quite sizable tracker-store.journal...)
> 
> Bernd Feige 2012-07-20 12:37:23 UTC
> 
> Update 2: I think I found the reason for the less-than-overwhelming response now: The problem only occurs with "mixed" LC_* settings such as my own.
> 
> When not touching LC_NUMERIC (i.e. unset LC_NUMERIC) everything is file also using LANG=de_DE.UTF-8.
> 
> Now I'm sure that a relatively recent change caused this; could have either been
> 
> sys-libs/glibc-2.15-r2
> 
> or
> 
> dev-libs/icu-49.1.2

Comment 13 Alexandre Rostovtsev 2012-07-20 15:37:10 UTC

Created attachment 219332 [details] [review]
proposed patch

I think I see what happened.

In the FTS parser, tracker_parser_reset() calls ubrk_open(UBRK_WORD, setlocale (LC_ALL, NULL), ...).

Depending on your locale setup, setlocale(LC_ALL, NULL) as implemented by glibc can easily return a string of >200 characters long.

ICU uses a fixed-size buffer (welcome to the year 1990!) to process locale strings. The size of this buffer is ULOC_FULLNAME_CAPACITY bytes, where ULOC_FULLNAME_CAPACITY is defined as 157.

In the chain of calls from ubrk_open() (BreakIterator::createInstance → BreakIterator::makeInstance → BreakIterator::buildInstance → ures_open → uloc_getBaseName → _canonicalize → u_terminateChars), when supplied an overly long locale definition, ICU overflows an ULOC_FULLNAME_CAPACITY-size buffer, and rather than corrupting memory, throws an error.

Which in turn causes tracker_parser_reset() to fail.

Ideally, ICU ought to make its buffers bigger. However, since ULOC_FULLNAME_CAPACITY is a part of ICU's API and ABI (it's defined in unicode/uloc.h) and is used over 100 places in the ICU source, this is unlikely to happen in the near future.

Fortunately, tracker can easily work around the problem by calling ubrk_open(UBRK_WORD, setlocale (LC_CTYPE, NULL), ...), since that should be sufficient for detecting word boundaries, and the LC_CTYPE definition is certainly less than 157 characters.

Comment 14 Alexandre Rostovtsev 2012-07-20 16:03:19 UTC

After thinking about this a bit further: calling ubrk_open(UBRK_WORD, setlocale(LC_ALL, NULL), ...) was not merely bad practice for some complex locale setups, but *conceptually wrong* in the first place.

ubrk_open expects the name of just a single locale (e.g. "en_US.UTF-8"), not the full definition of your various locale variables and their values as returned by glibc's setlocale(LC_ALL, NULL).

Comment 15 Jürg Billeter 2012-07-21 09:26:32 UTC

I agree that LC_CTYPE sounds like the best choice here. The setlocale manpage is not exactly clear that setlocale (LC_ALL, NULL) may return more than a single locale but it definitely does in some setups.

Comment 16 Jürg Billeter 2012-07-21 12:26:00 UTC

commit 48713ba26af38a15a97fc7ebb0828cd287ef2447
Author: Alexandre Rostovtsev <tetromino@gentoo.org>
Date:   Fri Jul 20 10:46:33 2012 -0400

    libtracker-fts: ICU cannot handle complex locale descriptions
    
    ubrk_open expects the name of just a single locale (e.g. "en_US.UTF-8"),
    not the full definition of your various locale variables and their
    values as returned by glibc's setlocale(LC_ALL, NULL).
    
    Instead, limit ourselves to LC_CTYPE, since after all, that's all we
    need to determine word boundaries.
    
    Fixes GB#675660.

Comment 17 Jürg Billeter 2012-07-30 13:32:36 UTC

*** Bug 676989 has been marked as a duplicate of this bug. ***