After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 440458 - non-UTF8 folders are not indexed by beagle
non-UTF8 folders are not indexed by beagle
Status: RESOLVED FIXED
Product: beagle
Classification: Other
Component: General
0.2.17
Other Linux
: Normal normal
: ---
Assigned To: Beagle Bugs
Beagle Bugs
Depends on:
Blocks:
 
 
Reported: 2007-05-22 14:25 UTC by Frederic Crozat
Modified: 2008-07-10 19:36 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Use local encoding for all local file operations (9.28 KB, patch)
2008-03-23 15:06 UTC, Debajyoti Bera
none Details | Review

Description Frederic Crozat 2007-05-22 14:25:32 UTC
This is a follow-up from a blog entry regarding beagle and non-ASCII directory.

To duplicate bug :
-change your test account to use an non-UTF8 locale (for instance en_US.ISO-8859-1)
-export G_FILENAME_ENCODING='@locale' environment variable
-create a textfile containing a string you want to seach easily : foobar.txt containing "I love Beagle" and put it in home directory
-query beagle for "I love Beagle" string and check it is displayed by beagle
-create a directory containing non-ASCII characters, for instance "Téléchargements". Make sure you are creating this file with a non-UTF8 locale, so directory name won't be UTF-8 on disk.
-move foobar.txt file to this directory
-beagle won't find it anymore.

If using an UTF-8 locale, beagle is able to index and display it when queried.

I'm guessing beagle is not using the equivalent from g_filename_to_utf8 in Mono to handle such cases.
Comment 1 Joe Shaw 2007-05-22 14:49:10 UTC
Do you see any warnings or anything from the Beagle logs when indexing or searching for this file?
Comment 2 Frederic Crozat 2007-05-22 16:04:50 UTC
No, I've just redone the tests and there is no warning/exception at all, on both side. And I'm not sure beagled is setting an inotify watcher on those directories
Comment 3 Debajyoti Bera 2008-03-19 01:02:40 UTC
(In reply to comment #2)
> No, I've just redone the tests and there is no warning/exception at all, on
> both side. And I'm not sure beagled is setting an inotify watcher on those
> directories

Frederic, I noticed in your comment in https://qa.mandriva.com/show_bug.cgi?id=38744

I didnt get a chance to test this, I am extremely sorry for that. But based on your comment in the mdv bug, can I assume this bug somehow got fixed and is resolved ?
Comment 4 Frederic Crozat 2008-03-19 11:04:27 UTC
unfornately no, it is still broken.

to test it is really easy :
*run into a non-UTF8 locale (en_US.ISO-8859-1 for instance)
*run a query on "foobarutf8" 
*in your home, echo "foobarutf8" > foobar.txt
=> you'll see the file appearing in beagle search
* rm -f foobar.txt
then 
* mkdir Télé
* echo "foobarutf8" > Télé/foobar.txt

and you'll notice beagle doesn't see the file appearing..
Comment 5 Debajyoti Bera 2008-03-19 11:25:11 UTC
I can not seem to reproduce it.
$ export LC_ALL=en_US.ISO-8859-1
$ export LANG=en_US.ISO-8859-1
$ BEAGLE_INOTIFY_VERBOSE=1 BEAGLE_EXERCISE_THE_DOG=1 BEAGLE_HOME=index/ beagled --fg --debug --backend Files --indexing-delay 0
... <log shows up here>

From another terminal,
1. <same export lang and lc_all>
2. Created the files and directories as you said above.
3. Noticed they were picked up by beagle and I was also able to search in the files.
Comment 6 Frederic Crozat 2008-03-19 14:01:02 UTC
ok, I've ran beagle with the options you gave :

when I create an non-utf8 directory inside a monitored directory, I'm getting :

Error: Can't add directory: '/home/a/test_beagl?ÿ" 
when test_beaglé is created

I forgot to say G_FILENAME_ENCODING is set to '@locale'

it seems beagle is not converting back filename from UTF-8 to current locale (using g_filename_from_utf8 or equivalent in mono) when creating inotify watches.

I have a similar issue if I name a file with non-utf8 characters.
Comment 7 Frederic Crozat 2008-03-19 14:01:50 UTC
one thing I forgot : you need to start your bash and your terminal in the wanted locale for mkdir to really create the files in the correct locale :)
Comment 8 Debajyoti Bera 2008-03-19 18:40:50 UTC
(In reply to comment #7)
> one thing I forgot : you need to start your bash and your terminal in the
> wanted locale for mkdir to really create the files in the correct locale :)

How do I "start bash and terminal in the wanted locale" - I am not good with these things, just exporting LANG and LC_ALL in the terminal where I mkdir would work ?
Comment 9 Frederic Crozat 2008-03-20 08:44:36 UTC
no, it won't work, because the string you are typing are handled by your terminal and bash which are still running with the previous locale.

I suggest you configure a new user with a non-utf8 locale  (should be easy to switch locale using gdm at login) to do your test.
Comment 10 Debajyoti Bera 2008-03-23 00:25:44 UTC
Hmm... how I do it with kdm ? I tried fiddling with .bashrc and .bash_profile without much success. I will try to find a gnome livecd sometime and try to create a user and reproduce it there.

In the mean time could you try to set MONO_EXTERNAL_ENCODINGS="iso-8859-1" and retry indexing the non-utf8 folder ?
Comment 11 Debajyoti Bera 2008-03-23 15:06:43 UTC
Created attachment 107872 [details] [review]
Use local encoding for all local file operations

Can you try this patch ? I did a limited testing and it seems to work. You have to set MONO_EXTERNAL_ENCODINGS="iso-8859-1" to the right locale name.

The problem here is mainly that we use some native libc (and in house C functions) functions for certain file and directory operations. And there was a problem in th marshalling and urmarshalling the strings from Mono's internal UTF16 encoding to platform encoding. The patch uses a custom marshaller (copied from Mono).

I am not very comfortable with such encoding business, so I have tried to make it work as before with when locale is utf8. Only with MONO_EXTERNAL_ENCODINGS set, it will use the platform encoding for its local filenames. It needs some thorough testing to make sure it works for all kinds of encoding and utf8 alike.
Comment 12 Debajyoti Bera 2008-03-25 14:58:27 UTC
Let me know how this patch works.
Comment 13 Debajyoti Bera 2008-07-10 19:36:41 UTC
I checked in a modified version of the patch. If MONO_EXTERNAL_ENCODINGS is set, then the platform encoding is used, otherwie utf8 is used. I created a new account with iso-8859-1 as the locale and did some testing with it. Beagle seems to be indexing fine now and new changes to the filesystem are being picked up nicely. I think this should fix this bug; if you find any problem please reopen or file a new bug.
Thanks.