GNOME Bugzilla – Bug 440458
non-UTF8 folders are not indexed by beagle
Last modified: 2008-07-10 19:36:41 UTC
This is a follow-up from a blog entry regarding beagle and non-ASCII directory. To duplicate bug : -change your test account to use an non-UTF8 locale (for instance en_US.ISO-8859-1) -export G_FILENAME_ENCODING='@locale' environment variable -create a textfile containing a string you want to seach easily : foobar.txt containing "I love Beagle" and put it in home directory -query beagle for "I love Beagle" string and check it is displayed by beagle -create a directory containing non-ASCII characters, for instance "Téléchargements". Make sure you are creating this file with a non-UTF8 locale, so directory name won't be UTF-8 on disk. -move foobar.txt file to this directory -beagle won't find it anymore. If using an UTF-8 locale, beagle is able to index and display it when queried. I'm guessing beagle is not using the equivalent from g_filename_to_utf8 in Mono to handle such cases.
Do you see any warnings or anything from the Beagle logs when indexing or searching for this file?
No, I've just redone the tests and there is no warning/exception at all, on both side. And I'm not sure beagled is setting an inotify watcher on those directories
(In reply to comment #2) > No, I've just redone the tests and there is no warning/exception at all, on > both side. And I'm not sure beagled is setting an inotify watcher on those > directories Frederic, I noticed in your comment in https://qa.mandriva.com/show_bug.cgi?id=38744 I didnt get a chance to test this, I am extremely sorry for that. But based on your comment in the mdv bug, can I assume this bug somehow got fixed and is resolved ?
unfornately no, it is still broken. to test it is really easy : *run into a non-UTF8 locale (en_US.ISO-8859-1 for instance) *run a query on "foobarutf8" *in your home, echo "foobarutf8" > foobar.txt => you'll see the file appearing in beagle search * rm -f foobar.txt then * mkdir Télé * echo "foobarutf8" > Télé/foobar.txt and you'll notice beagle doesn't see the file appearing..
I can not seem to reproduce it. $ export LC_ALL=en_US.ISO-8859-1 $ export LANG=en_US.ISO-8859-1 $ BEAGLE_INOTIFY_VERBOSE=1 BEAGLE_EXERCISE_THE_DOG=1 BEAGLE_HOME=index/ beagled --fg --debug --backend Files --indexing-delay 0 ... <log shows up here> From another terminal, 1. <same export lang and lc_all> 2. Created the files and directories as you said above. 3. Noticed they were picked up by beagle and I was also able to search in the files.
ok, I've ran beagle with the options you gave : when I create an non-utf8 directory inside a monitored directory, I'm getting : Error: Can't add directory: '/home/a/test_beagl?ÿ" when test_beaglé is created I forgot to say G_FILENAME_ENCODING is set to '@locale' it seems beagle is not converting back filename from UTF-8 to current locale (using g_filename_from_utf8 or equivalent in mono) when creating inotify watches. I have a similar issue if I name a file with non-utf8 characters.
one thing I forgot : you need to start your bash and your terminal in the wanted locale for mkdir to really create the files in the correct locale :)
(In reply to comment #7) > one thing I forgot : you need to start your bash and your terminal in the > wanted locale for mkdir to really create the files in the correct locale :) How do I "start bash and terminal in the wanted locale" - I am not good with these things, just exporting LANG and LC_ALL in the terminal where I mkdir would work ?
no, it won't work, because the string you are typing are handled by your terminal and bash which are still running with the previous locale. I suggest you configure a new user with a non-utf8 locale (should be easy to switch locale using gdm at login) to do your test.
Hmm... how I do it with kdm ? I tried fiddling with .bashrc and .bash_profile without much success. I will try to find a gnome livecd sometime and try to create a user and reproduce it there. In the mean time could you try to set MONO_EXTERNAL_ENCODINGS="iso-8859-1" and retry indexing the non-utf8 folder ?
Created attachment 107872 [details] [review] Use local encoding for all local file operations Can you try this patch ? I did a limited testing and it seems to work. You have to set MONO_EXTERNAL_ENCODINGS="iso-8859-1" to the right locale name. The problem here is mainly that we use some native libc (and in house C functions) functions for certain file and directory operations. And there was a problem in th marshalling and urmarshalling the strings from Mono's internal UTF16 encoding to platform encoding. The patch uses a custom marshaller (copied from Mono). I am not very comfortable with such encoding business, so I have tried to make it work as before with when locale is utf8. Only with MONO_EXTERNAL_ENCODINGS set, it will use the platform encoding for its local filenames. It needs some thorough testing to make sure it works for all kinds of encoding and utf8 alike.
Let me know how this patch works.
I checked in a modified version of the patch. If MONO_EXTERNAL_ENCODINGS is set, then the platform encoding is used, otherwie utf8 is used. I created a new account with iso-8859-1 as the locale and did some testing with it. Beagle seems to be indexing fine now and new changes to the filesystem are being picked up nicely. I think this should fix this bug; if you find any problem please reopen or file a new bug. Thanks.