After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 354161 - Not All Files Found: Too Large Music Directory to Index?
Not All Files Found: Too Large Music Directory to Index?
Status: RESOLVED FIXED
Product: beagle
Classification: Other
Component: General
0.2.6
Other All
: Normal major
: ---
Assigned To: Beagle Bugs
Beagle Bugs
: 341841 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2006-09-03 19:11 UTC by Kent Borg
Modified: 2006-09-10 01:36 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Log files from debug run (270.00 KB, application/x-compressed-tar)
2006-09-05 22:01 UTC, Kent Borg
Details
Test program (543 bytes, text/plain)
2006-09-06 19:44 UTC, Joe Shaw
Details
Output from test-directorywalker.exe (13.69 KB, text/plain)
2006-09-06 20:29 UTC, Kent Borg
Details
complete list of music files (145.03 KB, text/plain)
2006-09-06 20:37 UTC, Kent Borg
Details
Updated test program, this time precompiled (8.50 KB, application/octet-stream)
2006-09-06 21:18 UTC, Joe Shaw
Details
Try this one, precompiled (8.50 KB, application/octet-stream)
2006-09-06 21:27 UTC, Joe Shaw
Details
Output from latest test-directorywalker.exe (74.00 KB, text/plain)
2006-09-06 21:32 UTC, Kent Borg
Details
Hopefully a test program which fixes it (8.50 KB, application/octet-stream)
2006-09-06 21:53 UTC, Joe Shaw
Details
Output from latest test-directorywalker.exe (154.49 KB, text/plain)
2006-09-06 21:57 UTC, Kent Borg
Details

Description Kent Borg 2006-09-03 19:11:49 UTC
Please describe the problem:
I have a large flat directory (3220 .mp3 files). Seems it isn't all indexed.  Is this a known limit?  (Possibly fixed upstream of my distro, Ubuntu 6.06?)

For example, I ask beagle for "Beatles" (using beagle-query or GUI) and I get 13 songs (plus two beatles mentions in PDFs and one item that appears to be contents from a web page in the Firefox cache, 16 hits total).  I run "locate Beatles" and I get 71 songs (only songs).  Interestingly, the first 13 hits from "locate" are the same 13 songs beagle-query finds (though in a different order).  Spot checking the extra songs found by "locate" confirms they do currently exist on my disk.  This happens with other music in that directory.  I don't think there has been any change in the contents of that large directory since before I installed beagle.  The directory was created by gtkpod.  

An example file name that beagle does not find: 

  /home/kentborg/gtkpodmusic/The Beatles - Wild Honey Pie.mp3

An example that it does find: 

  /home/kentborg/gtkpodmusic/The Beatles - Words Of Love.mp3

I don't know if things other than music are not being indexed.

beagle-status says: 

  Scheduler:
  Count: 20888
  Status: Waiting on empty queue
  
  Pending Tasks:
  Scheduler queue is empty.

Steps to reproduce:
Every song query seems to reproduce the problem, haven't tried a reindex from scratch.  (What are the correct steps for that?  Any valuable experiments I should try first?)

Actual results:


Expected results:


Does this happen every time?


Other information:
"beagled -fg" reports version 0.2.6, this is on a notebook, running Ubuntu 6.06.  The initial indexing ran overnight and got things hot, upset the battery (blinking the charge LED, which I had not seen before), but the OS did not crash.
Comment 1 Kevin Kubasik 2006-09-04 22:36:53 UTC
Just thinking aloud here, could we use the static indexer to try and diagnose this elusive problem?
Comment 2 Debajyoti Bera 2006-09-05 16:03:15 UTC
This problem has been reported before in #341841. I tried to reproduce this with lots of mp3 files in 1 directory but wasnt able to reproduce the bug. It looks like there is something other than just lots_of_files playing nasty here. Maybe something in the names of the files or xdgmime or something even worse.

Some information which might might be helpful:
* Is it reproducible everytime ? - then probably not an xdgmime issue.
* If that directory is added as a indexing root, does that problem happen ? if no, then probably some directory traversal bug.
* Maybe keep removing half of the files from the directory to see if any particular file is causing this.
Comment 3 Joe Shaw 2006-09-05 18:05:14 UTC
(In reply to comment #1)
> Just thinking aloud here, could we use the static indexer to try and diagnose
> this elusive problem?

Static indexer might help, but it's a different code path.  It's probably just as easy to set up a sandbox and test it using the daemon, like so:

BEAGLE_HOME=/tmp/sandbox BEAGLE_EXERCISE_THE_DOG=1 beagled --debug --fg --allow-backend files
Comment 4 Kent Borg 2006-09-05 19:02:51 UTC
I am not well experienced with beagle, so please be patient with me...

Not wanting to destroy a captive instance of a possibly elusive bug, I did the following:

 - create a new user
 - "cp -a" my music directory to the new user's home
 - fire up "beagled --fg"
 - wait
 - "beagle-query Beatles"

Result: Same results.   Identical files in identical order.  Only 13 hits.   

Conclusion: I can reproduce this bug.

Another clue: The 13 files returned are the same as the first 13 files returned by locate.  I think there is a file that beagled somehow barfed on, someplace between the 13th and 14th Beatles song.  

Note, these files do not have traditional computer names.  They came from loading CDs into Itunes on a Mac, syncing with an Ipod, and using gtkpod to suck those songs into my Ubuntu notebook (BTW, gtkpod is getting confused on these files too).  The files names include: spaces, apostrophies, ampersands, dashes, parentheses, exclamation points, underscores, multiple periods, upper and lowercase letters, digits, long names (e.g., 230 characters long for one), diacriticals (e-acute, e-grave, o-diaeresis, e-diaeresis, E-dot, i-diaeresis, o-acute, n-tilda, i-acute), sexed single close quotes, cross-hash pound signs...and others I am sure I missed in my survey (scrolling though the directory in emacs).

How well tested is beagle against inband data in names being interpreted as delimiters?  I am guessing some name is a problem.

Any suggestions for what experiments I might try next?  (Please be explicit, I am not a beagle expert.)

Thanks,

-kb
Comment 5 Joe Shaw 2006-09-05 19:21:05 UTC
Kent: Filenames should not be a problem, with perhaps the exception of the diacritics if the filename isn't UTF-8.  But it probably is if you're seeing the names correctly and not garbage.

You might want to try it again in your second home directory; nuke the ~/.beagle directory and rerun it; make sure you pass in --debug to the command-line.

It would be helpful if you could tar up your ~/.beagle/Log directory and attach it to the bug; that would tell us if files are being detected incorrectly, if there was an error that caused indexing to stop, etc.
Comment 6 Debajyoti Bera 2006-09-05 19:51:54 UTC
(In reply to comment #4)
> Note, these files do not have traditional computer names.  They came from
> loading CDs into Itunes on a Mac, syncing with an Ipod, and using gtkpod to
> suck those songs into my Ubuntu notebook (BTW, gtkpod is getting confused on
> these files too).  The files names include: spaces, apostrophies, ampersands,
> dashes, parentheses, exclamation points, underscores, multiple periods, upper
> and lowercase letters, digits, long names (e.g., 230 characters long for one),
> diacriticals (e-acute, e-grave, o-diaeresis, e-diaeresis, E-dot, i-diaeresis,
> o-acute, n-tilda, i-acute), sexed single close quotes, cross-hash pound
> signs...and others I am sure I missed in my survey (scrolling though the
> directory in emacs).
> 
> How well tested is beagle against inband data in names being interpreted as
> delimiters?  I am guessing some name is a problem.

Now and then name problems do show up but nothing so severe. Beagle generally handles weird naming good enough, as Joe said. One way to test against name problem would be to copy one simple small good mp3 file for each of the 3220 files, having names as those in that directory. Output "ls -1" to a file and then run a script reading a name from that file and copying a fixed mp3 as that file.

Whatever you do, the log files would be very helpful. Dont let the instance run away.
Comment 7 Kent Borg 2006-09-05 21:56:26 UTC
(In reply to comment #5)
> You might want to try it again in your second home directory; nuke the
> ~/.beagle directory and rerun it; make sure you pass in --debug to the
> command-line.

OK.  The new indexing still returns the incomplete 13 Beatles songs.

I attach the logs.  You will note the user has a Google Earth installation from a few months ago, but I don't think it interfered, there is no other Google Earth installation on this machine.
Comment 8 Kent Borg 2006-09-05 22:01:40 UTC
Created attachment 72279 [details]
Log files from debug run
Comment 9 Joe Shaw 2006-09-06 19:30:33 UTC
Weird!  There are basically no errors here, it seems as thought it's just not seeing some of the files.

I am going to whip up a test program for you to try to see if the bug is in our DirectoryWalker code.
Comment 10 Joe Shaw 2006-09-06 19:44:07 UTC
Created attachment 72334 [details]
Test program

Compile the program like so:

mcs -debug test-directorywalker.cs -r:Util.dll

(You might have to specify a full path for it; it comes from beagle, so it'd be something like -r:/usr/lib/beagle/Util.dll)

Then to run it you'll need something like:

LD_LIBRARY_PATH=/usr/lib/beagle:$LD_LIBRARY_PATH MONO_PATH=/usr/lib/beagle mono test-directorywalker.exe

Run it in your directory with the thousands of mp3 files.  It should output them one by one and at the end say how many files there were.  Compare this with the output of:

find . -maxdepth 0 -type f | wc -l

They should be roughly equal.  If not, then we have a problem.
Comment 11 Kent Borg 2006-09-06 20:14:23 UTC
(In reply to comment #10)
> Compile the program like so:
> 
> mcs -debug test-directorywalker.cs -r:Util.dll

I don't see an "mcs" on my machine.  So I cast about in Synaptic and install mono-mcs package.

Now run "$ mcs -debug test-directorywalker.cs -r:/usr/lib/beagle/Util.dll" and that works.

So I think you want to see this (or, I guess maybe you don't want to see this):

  google-earth-user@bottom:~$ cd gtkpodmusic/     
  google-earth-user@bottom:~/gtkpodmusic$ find . -maxdepth 1 -type f | wc -l 
  3220
  google-earth-user@bottom:~/gtkpodmusic$ LD_LIBRARY_PATH=/usr/lib/beagle:$LD_LIBRARY_PATH MONO_PATH=/usr/lib/beagle mono ../test-directorywalker.exe | wc -l
  320
  google-earth-user@bottom:~/gtkpodmusic$

Let me know if I did it wrong.

-kb
Comment 12 Kevin Kubasik 2006-09-06 20:24:48 UTC
Could you just run this line like
google-earth-user@bottom:~/gtkpodmusic$
LD_LIBRARY_PATH=/usr/lib/beagle:$LD_LIBRARY_PATH MONO_PATH=/usr/lib/beagle mono
../test-directorywalker.exe | wc -l
  320


google-earth-user@bottom:~/gtkpodmusic$
LD_LIBRARY_PATH=/usr/lib/beagle:$LD_LIBRARY_PATH MONO_PATH=/usr/lib/beagle mono
../test-directorywalker.exe 

that? so we can see the output of the test program. Although on a separate note, that would mean our code is catching about 1/10th of the files in that directory....
Comment 13 Kent Borg 2006-09-06 20:28:15 UTC
So I try this:

  google-earth-user@bottom:~/gtkpodmusic$ LD_LIBRARY_PATH=/usr/lib/beagle:$LD_LIBRARY_PATH MONO_PATH=/usr/lib/beagle mono ../test-directorywalker.exe > /tmp/318_music_list.txt

And I will attach the output...


-kb
Comment 14 Kent Borg 2006-09-06 20:29:51 UTC
Created attachment 72337 [details]
Output from test-directorywalker.exe
Comment 15 Joe Shaw 2006-09-06 20:33:28 UTC
You did it absolutely right, and there must be a bug in our directorywalker code.

Can you also attach the output of the find command, sans the "| wc -l" ?
Comment 16 Kent Borg 2006-09-06 20:37:54 UTC
Created attachment 72340 [details]
complete list of music files

google-earth-user@bottom:~/gtkpodmusic$ find . -maxdepth 1 -type f > /tmp/3220_music_list.txt
Comment 17 Joe Shaw 2006-09-06 21:18:02 UTC
Created attachment 72344 [details]
Updated test program, this time precompiled

I've attached a precompiled test program, so you don't need to bother with the mcs step.

Can you please run it and attach the output?  Also the results of piping it through "grep ^got | wc -l" would be helpful.
Comment 18 Kent Borg 2006-09-06 21:23:54 UTC
google-earth-user@bottom:~/gtkpodmusic$ LD_LIBRARY_PATH=/usr/lib/beagle:$LD_LIBRARY_PATH MONO_PATH=/usr/lib/beagle mono ../test-directorywalker.exe

Unhandled Exception: System.IO.FileNotFoundException: No such file or directory ----> Mono.Unix.UnixIOException: No such file or directory
in <0x00013> Mono.Unix.UnixMarshal:ThrowExceptionForLastError ()
in <0x00063> Beagle.Util.DirectoryWalker2:readdir (IntPtr dir, System.Text.StringBuilder buffer)
in <0x0002f> Beagle.Util.DirectoryWalker2+FileEnumerator:MoveNext ()
in <0x000ec> X:Main ()


Did I run it the wrong way?  (Did you build it right for me to use?)

-kb


Comment 19 Joe Shaw 2006-09-06 21:26:42 UTC
It's supposed to do that, but it should output more data too.  Lemme update the test program just in case.
Comment 20 Joe Shaw 2006-09-06 21:27:20 UTC
Created attachment 72348 [details]
Try this one, precompiled
Comment 21 Kent Borg 2006-09-06 21:32:07 UTC
Created attachment 72350 [details]
Output from latest test-directorywalker.exe

I don't know what the output means, but it is very interesting--I bet you have it cornered once you see this.  

-kb, the Kent who can't spell "output".


google-earth-user@bottom:~/gtkpodmusic$ LD_LIBRARY_PATH=/usr/lib/beagle:$LD_LIBRARY_PATH MONO_PATH=/usr/lib/beagle mono ../test-directorywalker.exe > /tmp/test-directory-outout.txt

Unhandled Exception: System.IO.FileNotFoundException: No such file or directory ----> Mono.Unix.UnixIOException: No such file or directory
in <0x00013> Mono.Unix.UnixMarshal:ThrowExceptionForLastError ()
in <0x0007b> Beagle.Util.DirectoryWalker2:readdir (IntPtr dir, System.Text.StringBuilder buffer)
in <0x0002f> Beagle.Util.DirectoryWalker2+FileEnumerator:MoveNext ()
in <0x000ec> X:Main ()
Comment 22 Joe Shaw 2006-09-06 21:42:57 UTC
Yeah, that is very, very interesting.  Thanks for the info, I'll dig into it.
Comment 23 Joe Shaw 2006-09-06 21:53:59 UTC
Created attachment 72352 [details]
Hopefully a test program which fixes it

Can you try this one?  Also precompiled.
Comment 24 Kent Borg 2006-09-06 21:57:47 UTC
Created attachment 72353 [details]
Output from latest test-directorywalker.exe

Sounds like you have the fix.  'fess up, what was wrong?

-kb

google-earth-user@bottom:~/gtkpodmusic$ LD_LIBRARY_PATH=/usr/lib/beagle:$LD_LIBRARY_PATH MONO_PATH=/usr/lib/beagle mono ../test-directorywalker.exe > /tmp/test-directory-output2.txt
Comment 25 Joe Shaw 2006-09-06 22:06:37 UTC
Apparently mono was transparently resizing the internal buffer we were using to get the file names to something pretty small.  I'm not entirely sure why, but adding a call to StringBuilder.EnsureCapacity() before we used it fixed it.

The reason why I never saw it is because I only ever tried it with short filenames: up to 6 characters in length.  You probably would only ever see this if you had some extremely long filenames.

I just checked in the fix to CVS.  Thanks a ton for your help tracking this down!
Comment 26 Joe Shaw 2006-09-06 22:07:59 UTC
*** Bug 341841 has been marked as a duplicate of this bug. ***
Comment 27 Kent Borg 2006-09-06 22:18:23 UTC
> I just checked in the fix to CVS.  Thanks a ton for your help tracking this
> down!

Thanks for the quick fix.

-kb, the Kent who hopes this might be the kind of bug that would arrive in a bug fix in Ubuntu.
Comment 28 Kevin Kubasik 2006-09-07 02:04:52 UTC
Joe: Maybe this should make us do a micro-release (like 0.2.9.1) just for packagers since this is a pretty big showstopper for a lot of people?
Comment 29 Joe Shaw 2006-09-07 03:13:23 UTC
I'd prefer to push through with another few bug fixes and do a 0.2.10 in a week or two.  It's only affected 2 or 3 people (at least, who have reported it) from what I can tell.
Comment 30 Joe Shaw 2006-09-07 03:14:11 UTC
(btw, the reason being that a 0.2.9.1 release isn't really any less work than a 0.2.10 release.)
Comment 31 Kevin Kubasik 2006-09-07 03:31:48 UTC
Not an issue, sounds good, just wanted to put it out there that we should get it out pretty soon. Any specific things on that bug list that need help?
Comment 32 Serge Gavrilov 2006-09-10 00:08:10 UTC
Would you be so kind to put the link to the patch here?