GNOME Bugzilla – Bug 169678
Beagle can't chew on filenames containing #
Last modified: 2005-03-09 17:42:43 UTC
Version details: cvs Looks like our little dog has some trouble with these files and spits out some exceptions. To reproduce, just add some content to a file called "a#b". Bugzilla needs an EVIL keyword for this bug :P So here's what happens: We input a filename into a Uri, relying on our UriFu magic to convert # into %23 This is where the oddness starts. Take the following code and its output: Uri uri = new Uri("file:///home/dsd/beagle-index/a%23b", true); Console.WriteLine(uri); Console.WriteLine(uri.LocalPath); This outputs: file:///home/dsd/beagle-index/a%23b /home/dsd/beagle-index/a#b This is different behaviour from other characters (e.g. @) where you'd get something like: Uri uri = new Uri("file:///home/dsd/beagle-index/a%40b", true); Console.WriteLine(uri); Console.WriteLine(uri.LocalPath); file:///home/dsd/beagle-index/a@b /home/dsd/beagle-index/a@b Notice how neither are %escaped for @ whereas ToString is %escaped for #. Anyway, this all doesn't matter because we use uri.LocalPath pretty much everywhere for files, which returns the correcly unescaped version. But here's the next problem. When we serialize Uri's, we serialize them as strings from their ToString value. So taking the Uri which gives a ToString of: file:///home/dsd/beagle-index/a%23b When we come to deserialize this, we escape it again. (Yes, this is needed for the more normal characters such as @) In this case, after escaping, we get: file:///home/dsd/beagle-index/a%2523b (i.e. the % has been interpreted as a normal character and escaped) We now construct a Uri from this as part of the deserialization process. And a Uri of this form gives these properties: Uri uri = new Uri("file:///home/dsd/beagle-index/a%2523b", true); Console.WriteLine(uri); Console.WriteLine(uri.LocalPath); file:///home/dsd/beagle-index/a%23b /home/dsd/beagle-index/a%23b And this isn't what we want at all, since even our localpath is wrong, so we get FileNotFound exceptions, etc. One approach I took to solving this is simply not %escaping the # character. This works to an extent. The only problem is that the .LocalPath property of something like that strips off the # and everything after it: Uri uri = new Uri("file:///home/dsd/beagle-index/a#b", true); Console.WriteLine(uri); Console.WriteLine(uri.LocalPath); file:///home/dsd/beagle-index/a#b /home/dsd/beagle-index/a So I do a quick hack to make every user of .LocalPath use .ToString().Substring(7) which seems to work. But then we meet another exception: Can't get MIME type. Turns out that gnome-vfs won't take filenames/URI's with a # in - they have to be escaped to %23. Sigh. I'm not really sure how to solve this. There are a few obvious solutions but I can't think of any that aren't hack-like or are obviously correct. Any suggestions?
Created attachment 38463 [details] [review] Solution Ok, I thought about it some more. Decided that fixing up the URI's before serialization would be most sensible. Dropped a function into UriFu, fixed the XML serialization routines to use it, job done. Not. Ran into further problems with the Lucene glue's Document <--> Uri handling. Ran into further problems with our binary serialization (for sending results to client) So while its not as clean as I had hoped it would be, this is the best solution I've come up with. Any comments? OK to commit?
Looks good... they can't all be as clean as we'd like. Feel free to commit.
In cvs.