After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 169678 - Beagle can't chew on filenames containing #
Beagle can't chew on filenames containing #
Status: RESOLVED FIXED
Product: beagle
Classification: Other
Component: General
0.0.x
Other Linux
: Normal normal
: ---
Assigned To: Beagle Bugs
Beagle Bugs
Depends on:
Blocks:
 
 
Reported: 2005-03-09 01:18 UTC by Daniel Drake
Modified: 2005-03-09 17:42 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Solution (3.53 KB, patch)
2005-03-09 16:45 UTC, Daniel Drake
none Details | Review

Description Daniel Drake 2005-03-09 01:18:23 UTC
Version details: cvs

Looks like our little dog has some trouble with these files and spits out some
exceptions. To reproduce, just add some content to a file called "a#b".

Bugzilla needs an EVIL keyword for this bug :P

So here's what happens:

We input a filename into a Uri, relying on our UriFu magic to convert # into %23
This is where the oddness starts. Take the following code and its output:

		Uri uri = new Uri("file:///home/dsd/beagle-index/a%23b", true);
		Console.WriteLine(uri);
		Console.WriteLine(uri.LocalPath);

This outputs:
                file:///home/dsd/beagle-index/a%23b
                /home/dsd/beagle-index/a#b

This is different behaviour from other characters (e.g. @) where you'd get
something like:

		Uri uri = new Uri("file:///home/dsd/beagle-index/a%40b", true);
		Console.WriteLine(uri);
		Console.WriteLine(uri.LocalPath);

                file:///home/dsd/beagle-index/a@b
                /home/dsd/beagle-index/a@b

Notice how neither are %escaped for @ whereas ToString is %escaped for #.

Anyway, this all doesn't matter because we use uri.LocalPath pretty much
everywhere for files, which returns the correcly unescaped version.

But here's the next problem. When we serialize Uri's, we serialize them as
strings from their ToString value. So taking the Uri which gives a ToString of:
    file:///home/dsd/beagle-index/a%23b

When we come to deserialize this, we escape it again. (Yes, this is needed for
the more normal characters such as @)
In this case, after escaping, we get:
    file:///home/dsd/beagle-index/a%2523b

(i.e. the % has been interpreted as a normal character and escaped)
We now construct a Uri from this as part of the deserialization process. And a
Uri of this form gives these properties:
		Uri uri = new Uri("file:///home/dsd/beagle-index/a%2523b", true);
		Console.WriteLine(uri);
		Console.WriteLine(uri.LocalPath);

                file:///home/dsd/beagle-index/a%23b
                /home/dsd/beagle-index/a%23b

And this isn't what we want at all, since even our localpath is wrong, so we get
FileNotFound exceptions, etc.

One approach I took to solving this is simply not %escaping the # character.
This works to an extent. The only problem is that the .LocalPath property of
something like that strips off the # and everything after it:

		Uri uri = new Uri("file:///home/dsd/beagle-index/a#b", true);
		Console.WriteLine(uri);
		Console.WriteLine(uri.LocalPath);

                file:///home/dsd/beagle-index/a#b
                /home/dsd/beagle-index/a

So I do a quick hack to make every user of .LocalPath use
.ToString().Substring(7) which seems to work. But then we meet another
exception: Can't get MIME type. Turns out that gnome-vfs won't take
filenames/URI's with a # in - they have to be escaped to %23. Sigh.

I'm not really sure how to solve this. There are a few obvious solutions but I
can't think of any that aren't hack-like or are obviously correct. Any suggestions?
Comment 1 Daniel Drake 2005-03-09 16:45:57 UTC
Created attachment 38463 [details] [review]
Solution

Ok, I thought about it some more. Decided that fixing up the URI's before
serialization would be most sensible. Dropped a function into UriFu, fixed the
XML serialization routines to use it, job done.

Not.

Ran into further problems with the Lucene glue's Document <--> Uri handling.
Ran into further problems with our binary serialization (for sending results to
client)

So while its not as clean as I had hoped it would be, this is the best solution
I've come up with. Any comments? OK to commit?
Comment 2 Jon Trowbridge 2005-03-09 17:11:58 UTC
Looks good... they can't all be as clean as we'd like.  Feel free to commit.
Comment 3 Daniel Drake 2005-03-09 17:42:43 UTC
In cvs.