GNOME Bugzilla – Bug 569460
Unescape special charcters in filenames when doing (Flickr, Picasa Web, Zip, ...) export
Last modified: 2009-08-03 19:59:45 UTC
I'm right now exporting some of my pictures to Flickr. I give all my pictures good and useable filenames, like that one: [2008-11-12--10.40.37] (IMG_0017) Cédric, Photoshooting, Petra, Fotograf.jpg The picture name, as it appears on Flickr at http://www.flickr.com/photos/alexs77/3233306805/, is %5B2008-11-12--10.40.37%5D%20(IMG_0017)%20C%C3%A9dric%2C%20Photoshooting%2C%20Petra%2C%20Fotograf.jpg As you can see, the special characters (eg. [, ], ",", é, " ", ...) haven't been decoded by Flickr. This makes the filenames on Flickr a bit unwieldy (even more so than the original, nice'n'good filenames *G*). Other information:
This also happens when exporting to Picasa Web. http://picasaweb.google.com/Fam.Skwar/Bis2003#5296359208335347250 Dateiname: Sandra%201,%20Wohnung%20Wuppertal-Ronsdorf.jpg
This also happens when doing exports to Zip: --($ ~)-- unzip -l f-spot_export.zip Archive: f-spot_export.zip Length Date Time Name -------- ---- ---- ---- 211020 01-28-09 16:21 %5B2002-12-24%2016-56-00%5D%20(094-041_002_2)%20Sandra,%20Weihnachten%202002,%20Wohnung%20Gelsenkirchen.jpg 249949 01-28-09 16:21 %5B2002-12-24%2016-55-00%5D%20(094-041_001_1)%20Sandra,%20Weihnachten%202002,%20Wohnung%20Gelsenkirchen.jpg 126418 01-28-09 16:21 %5B2002-08-25%2018-40-36%5D%20(P0003927)%20Wohnung%20Gelsenkirchen.jpg -------- ------- 587387 3 files And I suppose it'll also happen with the other export types (but I haven't checked yet, to be honest). So that seems to be a general breakage in F-Spot. It wasn't always that broken, though. Some time ago, after 2008-07-17, I did an export to Picasaweb -> http://picasaweb.google.com/lh/photo/IrpiISfd9duzjN36j_YWpQ?feat=directlink. As you can see there, the filename isn't quite as broken. It's not good, but only mildly defective: Dateiname: [2008-06-17--14.43.00] (525477.11a) Cassandra%2C Schwimmreifen%2C Schwimmt%2C Im Pool%2C Pool%2C Unterwasserkamera%2C Sommerurlaub 2008%2C El Ràfol d'Almúnia%2C Urbanización L'Almúnia%2C Pego%2C Spanien%2C {2.4 MB}.jpg Here you see, that only the "," hasn't been decoded from %2C to ",". Spaces and [ and ] and other characters were correctly decoded. And even earlier (see http://picasaweb.google.com/lh/photo/67MHo8ifiCbRmRJK6kinJQ?feat=directlink), no breakage at all is to be found in the filename: [2007-10-30 13-34-11] (cimg4253) Cédric, In Hochstuhl, Esstisch, Zu Hause, Winterthur {2,8MB}.jpg If you could tell me if there's a way to find out when a picture has been exported (or rather "imported") at Picasaweb, then I'd be happy to supply that information, as this would then allow you to figure out when things started to go wrong. Additionally, I'd like to mention that I wrote a program which uses the f-spot sqlite database file to create a web gallery. When doing so, I noticed that in the 0.3.x series of f-spot, the database format started to change - in the old days, the file name was stored completely unencoded in the database (and it was just the filename, and not URI like file://-thingie). Over time, the filename got more and more escaped in the database. Maybe that's related?
Created attachment 128298 [details] [review] Unescape the picture name I tested this with Zip Export, Flickr and Picasa and there it works reasonably well. There's still one issue, though - a string like C%C3%A9dric is decoded as something like C├Г┬йdric (Zip) or Cédric (Flickr). Both things are wrong - it should be decoded as Cédric. I guess the UnescapeString method in UriUtils needs to be made UTF-8 capable. How would that be done?
two notes: - the patch doesn't apply to CDExport extension - in GalleryRemote.cs, just apply the modification few lines below, at the gallery.AddItem call - remove all comments referencing to the bug - why in some cases you use a variable and sometimes not? if there is no multiple use of the string in the same method i'd not use a temporary var. - please add Changelog entry and bump addin version number in .addin.xml file to a new minor version don't know about the utf8 issue...
Created attachment 129244 [details] [review] Unescape the picture name (revised per comment #4) (In reply to comment #4) > two notes: > - the patch doesn't apply to CDExport extension Oh. It did, at the time when I wrote it - but no longer. Strange. Going to attach a revised patch. > - in GalleryRemote.cs, just apply the modification few lines below, at the > gallery.AddItem call Instead of in GalleryExport.cs or do both? And for what? For the "path" variable or for "filename"? > - remove all comments referencing to the bug Ok. > - why in some cases you use a variable and sometimes not? if there is no > multiple use of the string in the same method i'd not use a temporary var. If you prefer that, then I can change the patch to meet that, sure. I prefer to use temp vars, because it makes code a bit more readable - and also easier to understand, especially, if similar modules (like all those exporters) are compared. > - please add Changelog entry and bump addin version number in .addin.xml file > to a new minor version Yep. > don't know about the utf8 issue... Well, but that's also an important part :) Without it, filenames still look shitty. Why are the names encoded in the first place? Why not store the filename exactly the way its found on disk in the database? But is all of that really needed? In the database, you store the filename encoded. But you're of course not able to read the filename in its encoded form from disk (because there's no file called "/tmp/foo%20bar" on disk - on disk, there's "/tmp/foo bar"). So somewhere, f-spot is already doing some decoding of what's stored in DB -> real name on filesystem. Do you happen to know, where that's done?
(In reply to comment #5) > Instead of in GalleryExport.cs or do both? And for what? For the "path" > variable or for "filename"? in GalleryRemote, where you wrote the comment about "how to...". the method just below has a gallery.AddItem call, where you can add the escape to item.Name > > - please add Changelog entry and bump addin version number in .addin.xml file > > to a new minor version > > Yep. please note, extension has own ChangeLog (extensions/Exporters/ChangeLog) > Why are the names encoded in the first place? Why not store the filename > exactly the way its found on disk in the database? because it's a uri. it is correct to store them encoded.
Created attachment 129254 [details] [review] Unescape the picture name (revised again per comment #6) (In reply to comment #6) > (In reply to comment #5) > > Instead of in GalleryExport.cs or do both? And for what? For the "path" > > variable or for "filename"? > > in GalleryRemote, where you wrote the comment about "how to...". the method > just below has a gallery.AddItem call, where you can add the escape to > item.Name Ah, okay. Thanks. Understood. > > > - please add Changelog entry and bump addin version number in .addin.xml file > > > to a new minor version > > > > Yep. > > please note, extension has own ChangeLog (extensions/Exporters/ChangeLog) Oh. Understood. > > Why are the names encoded in the first place? Why not store the filename > > exactly the way its found on disk in the database? > > because it's a uri. it is correct to store them encoded. Okay. And by using URIs, it would, at least in theory, also be possible to use http hosted files (or whatever GIO supports). Okay. You don't happen to know where f-spot as of right now decodes the URI? Or doesn't it do that by itself?
A more elegant fix: http://gitorious.org/f-spot/mainline/merge_requests/1087
hi. we've pushed a different patch in git master now. Seems to solve all the issues you rised, with few lines of code :) Please reopen the bug if you still experience problems. thanks! commit 5127d5dc3d648534221d5148ad94d2233451d1f7 Author: Anton Keks <anton@azib.net> Date: Sun Aug 2 00:15:51 2009 +0300 Photo.Name should always be unescaped. Removing escaping from InfoBox.