GNOME Bugzilla – Bug 502302
Improve catalogs and categories (replace with tags)
Last modified: 2015-12-17 15:23:32 UTC
gthumb is mainly organized around the filesystem (e.g. images are organized in folders). I like this approach, because it allows me to easily browse my image collection, in exactly the same way I use a file manager for regular files. The major weakness of this system is that every image can exist in only one folder. But we also have "Libraries and catalogs" and "Categories" to organize images without this problem. Libraries and catalogs Libraries and catalogs allow the user to organize images into a "virtual" filesystem. gthumb provides a very intuitive interface to switch between folders and catalogs and using this feature is no more complicated than regular folders. I consider this a very powerful feature to organize images. Pro: * Hierarchical structure is very flexible. * Browsing is easy and works exactly the same as regular folders. Contra: * Searching is impossible (with the exception of viewing the contents of one catalog): bug 313811 * Changing (or seeing) the catalogs for one image is impossible (without a lot of work): bug 452845 The information for the catalogs is stored in the users home directory, separate from the images themselves. This makes moving a file problematic (e.g. the catalog is not updated automatically, bug 142930). Changing the catalogs (renaming, moving, removal...) is very easy. Categories The user can also assign categories to individual images by means of an additional dialog. The user interface is a little more difficult to use (compared to folders and/or catalogs). Pro: * Searching is possible * Changing (or seeing) the categories assigned to one image is easy Contra: * Flat list only: bug 149709 * Browsing is impossible (without creating a search first): bug 313822 The information of the categories is stored with the images (along with the comments in the ".comments" directory). This makes updating all the categories more difficult (e.g. removing does not update all images) or even impossible (renaming). Moving an entire directory with images preserves the categories automatically. Proposal: Tags I propose to create a new "tag" system that combines the strengths of both catalogs and categories. We keep the hierarchical structure and the browsing interface of the catalogs, and incorporate the searching capabilities and the way how tags are assigned/viewed from the categories. To make this work, we'll need to be able to retrieve all tags for one image (for providing a viewing experience similar to the categories). However, with the current catalog system, this requires reading the entire catalog store for each image. This is doable but is probably very inefficient (I didn't test that). I think this problem can be avoid by using an sqlite database to store the tags (along with other data like comments and metadata). This will give us all the advantages of a centralized storage system (think catalogs) and quick access to single image data (think categories). The only disadvantage I can think of is that we'll lose the possibility to store data (comments, categories,..) along with to the images (think removable media). But I believe this is negligible compared to the benefits. The comment files are lost anyway after external modifications of the directory structure (with the exception of moving an entire directory to a new location). But because the tags (and other data) are still in the database, we could offer to search for the new location of the image (for images moved to a new location) or just keep the data (for images on removable media). See bug 142931 and bug 142932.
Hi Jef! It'll take me a while to digest everything you've said. I don't use the catalog or categories features at all, personally. However, much of this ties in closely with the comments system, which I have been thinking about. I've added XMP-reading support to trunk, and I think it makes sense to ditch the xml comments files and stick the comments (and geographical location stuff) in xmp fields instead - people see to want shareable metadata, unsurprisingly. The category tags should go in XMP fields too. (The exempi library might have to be made mandatory in that case). The XMP scheme has nice fields for this, more so than exif. We could then use your sqlite database (or whatever) as the fast cache for all of this data (date-stamping everything to detect out-of-date info). The database would be the primary location for this data for read-only files. It would be the secondary location for this data for read/write files. We'd have to give some thought about when exactly the database needs to be checked for freshness. You'd want to check file mtime every time the database returned a reference to a file, of course - but what about files that have been changed outside of gthumb, and now they match a search criteria? How will the database know that? I guess that's a secondary issue. Furthermore... should be implementing our own data store, as you propose, or use something like tracker (http://live.gnome.org/Tracker)? Interestingly, the tracker web page says "Tracker trawls through your data and organises it so that it can be retrieved extremely quickly later on via simple searches. This organisation puts your data into categories so that application like photo managers and music players can instantly find relevant content automatically." Seems topical... I know tracker has some level of xmp support, but we might need to patch it to make it work the way we want. Not sure about that. Anyway, the comments, categories, and catalog systems are all kind of crufty and could definitely use improvement! Beware, though, that these changes would reach deep into gthumb... a lot of code would be touched... - Mike
I've just started using gThumb as the best option I found for image management under linux having moved from using QPict and iPhoto on OS X. Using Tracker would be great - having been able to use Spotlight on OS X to create smart folders of "All images taken with 300mm lens" or "All outdoor portraits" was a wonderful feature. If using tracker would give that sort of power to gThumb with the minimum pain then that is ideal. Of course, metadata is only as useful as what you put in it and so the ability to read and write XMP data and (more importantly) search based on it would make gThumb a "proper" image management tool. Congrats on a great tool - it sounds like you have a firm grasp of what users really need from image management tools. Andrew
(In reply to comment #1) > However, much of this ties in closely with the comments system, which I have > been thinking about. I've added XMP-reading support to trunk, and I think it > makes sense to ditch the xml comments files and stick the comments (and > geographical location stuff) in xmp fields instead - people see to want > shareable metadata, unsurprisingly. The category tags should go in XMP fields > too. (The exempi library might have to be made mandatory in that case). The XMP > scheme has nice fields for this, more so than exif. I also believe the possibility to store the metadata (comments, keywords,...) inside the image is an very useful feature. It would certainly solve many of the problems related to external modifications (e.g. moving files with nautilus) and improve the interoperability with other application (e.g. tags can be shared with eog, f-spot,...). But that is not directly related to the discussion in this bugreport. Even if the metadata is stored in the files itself, we'll still need some "cache" for fast searching. And as far as I know, not all image types support storing (xmp) metadata. So, in that case we also need the "cache" as a fallback. > We could then use your sqlite database (or whatever) as the fast cache for all > of this data (date-stamping everything to detect out-of-date info). The > database would be the primary location for this data for read-only files. It > would be the secondary location for this data for read/write files. I wouldn't make a distinction between read-only and read/write files. I think it is better to always use the cache and use the file only if the cache is out-of-date. I believe this is also what we are already doing for the thumbnails now. > We'd have to give some thought about when exactly the database needs to be > checked for freshness. You'd want to check file mtime every time the database > returned a reference to a file, of course - but what about files that have been > changed outside of gthumb, and now they match a search criteria? How will the > database know that? I guess that's a secondary issue. Synchronizing the database with external modifications will be difficult. But I'm not sure if that is a real problem. I already mentioned that gthumb has two different "modes" to organize/browse your images: "Folders" (the filesystem) and/or "Catalogs" (the internal library or database). I certainly don't want to change that situation. That means I wanted to keep the ability to search the filesystem and find those images with changes (e.g. like we already have in gthumb now). But if you are searching your library (e.g. the database), you won't notice the changes, but searching will be really fast. We already have the same problem with saved searches. Search results are cached and if you view the search again, you get the cached results. The results are only updated if you execute the search again, which is slow. But if we use a database, we can perform the search on-the-fly and it will automatically pickup changes made within gthumb. Have you ever tried the new firefox 3.0 bookmark system? It is very powerful (hierarchical structure, saved searches,...) and works really great. If you map urls to images and bookmarks to the library, it becomes very similar to the gthumb library mode. Also in firefox, the object to which an url refers, can change (removed, updated,...), without a notification. But even if the bookmarks database is not updated, it is still very powerful. Keep in mind we are talking about two closely related items in this bug: (a) improving the library mode by using a database and (b) caching metadata in the same database. > Furthermore... should be implementing our own data store, as you propose, or > use something like tracker (http://live.gnome.org/Tracker)? Interestingly, the > tracker web page says "Tracker trawls through your data and organises it so > that it can be retrieved extremely quickly later on via simple searches. This > organisation puts your data into categories so that application like photo > managers and music players can instantly find relevant content automatically." > Seems topical... I know tracker has some level of xmp support, but we might > need to patch it to make it work the way we want. Not sure about that. I don't know much about tracker, but I have used its search tool a few times. I think it might be a solution for the metadata (and searching) because it does monitor files for changes. But I wouldn't use it for the library. For instance I only want to organize some of my photos in the library, but not all the images tracker can find on my disk. That was one of my problems I encountered when using the tracker search tool. Tracker is also quite new and maybe not stable yet. Is there for instance an API available? > Beware, though, that these changes would reach deep into gthumb... a lot of > code would be touched... I know :-)
Jef, OK, I understand a bit better now. Yes, it makes sense to have a fast database store both the comment metadata and the cataloging info. I guess the tracker idea is a bit premature. However, hopefully the database can be designed such that we can share metadata with tracker and beagle and whatever fancy data services pop up in the future. So... what sort of tables do we need? Something like this? 1. library definition table fields: library 2. catalog definition table fields: catalog, library (library can be NULL) 3. catalog membership table fields: catalog, uri 4. keyword members file data table: fields: uri, mtime, comment, locations, dates, etc The data needs to be designed so that we can easily extract it from the current data structures (xml files for comments, catalog files) and also map the data to relevant XMP tags. Anyway, to summarize, I seem to agree with everything you've said... Can you take take the lead on coding this? I do plan to look at the comments <> XMP relationship, which can be done in parallel with the database work. - Mike
Any news on this issue? I'm looking for a replacement to F-Spot, I think that gthumb would be a great choice, but I'm somewhat confused by the catalog/categories system and it isn't as comfortable to use. I think that implementing some of the ideas here would be really helpful.
*** Bug 452842 has been marked as a duplicate of this bug. ***
How about user extended file-system attributes? Many years this future is presented in the kernel, but not widely used. With xa there are no problems with renaming/moving/changing files, because xa are preserved. But AFAIK user_xattr is not enabled by default on many distributions, but kernel support is usualy available (and adding user_xattr to the fstab is sufficient). As wikipedia says, Beagle uses xa to store some file-related data. freedesktop.org has some recommendations about xa usage: http://www.freedesktop.org/wiki/CommonExtendedAttributes
Marking as obsolete, as this was reported for a now-unsupported version and no recent activity has occurred. Please feel free to reopen this bug report if the problem still occurs with a current version of gThumb.