GNOME Bugzilla – Bug 660784
Media art detection enhancements
Last modified: 2013-03-14 16:47:22 UTC
Here is a patch set which adds support for detecting video posters as well as album art. I refactored the album art detection heuristic completely so that it will be much clearer to extend etc. in the future. There's a couple of more intrusive changes I'd like to make following on from this, but I'll discuss these on the ML.
Created attachment 198076 [details] [review] tracker-extract: Rework albumart API to allow multiple media types * tracker-extract-gstreamer, tracker-extract-mp3: Update to use new API * tracker-extract-mp3: Pass URI instead of filename for consistency
Created attachment 198077 [details] [review] tracker-extract: Extract tracker_albumart_find_external() This is code from the albumart_heuristic() function but extracted to be testable and more flexible.
Created attachment 198078 [details] [review] tracker-extract: Rewrite tracker_albumart_find_external() The new code reads through the directory in one pass and allows more detailed logic to run on the results, also taking into account the type of media file.
Created attachment 198079 [details] [review] tracker-extract: Minor albumart code cleanups
Created attachment 198209 [details] [review] tracker-extract: Rewrite tracker_albumart_find_external() The new code reads through the directory in one pass and allows more detailed logic to run on the results, also taking into account the type of media file.
(return NULL correctly instead of an undefined pointer when no album art found)
Hi Sam, can you push these into a branch for easier review for me please? Thanks.
Pushed to branch 'media-art-enhancements' I've also gone ahead and done the rename albumart to media-art on the branch
Here's the test case I started writing, if that helps you review: http://afuera.me.uk/junk/tracker-media-art-test.c As mentioned on tracker-list I'm not entirely sure how to integrate it
Hi Sam, I have some comments based on review of your branch: - There are some whitespace issues if you git diff --check origin/master (after rebase) - There is a FIXME in there, is that planned to be fixed before merging anything? - Some of the renames were missed, like: albumart_request_download(), albumart_queue_cb()... - Given we're changing API here, I would suggest we move tracker_media_art_remove_add() --> tracker_media_art_queue_remove() tracker_media_art_check_clean() --> tracker_media_art_queue_empty() - The name of this function seems a bit ambiguous to me tracker_albumart_find_external(). - This part of the patch looks like it breaks current behaviour, we use "" if there is no artist or title/album before, now we don't AFAICS: """ - artist_strdown = artist_stripped ? g_utf8_strdown (artist_stripped, -1) : g_strdup (""); - album_strdown = album_stripped ? g_utf8_strdown (album_stripped, -1) : g_strdup (""); + if (artist) { + temp = tracker_albumart_strip_invalid_entities (artist); + artist_strdown = g_utf8_strdown (temp, -1); + } - for (retval = FALSE, i = 0; i < 2 && !retval; i++) { + temp = tracker_albumart_strip_invalid_entities (title); """ - We always define variables at the top of the code block in C in the Tracker project (in tracker_media_art_find_external()): const gchar *name; gchar *name_utf8, *name_strdown; - Can you explain in a comment why the [2] makes more sense than [0] or [1] here? if (g_list_length (image_list[2]) > 0) { /* Use the obvious choices, if available */ art_file_name = g_strdup (image_list[2]->data); } else if (g_list_length (image_list[1]) > 0) { art_file_name = g_strdup (image_list[1]->data); - Commit a7e2396042b6007f0aaf3697cd3da4de1b23b302 is waay too complicated to figure out what real changes you're making. Philip also said something similar. As such, I can't check if you have broken existing behaviour in detail. Can you make sure you make commits much smaller in the future and split this commit up perhaps? It's hard to follow and the comment in the commit doesn't really help me understand the changes. The changes are also not just simple code moving to make things more modular - they change ordering/behaviour as far as I can see. - Coding style is not upheld here unfortunately (see http://live.gnome.org/Tracker/Documentation/CodingStyle): """ +typedef struct { + gchar *uri; + TrackerMediaArtType type; + gchar *artist_strdown; + gchar *title_strdown; +} TrackerMediaArtSearch; """ - There is no need for extra parenthesis here: """ + if ((*dirname) == NULL) { """ - In classify_image_file() returning the value associated with the array filled values for image_list is not well defined or commented anywhere. I would like to see these values/information commented on and/or properly encapsulated with functions to make it clear what the purpose and priority of these values is. - I would create the search object AFTER the dir part to avoid freeing it in error conditions here: """ search = tracker_media_art_search_new (uri, type, artist, title); dir = get_parent_g_dir (uri, &dirname, &error); """ - Given the extent of these changes, I think we need test cases to make sure we're not breaking existing behaviour. Did you plan to add any of these? -- As far as I can see this branch would only be suitable for master (not even 0.12) given the API breaks. In the future, I think it makes sense to fix issues and do feature development outside major renames like this. It's quite hard to see what actual changes have been made without reviewing the git log independently of the whole diff. Thanks for your effort though! :)
Thanks Martyn; I should be able to fix things up in the next few days but just to check - would it be easier if I backed out the mass rename and put it as a separate issue? I presume the patch still wouldn't work for 0.12 because albumart API gets broken in other ways too. Wasn't aware of git diff --check, that's going to save me some time in future :)
(In reply to comment #11) > Thanks Martyn; I should be able to fix things up in the next few days but just > to check - would it be easier if I backed out the mass rename and put it as a > separate issue? I presume the patch still wouldn't work for 0.12 because I don't think that's necessary here. You at least did that in a separate commit which is good enough here. It's just harder when diffing the entire branch to master (which is mostly how reviews are done because single commits often are fixed in later commits during a branch). > albumart API gets broken in other ways too. Actually, I rebased on top of master to do the diff so I have no reason to believe it won't apply to 0.12. But given the API breaks, we can't do that anyway yea. > Wasn't aware of git diff --check, that's going to save me some time in future > :) No problem ;)
I've rebased the branch against master and fixed most of what you mention. A couple of responses: - I missed out albumart_request_download() and albumart_queue_cb() from the rename because they do just download album art, and also it's a fairly Nokia-specific piece of code which presumably we want to replace with a more generic system eventually. I'm happy to change them if you want to maintain consistency :) - I thought it might be clearer to do: tracker_media_art_remove_add() --> tracker_media_art_queue_removal() tracker_media_art_check_clean() --> tracker_media_art_execute_queue() What do you think of this colour for the shed? - I did tracker_media_art_find_external() => tracker_media_art_process_external_images() - You're right that I was generating filenames wrongly (doing " "-title for videos instead of title-" " amongst other things), that's fixed. I rewrote tracker_media_art_get_path() a little too because it was getting messy. - The image priorities are an enum now, no idea why I thought it was a good idea to use integers. - I'm don't know which commit you mean (I guess you rebased the branch so the SHA1 is different :), there is one which I consider a rewrite of the album art detection heuristic rather than any incremental change, so I'm not sure how much it could have been broken up into smaller units. I also made some additional changes/fixes: - (since I went API-break crazy already) tracker_guarantee_title_from_file() now returns the guaranteed title to the caller. Otherwise video posters only get stored for videos which have the title in the tags, which is not very many of them. - A couple of tiny cleanups in the GStreamer extractor I'm all in favour of writing some tests for it (in addition to the one above), what would you like to see? I'm not sure how we could usefully fit this into the functional tests, but some further unit tests for tracker_media_art_process_external_images(), maybe tests for tracker_media_art_get_path() - would be fine. In terms of moving tracker-media-art.c to libtracker-extract, would there be any real issue if we used GVolumeMonitor directly to check if a path is on a removable device rather than using TrackerStorage? I'm unclear as to the purpose of TrackerStorage so I can't tell if this would subvert its intentions or not.
(In reply to comment #13) > I've rebased the branch against master and fixed most of what you mention. > > A couple of responses: > > - I missed out albumart_request_download() and albumart_queue_cb() from the > rename because they do just download album art, and also it's a fairly > Nokia-specific piece of code which presumably we want to replace with a > more generic system eventually. I'm happy to change them if you want to > maintain consistency :) Yes please. > - I thought it might be clearer to do: > tracker_media_art_remove_add() --> tracker_media_art_queue_removal() > tracker_media_art_check_clean() --> tracker_media_art_execute_queue() > > What do you think of this colour for the shed? I am fairly specific about the colours I use for my shed :P I prefer to use tracker_media_art_queue_ prefix both APIs instead of using execute_queue() there. > - I did > tracker_media_art_find_external() => > tracker_media_art_process_external_images() Hmm, I had to look up what this function did, which is bad IMO. The issue for me here is, we use _images() when we process artist/title and return only 1 image. Sorry but can we change this to something more obvious like: tracker_media_art_find_by_artist_and_title (); Also, I noticed this calls tracker_media_art_search_{new|free}() and we don't give tracker_ prefixes to static (or non-external) function calls. > - You're right that I was generating filenames wrongly (doing " "-title for > videos instead of title-" " amongst other things), that's fixed. I rewrote > tracker_media_art_get_path() a little too because it was getting messy. Great. > - The image priorities are an enum now, no idea why I thought it was a good > idea to use integers. Thanks. Usually what you do when you just want to get something working ;) Just to note for the future: + IMAGE_MATCH_EXACT = 0, + IMAGE_MATCH_EXACT_SMALL = 1, + IMAGE_MATCH_SAME_DIRECTORY = 2, + IMAGE_MATCH_TYPE_COUNT You don't need to set values for any other than the first UNLESS they're not incremental of course. > - I'm don't know which commit you mean (I guess you rebased the branch so the > SHA1 is different :), there is one which I consider a rewrite of the album > art > detection heuristic rather than any incremental change, so I'm not sure how > much it could have been broken up into smaller units. Separate branches are really the best way here to avoid confusing the branch and what it's supposed to do with multiple fixes interlaced along the timeline. > I also made some additional changes/fixes: > > - (since I went API-break crazy already) tracker_guarantee_title_from_file() > now > returns the guaranteed title to the caller. Otherwise video posters only get > stored for videos which have the title in the tags, which is not very many of > them. Just one note about this: + gchar **p_new_value) Because the parameter is "p_new_value" and the docs use "new_value", gtkdoc will complain, you need to keep API parameter names consistent in docs and the actual function declarations. It's only a warning, so not critical. > - A couple of tiny cleanups in the GStreamer extractor > > I'm all in favour of writing some tests for it (in addition to the one above), > what would you like to see? I'm not sure how we could usefully fit this into > the functional tests, but some further unit tests for > tracker_media_art_process_external_images(), > maybe tests for tracker_media_art_get_path() - would be fine. Yes, they make sense. Did you need more time to add those before we merge to master then? > In terms of moving tracker-media-art.c to libtracker-extract, would there be > any real issue if we used GVolumeMonitor directly to check if a path is on a > removable device rather than using TrackerStorage? I'm unclear as to the > purpose of TrackerStorage so I can't tell if this would subvert its intentions > or not. TrackerStorage is complicated. It's been re-written several times and what GVolumeMonitor et al report as a removable device is not always the case. It depends on a number of things including if GVFS is installed as I recall. You *could* get away with copying the mount_add() function here: http://git.gnome.org/browse/tracker/tree/src/libtracker-miner/tracker-storage.c#n543 Most of the code in that function is debugging and general identification of each mount point. But I would be careful here, I usually prefer to re-use code like this and not-rework it given the problems it can cause. Also, if we re-use it we can be sure our method for identifying "removable devices" is at least consistent ;) -- Other than the unit tests which you say you could write, there are just a few small things there. Let me know when you're done with those and I can merge this. Thanks again Sam.
As discussed on IRC: - need to add support for storing media art resources as nmm:MediaArt, as is now done in 0.12 - should use a const char * for media art type instead of the enum Further, the time is not right to land a complete rewrite of this code (or anything else) - two things can be done 1. add the media-art-type parameter to the existing code, and possibly add support for detecting video art without doing any major rewriting 2. wait until the team decide to rework the album art code and start from this branch; OR, make the detection algorithm in this branch really really good so there's a reason to merge it sooner
Hi Sam, has there been any more work here since our comments? i.e. is there a branch I can go from here i.e. did the nmm:MediaArt suggest get done yet? Let me know, I think it makes sense to get this in before 0.14.0 if we can.
I've pushed the minor fixes mentioned above into media-art-enhancements-2 branch Regarding nmm:MediaArt - the way the media-art-detect branch works isn't really optimal, it works by just ignoring certain file name patterns. It would be better if the video/audio extractors could set nmm:MediaArt when they find such an image - but if the album art comes later in the extraction queue than the music/video files that will get overwritten again unless we somehow query the store on each image extraction (which of course we don't want). Looks like that's quite a hard problem to solve after all. For a simplistic solution the existing media-art-detect branch would work fine. As for tests - I have one, but it still requires that the media-art functions get moved to libtracker-extract. I'll do that if you like and duplicate tracker-storage.c into libtracker-extract, let me know what you want
The patches on this bug were from the media-art-enhancements branch which was merged for 0.13.0. I've deleted that branch from git now and obsoleted the patches. media-art-enhancements-2 contained some minor fixes from your review above, and it seems that got missed in the original merge :) The branch rebases cleanly on top of master, if you want to merge it now. Nothing else I want to add to this bug.
Thanks Sam, appreciate the effort here. I've merged this to master now! :) thanks. This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.