After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 657183 - Cue sheet parsing for GStreamer extractor
Cue sheet parsing for GStreamer extractor
Status: RESOLVED FIXED
Product: tracker
Classification: Core
Component: Extractor
git master
Other Linux
: Normal enhancement
: ---
Assigned To: tracker-extractor
Jamie McCracken
Depends on:
Blocks:
 
 
Reported: 2011-08-23 17:51 UTC by Sam Thursfield
Modified: 2011-10-03 10:34 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
tracker-extract-gstreamer: Remove 'needs_audio' flag (2.99 KB, patch)
2011-08-23 17:53 UTC, Sam Thursfield
none Details | Review
tracker-extract-gstreamer: Remove check for GST_VERSION > 0.10.20 (1.33 KB, patch)
2011-08-23 17:53 UTC, Sam Thursfield
none Details | Review
tracker-extract-gstreamer: Use gst_tag_list_insert() (1.62 KB, patch)
2011-08-23 17:53 UTC, Sam Thursfield
none Details | Review
tracker-extract-gstreamer: Don't pass file URL needlessly (9.14 KB, patch)
2011-08-23 17:54 UTC, Sam Thursfield
none Details | Review
tracker-extract-gstreamer: Rework album info reading (9.70 KB, patch)
2011-08-23 17:54 UTC, Sam Thursfield
none Details | Review
tracker-extract-gstreamer: Avoid duplicate nmm:Artist inserts (5.28 KB, patch)
2011-08-23 17:54 UTC, Sam Thursfield
none Details | Review
tracker-extract-gstreamer: Break up extract_metadata() into functions (27.97 KB, patch)
2011-08-23 17:54 UTC, Sam Thursfield
none Details | Review
tracker-extract-gstreamer: Factor out type setting code (3.03 KB, patch)
2011-08-23 17:54 UTC, Sam Thursfield
none Details | Review
tracker-extract-gstreamer: Pass tag list explicitly (10.28 KB, patch)
2011-08-23 17:54 UTC, Sam Thursfield
none Details | Review
libtracker-common: Add tracker_case_match_filename_without_extension() (5.12 KB, patch)
2011-08-23 17:54 UTC, Sam Thursfield
none Details | Review
tracker-extract: Add libcue-based CUE sheet parser (15.73 KB, patch)
2011-08-23 17:54 UTC, Sam Thursfield
none Details | Review
tracker-extract-gstreamer: Use ToC information (7.74 KB, patch)
2011-08-23 17:54 UTC, Sam Thursfield
none Details | Review
tracker-extract-gstreamer: Simplify tagcache behaviour (3.19 KB, patch)
2011-08-24 11:57 UTC, Sam Thursfield
none Details | Review
tracker-extract-gstreamer: Don't pass file URL needlessly (9.14 KB, patch)
2011-08-24 11:58 UTC, Sam Thursfield
none Details | Review
tracker-extract-gstreamer: Rework album info reading (9.72 KB, patch)
2011-08-24 11:58 UTC, Sam Thursfield
none Details | Review
tracker-extract-gstreamer: Avoid duplicate nmm:Artist inserts (5.29 KB, patch)
2011-08-24 11:58 UTC, Sam Thursfield
none Details | Review
tracker-extract-gstreamer: Break up extract_metadata() into functions (28.04 KB, patch)
2011-08-24 11:58 UTC, Sam Thursfield
none Details | Review
tracker-extract-gstreamer: Factor out type setting code (3.08 KB, patch)
2011-08-24 11:58 UTC, Sam Thursfield
none Details | Review
tracker-extract-gstreamer: Pass tag list explicitly (10.28 KB, patch)
2011-08-24 11:58 UTC, Sam Thursfield
none Details | Review
libtracker-common: Add tracker_case_match_filename_without_extension() (5.12 KB, patch)
2011-08-24 11:58 UTC, Sam Thursfield
none Details | Review
tracker-extract: Add libcue-based CUE sheet parser (15.73 KB, patch)
2011-08-24 11:58 UTC, Sam Thursfield
none Details | Review
tracker-extract-gstreamer: Use ToC information (7.80 KB, patch)
2011-08-24 11:58 UTC, Sam Thursfield
none Details | Review
tracker-extract: Add libcue-based CUE sheet parser (15.92 KB, patch)
2011-08-24 16:55 UTC, Sam Thursfield
none Details | Review
tracker-extract-gstreamer: Use ToC information (7.48 KB, patch)
2011-08-24 16:55 UTC, Sam Thursfield
none Details | Review

Description Sam Thursfield 2011-08-23 17:51:22 UTC
Hi!
Here's a patch set with the first draft of my cue sheet extraction code. In short:
  - it works for the GStreamer extractor only
  - adds a new optional dependency on libcue[1]. 
  - supports both external and embedded (as Vorbis comment) cuesheets

I took a guess how to model an audio file that contains multiple tracks: basically the file itself is stored as nfo:Audio only while each track is stored as nmm:MusicPiece. The nie:url of the tracks has a fragment identifier marking where the particular track starts in the audio. Let me know if this is a good approach or if there's a better way.

Another concern is that the time values aren't going to be sample-accurate; the offset specified in the track nie:url is a double seconds value (which isn't too bad) but the nie:duration is for some reason an int64 that holds seconds; for an audio player this is of no real use, and it seems a shame to require them to parse the cuesheet again just because we lack precision. I don't really have any ideas how to fix this (or any direct need to, it just worries me).

The patches are mostly refactoring the existing gstreamer extractor to prepare for the new code, only the last two actually add functionality. It adds an API which should be relatively compatible with that proposed in #540890.

I plan to add some functional tests, I'll push these in a feature branch at a later date since there'll be binary data etc.

Thanks!

[1] http://www.sourceforge.net/projects/libcue
Comment 1 Sam Thursfield 2011-08-23 17:53:52 UTC
Created attachment 194509 [details] [review]
tracker-extract-gstreamer: Remove 'needs_audio' flag

Refactor away this variable, it's not needed
Comment 2 Sam Thursfield 2011-08-23 17:53:55 UTC
Created attachment 194510 [details] [review]
tracker-extract-gstreamer: Remove check for GST_VERSION > 0.10.20

Tracker now requires at least 0.10.30 in any case.
Comment 3 Sam Thursfield 2011-08-23 17:53:58 UTC
Created attachment 194511 [details] [review]
tracker-extract-gstreamer: Use gst_tag_list_insert()

Remove add_tags() function which needlessly reimplements
gst_tag_list_insert().
Comment 4 Sam Thursfield 2011-08-23 17:54:01 UTC
Created attachment 194512 [details] [review]
tracker-extract-gstreamer: Don't pass file URL needlessly

Also some changes to avoid confusion:

* Rename 'uri' to 'file_url' when there are lots of URIs around

* Rename add_date_time_gst_tag() (which still uses the file URL
  to guarantee metadata) to add_date_time_gst_tag_with_mtime_fallback()
Comment 5 Sam Thursfield 2011-08-23 17:54:05 UTC
Created attachment 194513 [details] [review]
tracker-extract-gstreamer: Rework album info reading

* Use GST_TAG_ALBUM_ARTIST for album artist if available (otherwise fall back
  on performer/artist)

* Be consistant with variable names

* Remove unused 'scount' parameter
Comment 6 Sam Thursfield 2011-08-23 17:54:08 UTC
Created attachment 194514 [details] [review]
tracker-extract-gstreamer: Avoid duplicate nmm:Artist inserts

Maintain an internal list to avoid creating more than one INSERT per
nmm:Artist. Once CUE reading lands, a single file may contain 15 songs
by one artist and previously this would have resulted in 15 identical
INSERT statements in the preupdate.
Comment 7 Sam Thursfield 2011-08-23 17:54:12 UTC
Created attachment 194515 [details] [review]
tracker-extract-gstreamer: Break up extract_metadata() into functions
Comment 8 Sam Thursfield 2011-08-23 17:54:15 UTC
Created attachment 194516 [details] [review]
tracker-extract-gstreamer: Factor out type setting code
Comment 9 Sam Thursfield 2011-08-23 17:54:18 UTC
Created attachment 194517 [details] [review]
tracker-extract-gstreamer: Pass tag list explicitly

This allows smarter processing in future using information from cue
sheets and other sources.
Comment 10 Sam Thursfield 2011-08-23 17:54:21 UTC
Created attachment 194518 [details] [review]
libtracker-common: Add tracker_case_match_filename_without_extension()

This is used by the cue sheet parser to match audio files which are
listed with incorrect extensions.
Comment 11 Sam Thursfield 2011-08-23 17:54:25 UTC
Created attachment 194519 [details] [review]
tracker-extract: Add libcue-based CUE sheet parser

This is currently only usable by the GStreamer extractor due to the
use of GstTagList.
Comment 12 Sam Thursfield 2011-08-23 17:54:28 UTC
Created attachment 194520 [details] [review]
tracker-extract-gstreamer: Use ToC information

Fixes GB#657183
Comment 13 Sam Thursfield 2011-08-24 11:57:57 UTC
Created attachment 194577 [details] [review]
tracker-extract-gstreamer: Simplify tagcache behaviour

Avoid having a NULL tagcache, which allows us to use
gst_tag_list_insert() to merge new tags.
Comment 14 Sam Thursfield 2011-08-24 11:58:01 UTC
Created attachment 194578 [details] [review]
tracker-extract-gstreamer: Don't pass file URL needlessly

Also some changes to avoid confusion:

* Rename 'uri' to 'file_url' when there are lots of URIs around

* Rename add_date_time_gst_tag() (which still uses the file URL
  to guarantee metadata) to add_date_time_gst_tag_with_mtime_fallback()
Comment 15 Sam Thursfield 2011-08-24 11:58:09 UTC
Created attachment 194579 [details] [review]
tracker-extract-gstreamer: Rework album info reading

* Use GST_TAG_ALBUM_ARTIST for album artist if available (otherwise fall back
  on performer/artist)

* Be consistant with variable names

* Remove unused 'scount' parameter
Comment 16 Sam Thursfield 2011-08-24 11:58:12 UTC
Created attachment 194580 [details] [review]
tracker-extract-gstreamer: Avoid duplicate nmm:Artist inserts

Maintain an internal list to avoid creating more than one INSERT per
nmm:Artist. Once CUE reading lands, a single file may contain 15 songs
by one artist and previously this would have resulted in 15 identical
INSERT statements in the preupdate.
Comment 17 Sam Thursfield 2011-08-24 11:58:16 UTC
Created attachment 194581 [details] [review]
tracker-extract-gstreamer: Break up extract_metadata() into functions
Comment 18 Sam Thursfield 2011-08-24 11:58:19 UTC
Created attachment 194582 [details] [review]
tracker-extract-gstreamer: Factor out type setting code
Comment 19 Sam Thursfield 2011-08-24 11:58:22 UTC
Created attachment 194583 [details] [review]
tracker-extract-gstreamer: Pass tag list explicitly

This allows smarter processing in future using information from cue
sheets and other sources.
Comment 20 Sam Thursfield 2011-08-24 11:58:26 UTC
Created attachment 194584 [details] [review]
libtracker-common: Add tracker_case_match_filename_without_extension()

This is used by the cue sheet parser to match audio files which are
listed with incorrect extensions.
Comment 21 Sam Thursfield 2011-08-24 11:58:29 UTC
Created attachment 194585 [details] [review]
tracker-extract: Add libcue-based CUE sheet parser

This is currently only usable by the GStreamer extractor due to the
use of GstTagList.
Comment 22 Sam Thursfield 2011-08-24 11:58:33 UTC
Created attachment 194586 [details] [review]
tracker-extract-gstreamer: Use ToC information

Fixes GB#657183
Comment 23 Sam Thursfield 2011-08-24 16:55:33 UTC
Created attachment 194641 [details] [review]
tracker-extract: Add libcue-based CUE sheet parser

This is currently only usable by the GStreamer extractor due to the
use of GstTagList.
Comment 24 Sam Thursfield 2011-08-24 16:55:37 UTC
Created attachment 194642 [details] [review]
tracker-extract-gstreamer: Use ToC information

Fixes GB#657183
Comment 25 Sam Thursfield 2011-08-26 13:17:25 UTC
(Note these patches are also available in the cuesheets branch of Tracker git)
Comment 26 Sam Thursfield 2011-09-12 16:04:09 UTC
I've rebased this branch on top of 0.12 and pushed it as cuesheets-0.12

I have also found and fixed a couple of bugs
Comment 27 Sam Thursfield 2011-09-23 12:47:03 UTC
Branch is updated, hopefully this is the final version.

I submitted a bug to Nepomuk upstream about the nfo:containerStartTime property: 
https://sourceforge.net/apps/trac/oscaf/ticket/123
Comment 28 Martyn Russell 2011-10-03 10:34:22 UTC
This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.

This will be in 0.12.4.