After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 607548 - Parse video filenames to get extra metadata
Parse video filenames to get extra metadata
Status: RESOLVED OBSOLETE
Product: tracker
Classification: Core
Component: Extractor
git master
Other Linux
: Normal normal
: ---
Assigned To: tracker-extractor
Jamie McCracken
Depends on:
Blocks:
 
 
Reported: 2010-01-20 13:13 UTC by iain
Modified: 2021-05-26 22:26 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Parse the filename to attempt to get extra missing metadata (11.71 KB, patch)
2010-02-02 15:18 UTC, iain
reviewed Details | Review

Description iain 2010-01-20 13:13:11 UTC
Video files rarely have metadata such as Series name, season number, year, etc, stored in their files. Instead they have it stored in the filename. Attached is a patch to the GStreamer parser that will parse the filename to attempt to get extra metadata. It can extract Movie Name and Year from movie filenames, and Series name, season number, episode number and episode title from TV filenames.

There are 2 FIXMEs in the code. The first is that currently there is no way to store the TV series name in the Tracker ontology. Ivan said this is going to be fixed soon in some ontology changes. Also it doesn't store nmm:isSeries yet.

In the future, if there are video specific extractors, it would be useful to separate this code out into a separate file to be shared.
Comment 1 Martyn Russell 2010-02-02 15:07:49 UTC
(In reply to comment #0)
> Video files rarely have metadata such as Series name, season number, year, etc,
> stored in their files. Instead they have it stored in the filename. Attached is
> a patch to the GStreamer parser that will parse the filename to attempt to get
> extra metadata. 

Hi Iain, did you forget to attach the patch?
Comment 2 iain 2010-02-02 15:18:48 UTC
Created attachment 152838 [details] [review]
Parse the filename to attempt to get extra missing metadata
Comment 3 iain 2010-02-02 15:19:11 UTC
Whoops, yes, I did :)
Comment 4 Mikael Ottela 2010-02-02 16:33:53 UTC
We used to do guessing in the past with varying levels of success. There are many problems with this approach though, mostly related to operations such as renames and moves which do not trigger a complete new re-extraction of the file. There is also no way of knowing whether information for some property comes from a good source or whether it was a guess when trying to merge existing information with results of a new extraction.

Just to give some examples,

applications that rely on tracker providing some title for each valid video will be showing the old filename as the title even after the user has renamed his files to something that makes more sense. This is very unintuitive as the user doesn't understand why the title in his video player is not being updated.

many apps use some kind of a temporary filename when creating a file which is then renamed to something valid. For instance when recording a wav file rec.wav might be used when recording and when done the user saves the file as something else. Every single file recorded with the application will have title 'rec'.

The current approach for making sure every clip has some valid title is to use sparql functions (and coalesce) instead:

SELECT tracker:coalesce(?title, tracker:string-from-filename(?filename), "unknown") WHERE { ... }

this will pick the first option that has non-NULL value. This much is already in tracker.

Of course your parsing routine is more advanced and provides more information than just the title but the situation is still similar. Sparql functions for providing this information are the preferred way to do this, unfortunately in this case it may mean repeated parsing of the filename for all the different fields you obtain from it.
Comment 5 iain 2010-02-02 16:56:44 UTC
(In reply to comment #4)
> 
> many apps use some kind of a temporary filename when creating a file which is
> then renamed to something valid. For instance when recording a wav file rec.wav
> might be used when recording and when done the user saves the file as something
> else. Every single file recorded with the application will have title 'rec'.

it only happens for video.
Using sparql functions also means that the filename has to be parsed every time for every field for every result which I imagine would slow down the query.

Also only filenames matching the regexes - showname.sXXeXX.title, or title(YYYY)  would get the extra metadata, filename.avi would not trigger it.
Comment 6 Martyn Russell 2010-02-02 18:33:19 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > 
> > many apps use some kind of a temporary filename when creating a file which is
> > then renamed to something valid. For instance when recording a wav file rec.wav
> > might be used when recording and when done the user saves the file as something
> > else. Every single file recorded with the application will have title 'rec'.
> 
> it only happens for video.

But from our past experience we have had the same thing with MP3s generally the success is the same.

> Using sparql functions also means that the filename has to be parsed every time
> for every field for every result which I imagine would slow down the query.

This is true, but as you say, you would only really notice this if there was no title already which should rarely be the case but seldom is IMO :/

I would be interested to know how slow it really is.

> Also only filenames matching the regexes - showname.sXXeXX.title, or
> title(YYYY)  would get the extra metadata, filename.avi would not trigger it.

This still doesn't avoid the problem that ottela describes. That being that file name changes are not then reflected in the database - so if you application creates temporary file foo.avi and writes data to it and then moves it to showname.sXXeXX.title.avi later, Tracker won't pick that change up because we don't re-extract on file moves.

I also checked with Juergbi about this and he agreed with ottela and I, it is a problem and we have been burnt before trying to be clever with file name use for metadata when files get renamed.
Comment 7 iain 2010-02-02 19:03:03 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #4)
> > > 
> > > many apps use some kind of a temporary filename when creating a file which is
> > > then renamed to something valid. For instance when recording a wav file rec.wav
> > > might be used when recording and when done the user saves the file as something
> > > else. Every single file recorded with the application will have title 'rec'.
> > 
> > it only happens for video.
> 
> But from our past experience we have had the same thing with MP3s generally the
> success is the same.

Sure, I was just replying to the misconception that every file recorded with this hypothetical sound recording application will trigger this code and have the title "rec"

Thing is, with MP3s you can point to an existing standard and say "thats how we support MP3 metadata", but there's no way to do this for video.

> > Using sparql functions also means that the filename has to be parsed every time
> > for every field for every result which I imagine would slow down the query.
> 
> This is true, but as you say, you would only really notice this if there was no
> title already which should rarely be the case but seldom is IMO :/

title, season number, episode number, show name and film year.
These are all the things that would trigger it if I was doing it in a sparql query so a SELECT query would parse the filename 5 times.

> > Also only filenames matching the regexes - showname.sXXeXX.title, or
> > title(YYYY)  would get the extra metadata, filename.avi would not trigger it.
> 
> This still doesn't avoid the problem that ottela describes. That being that
> file name changes are not then reflected in the database - so if you
> application creates temporary file foo.avi and writes data to it and then moves
> it to showname.sXXeXX.title.avi later, Tracker won't pick that change up
> because we don't re-extract on file moves.

Sure. Thats a very very rare hypothetical situation though. I would say don't try to be too clever about it, and just parse the metadata from the filename when you find the file. It make break sometimes, but breaking sometimes is surely better than never working?

The other solution would be to parse it client side, but it seems wrong to make the application do that, and potentially update the tracker database as it finds the information (which again potentially loses sync with the filename).

We'll probably just include this patch in moblin then.
Comment 8 Martyn Russell 2010-03-19 21:24:06 UTC
Comment on attachment 152838 [details] [review]
Parse the filename to attempt to get extra missing metadata

Changing patch status
Comment 9 Philip Van Hoof 2010-04-28 09:51:42 UTC
I guess this patch should be rewritten to be in the tracker:string-from-filename SPARQL function?

Can we close this bug as invalid then?
Comment 10 Martyn Russell 2010-04-28 10:45:06 UTC
(In reply to comment #9)
> I guess this patch should be rewritten to be in the
> tracker:string-from-filename SPARQL function?
> 
> Can we close this bug as invalid then?

That's actually a really good idea. I think doing this allows applications to decide. I am up for the function. Mikael, any comments?
Comment 11 Mikael Ottela 2010-04-28 11:24:21 UTC
As mentioned by iain that function would actually be a series of functions for things search episode number, season, title.. and they'd all have to parse independently which is not ideal at all.

I am not against adding those functions but I am not convinced they will perform adequately. Of course the ideal solution here would be to have applications with user interaction to initiate this and to confirm the results (in a same manner as online scraping would be, maybe we should have some library with modules or something) but that's a somewhat different topic.

As already mentioned, automated guessing with no way for the user to confirm or reject has burned us before.

Whether you want to close the bug or keep it around is up to you but let's not lose the patch as there is nothing wrong with it as such.
Comment 12 Martyn Russell 2010-04-28 13:21:56 UTC
This is a bit of a corner case, but the point here is to provide the functionality in some form or another so it is not duplicated in all applications. It might not be performant, but that's not necessarily the point here. Applications can always do this themselves are you say.

> As already mentioned, automated guessing with no way for the user to confirm or
> reject has burned us before.

I agree, but it is still something that can be useful even if it is only a suggestion in an application UI.
Comment 13 Martyn Russell 2011-03-17 18:32:04 UTC
*** Bug 645052 has been marked as a duplicate of this bug. ***
Comment 14 Lionel Landwerlin 2011-03-17 23:20:42 UTC
Sorry for the duplication, I couldn't find this bug, and thought it was closed, my mistake.

What I wanted to point in #645052 is that something is broken in the current nie:title selection. I would rather refer the nie:title be the same as nfo:fileName or nfo:fileName without extension than just 'The' when the file name is 'The.Coin.Truc.2010.S01E02.HDTV.XviD-LOL.avi'.

I understand that the patch proposed by Iain might not be solution for all files.
Comment 15 Martyn Russell 2011-03-18 10:09:17 UTC
Actually, I have been thinking that we should be using this patch with the --enable-guarantee-metadata configure switch because we already do similar things for other properties (like nie:title based on filename for other formats).
Comment 16 Martyn Russell 2014-08-21 09:07:23 UTC
Sam, now that we "guarantee" metadata, even from file titles if we have to, should we incorporate this work / patch to be more thorough with our guessing?
Comment 17 Bastien Nocera 2014-08-21 19:59:04 UTC
FWIW, we do that work in grilo directly for totem/Videos, as it's more flexible.
Comment 18 Martyn Russell 2014-08-22 09:39:34 UTC
(In reply to comment #17)
> FWIW, we do that work in grilo directly for totem/Videos, as it's more
> flexible.

Link?
Comment 19 Sam Thursfield 2021-05-26 22:26:16 UTC
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new enhancement request ticket at
  https://gitlab.gnome.org/GNOME/tracker/-/issues/

Thank you for your understanding and your help.