GNOME Bugzilla – Bug 324186
Smarter (deterministic!) typefind decisions
Last modified: 2006-03-05 13:50:49 UTC
I have just encountered a case with a file (from http://bugzilla.gnome.org/show_bug.cgi?id=324123) that highlights a problem with current typefinding. The file in question contains an ID3v2 tag at the start, and an ID3v1 tag at the end. Inside that is a WAV file, containing MP3 data. Normally (ignoring the problem in #324123), this works ok, because the file is detected as ID3, runs through id3demux then wavparse then the mp3 decoder. HOWEVER, when I remove the ID3v2 tag from the start, we now have what looks like a WAV file with ID3v1 at the end. Typefinding functions return MAXIMUM for both WAV and ID3v1, but sometimes one function runs first and sometimes the other does (registry reasons?) which means sometimes the ID3 is found and sometimes the WAV is found. This SUCKS! The solution is that somehow, ID3 has to take precedence over WAV in this case. The best suggestion for how to do that so far is to ensure that the ID3 typefind function has higher rank than anything that might be contained in ID3, and that typefind functions are sorted by rank when they are used.
Is this fixed now by Tim's recent typefind changes? Perhaps not entirely... It should now be deterministic (due to sorting), but I think ID3 and WAV will both still typefind with MAXIMUM probability, so I guess we'd still need to change that... if we care. The determinism is the important bit.
I was also thinking that we could extend typefind to return how 'early' in the stream the type was found, and prefer 'earlier' over 'later', somehow.
I think typefinding factory ranks plus probabilities should cover that API-wise. The rest is fine-tuning. Typefindelement might need some more fixing though (the sort it does is wrong and it might want to use the new helper function; also, the 'maximum' property never worked; and I don't know if it uses the helper function for pullrange-based typefinding)
I think this is fixed now. We have typefind helpers for typefinding in pull-mode and for typefinding buffers. Those typefind helpers call typefind functions in order of rank, and then in (reverse?) alphabetical order as secondary sort criterion if the rank is the same. The rest is done via probabilities. Typefinding is stopped immediately when a function returns a probability of TYPEFIND_MAXIMUM, otherwise the highest probability found is used. 2006-03-05 Tim-Philipp Müller <tim at centricular dot net> * gst/typefind/gsttypefindfunctions.c: (plugin_init): Give id3 and ape tag typefinders a rank slightly higher than PRIMARY to ensure they're always run before any of the other typefinders (in particular wav and mp3) (#324186).