Bug 721676 – typefind does not find the correct media type for mpg with http streaming

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 721676 - typefind does not find the correct media type for mpg with http streaming


Summary:	typefind does not find the correct media type for mpg with http streaming


Status:	RESOLVED OBSOLETE

Product:	GStreamer
Classification:	Platform
Component:	gstreamer (core)
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2014-01-07 03:53 UTC by satish kumar
Modified:	2018-11-03 12:19 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
The patch analyzes the data to find the max probabiltity till the max data size is reached. (6.01 KB, patch) 2014-01-07 03:53 UTC, satish kumar	needs-work	Details \| Review
patch with git master (4.97 KB, patch) 2014-01-09 06:06 UTC, satish kumar	none	Details \| Review
Test stream (1.24 MB, video/mpeg) 2014-01-10 06:24 UTC, satish kumar		Details

Description satish kumar 2014-01-07 03:53:23 UTC

Created attachment 265502 [details] [review]
The patch analyzes the data to find the max probabiltity till the max data size is reached. 

Sometimes for some mpg streams in http streaming case, only audio is decoded. Video is not displayed. streams has both audio/video content. 

Analysis:
The issue is coming from typefind element. In this, minimum data size required for parsing is  set to 2048 bytes and max as  128*1024.

Failing case: in this case, httpsrc (soup) source gives 2625 bytes to typefind as first buffer. This buffer is used for parsing to find the suitable data type for further auto plugging. For this size, five consecutive frames of mp3 are found. So typefind declares it as mpeg/audio with probability 99. This is maximum for current case. Hence only audio pipeline is created  and so no video at all.

Passing case: when this passes ( showing video/audio both) then httpsrc(soup) gives 1165 bytes which are less than min size( 2048) so they are stored and soup again gives 4096 bytes which makes a total of 5261 bytes to typefind. With this data length video/mpeg-sys with probability 100  is declared as typefind output. This causes further auto plugging of demuxer and audio/video decoding path. Hence working fine.

These figures are based on some logs and may vary for different run case, but the issue should remain same. Main issue is that when some format is found then it does not check for case where some other format might be found with more probability if some more data is used for parsing.

Also proposed a patch to fix the issue.

Comment 1 Sebastian Dröge (slomo) 2014-01-07 08:01:10 UTC

Comment on attachment 265502 [details] [review]
The patch analyzes the data to find the max probabiltity till the max data size is reached. 

Please update this patch to cleanly apply against 1.2 or git master. 0.10 is no longer maintained since a very long time.

Chances are also good that this is fixed already in 1.2. Please provide a testcase or check if this is the case.

Comment 2 satish kumar 2014-01-08 11:56:55 UTC

This is still reproducible with latest code git master.

will submit a new patch for it.

Comment 3 satish kumar 2014-01-09 06:06:42 UTC

Created attachment 265818 [details] [review]
patch with git master

new patch set with git master.

Comment 4 satish kumar 2014-01-09 09:50:01 UTC

it is showing error message as the test file size is more than 1600 kb, max size set for non-patch attachment.

Comment 5 Tim-Philipp Müller 2014-01-09 10:00:10 UTC

Maybe you could upload it somewhere else? Or maybe the bug is reproducable with a small part of the clip already? (head --bytes=1500k foo.mpg > head.mpg)

Comment 6 satish kumar 2014-01-10 06:24:04 UTC

Created attachment 265893 [details]
Test stream

Test stream attached.

Comment 7 Tim-Philipp Müller 2014-01-10 12:02:30 UTC

Thank you for the patch and the test file. I can confirm the issue with git master.

Comment 8 satish kumar 2014-01-10 15:07:03 UTC

Thanks for confirmation and suggestion for uploading the file.

Comment 9 pavan goyal 2014-09-09 17:14:40 UTC

Hello Tim,
      is the patch ok to go ahead?

Comment 10 Sebastian Dröge (slomo) 2014-09-12 12:32:45 UTC

If I'm not mistaken this is not very efficient... if we get hundreds of little buffers, we will combine them into a single big one... one by one, copying data over and over again. And then typefind over that data over and over again, every time with a little bit of more data in the end.

This doesn't seem like a great default behaviour

Comment 11 pavan goyal 2014-09-12 12:43:07 UTC

(In reply to comment #10)
> If I'm not mistaken this is not very efficient... if we get hundreds of little
> buffers, we will combine them into a single big one... one by one, copying data
> over and over again. And then typefind over that data over and over again,
> every time with a little bit of more data in the end.
> 
> This doesn't seem like a great default behaviour

Thanks Sebastian. Any suggestion please on this to proceed further? how this use case should be handled?

Comment 12 Rajesh 2014-11-28 10:04:01 UTC

as we k(In reply to comment #10)
> If I'm not mistaken this is not very efficient... if we get hundreds of little
> buffers, we will combine them into a single big one... one by one, copying data
> over and over again. And then typefind over that data over and over again,
> every time with a little bit of more data in the end.
> 
> This doesn't seem like a great default behaviour

as we know that the data size used in typefind is not sufficient 
to detect the media type correctly in some streams then i suppose 
it would be better to have a max data size property in typefind for detection.
application can simply set that property for outlier streams which don't play with default typefind data size.

Comment 13 satish kumar 2014-11-29 05:47:15 UTC

(In reply to comment #10)
> If I'm not mistaken this is not very efficient... if we get hundreds of little
> buffers, we will combine them into a single big one... one by one, copying data
> over and over again. And then typefind over that data over and over again,
> every time with a little bit of more data in the end.
> 
> This doesn't seem like a great default behaviour


hi Sebastian,

with the current default implementation, we are iterating over and again by more data till we find the probability greater than the minimum probability. The minimum probability is not sufficient always for finding correct media type( as in the reported case).

In the patch, I have extended the same philosophy, to check for maximum probability for the given scanning range of data. when 100% probability found in b/w, simply break and declare the valid type found. For most of the cases, 100% probability should be found with very few iterations.

BR/satish

Comment 14 GStreamer system administrator 2018-11-03 12:19:36 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gstreamer/issues/48.