GNOME Bugzilla – Bug 653768
inline HTML subtitle parser plugin
Last modified: 2011-07-10 19:31:48 UTC
Hello, I have written a small subtitle parser plugin, that converts inline html in plain/text subtitles to pango markup. The plugin is automatically plugged by the subtitleoverlay element so it works out of the box. Commit 74e0c05ff7d2270494d should be reverted in gst-plugins-good if the htmlparse plugin makes it in gstreamer. I have attached a patch against gst-plugins-bad git head. Thanks
Created attachment 191052 [details] [review] htmlparse patch against gst-plugins-bad git head
text/plain is not the appropriate type for HTML subtitles. Also, htmlparse is not the best name for this element. It does not parse HTML.
I'm using text/plain because there is no metadata in the input stream specifying that the subtitles contains tags. There is no spec for tags in plain text subtitles. We could parse the subtitles in the demuxers to detect a tag and change the mimetype but I don't think it is necessary. Since there is no spec for this kind of subtitles and the tags are taken from the html spec, I'm not sure what other name could be used.
You can invent a new caps format, e.g. "subtitle/html". This should also go together with a typefinder in gst-plugins-base/gst/typefind but as this seems to be plain HTML this will be hard to distinguish from non-subtitle HTML... Where are subtitles like this used?
This element was useful with subtitles embedded in MKV files, and I just found out they are actually in the SubRip format, the text/plain format is the output of matroskademux. SubRip format can have the following tags: <b> <i> <u> <s> <font color="#xxxxxx"> The subparse element handles <b>, <i>, <u> and <s>. The matroskademux element handles <b>, <i>, <u>, <s> and <span> tags since 74e0c05ff7d2270494d. So my plugin is actually not needed, only font color support is missing in these elements. Sorry for my lack of research, I am closing the issue.