GNOME Bugzilla – Bug 552237
UTF-16 srt confuses gstreamer, misdetected as mp3
Last modified: 2008-10-13 09:13:47 UTC
To reproduce: - Take a video file, put the attached srt file in the same directory and rename it to the same name as the video (as foo.avi and foo.srt) - Lauch totem, go to preferences and check if "Automatically load subtitle files when movie is loaded" is enabled - Try to open the video - Totem doesn't respond I had the same problem with Elisa so I guess that's a GStreamer bug. The srt file is probably bugged but GStreamer should be able to handle it better. VLC was able to read it mostly fine (some subtitles were badly displayed).
Created attachment 118699 [details] bugged srt file
The file seems to be encoded in UTF-16(LE) with a BOM at the start. Three problems: - subparse doesn't handle files that are not UTF-8/ASCII yet (there's a bug for that somewhere I think) (or recognise them for that matter) - typefinding thinks it could be an mp3 because of the BOM marker - mp3parse doesn't error out after getting an EOS without having seen any frames. Guess it should post a STREAM WRONG_TYPE error or something like that in this case.
The first part is fixed now in CVS, I'll care for the other ones now ;) 2008-10-13 Sebastian Dröge <sebastian.droege@collabora.co.uk> * gst/subparse/Makefile.am: * gst/subparse/gstsubparse.c: (gst_sub_parse_dispose), (gst_sub_parse_class_init), (gst_sub_parse_init), (gst_convert_to_utf8), (detect_encoding), (convert_encoding), (get_next_line), (gst_sub_parse_data_format_autodetect), (feed_textbuf), (handle_buffer), (gst_sub_parse_change_state), (gst_subparse_type_find): * gst/subparse/gstsubparse.h: Add support for UTF16/UTF32 subtitles as long as the first bytes of the first buffer contain the BOM. This also adds support for other encodings that allow NUL bytes via the encoding property. Fixes bugs #552237 and #456788.
2008-10-13 Sebastian Dröge <sebastian.droege@collabora.co.uk> * gst/mpegaudioparse/gstmpegaudioparse.c: (gst_mp3parse_sink_event): Post a GST_ELEMENT_ERROR if we get EOS before seeing any valid frames. Partially fixes bug #552237. IMHO the typefinding is a non-issue now and I don't see how to fix it properly. The mp3 typefinder found at least two following valid frame headers in this file ;)