GNOME Bugzilla – Bug 555647
subparse doesn't correctly handle 8859-15 encoded .srt-Files
Last modified: 2009-04-20 07:08:59 UTC
Please describe the problem: when parsing subtitles from an .SRT file that has latin-1 or 8859-15 encoding (german umlaute characters), only lines that have no non-ascii symbols are sent down the pipeline when no character encoding is specified. when setting subtitle-encoding or the GST_SUBTITLE_ENCODING to "ISO-8859-15", then the pipeline stops after the first occurance of a symbol. the same subtitle file works as expected after recoding it to utf-8 Steps to reproduce: 1. gst-launch-0.10 filesrc location=subs.srt ! subparse ! fakesink dump=true 2. gst-launch-0.10 filesrc location=subs.srt ! subparse subtitle-encoding="ISO-8859-1" ! fakesink dump=true 3. export GST_SUBTITLE_ENCODING="ISO-8859-15" 4. gst-launch-0.10 filesrc location=subs.srt ! subparse ! fakesink dump=true Actual results: 1: skips lines containing non-ascii chars 2: pipeline stops after the second string 4: same as above Expected results: emit correctly encoded utf-8 strings when specifying the correct input encoding Does this happen every time? yes Other information: log and srt-file to be uploaded
Created attachment 120255 [details] here's my console log
Created attachment 120256 [details] subtitle file, 8859-15 encoding with german characters
This works fine for me with CVS - what version of -base are you using? Also this: $ GST_SUBTITLE_ENCODING='ISO-8859-15' gst-launch-0.10 filesrc location=555647-iso-8859-1.srt ! subparse ! t.text_sink videotestsrc pattern=black ! textoverlay name=t ! ffmpegcolorspace ! ximagesink renders the text just fine, including the Umlauts.
using 0.10.20 like stated. i work on a mipsel 32 architecture Linux dm8000 2.6.12-5.1-brcmstb-dm8000 #1 Fri Sep 12 17:48:45 CEST 2008 7400b0
root@dm8000:/media/hdd/movie/avi# GST_DEBUG=subparse:4 gst-launch-0.10 filesrc location=subs.srt ! subparse subtitle-encoding="ISO-8859-15" ! fakesink dump=true sync=true Setting pipeline to PAUSED ... Pipeline is PREROLLING ... 0:00:00.346522000 1465 0x10024890 DEBUG subparse gstsubparse.c:1155:gst_sub_parse_sink_event: Handling newsegment event 0:00:00.348708000 1465 0x10024890 DEBUG subparse gstsubparse.c:1187:gst_sub_parse_sink_event:<subparse0> newsegment (bytes) 0:00:00.351016000 1465 0x10024890 INFO subparse gstsubparse.c:1026:feed_textbuf: discontinuity 0:00:00.352379000 1465 0x10024890 DEBUG subparse gstsubparse.c:863:parser_state_init: initialising parser 0:00:00.359845000 1465 0x10024890 DEBUG subparse gstsubparse.c:863:parser_state_init: initialising parser 0:00:00.363185000 1465 0x10024890 DEBUG subparse gstsubparse.c:1103:handle_buffer:<subparse0> Sending text 'Alan, Shirley!', 0:00:00.780000000 + 0:00:01.180000000 Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstSystemClock 00000000 (0x100599f8): 41 6c 61 6e 2c 20 53 68 69 72 6c 65 79 21 Alan, Shirley! 0:00:01.156569000 1465 0x10024890 DEBUG subparse gstsubparse.c:1103:handle_buffer:<subparse0> Sending text 'Lorraine!', 0:00:01.960000000 + 0:00:00.640000000 00000000 (0x10051558): 4c 6f 72 72 61 69 6e 65 21 Lorraine! 0:00:02.335854000 1465 0x10024890 INFO subparse gstsubparse.c:316:convert_encoding:<subparse0> invalid UTF-8! 0:00:02.337064000 1465 0x10024890 WARN subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported 0:00:02.339545000 1465 0x10024890 WARN subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported 0:00:02.341918000 1465 0x10024890 WARN subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported 0:00:02.345400000 1465 0x10024890 WARN subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported 0:00:02.347515000 1465 0x10024890 WARN subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported 0:00:02.349495000 1465 0x10024890 WARN subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported 0:00:02.351497000 1465 0x10024890 WARN subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported 0:00:02.353462000 1465 0x10024890 WARN subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported 0:00:02.355539000 1465 0x10024890 WARN subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported 0:00:02.357780000 1465 0x10024890 WARN subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported 0:00:02.360701000 1465 0x10024890 WARN subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported 0:00:02.364168000 1465 0x10024890 WARN subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported 0:00:02.366536000 1465 0x10024890 WARN subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported 0:00:02.368923000 1465 0x10024890 WARN subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported 0:00:02.370830000 1465 0x10024890 DEBUG subparse gstsubparse.c:1155:gst_sub_parse_sink_event: Handling eos event 0:00:02.372299000 1465 0x10024890 DEBUG subparse gstsubparse.c:1166:gst_sub_parse_sink_event: EOS. Pushing remaining text (if any) 0:00:02.373546000 1465 0x10024890 WARN subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported Got EOS from element "pipeline0". Execution ended after 2595366000 ns. Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... FREEING pipeline ...
Ok, so it's most likely a system installation / packaging / setup issue. subparse could post a warning message on the bus in this case, apart from that I'm not sure what else it's supposed to do.
works as expected after installing glib-gconv and glibc-gconv-iso8859 i am making the request to have the subparse element emit a warning or error message in case of conversion failure.
Hey guys :) This bug has been set to NEEDINFO in comment #3. The question has already been answered. As this issue has been NOTABUG as per comment #7, I am closing this one. Andreas, would you please open a new bug for your request? TIA!
aye aye: http://bugzilla.gnome.org/show_bug.cgi?id=579576