After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 555647 - subparse doesn't correctly handle 8859-15 encoded .srt-Files
subparse doesn't correctly handle 8859-15 encoded .srt-Files
Status: RESOLVED NOTABUG
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
0.10.20
Other All
: Normal normal
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2008-10-09 09:56 UTC by Andreas Frisch
Modified: 2009-04-20 07:08 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
here's my console log (33.09 KB, text/plain)
2008-10-09 09:57 UTC, Andreas Frisch
Details
subtitle file, 8859-15 encoding with german characters (55.49 KB, text/plain)
2008-10-09 09:59 UTC, Andreas Frisch
Details

Description Andreas Frisch 2008-10-09 09:56:43 UTC
Please describe the problem:
when parsing subtitles from an .SRT file that has latin-1 or 8859-15 encoding (german umlaute characters), only lines that have no non-ascii symbols are sent down the pipeline when no character encoding is specified. when setting subtitle-encoding or the GST_SUBTITLE_ENCODING to "ISO-8859-15", then the pipeline stops after the first occurance of a symbol.
the same subtitle file works as expected after recoding it to utf-8

Steps to reproduce:
1. gst-launch-0.10 filesrc location=subs.srt ! subparse ! fakesink dump=true
2. gst-launch-0.10 filesrc location=subs.srt ! subparse subtitle-encoding="ISO-8859-1" ! fakesink dump=true
3. export GST_SUBTITLE_ENCODING="ISO-8859-15"
4. gst-launch-0.10 filesrc location=subs.srt ! subparse ! fakesink dump=true


Actual results:
1: skips lines containing non-ascii chars
2: pipeline stops after the second string
4: same as above


Expected results:
emit correctly encoded utf-8 strings when specifying the correct input encoding

Does this happen every time?
yes

Other information:
log and srt-file to be uploaded
Comment 1 Andreas Frisch 2008-10-09 09:57:59 UTC
Created attachment 120255 [details]
here's my console log
Comment 2 Andreas Frisch 2008-10-09 09:59:34 UTC
Created attachment 120256 [details]
subtitle file, 8859-15 encoding with german characters
Comment 3 Tim-Philipp Müller 2008-10-09 10:12:00 UTC
This works fine for me with CVS - what version of -base are you using?

Also this:

$ GST_SUBTITLE_ENCODING='ISO-8859-15' gst-launch-0.10 filesrc location=555647-iso-8859-1.srt ! subparse ! t.text_sink  videotestsrc pattern=black ! textoverlay name=t ! ffmpegcolorspace ! ximagesink

renders the text just fine, including the Umlauts.
Comment 4 Andreas Frisch 2008-10-09 10:19:27 UTC
using 0.10.20 like stated.
i work on a mipsel 32 architecture
Linux dm8000 2.6.12-5.1-brcmstb-dm8000 #1 Fri Sep 12 17:48:45 CEST 2008 7400b0
Comment 5 Andreas Frisch 2008-10-09 10:50:52 UTC
root@dm8000:/media/hdd/movie/avi# GST_DEBUG=subparse:4 gst-launch-0.10 filesrc location=subs.srt ! subparse subtitle-encoding="ISO-8859-15" ! fakesink dump=true sync=true
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...    
0:00:00.346522000  1465 0x10024890 DEBUG             subparse gstsubparse.c:1155:gst_sub_parse_sink_event: Handling newsegment event
0:00:00.348708000  1465 0x10024890 DEBUG             subparse gstsubparse.c:1187:gst_sub_parse_sink_event:<subparse0> newsegment (bytes)
0:00:00.351016000  1465 0x10024890 INFO              subparse gstsubparse.c:1026:feed_textbuf: discontinuity                            
0:00:00.352379000  1465 0x10024890 DEBUG             subparse gstsubparse.c:863:parser_state_init: initialising parser                  
0:00:00.359845000  1465 0x10024890 DEBUG             subparse gstsubparse.c:863:parser_state_init: initialising parser                  
0:00:00.363185000  1465 0x10024890 DEBUG             subparse gstsubparse.c:1103:handle_buffer:<subparse0> Sending text 'Alan, Shirley!', 0:00:00.780000000 + 0:00:01.180000000                                                                                                                 
Pipeline is PREROLLED ...                                                                                                                       
Setting pipeline to PLAYING ...                                                                                                                 
New clock: GstSystemClock                                                                                                                       
00000000 (0x100599f8): 41 6c 61 6e 2c 20 53 68 69 72 6c 65 79 21        Alan, Shirley!                                                          
0:00:01.156569000  1465 0x10024890 DEBUG             subparse gstsubparse.c:1103:handle_buffer:<subparse0> Sending text 'Lorraine!', 0:00:01.960000000 + 0:00:00.640000000                                                                                                                      
00000000 (0x10051558): 4c 6f 72 72 61 69 6e 65 21                       Lorraine!                                                               
0:00:02.335854000  1465 0x10024890 INFO              subparse gstsubparse.c:316:convert_encoding:<subparse0> invalid UTF-8!                     
0:00:02.337064000  1465 0x10024890 WARN              subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported                                                      
0:00:02.339545000  1465 0x10024890 WARN              subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported                                                      
0:00:02.341918000  1465 0x10024890 WARN              subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported                                                      
0:00:02.345400000  1465 0x10024890 WARN              subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported                                                      
0:00:02.347515000  1465 0x10024890 WARN              subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported
0:00:02.349495000  1465 0x10024890 WARN              subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported
0:00:02.351497000  1465 0x10024890 WARN              subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported
0:00:02.353462000  1465 0x10024890 WARN              subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported
0:00:02.355539000  1465 0x10024890 WARN              subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported
0:00:02.357780000  1465 0x10024890 WARN              subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported
0:00:02.360701000  1465 0x10024890 WARN              subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported
0:00:02.364168000  1465 0x10024890 WARN              subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported
0:00:02.366536000  1465 0x10024890 WARN              subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported
0:00:02.368923000  1465 0x10024890 WARN              subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported
0:00:02.370830000  1465 0x10024890 DEBUG             subparse gstsubparse.c:1155:gst_sub_parse_sink_event: Handling eos event
0:00:02.372299000  1465 0x10024890 DEBUG             subparse gstsubparse.c:1166:gst_sub_parse_sink_event: EOS. Pushing remaining text (if any)
0:00:02.373546000  1465 0x10024890 WARN              subparse gstsubparse.c:337:convert_encoding:<subparse0> could not convert string from 'ISO-8859-15' to UTF-8: Conversion from character set 'ISO-8859-15' to 'UTF-8' is not supported
Got EOS from element "pipeline0".
Execution ended after 2595366000 ns.
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
FREEING pipeline ...
Comment 6 Tim-Philipp Müller 2008-10-09 11:15:25 UTC
Ok, so it's most likely a system installation / packaging / setup issue.

subparse could post a warning message on the bus in this case, apart from that I'm not sure what else it's supposed to do.


Comment 7 Andreas Frisch 2008-10-09 12:10:03 UTC
works as expected after installing glib-gconv and glibc-gconv-iso8859
i am making the request to have the subparse element emit a warning or error
message in case of conversion failure.
Comment 8 Tobias Mueller 2009-04-19 23:12:11 UTC
Hey guys :)

This bug has been set to NEEDINFO in comment #3. The question has already been answered. As this issue has been NOTABUG as per comment #7, I am closing this one.

Andreas, would you please open a new bug for your request? TIA!
Comment 9 Andreas Frisch 2009-04-20 07:08:59 UTC
aye aye: http://bugzilla.gnome.org/show_bug.cgi?id=579576