After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 451565 - Automatic detection of subtitle encoding
Automatic detection of subtitle encoding
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
git master
Other Linux
: Normal enhancement
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
: 615211 647140 647141 652455 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2007-06-27 11:03 UTC by David Jaša
Modified: 2018-11-03 11:13 UTC
See Also:
GNOME target: ---
GNOME version: 2.17/2.18



Description David Jaša 2007-06-27 11:03:27 UTC
If you play movie with external subtitles in totem, you have to set it's encoding by hand in settings dialog. It would be nice if totem used enca by default to guess encoding automatically and/or offered some more straightforward way to set it by hand (sort of submenu like Opera web browser has).
Comment 1 Bastien Nocera 2007-06-27 13:05:18 UTC
enca should be used in the backends, not in Totem itself (although we'd need to add an "automatic" item to the drop-down menu). Which backend are you using?
Comment 2 David Jaša 2007-06-27 13:30:18 UTC
I used gstreamer some months ago (probably in March) and now I use xine. Both don't detect encoding automatically and neither depends on enca (in Ubuntu Feisty).
Comment 3 Bastien Nocera 2007-06-27 13:37:05 UTC
You should file a bug against xine-lib then (see http://www.gnome.org/projects/totem/#bugs) and I'll reassign this one against GStreamer (I believe it's a duplicate).
Comment 4 Tim-Philipp Müller 2007-06-27 14:31:53 UTC
libenca only provides a partial solution to the overall problem, with support only for a very limited number of languages; it's currently not worth adding it as a dependency IMHO (even if the line-based parsers didn't need quite a bit of refactoring to make this work in a decent way).
Comment 5 Bastien Nocera 2011-04-06 01:48:39 UTC
Having an "auto-detect" option in GStreamer would still be useful.
Comment 6 Sebastian Dröge (slomo) 2011-05-18 20:20:43 UTC
*** Bug 647140 has been marked as a duplicate of this bug. ***
Comment 7 Sebastian Dröge (slomo) 2011-05-18 20:21:00 UTC
*** Bug 647141 has been marked as a duplicate of this bug. ***
Comment 8 Sebastian Dröge (slomo) 2011-05-18 20:21:05 UTC
*** Bug 615211 has been marked as a duplicate of this bug. ***
Comment 9 Sebastian Dröge (slomo) 2011-05-18 20:22:11 UTC
Agreed, we should probably investigate if the the encoding detection code in gedit or mozilla or from somewhere else can be used.
Comment 10 Jonathan Matthew 2011-06-13 17:17:02 UTC
*** Bug 652455 has been marked as a duplicate of this bug. ***
Comment 11 Tim-Philipp Müller 2012-10-04 22:30:59 UTC
For future reference: libguess was also mentioned today, and might be worth a look (even though it's not clear to me yet how we'd integrate it; I'd like to avoid dependencies for basic modules like this and libgsttag).
Comment 12 Bastien Nocera 2012-10-05 12:20:04 UTC
Wouldn't this work as a separate plugin that would get text/x-subtitle in, and spit out the same but in something guaranteed to be utf8. Ship a passthrough plugin with low priority that would use the usual way of determining the subtitle's encoding and convert that naively.
Comment 13 Jehan 2015-08-04 16:37:55 UTC
Hello,

mpv just add support for "uchardet" which is basically the C binding for mozilla algorithm: https://github.com/BYVoid/uchardet

It works well and has a much broader support that enca. Do `enca --list languages` to get a full list of ENCA supported languages. Basically enca does support only latin and cyrillic languages, with the exception of Chinese as only Asian language.
Also after some test and doc reading, I realize that ENCA does not work without a hint language for monobyte encoding, which means for nearly all its list. This is actually documented in its manual:

> The special language none can be shortened to __, it contains no 8bit encodings, so only multibyte encodings are detected.

And actually even with a language hint, it does not seem the most efficient (I had several failures with encoding supposed to work, even when giving the language). That really does not make it very useful as a default.

> For future reference: libguess was also mentioned today

mpv also has a support for libguess, so I tested it. It apparently requires a language hint at all time (at least this is how they implemented it in mpv) and was not able to detect a file in EUC-KR with the hint "korean". So this is not a very broad test case, and maybe also mpv has a broken implementation, but it did not look very promising.

On the other hand, uchardet works very well. It is able to detect the encoding of my files in non-UTF-8 Korean or Japanese.
It is now the default in mpv. (see: https://github.com/mpv-player/mpv/issues/908 and https://github.com/mpv-player/mpv/pull/2193).
Could it be supported, and why not even become the default in gstreamer too?
Comment 14 Tim-Philipp Müller 2015-08-04 16:49:27 UTC
Thanks for that, I'll check it out. I actually started porting the mozilla lib to C for this purpose some time ago, but then got side-tracked.
Comment 15 GStreamer system administrator 2018-11-03 11:13:28 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/issues/9.