GNOME Bugzilla – Bug 451565
Automatic detection of subtitle encoding
Last modified: 2018-11-03 11:13:28 UTC
If you play movie with external subtitles in totem, you have to set it's encoding by hand in settings dialog. It would be nice if totem used enca by default to guess encoding automatically and/or offered some more straightforward way to set it by hand (sort of submenu like Opera web browser has).
enca should be used in the backends, not in Totem itself (although we'd need to add an "automatic" item to the drop-down menu). Which backend are you using?
I used gstreamer some months ago (probably in March) and now I use xine. Both don't detect encoding automatically and neither depends on enca (in Ubuntu Feisty).
You should file a bug against xine-lib then (see http://www.gnome.org/projects/totem/#bugs) and I'll reassign this one against GStreamer (I believe it's a duplicate).
libenca only provides a partial solution to the overall problem, with support only for a very limited number of languages; it's currently not worth adding it as a dependency IMHO (even if the line-based parsers didn't need quite a bit of refactoring to make this work in a decent way).
Having an "auto-detect" option in GStreamer would still be useful.
*** Bug 647140 has been marked as a duplicate of this bug. ***
*** Bug 647141 has been marked as a duplicate of this bug. ***
*** Bug 615211 has been marked as a duplicate of this bug. ***
Agreed, we should probably investigate if the the encoding detection code in gedit or mozilla or from somewhere else can be used.
*** Bug 652455 has been marked as a duplicate of this bug. ***
For future reference: libguess was also mentioned today, and might be worth a look (even though it's not clear to me yet how we'd integrate it; I'd like to avoid dependencies for basic modules like this and libgsttag).
Wouldn't this work as a separate plugin that would get text/x-subtitle in, and spit out the same but in something guaranteed to be utf8. Ship a passthrough plugin with low priority that would use the usual way of determining the subtitle's encoding and convert that naively.
Hello, mpv just add support for "uchardet" which is basically the C binding for mozilla algorithm: https://github.com/BYVoid/uchardet It works well and has a much broader support that enca. Do `enca --list languages` to get a full list of ENCA supported languages. Basically enca does support only latin and cyrillic languages, with the exception of Chinese as only Asian language. Also after some test and doc reading, I realize that ENCA does not work without a hint language for monobyte encoding, which means for nearly all its list. This is actually documented in its manual: > The special language none can be shortened to __, it contains no 8bit encodings, so only multibyte encodings are detected. And actually even with a language hint, it does not seem the most efficient (I had several failures with encoding supposed to work, even when giving the language). That really does not make it very useful as a default. > For future reference: libguess was also mentioned today mpv also has a support for libguess, so I tested it. It apparently requires a language hint at all time (at least this is how they implemented it in mpv) and was not able to detect a file in EUC-KR with the hint "korean". So this is not a very broad test case, and maybe also mpv has a broken implementation, but it did not look very promising. On the other hand, uchardet works very well. It is able to detect the encoding of my files in non-UTF-8 Korean or Japanese. It is now the default in mpv. (see: https://github.com/mpv-player/mpv/issues/908 and https://github.com/mpv-player/mpv/pull/2193). Could it be supported, and why not even become the default in gstreamer too?
Thanks for that, I'll check it out. I actually started porting the mozilla lib to C for this purpose some time ago, but then got side-tracked.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/issues/9.