GNOME Bugzilla – Bug 757743
Speech recognition with GStreamer 1.0 PocketSphinx plugin
Last modified: 2016-05-28 15:50:08 UTC
Any plans to move back speech recognition? Some useful links. "Using PocketSphinx with GStreamer (1.0) and Python": http://cmusphinx.sourceforge.net/wiki/gstreamer Correct link to example in article: https://raw.githubusercontent.com/cmusphinx/pocketsphinx/master/src/gst-plugin/livedemo.py Another way to use GStreamer 1.0 https://pypi.python.org/pypi/pgi/ import pgi pgi.install_as_gi() from gi.repository import Gst
I did recently notice that the Pocketsphinx plugin was updated to GStreamer 1.0. http://sourceforge.net/p/cmusphinx/discussion/help/thread/6a286ad1/ So, maybe I'll bring it back in, I don't know. First off, it seems obvious that the Pocketsphinx GStreamer plugin is not well maintained and depending on something like that is liable to bring a lot of grief one way or another. The developers either don't know how to maintain a library or don't see the same value in GStreamer as the rest of us. Secondly, when Gaupol did have speech recognition, it seemed to me that everyone who tried it had unrealistically high hopes, were utterly dissapointed and gave up. I don't usually make subtitles from scratch myself and I simply don't know if it's actually useful.
Hi Osmo, Thank you for response and for your work too. I going to write in to parts, first technical, second general. 1. Technical Last day I checked speech-to-text tools and found that SpeechRecognition (https://pypi.python.org/pypi/SpeechRecognition/) use SWIG wrapper around pocketsphinx-5prealpha (https://pypi.python.org/pypi/pocketsphinx/), so now GStreamer 1.0 can be omitted. Just theoretic idea, maybe it's possible to use SpeechRecognition as high-level API, so you will have no need to deal with relatively low-level pocketsphinx-python (SWIG). In worst case SpeechRecognition can serve as source of examples. 2. General From evolutionary position, it's better to have bad tool, than not have it at all. For me personally even old Gaupol (0.19) with outdated underlying libs (dicts, lang models etc.) is useful tool, because: a) can create automatic timings, this reduce boring manual work; b) for rare videos (when even youtube can't generate auto-subs), Gaupol can recognize correctly at least part of speech. I use this words for Language study purpose: analyze text over Frequency lists and learn words which I don't know, after that I watch video. So this help me to understand more, even without direct use of subs on screen. But I think different people can find own ways to use it with benefit. Gaupol look like only option on open source landscape. Automatic subtitles creation must work better with new libs (models...), but I hope currently this will be more simple to implement and maintain than it's been previously. Kind regards, Alex Panchuk
(In reply to yasondinalt from comment #2) > 1. https://pypi.python.org/pypi/SpeechRecognition/ Do you have any experience with the other engines SpeechRecognition uses? Are those web APIs usable in a desktop app (allowed, convenient, fast)? > Just theoretic idea, maybe it's possible to use SpeechRecognition as > high-level API, so you will have no need to deal with relatively low-level > pocketsphinx-python (SWIG). In worst case SpeechRecognition can serve as > source of examples. The obvious benefit with GStreamer is that it will extract and decode audio from video files. If some other solution can only handle WAV files, then it's no good, but also if it decodes, it's problem if Gaupol effectively ships with different decoders. People will wonder why a video works in the integrated video player, but not in speech recognition. This is why, even if poorly maintained, I'd prefer to use GStreamer. > 2. For me personally even old Gaupol (0.19) with outdated underlying libs > (dicts, lang models etc.) is useful tool. Thanks, this is good to know. I plan to resume Gaupol development in March and I'll probably restore the existing speech recognition, at least if it seems to just work without big effort.
>Do you have any experience with the other engines SpeechRecognition uses? >Are those web APIs usable in a desktop app (allowed, convenient, fast)? I am complete amateur in this area and I read about SpeechRecognition module two days ago, so from me can be more harm then help. I just thought, SpeechRecognition use general code part and back-end related unified "API" (bad idea to use this word previously), which perhaps can be used directly from another programs. Not pay much attention. Thanks for clarification about GStreamer, I recall now https://github.com/maxrd2/subtitlecomposer use it to work with media too.
Moving bugs to GitHub. https://github.com/otsaloma/gaupol/issues/17