Bug 757743 – Speech recognition with GStreamer 1.0 PocketSphinx plugin

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 757743 - Speech recognition with GStreamer 1.0 PocketSphinx plugin


Summary:	Speech recognition with GStreamer 1.0 PocketSphinx plugin


Status:	RESOLVED OBSOLETE

Product:	gaupol
Classification:	Other
Component:	general
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	gaupol-maint@gnome.bugs
QA Contact:	gaupol-maint@gnome.bugs

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2015-11-07 17:47 UTC by yasondinalt
Modified:	2016-05-28 15:50 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description yasondinalt 2015-11-07 17:47:39 UTC

Any plans to move back speech recognition?

Some useful links.

"Using PocketSphinx with GStreamer (1.0) and Python":
http://cmusphinx.sourceforge.net/wiki/gstreamer

Correct link to example in article:
https://raw.githubusercontent.com/cmusphinx/pocketsphinx/master/src/gst-plugin/livedemo.py

Another way to use GStreamer 1.0
https://pypi.python.org/pypi/pgi/
import pgi
pgi.install_as_gi()
from gi.repository import Gst

Comment 1 Osmo Salomaa 2015-11-07 19:01:59 UTC

I did recently notice that the Pocketsphinx plugin was updated to GStreamer 1.0.

http://sourceforge.net/p/cmusphinx/discussion/help/thread/6a286ad1/

So, maybe I'll bring it back in, I don't know.

First off, it seems obvious that the Pocketsphinx GStreamer plugin is not well maintained and depending on something like that is liable to bring a lot of grief one way or another. The developers either don't know how to maintain a library or don't see the same value in GStreamer as the rest of us.

Secondly, when Gaupol did have speech recognition, it seemed to me that everyone who tried it had unrealistically high hopes, were utterly dissapointed and gave up. I don't usually make subtitles from scratch myself and I simply don't know if it's actually useful.

Comment 2 yasondinalt 2016-02-17 10:49:03 UTC

Hi Osmo,

Thank you for response and for your work too.

I going to write in to parts, first technical, second general.

1. Technical
Last day I checked speech-to-text tools and found that SpeechRecognition (https://pypi.python.org/pypi/SpeechRecognition/)
use SWIG wrapper around pocketsphinx-5prealpha (https://pypi.python.org/pypi/pocketsphinx/), so now GStreamer 1.0 can be omitted.

Just theoretic idea, maybe it's possible to use SpeechRecognition as high-level API, so you will have no need to deal with relatively low-level pocketsphinx-python (SWIG). In worst case SpeechRecognition can serve as source of examples.

2. General

From evolutionary position, it's better to have bad tool, than not have it at all.
For me personally even old Gaupol (0.19) with outdated underlying libs (dicts, lang models etc.) is useful tool, because:
a) can create automatic timings, this reduce boring manual work;
b) for rare videos (when even youtube can't generate auto-subs), Gaupol can recognize correctly at least part of speech. I use this words for Language study purpose: analyze text over Frequency lists and learn words which I don't know, after that I watch video. So this help me to understand more, even without direct use of subs on screen.

But I think different people can find own ways to use it with benefit. Gaupol look like only option on open source landscape.

Automatic subtitles creation must work better with new libs (models...), but I hope currently this will be more simple to implement and maintain than it's been previously.

Kind regards,
Alex Panchuk

Comment 3 Osmo Salomaa 2016-02-17 16:05:06 UTC

(In reply to yasondinalt from comment #2)
> 1. https://pypi.python.org/pypi/SpeechRecognition/

Do you have any experience with the other engines SpeechRecognition uses? Are those web APIs usable in a desktop app (allowed, convenient, fast)?

> Just theoretic idea, maybe it's possible to use SpeechRecognition as
> high-level API, so you will have no need to deal with relatively low-level
> pocketsphinx-python (SWIG). In worst case SpeechRecognition can serve as
> source of examples.

The obvious benefit with GStreamer is that it will extract and decode audio from video files. If some other solution can only handle WAV files, then it's no good, but also if it decodes, it's problem if Gaupol effectively ships with different decoders. People will wonder why a video works in the integrated video player, but not in speech recognition. This is why, even if poorly maintained, I'd prefer to use GStreamer.

> 2. For me personally even old Gaupol (0.19) with outdated underlying libs
> (dicts, lang models etc.) is useful tool.

Thanks, this is good to know.

I plan to resume Gaupol development in March and I'll probably restore the existing speech recognition, at least if it seems to just work without big effort.

Comment 4 yasondinalt 2016-02-18 15:43:18 UTC

>Do you have any experience with the other engines SpeechRecognition uses? 
>Are those web APIs usable in a desktop app (allowed, convenient, fast)?

I am complete amateur in this area and I read about SpeechRecognition module two days ago, so from me can be more harm then help. I just thought, SpeechRecognition use general code part and back-end related unified "API" (bad idea to use this word previously), which perhaps can be used directly from another programs. Not pay much attention.

Thanks for clarification about GStreamer, I recall now https://github.com/maxrd2/subtitlecomposer use it to work with media too.

Comment 5 Osmo Salomaa 2016-05-28 15:50:08 UTC

Moving bugs to GitHub.

https://github.com/otsaloma/gaupol/issues/17