Bug 141516 – Festival driver needs to detect/support multiple voices and languages

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 141516 - Festival driver needs to detect/support multiple voices and languages


Summary:	Festival driver needs to detect/support multiple voices and languages


Status:	RESOLVED FIXED

Product:	gnome-speech
Classification:	Deprecated
Component:	general
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Willie Walker
QA Contact:	GNOME Speech Maintainer(s)

URL:
Whiteboard:

Duplicates:	319215 321216 342325 (view as bug list)
Depends on:
Blocks:

Reported:	2004-04-30 18:00 UTC by Sergio
Modified:	2006-07-15 06:43 UTC

See Also:
GNOME target:	---
GNOME version:	2.9/2.10

Attachments
Patch using ISO-8859-1 with festival and adding Spanish voice (2.28 KB, patch) 2005-02-07 11:09 UTC, Fernando Herrera	none	Details \| Review
patch which fixes the festival encoding only (without hard-wiring in the spanish voice) (943 bytes, patch) 2005-03-03 12:27 UTC, bill.haneman	none	Details \| Review
patch from Fernando, for autodetection (sent to mailing list) (7.05 KB, patch) 2005-04-13 13:08 UTC, bill.haneman	committed	Details \| Review
Patch to detect character set encoding from festival voice from Enrico Zini (3.80 KB, patch) 2006-07-01 16:12 UTC, Willie Walker	committed	Details \| Review

Description Sergio 2004-04-30 18:00:50 UTC

When the words are stress them are not pronounced,they are spelling no good.
For example, the word 'Ratón(mouse)' is spelling : 'r'+'a'+'t'+'xx'+'n'.
This problem is of Gnopernicus,Festival TTS pronounced right the words stress.

Comment 1 Dana Ormenisan 2004-05-03 06:56:36 UTC

Transfering to gnome-speech for evaluation.

Comment 2 Juan Ramon 2004-06-23 06:11:01 UTC

I have the same problem, because i'm working in spanish too. I think the problem 
it's easy to solve. The problem it's that gnopernicus (or gnome-speech?) send 
utf8 strings to festival. If gnopernicus sends ascii strings to festival, 
stressed words and words with special characters (like 'ñ') will sound good. 
Festival (and others synthesis engines) don't read correctly a word if it's not 
in ascii. 

You can try to write a text in gedit and save it in utf8 and in ascii 
(occidental iso 8859-15) and send it to festival through 'festival --tts file' 

A function 'utf82ascii' or something like that will be enought to solve the 
problem (i think)

Comment 3 bill.haneman 2004-07-26 10:01:08 UTC

It sounds as though festival is expecting ISO-8859-15 instead of UTF-8; I
thought UTF-8 was compatible with 8859, but maybe it's only compatible with
8859-1, not the extended Latin standards.  It might we worth doing some
command-line experimenting here, and re-reading the festival docs.

Comment 4 bill.haneman 2004-07-28 13:58:56 UTC

from my reading, it sounds as though this may be voice-dependent in Festival;
i.e. it depends on the encoding of the voice data files.  Ugh.  For latin
languages this _appears_ to usually be ISO-8859-*, so it would probably help the
majority of languages if we did an encoding conversion before sending the
character stream to festival.  However, there may not be a clean way of doing
this in the general case for festival.

I think the flite APIs may be somewhat better in this respect.

Comment 5 Fernando Herrera 2005-02-07 11:09:31 UTC

Created attachment 37094 [details] [review]
Patch using ISO-8859-1 with festival and adding Spanish voice

Hello, this patch uses ISO-8859-1 enconding in the io_channels talking to
Festival, so it correcly says Spanish words. I also added the Spanish voice
"el" to the available ones (but you would need to download and install it
before).

What do you think about autodetecting installed voices using the (voice.list )
command?

Comment 6 Javier 2005-02-14 12:17:28 UTC

The patch seems to work, actually accent's characters and n with tilde are good 
prounonced by festival. However due a el-diphone database  missing, there are 
many symbols that are unknown prounonciation and festival says 'xx'. Another 
stuff it's that currently the iso-8859-1 conversion it's made on every voice 
speaker. Does it posible to apply iso-8859-1 translations only on the v2-el 
speaker?   


By the way I think that problem now it's on festival's el-diphone voice. Do you 
know where to file a bug for festival?

Comment 7 Fernando Herrera 2005-02-14 14:24:07 UTC

Well, festival is only expecting ISO-8859-1 or ISO-8859-15 (The only difference
is that the last adds the EURO symbol).

So writing to it UTF-8 strings via a default GIOChannel is wrong. (Try to
reencode spanlex.scm file into UTF-8 and send festival UTF-8 spanish strings,
and it will fail).

With the current code it is working because all the characters on UTF-8 for
English map directly to the same charaters under ISO-8859-1/15. kal*.scm files
are both valid ISO-8859-1/15 and UTF-8 files. But this correspondence is not
true when you use any ASCII char > 127, as in Spanish, in French or any other
non-English language. 

So Festival is not UTF-8 (the upcoming 2.0 isn't too) and we have to live with
it. So the right solution is to set the IOChannel encoding to ISO-8859-1/15 as
festival is not doing any charset conversion internally it would be okey for
every lang. I mean, all ISO8859 encodings are 0-255 chars that only are
different in the visual representation of the characters, If we have a greek
text with an alpha, we will send an 0xE1 code, under ISO8859-1 or ISO8859-6, and
if festival is reading greek.scm, it will match against 0xE1, it doesn't matter
that if you show that file with a ISO8859-1 font and you see a &acute;

Comment 8 Fernando Herrera 2005-02-14 14:32:16 UTC

Regarding the other point in this bug, allowing the user to select any installed
voice instead of only the two English default, I think we should build the
voices list at runtime instead of hardcode them. We could add by default
(hardcoded) the two Enlish default voices and then query the festival driver at
init time for available voices with the (voice.list ) command, do the reponse
parsing in the read iochannels and add any new voice found to the previous list.

Please, let me know if you are okey with this and I'll attach a patch for it.
Thanks

Comment 9 bill.haneman 2005-03-03 12:18:30 UTC

Hi Fernando: 

Please provide a patch for autodetection of voices.... can you file a separate
bug for that, and reference it here?  THanks.

Comment 10 bill.haneman 2005-03-03 12:20:30 UTC

Comment on attachment 37094 [details] [review]
Patch using ISO-8859-1 with festival and adding Spanish voice

the IOChannel part of this patch looks right.  But not everyone will have the
es_ES voice, right?

If not, then the addition of the spanish voice should not go in to cvs...

Comment 11 bill.haneman 2005-03-03 12:27:56 UTC

Created attachment 38200 [details] [review]
patch which fixes the festival encoding only (without hard-wiring in the spanish voice)

Comment 12 bill.haneman 2005-04-08 14:50:26 UTC

Fernando: Did you ever get a patch that queried festival for its voices, rather
than hard-wiring them?

Comment 13 bill.haneman 2005-04-13 13:08:36 UTC

Created attachment 45214 [details] [review]
patch from Fernando, for autodetection (sent to mailing list)

Comment 14 Willie Walker 2006-07-01 16:12:02 UTC

Created attachment 68244 [details] [review]
Patch to detect character set encoding from festival voice from Enrico Zini

Based upon bug 342325 and bug 321216 and discussion on a number of accessibility lists, I'm going to reopen this bug and mark the other two as duplicates.  The summary of the discussion on the lists, which should be available on the July-01 archive for http://mail.gnome.org/mailman/listinfo/gnome-accessibility-devel (it's too soon for the archive to exist), is as follows:

1) IN FESTIVAL: Rely on a convention to optionally extend the programmatic description of Festival voices directly in the Festival voice data itself (i.e., not in gnome-speech).  Based upon precedence set by the festival-freebsoft-utils folks, this extension adds a  "coding" attribute to define the character encoding type of the voice.   The "coding" attribute is a string acceptable for passing directly to g_io_channel_set_encoding. ISO-8859-1 is implied if "coding" is absent.  See the "current-voice-coding" description at   <http://www.freebsoft.org/doc/festival-freebsoft-utils/festival-freebsoft-utils_13.html> for more information on the "coding" attribute.

2) IN GNOME-SPEECH: Patch the gnome-speech festival synthesis driver to check for the "coding" attribute of a voice description.  If the parameter is defined, call g_io_channel_set_encoding with the value of  the attribute.  If it is not set, default to ISO-8859-1.

Here's an example patch for the Telgu Festival voice:

diff -Naur festival-te-0.3.2.orig/telugu_NSK_diphone/festvox/telugu_NSK_diphone.scm festival-te-0.3.2/telugu_NSK_diphone/festvox/telugu_NSK_diphone.scm
--- festival-te-0.3.2.orig/telugu_NSK_diphone/festvox/telugu_NSK_diphone.scm	2006-04-04 05:46:13.000000000 +0100
+++ festival-te-0.3.2/telugu_NSK_diphone/festvox/telugu_NSK_diphone.scm	2006-07-01 15:02:25.000000000 +0100
@@ -225,6 +225,8 @@
    (description
     "COMMENT"
     )
-   (builtwith festvox-1.2)))
+   (builtwith festvox-1.2)
+   (coding UTF-8)
+   ))
 
 (provide 'telugu_NSK_diphone)

Many thanks to Enrico Zini and Milan Zamazal for their knowledge and diligence in investigating this problem.

Comment 15 Willie Walker 2006-07-01 16:17:56 UTC

*** Bug 342325 has been marked as a duplicate of this bug. ***

Comment 16 Willie Walker 2006-07-01 16:25:26 UTC

*** Bug 321216 has been marked as a duplicate of this bug. ***

Comment 17 Willie Walker 2006-07-02 11:19:41 UTC

The patch from Enrico has been committed to head and I'm closing this bug.  If our Telegu friends discover this doesn't work (after making the appropriate change to add the 'coding' attribute to the Telegu Festival voice itself), please reopen this bug.  Thanks everyone!

Comment 18 Willie Walker 2006-07-02 11:38:51 UTC

*** Bug 319215 has been marked as a duplicate of this bug. ***

Comment 19 Sunil Mohan Adapa 2006-07-15 06:43:47 UTC

The above patch suggested works with Festival Telugu (released as 0.3.3)

http://sourceforge.net/project/showfiles.php?group_id=159819