GNOME Bugzilla – Bug 306769
Punctuation is spoken twice with festival for Spanish voice
Last modified: 2006-03-06 16:59:30 UTC
Steps to reproduce: 1. Start Gnopernicus. 2. Go to preferences' menu. 3. Select voice parameters. 4. Set punctuation to "say all punctuation" 5. Open a text editor (Gedit) 6. Write some punctuation, example the dot punctuation once. 7. Up and down arrow to read the line. Current results: Gnopernicus says "dot" twice. Firstly the "dot" string is sent and then the "." also it's sent to the TTS. Correct result: The punctuation sign has to be said once. When punctuation it's set to "say all" the signs shouldn't be sent as literal to the TTS, only the strings should be sent.... What do you think?
This happends with Festival TTS, I don't know behavior with other TTS
The idea is to send to speech the punctuation sign to have the right intonation and the sign "translated" to have it spoken. This is useful when documents are checked for errors. I tried to reproduce your problem, but without success. I tried with rab-diphone voice.
Have you tried with kal.diphone? Same than el.diphone if you send a "." the synthesis says "dot" and Gnopernicus also says "dot". If I write "..." and then I read up and down arrow gnopernicus says 6 "dot" times. Perhaps some voices doesn't know how to entonate the "." I don't know if this has to do with festival's voices.
I tried with "el" and "kal" voices. With both I got only 3 "dot"s.
I have confirmed that with "kal" voice for example punctuation "." it's not said as "dot". But if you uses "el" voice and send the string "hola mundo..." okay, only "hola mundo" it said. Altough Gnopernicus seems to send "hola mundo. . ." and then the voice says: "hola mundo punto punto" ("punto" is "dot" in spanish. And then when Gnopernicus reads for example "1. Modo de inicio..." and punctuation mode is set to "all" the "el" voice says: "1. modo de inicio. punto punto punto punto punto" is there any possibility to see what string is sent to the tts?
To see the texts sent to speech you have to recompile gnopernicus with --enable-debug option ./autogen.sh --enable-debug or ./configure --enable-debug Before running gnopernicus, set SRCSPEECH_DEBUG to "out" export SRCSPEECH_DEBUG=out This not the real text sent to speech. It's an xml string, but the text is contained in that xml.
Seeing the output the text sent to the TTS in the following sentence: "hello..." the output in "say all punctuation" now is: "hello. dot . dot . dot" some synthesis reads this like: "hello dot dot dot dot dot" the "." alone without any text is pronounced "dot". With your logic to produce correct entonation and saying the name of punctuation the correct output would be: "hello... dot dot dot" The text is firstly sent to entonate it, and then the names of the punctuation.
I disagree slightly - if punctuation is set to "say all", then it should be STRIPPED from the output and replaced with equivalent vocalization, otherwise you are just asking for trouble because some synths will _already_ default to speaking such punctuation. That's what is happening in this case. So the best output to TTS, for this string "hello...", using "say all", is, IMO: "hello dot dot dot" Bill
Bill, in that case is true. But if text is "hello, how are you?" and "hello comma how are you qustion mark" is sent to speech, then the intonation of phrase is none.
I believe that is the cost of attempting to implement "say all punctuation" outside of the TTS engine itself. If the TTS engine supports changing punctuation speech level, then this should be exposed via gnome-speech and gnopernicus should use gnome-speech parameter APIs for setting it. Otherwise, I don't see how inflection can be preserved. The current behavior is wrong, and I think it's necessary to strip punctuation from the speech engine input if gnopernicus is doing the "say all" implementation.
Bill - I am unconvinced this should go into gnome-speech; it is a tricky bit of stuff to do well. Separately, I am also unconvinced that when punctuation pronounciation is turned on, prosidy is also needed. I think the right answer is a direct character substitution. If you are speaking commas or question marks, strip them out entirely from what is sent to the TTS engine. And if the user has a TTS engine that can speak such things, then they would use that and NOT have Gnopernicus speak it. However, what I think (and what Bill thinks) is frankly less interesting than the opinions of the users. How important is prosidy with spoken punctuation? I've asked some folks for their comments in this bug; hopefully we'll hear from them here soon (assuming they can overcome accessibility problems with the bugzilla web interface, sigh...).
Peter: I agree that stripping all punctuation when in 'say all punctuation' mode is the right thing to do. The questions of whether prosody is needed or whether gnome-speech drivers ought to expose parameterization for speaking punctuation can IMO be considered separately. However, where a TTS engine _does_ provide such functionality (and I believe that some do), it seems appropriate that we should expose it.
Raising severity as this makes 'say all punctuation' a lot less useful.
I have study in more details the festival. It has different rules for every language added. Because of that is possible to have same string spoken differently in 2 languages. It is impossible for gnopernicus to remove the punctuation sign only for some voices. So, the solutions are: 1. Leave it as it is --- intonation but punctuation spoken twice in some cases 2. Remove all punctuation sign --- no intonation and no sign spoken twice Seems that Bill and Javier agree the second solution.
Created attachment 60299 [details] [review] proposed patch
Patch looks good, please apply.
Comment on attachment 60299 [details] [review] proposed patch Patch applied to cvs head.