GNOME Bugzilla – Bug 392939
[pending] "-" should be spoken when used as a negative sign at "some" punctuation
Last modified: 2007-03-14 14:41:28 UTC
The "-" character is normally only spoken when the user's punctuation setting is "most" or "all". Under normal circumstances, this is exactly as it should be; however, users who prefer a punctuation level of "some" might conclude that a number is positive when in reality it is not. I tend to think of the "some" level being the functional equivalent of what one would speak if reading the document aloud, and I think that is how many users apply it: to read for content smoothly and naturally rather than proofreading. When reading for content, the negative sign is still of interest (just like the dollar sign, decimal point, percent sign, etc.) . As such, I'd like to propose that when a word begins with a "-" followed immediately by a digit or a dollar sign, it be spoken at the "some" level of punctuation. Thanks!!
Thanks Joanie. Mike?
I would agree with this.
Just starting to look at this. Thinking about this a bit more, shouldn't the "-" at the beginning of a number or a dollar value be spoken irrespective of the punctuation level?
For example, with the punctuation level set to PUNCTUATION_STYLE_NONE the fourth line below in gedit, is still spoken incorrectly: This line contain 5.0 number This line contains -5.0 number This line contains $5.0 number This line contains -$5.0 number
In my opinion none means none. I really don't know why anyone would want to use none but some do.
I was just going to say what Mike said, but I see he beat me to it. :-)
Created attachment 79778 [details] [review] Patch to implement this enhancement. Okay, fair enough. Patch attached (and NOT checked into SVN trunk yet).
Created attachment 79779 [details] Test file I've been using. Here's the test file I've been using. I've been testing with the punctuation level set to SOME and NONE. Couple of notes: 1/ It's not saying "5 dollar characters" for the "$$$$$" string when we have the punctuation level set to NONE. Is that correct? If the punctuation level is set to SOME, MOST or ALL it correct says "5 dollar characters". 2/ If you down arrow pass the last line, it speaks the last line again. At least for me. Are others hearing this too? If so, it's probably a separate newline related problem, and I'll file another bug for it.
Thanks Rich. I tried the patch. I am hearing the dash at SOME punctuation WHEN it precedes the dollar sign. When it immediately precedes the 5, I am still not hearing it. Couple of responses: 1/ NONE is NONE. :-) Like Mike said, I don't know why anyone would go with NONE, but they do. And when they do, they expect.... uh..... NONE. ;-) So, yes, I'd say that's the correct behavior. 2/ Yup. Me too.
> When it immediately precedes the 5, I am still not hearing it. What I'm getting in this case (punctuation level == SOME and lines 2 and 4 of the sample file), is that it speaks " ... minus 5 ...". So that suggests that something else is affecting this. Some other setting that you and I current have set differently. Note that I also tried a variant of the patch with: nextCharMatches = (oldText[i + 1] in "$0123456789") but then it speaks "dash five" for "-5", whereas I like it speaking "minus five". Maybe "dash five" is fine. Comments? > NONE is NONE. :-) Okay. I'll leave that one alone. :-) > Yup. Me too. I'll file a separate bug on this. Lynn, as this is your area, could you have a look at it please? Thanks.
The bug filed for the problem reported in note #2 above, is bug #394397
Created attachment 79788 [details] Orca debug output when up arrowing through sample file. I have the punctuation level set to orca.settings.PUNCTUATION_STYLE_SOME and the Orca debug level set to orca.debug.LEVEL_INFO. I started with the cursor at the bottom of the file (i.e. below the last line with text on).
Created attachment 79796 [details] Orca debug output when up arrowing through sample file (Joanie's) debug.out back at ya! :-) I might be tired, but they look the same to me.... When I made the change you mentioned, namely: nextCharMatches = (oldText[i + 1] in "$0123456789") Orca started speaking the dash when it was immediately followed by a number. So.... Could it be synthesizer related? Maybe your synthesizer is smart enough to be on the lookout for a dash preceding a number and to call it "minus"?? Mine apparently are not. I tried with Festival and with DecTalk 5. What are you using?
I'm using Cepstral. I looked at it a bit more. It speaks "minus" when the "-" is flush next to the number and "dash" when there is a space in between. I'll adjust the patch accordingly tomorrow. Just so I fully understand, do we want it to speak "dash" or "minus" for each of the following lines? This line contains -5 number This line contains -5.0 number This line contains -$5.0 number This line contains -$5 number This line contains -$$$$$ number
Yup, at the beginning of a line it does say "minus" for me as well. As for what it should speak, that's a good question.... I'm leaning towards minus because most of the time -- given the specified criteria for speaking it at SOME punctuation -- it's functioning as a minus sign. I'm trying to think of cases where a string begins with the dash character, is immediately followed by a number or a dollar sign, and the dash does NOT represent "minus." I cannot think of any right now -- other than your final example: This line contains -$$$$$ number So.... Maybe the criteria are: * Begins with the dash character * Is followed immediately by either: * A number, or * A single dollar sign *****followed by a number****** :-)
Created attachment 79804 [details] [review] Revised patch I've reworked the patch. It'll now say "minus" for the "-" character on each of these lines: This line contains -5 number This line contains -5.0 number This line contains -$5.0 number This line contains -$5 number when the punctuation level is SOME. At that level, the line: This line contains -$$$$$ number is spoken as `This line contains 5 dollar characters number` If the punctuation level is MOST or ALL, then that line is spoken as `This line contains dash 5 dollar characters number` I think this is a reasonable compromise. Anything more means making the code in adjustForRepeats() in util.py really ugly. Is the compromise okay?
It works really well. Thanks!!
> * Begins with the dash character > * Is followed immediately by either: > * A number, or > * A single dollar sign *****followed by a number****** In the interest of i18n, should we extend the dollar sign to all currency symbols? http://unicode.org/charts/PDF/U20A0.pdf
Good idea. How do I need to change the + nextCharMatches = (oldText[i + 1] in "$0123456789") line to include these new symbols? Can you point me at a similar example? Thanks.
(In reply to comment #19) > Good idea. > > How do I need to change the > > + nextCharMatches = (oldText[i + 1] in "$0123456789") > > line to include these new symbols? Can you point me at a similar example? > > > Thanks. > I don't know of a similar example, but I'd probably just start by making a list based upon the values in the PDF file. You'll need to first handle the symbols that are scattered about... unicodeCurrencySymbols = [\ u'\u0024', # dollar u'\u00a1', # cent ... u'\ufdfc' # rial ] Then, you should hopefully just be able to extend the list for those in the unicode currency block (I may be violating some law of the unicode universe, but I'm not sure): # Add EURO-CURRENCY SIGN to CEDI SIGN # for ordChar in range(ord(u'\u20a0'), ord(u'\u20b5') - 1): unicodeCurrencySymbols.append(unichr(ordChar)) Then, extend your decision to include a check for "oldText[i+1] in unicodeCurrencySymbols".
Created attachment 79870 [details] New version of the test file with some extra currency symbols in it.
Created attachment 79872 [details] [review] Rerevised patch including checks for more currency symbols Thanks Will. I've attached a new version of this patch. What's clear from this is that the speech output is going to be synthesizer dependent. Using the new sample test file (attached), Cepstral happily spoke the pound and euro symbols, but didn't say anything for the cent or the yen symbols.
> What's clear from this is that the speech output is going to be > synthesizer dependent. Using the new sample test file (attached), > Cepstral happily spoke the pound and euro symbols, but didn't > say anything for the cent or the yen symbols. Yeah. :-(
If you try reading the new sample text with DecTalk 5, Orca SAYS "minus" when the dash precedes a number or dollar sign. It SPELLS out "minus" with the other symbols.
Eeek! as Will would say. Is there anything we want to do about this? If we are detecting those symbols, there is no reason why we couldn't have a dictionary of pronunciations for them as well, and substitute the symbol for the word/phrase. This might be something that we leave for a future version as it would be foisting quite a few new translations on our translators now that there is a string freeze in place.
Changes (last patch) checked into SVN trunk. We discussed the dictionary of pronunciations idea and have decided to punt on it for now. It's something that's been discussed before. It's debatable whether this task should: a) be left to the synthesizers. b) be handled at the driver level in gnome-speech c) handled by the AT (Orca in the case). For now, we are going with a). I'll leave the bug open for now to give others a chance to try it and comment on it, before i close it out. Thanks.
Closed as FIXED.
In testing a patch for 413457, I discovered that if a negative number is the first thing on a line (or perhaps just the very first thing in the text), we'll hear "dash" followed by the number instead of "minus" followed by the number. This may be a special case that was accidentally missed. Reopening.
Well this is interesting: I opened gedit, and inserted the line -5 number On my Ubuntu Edgy system, both with Cepstral/Callie and Festival/kal_diphone Orca speaks this as "minus five number". I tried the same experiment on my Ubunti Feisty machine with Dectalk/Paul, Festival/kal_diphone and eSpeak/en-rhotic (en-r). For each of these, Orca speaks it as "dash five number". Hmmm. Still looking.
If you have orca.settings.verbalizePunctuationStyle set to orca.settings.PUNCTUATION_STYLE_SOME Orca will speak "minus". If you have it set to: orca.settings.PUNCTUATION_STYLE_MOST it will speak "dash". The "smarts" are in the __addVerbalizedPunctuation() routine in gnomespeechfactory.py (about line 715). Presumably this was intentional. Can we close this bug now? ;-) Putting it into the "[pending]" state.
Maybe there still is a problem here. The comment in the code says: # If this is a dash and the users punctuation level is not # NONE and the previous character is a white space character, # and the next character is a dollar sign or a digit, then # always speak it. It looks like a "-" character at the beginning of the line is still confusing it. Looking deeper...
Created attachment 84259 [details] [review] Patch to handle the situation where a line starts with a minus sign. I think I've found the fix. I've committed it. Please let me know if this fixes the problem for you. Changing the Summary line to "[pending]".
Much better. Thanks! Feel free to close as FIXED.