After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 392939 - [pending] "-" should be spoken when used as a negative sign at "some" punctuation
[pending] "-" should be spoken when used as a negative sign at "some" punctua...
Status: RESOLVED FIXED
Product: orca
Classification: Applications
Component: speech
2.17.x
Other All
: Normal enhancement
: ---
Assigned To: Rich Burridge
Orca Maintainers
Depends on:
Blocks:
 
 
Reported: 2007-01-04 23:28 UTC by Joanmarie Diggs (IRC: joanie)
Modified: 2007-03-14 14:41 UTC
See Also:
GNOME target: ---
GNOME version: 2.17/2.18


Attachments
Patch to implement this enhancement. (1.17 KB, patch)
2007-01-08 19:25 UTC, Rich Burridge
none Details | Review
Test file I've been using. (305 bytes, text/plain)
2007-01-08 19:30 UTC, Rich Burridge
  Details
Orca debug output when up arrowing through sample file. (3.37 KB, text/plain)
2007-01-08 21:37 UTC, Rich Burridge
  Details
Orca debug output when up arrowing through sample file (Joanie's) (2.66 KB, application/octet-stream)
2007-01-08 22:35 UTC, Joanmarie Diggs (IRC: joanie)
  Details
Revised patch (1.54 KB, patch)
2007-01-09 01:10 UTC, Rich Burridge
none Details | Review
New version of the test file with some extra currency symbols in it. (551 bytes, text/plain)
2007-01-09 17:40 UTC, Rich Burridge
  Details
Rerevised patch including checks for more currency symbols (3.80 KB, patch)
2007-01-09 17:52 UTC, Rich Burridge
none Details | Review
Patch to handle the situation where a line starts with a minus sign. (1.17 KB, patch)
2007-03-08 19:11 UTC, Rich Burridge
committed Details | Review

Description Joanmarie Diggs (IRC: joanie) 2007-01-04 23:28:31 UTC
The "-" character is normally only spoken when the user's punctuation setting is "most" or "all".   Under normal circumstances, this is exactly as it should be; however, users who prefer a punctuation level of "some" might conclude that a number is positive when in reality it is not.

I tend to think of the "some" level being the functional equivalent of what one would speak if reading the document aloud, and I think that is how many users apply it:  to read for content smoothly and naturally rather than proofreading.  When reading for content, the negative sign is still of interest (just like the dollar sign, decimal point, percent sign, etc.) .  As such, I'd like to propose that when a word begins with a "-" followed immediately by a digit or a dollar sign, it be spoken at the "some" level of punctuation. 

Thanks!!
Comment 1 Rich Burridge 2007-01-05 00:54:24 UTC
Thanks Joanie. Mike?
Comment 2 Mike Pedersen 2007-01-05 01:37:44 UTC
I would agree with this.  
Comment 3 Rich Burridge 2007-01-08 18:18:52 UTC
Just starting to look at this. Thinking about this a bit more, 
shouldn't the "-" at the beginning of a number or a dollar value 
be spoken irrespective of the punctuation level? 

Comment 4 Rich Burridge 2007-01-08 18:31:13 UTC
For example, with the punctuation level set to PUNCTUATION_STYLE_NONE
the fourth line below in gedit, is still spoken incorrectly:

This line contain 5.0 number
This line contains -5.0 number
This line contains $5.0 number
This line contains -$5.0 number
Comment 5 Mike Pedersen 2007-01-08 18:51:33 UTC
In my opinion none means none.  I really don't know why anyone would want to use none but some do.  
Comment 6 Joanmarie Diggs (IRC: joanie) 2007-01-08 18:56:53 UTC
I was just going to say what Mike said, but I see he beat me to it. :-)
Comment 7 Rich Burridge 2007-01-08 19:25:56 UTC
Created attachment 79778 [details] [review]
Patch to implement this enhancement.

Okay, fair enough. Patch attached (and NOT checked into
SVN trunk yet).
Comment 8 Rich Burridge 2007-01-08 19:30:56 UTC
Created attachment 79779 [details]
Test file I've been using.

Here's the test file I've been using. I've been testing
with the punctuation level set to SOME and NONE.

Couple of notes:

1/ It's not saying "5 dollar characters" for the "$$$$$"
   string when we have the punctuation level set to NONE. Is
   that correct? If the punctuation level is set to SOME,
   MOST or ALL it correct says "5 dollar characters".

2/ If you down arrow pass the last line, it speaks the last
   line again. At least for me. Are others hearing this too?
   If so, it's probably a separate newline related problem, and
   I'll file another bug for it.
Comment 9 Joanmarie Diggs (IRC: joanie) 2007-01-08 20:37:57 UTC
Thanks Rich.

I tried the patch.  I am hearing the dash at SOME punctuation WHEN it precedes the dollar sign.  When it immediately precedes the 5, I am still not hearing it.

Couple of responses:

1/ NONE is NONE. :-)  Like Mike said, I don't know why anyone would go with NONE, but they do. And when they do, they expect.... uh..... NONE. ;-) So, yes, I'd say that's the correct behavior.

2/ Yup.  Me too.
Comment 10 Rich Burridge 2007-01-08 21:21:58 UTC
> When it immediately precedes the 5, I am still not hearing it.

What I'm getting in this case (punctuation level == SOME and lines
2 and 4 of the sample file), is that it speaks " ... minus 5 ...". So that
suggests that something else is affecting this. Some other setting that
you and I current have set differently.

Note that I also tried a variant of the patch with:

              nextCharMatches = (oldText[i + 1] in "$0123456789")

but then it speaks "dash five" for "-5", whereas I like it speaking 
"minus five". Maybe "dash five" is fine. Comments?

> NONE is NONE. :-) 

Okay. I'll leave that one alone. :-)

> Yup.  Me too.

I'll file a separate bug on this. Lynn, as this is your area, could
you have a look at it please?

Thanks.

Comment 11 Rich Burridge 2007-01-08 21:28:17 UTC
The bug filed for the problem reported in note #2 above, 
is bug #394397
Comment 12 Rich Burridge 2007-01-08 21:37:54 UTC
Created attachment 79788 [details]
Orca debug output when up arrowing through sample file.

I have the punctuation level set to orca.settings.PUNCTUATION_STYLE_SOME
and the Orca debug level set to orca.debug.LEVEL_INFO. I started with the
cursor at the bottom of the file (i.e. below the last line with text on).
Comment 13 Joanmarie Diggs (IRC: joanie) 2007-01-08 22:35:43 UTC
Created attachment 79796 [details]
Orca debug output when up arrowing through sample file (Joanie's)

debug.out back at ya! :-)  I might be tired, but they look the same to me....

When I made the change you mentioned, namely:
              nextCharMatches = (oldText[i + 1] in "$0123456789")

Orca started speaking the dash when it was immediately followed by a number.  So.... Could it be synthesizer related?  Maybe your synthesizer is smart enough to be on the lookout for a dash preceding a number and to call it "minus"?? Mine apparently are not.  I tried with Festival and with DecTalk 5.  What are you using?
Comment 14 Rich Burridge 2007-01-08 23:40:10 UTC
I'm using Cepstral. 

I looked at it a bit more. It speaks "minus" when the "-" is flush 
next to the number and "dash" when there is a space in between. 

I'll adjust the patch accordingly tomorrow.

Just so I fully understand, do we want it to speak "dash" or
"minus" for each of the following lines?

This line contains -5 number
This line contains -5.0 number
This line contains -$5.0 number
This line contains -$5 number
This line contains -$$$$$ number

Comment 15 Joanmarie Diggs (IRC: joanie) 2007-01-09 00:06:35 UTC
Yup, at the beginning of a line it does say "minus" for me as well.

As for what it should speak, that's a good question.... 

I'm leaning towards minus because most of the time -- given the specified criteria for speaking it at SOME punctuation -- it's functioning as a minus sign.  I'm trying to think of cases where a string begins with the dash character, is immediately followed by a number or a dollar sign, and the dash does NOT represent "minus." I cannot think of any right now -- other than your final example:

This line contains -$$$$$ number

So.... Maybe the criteria are:

* Begins with the dash character
* Is followed immediately by either:
  * A number, or
  * A single dollar sign *****followed by a number******

:-)
Comment 16 Rich Burridge 2007-01-09 01:10:49 UTC
Created attachment 79804 [details] [review]
Revised patch

I've reworked the patch. It'll now say "minus" for the "-" character
on each of these lines:

This line contains -5 number
This line contains -5.0 number
This line contains -$5.0 number
This line contains -$5 number

when the punctuation level is SOME. At that level, the line:

This line contains -$$$$$ number

is spoken as `This line contains 5 dollar characters number`

If the punctuation level is MOST or ALL, then that line is
spoken as `This line contains dash 5 dollar characters number`

I think this is a reasonable compromise. Anything more means
making the code in adjustForRepeats() in util.py really ugly.

Is the compromise okay?
Comment 17 Joanmarie Diggs (IRC: joanie) 2007-01-09 02:51:02 UTC
It works really well.  Thanks!!
Comment 18 Willie Walker 2007-01-09 13:35:19 UTC
> * Begins with the dash character
> * Is followed immediately by either:
>   * A number, or
>   * A single dollar sign *****followed by a number******

In the interest of i18n, should we extend the dollar sign to all currency symbols?

http://unicode.org/charts/PDF/U20A0.pdf
Comment 19 Rich Burridge 2007-01-09 15:53:52 UTC
Good idea.

How do I need to change the 

+                    nextCharMatches = (oldText[i + 1] in "$0123456789")

line to include these new symbols? Can you point me at a similar example?


Thanks.
Comment 20 Willie Walker 2007-01-09 16:25:08 UTC
(In reply to comment #19)
> Good idea.
> 
> How do I need to change the 
> 
> +                    nextCharMatches = (oldText[i + 1] in "$0123456789")
> 
> line to include these new symbols? Can you point me at a similar example?
> 
> 
> Thanks.
> 

I don't know of a similar example, but I'd probably just start by making a list based upon the values in the PDF file.  You'll need to first handle the symbols that are scattered about...

unicodeCurrencySymbols = [\
u'\u0024', # dollar
u'\u00a1', # cent
...
u'\ufdfc' # rial
]

Then, you should hopefully just be able to extend the list for those in the unicode currency block (I may be violating some law of the unicode universe, but I'm not sure):

# Add EURO-CURRENCY SIGN to CEDI SIGN
#
for ordChar in range(ord(u'\u20a0'), ord(u'\u20b5') - 1):
    unicodeCurrencySymbols.append(unichr(ordChar))

Then, extend your decision to include a check for "oldText[i+1] in unicodeCurrencySymbols".
Comment 21 Rich Burridge 2007-01-09 17:40:38 UTC
Created attachment 79870 [details]
New version of the test file with some extra currency symbols in it.
Comment 22 Rich Burridge 2007-01-09 17:52:28 UTC
Created attachment 79872 [details] [review]
Rerevised patch including checks for more currency symbols

Thanks Will. I've attached a new version of this patch.
What's clear from this is that the speech output is going to be
synthesizer dependent. Using the new sample test file (attached),
Cepstral happily spoke the pound and euro symbols, but didn't
say anything for the cent or the yen symbols.
Comment 23 Willie Walker 2007-01-09 18:17:57 UTC
> What's clear from this is that the speech output is going to be
> synthesizer dependent. Using the new sample test file (attached),
> Cepstral happily spoke the pound and euro symbols, but didn't
> say anything for the cent or the yen symbols.

Yeah.  :-(
Comment 24 Joanmarie Diggs (IRC: joanie) 2007-01-09 18:33:56 UTC
If you try reading the new sample text with DecTalk 5, Orca SAYS "minus" when the dash precedes a number or dollar sign.  It SPELLS out "minus" with the other symbols.
Comment 25 Rich Burridge 2007-01-09 19:25:42 UTC
Eeek! as Will would say. Is there anything we want to do about this?

If we are detecting those symbols, there is no reason why we couldn't
have a dictionary of pronunciations for them as well, and substitute
the symbol for the word/phrase. This might be something that we leave
for a future version as it would be foisting quite a few new translations
on our translators now that there is a string freeze in place.
Comment 26 Rich Burridge 2007-01-09 22:10:20 UTC
Changes (last patch) checked into SVN trunk. We discussed
the dictionary of pronunciations idea and have decided to
punt on it for now. It's something that's been discussed
before. It's debatable whether this task should:

a) be left to the synthesizers.
b) be handled at the driver level in gnome-speech
c) handled by the AT (Orca in the case).

For now, we are going with a).

I'll leave the bug open for now to give others a chance to
try it and comment on it, before i close it out. Thanks.
Comment 27 Rich Burridge 2007-01-25 21:39:17 UTC
Closed as FIXED.
Comment 28 Willie Walker 2007-03-01 17:28:32 UTC
In testing a patch for 413457, I discovered that if a negative number is the first thing on a line (or perhaps just the very first thing in the text), we'll hear "dash" followed by the number instead of "minus" followed by the number.  This may be a special case that was accidentally missed.  Reopening.
Comment 29 Rich Burridge 2007-03-07 21:48:18 UTC
Well this is interesting:

I opened gedit, and inserted the line

-5 number

On my Ubuntu Edgy system, both with Cepstral/Callie and Festival/kal_diphone
Orca speaks this as 

"minus five number".

I tried the same experiment on my Ubunti Feisty machine with 
Dectalk/Paul, Festival/kal_diphone and eSpeak/en-rhotic (en-r).
For each of these, Orca speaks it as

"dash five number".

Hmmm. Still looking.
Comment 30 Rich Burridge 2007-03-08 18:55:07 UTC
If you have

orca.settings.verbalizePunctuationStyle 

set to

orca.settings.PUNCTUATION_STYLE_SOME

Orca will speak "minus". If you have it set to:

orca.settings.PUNCTUATION_STYLE_MOST

it will speak "dash". The "smarts" are in the
__addVerbalizedPunctuation() routine in gnomespeechfactory.py
(about line 715).

Presumably this was intentional.

Can we close this bug now? ;-)

Putting it into the "[pending]" state.
Comment 31 Rich Burridge 2007-03-08 18:58:14 UTC
Maybe there still is a problem here. The comment in the code says:

                # If this is a dash and the users punctuation level is not
                # NONE and the previous character is a white space character,
                # and the next character is a dollar sign or a digit, then
                # always speak it. 

It looks like a "-" character at the beginning of the line is still
confusing it. Looking deeper...
Comment 32 Rich Burridge 2007-03-08 19:11:40 UTC
Created attachment 84259 [details] [review]
Patch to handle the situation where a line starts with a minus sign.

I think I've found the fix. I've committed it.
Please let me know if this fixes the problem for you.

Changing the Summary line to "[pending]".
Comment 33 Willie Walker 2007-03-14 14:01:16 UTC
Much better.  Thanks!  Feel free to close as FIXED.