Bug 345399 – Key echo missing alpha numeric and punctuation keys.

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 345399 - Key echo missing alpha numeric and punctuation keys.


Summary:	Key echo missing alpha numeric and punctuation keys.


Status:	RESOLVED FIXED

Product:	orca
Classification:	Applications
Component:	general
Version:	0.2.x
Hardware:	Other All

Importance:	High normal
Target Milestone:	2.20.0
Assigned To:	Rich Burridge
QA Contact:	Orca Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2006-06-20 09:04 UTC by Javier
Modified:	2008-07-22 19:27 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Simple sample text document with three of the missing alpha numeric characters. (144 bytes, text/plain) 2007-05-17 15:23 UTC, Rich Burridge		Details
Patch to add debug messages to orca._isPrintableKey() (956 bytes, patch) 2007-05-17 16:05 UTC, Rich Burridge	needs-work	Details \| Review
Revised patch (sans debug messages). (1.06 KB, patch) 2007-05-21 18:29 UTC, Rich Burridge	committed	Details \| Review
Patch to extend the chnames dictionary. (31.47 KB, patch) 2007-05-23 21:57 UTC, Rich Burridge	committed	Details \| Review

Description Javier 2006-06-20 09:04:53 UTC

Please describe the problem:
When you enable key echo and checks alphanumeric and punctuation keys, if you press for example "ñ" or "ç/Ç" Orca doesn't speak these keys.

Steps to reproduce:
1. Go to Orca's preferences settings and key echo tab.
2. Check the key echo mode, and then alphanumeric and punctuation keys. 
3. Save the preferences and open a text editor (Gedit)
4. As you press letters on the keyboard Orca announces it.
5. So type in a Spanish keyboard the letter "ñ" or "ç" in French.



Actual results:
Orca doesn't speak these characters as you type.

Expected results:
These letters has to be spoken in the alphanumeric echo. There will be more extra characters in different languages so would need more testing to include them.

Does this happen every time?
yes

Other information:
Latest Orca's CVS build in a Ubuntu dapper drake.

Comment 1 Rich Burridge 2006-07-25 19:13:10 UTC

The test to see whether the character that the user typed in is
alphanumeric or a punctuation key is in _isPrintableKey() in
orca.py. Currently it is just doing a check against
"string.printable", which is defined as:

whitespace = ' \t\n\r\v\f'
lowercase = 'abcdefghijklmnopqrstuvwxyz'
uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
letters = lowercase + uppercase
digits = '0123456789'
punctuation = """!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
printable = digits + letters + punctuation + whitespace

We will need to improve or this for characters from other locales.

Comment 2 Willie Walker 2006-10-15 00:25:29 UTC

Add accessibility keyword.  Apologies for spam.

Comment 3 Willie Walker 2007-05-15 18:14:08 UTC

(In reply to comment #1)
> The test to see whether the character that the user typed in is
> alphanumeric or a punctuation key is in _isPrintableKey() in
> orca.py. Currently it is just doing a check against
> "string.printable", which is defined as:
> 
> whitespace = ' \t\n\r\v\f'
> lowercase = 'abcdefghijklmnopqrstuvwxyz'
> uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
> letters = lowercase + uppercase
> digits = '0123456789'
> punctuation = """!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
> printable = digits + letters + punctuation + whitespace
> 
> We will need to improve or this for characters from other locales.

One thing to do here might be to look to the unicode support.

wwalker@wwalker-laptop:~$ python
Python 2.5.1c1 (release25-maint, Apr 12 2007, 21:00:25) 
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a = u'\u00F1'
>>> import string
>>> a in string.printable
False
>>> a
u'\xf1'
>>> print a
ñ
>>> a.isalnum()
True

So...when determining printable, one might decode the string from UTF-8 to unicode, and then do a check like the following in orca.py:_isPrintableKey:

    unicodeString = event_string.decode("UTF-8")
    reply = (len(unicodeString) == 1) \
        and (unicodeString.isalnum() or unicodeString.isspace())

Or something like that.  The rest would be left up to chnames.py to have an appropriate entry and/or the speech synthesizer to speak the appropriate word(s) for a character.
  reply = a.isalnum()

Comment 4 Rich Burridge 2007-05-17 15:23:22 UTC

Created attachment 88335 [details]
Simple sample text document with three of the missing alpha numeric characters.

Comment 5 Rich Burridge 2007-05-17 16:05:30 UTC

Created attachment 88340 [details] [review]
Patch to add debug messages to orca._isPrintableKey()

Javier, we could use your help here. 

Could you please try applying the attached patch and then type in
some of the characters like ñ" or ç/Ç and let us know the debug output
you get?

I think this is only half of the patch. I suspect we will also need to
add some entries to the dictionary in chnames.py for those characters
so that they are spoken correctly.

If we are getting _isPrintableKey() returning True for those characters,
then I can work on the other part of the puzzle next.

Thanks.

Comment 6 Willie Walker 2007-05-18 12:50:15 UTC

You can modify your keymap to output these characters.  The following command changes your "\ |" key to "ñ ç" on Ubuntu:

xmodmap -e "keycode 51 = ntilde ccedilla"

The following command resets it:

xmodmap -e "keycode 51 = backslash bar"

Comment 7 Rich Burridge 2007-05-18 16:00:39 UTC

Excellent. Thanks! Okay, applying the previous patch and
then typing "añç" into a gedit window with Orca running, I get:

...
BRAILLE LINE:  'orca Application Orca Screen Reader / Magnifier Frame'
     VISIBLE:  'Orca Screen Reader / Magnifier F', cursor=1
SPEECH OUTPUT: 'Orca Screen Reader / Magnifier frame'
BRAILLE LINE:  'orca Application Orca Screen Reader / Magnifier Frame Preferences Button'
     VISIBLE:  'Preferences Button', cursor=1
SPEECH OUTPUT: ''
SPEECH OUTPUT: 'Preferences button'
BRAILLE LINE:  'gedit Application Unsaved Document 1 - gedit Frame'
     VISIBLE:  'Unsaved Document 1 - gedit Frame', cursor=1
SPEECH OUTPUT: 'Unsaved Document 1 - gedit frame'
BRAILLE LINE:  'gedit Application Unsaved Document 1 - gedit Frame TabList Unsaved Document 1 ScrollPane  $l'
     VISIBLE:  ' $l', cursor=1
SPEECH OUTPUT: 'Unsaved Document 1 page'
SPEECH OUTPUT: 'text '
orca._isPrintableKey: event_string:  a
orca._isPrintableKey: unicodeString:  a
orca._isPrintableKey: returning: True
SPEECH OUTPUT: 'a'
BRAILLE LINE:  'gedit Application *Unsaved Document 1 - gedit Frame TabList *Unsaved Document 1 ScrollPane a $l'
     VISIBLE:  'a $l', cursor=2
BRAILLE LINE:  'gedit Application *Unsaved Document 1 - gedit Frame TabList *Unsaved Document 1 ScrollPane a $l'
     VISIBLE:  'a $l', cursor=2
orca._isPrintableKey: event_string:  ñ
orca._isPrintableKey: unicodeString:  ñ
orca._isPrintableKey: returning: True
SPEECH OUTPUT: 'n tilde'
BRAILLE LINE:  'gedit Application *Unsaved Document 1 - gedit Frame TabList *Unsaved Document 1 ScrollPane añ $l'
     VISIBLE:  'añ $l', cursor=3
BRAILLE LINE:  'gedit Application *Unsaved Document 1 - gedit Frame TabList *Unsaved Document 1 ScrollPane añ $l'
     VISIBLE:  'añ $l', cursor=3
orca._isPrintableKey: event_string:  Shift_R
orca._isPrintableKey: unicodeString:  Shift_R
orca._isPrintableKey: returning: False
SPEECH OUTPUT: 'right shift'
orca._isPrintableKey: event_string:  ç
orca._isPrintableKey: unicodeString:  ç
orca._isPrintableKey: returning: True
SPEECH OUTPUT: 'ç'
BRAILLE LINE:  'gedit Application *Unsaved Document 1 - gedit Frame TabList *Unsaved Document 1 ScrollPane añç $l'
     VISIBLE:  'añç $l', cursor=4
BRAILLE LINE:  'gedit Application *Unsaved Document 1 - gedit Frame TabList *Unsaved Document 1 ScrollPane añç $l'
     VISIBLE:  'añç $l', cursor=4
SPEECH OUTPUT: 'Goodbye.'
BRAILLE LINE:  'Goodbye.'
     VISIBLE:  'Goodbye.', cursor=0

So it looks like _isPrintableKey() is doing the right thing
and it's nicely speaking "n tilde" because of:

chnames.py:chnames[u'\u00F1'] = _("n tilde")

but we don't have an entry in chnames for ç

Where can I find a full list of such characters, their unicode
values and how they should be spoken, so that I can update chnames?

Comment 8 Joanmarie Diggs (IRC: joanie) 2007-05-18 16:11:39 UTC

(In reply to comment #7)
> Where can I find a full list of such characters, their unicode
> values and how they should be spoken, so that I can update chnames?

unicode.org has tables galore.  Try: http://www.unicode.org/charts/PDF/U0080.pdf

As for how they should be spoken, you'll see that some adjustment is in order.  For instance:

    00E7   LATIN SMALL LETTER C WITH CEDILLA

probably should just be "c cedilla"

Comment 9 Willie Walker 2007-05-18 16:37:49 UTC

Just a note of caution: I think there are about 50000 unicode characters to consider.  I don't think we want all of those in Orca.  

As a fallback, we should depend upon the synthesizer to do the right thing and speak alphanumerics appropriately.  For now, I think chnames.py should be viewed generally as something to contain the exceptions for what we would not expect a user's TTS engine to speak when the engine is using the locale of the user.  For example, punctuation, bullets, and a few others.  n tilde probably shouldn't be in there.

If one wanted to err on the cautious side a little bit, however, it might be OK to include characters 160-255 of the ISO LATIN-1 character set:

http://htmlhelp.com/reference/charset/latin1.gif

Keep in mind, however, that our translators will need to come up with translations for each of these.  As such, I'd still opt for relying upon the TTS engine to do the right thing with alphanumerics from the user's character set.

Comment 10 Joanmarie Diggs (IRC: joanie) 2007-05-18 16:46:34 UTC

> user.  For example, punctuation, bullets, and a few others.  n tilde probably
> shouldn't be in there.

That's what I thought too.  It's there because of this comment:
(http://bugzilla.gnome.org/show_bug.cgi?id=416971#c7)
> I'm closing bug 418147 with the intent that the patch for this bug will include
> at least the ñ from niño.  :-)  No need to go crazy with all characters,
> though.

Comment 11 Rich Burridge 2007-05-18 17:44:38 UTC

> If one wanted to err on the cautious side a little bit, 
> however, it might be OK to include characters 160-255 
> of the ISO LATIN-1 character set:

Before I spend more time on this then I have to, I want to clarify 
exactly what I should put in chnames. For example, 162 has a 
description of "cent sign" but we've currently just got "cent" 
in chnames for that one.

I'd like Will or Mike to go through the list below and tell me 
exactly what you'd like me to use as the value of the chnames 
dictionary entries. For example, how are we going to tell the 
difference between "A acute" and "a acute" and between "THORN"
and "thorn"?

160 non-breaking space
161 inverted exclamation
162 cent sign
163 pound sign
164 currency sign
165 yen sign
166 broken bar
167 section sign
168 umlaut or diaeresis
169 copyright sign
170 feminine ordinal
171 left angle quotes
172 logical not sign
173 soft hyphen
174 registered trademark
175 spacing macron
176 degree sign
177 plus-minus sign
178 superscript 2
179 superscript 3
180 spacing acute
181 micro sign
182 paragraph sign
183 middle dot
184 spacing cedilla
185 superscript 1
186 masculine ordinal
187 right angle quotes
188 one quarter
189 one half
190 three quarters
191 inverted question mark
192 A grave
193 A acute
194 A circumflex
195 A tilde
196 A umlaut
197 A ring
198 AE ligature
199 C cedilla
200 E grave
201 E acute
202 E circumflex
203 E umlaut
204 I grave
205 I acute
206 I circumflex
207 I umlaut
208 ETH
209 N tilde
210 O grave
211 O acute
212 O circumflex
213 O tilde
214 O umlaut
215 multiplication sign
216 O slash
217 I grave
218 U acute
219 U circumflex
220 U umlaut
221 Y acute
222 THORN
223 sharp s
224 a grave
225 a acute
226 a circumflex
227 a tilde
228 a umlaut
229 a ring
230 ae ligature
231 c cedilla
232 e grave
233 e acute
234 e circumflex
235 e umlaut
236 i grave
237 i acute
238 i circumflex
239 i umlaut
240 eth
241 n tilde
242 o grave
243 o acute
244 o circumflex
245 o tilde
246 o umlaut
247 division sign
248 o slash
249 u grave
250 u acute
251 u circumflex
252 u umlaut
253 y acute
254 thorn
255 y umlaut

> n tilde probably shouldn't be in there.

I'm confused. "n tilde" is code 241, which falls in the range
160-255. Should we still keep it or remove it?

Are there any others that should be removed?

Comment 12 Willie Walker 2007-05-18 17:54:55 UTC

(In reply to comment #11)
> > If one wanted to err on the cautious side a little bit, 
> > however, it might be OK to include characters 160-255 
> > of the ISO LATIN-1 character set:
> 
> Before I spend more time on this then I have to, I want to clarify 
> exactly what I should put in chnames. For example, 162 has a 
> description of "cent sign" but we've currently just got "cent" 
> in chnames for that one.

I think the first question to answer is if we want to err on the cautious side or not.  For now, I lean towards thinking that we do not need to do this.  

Instead, we can err on relying upon the TTS engine to know about the alphanumerics for the user's locale.  This may or may not be the right thing to do, and I think we need input from our users regarding what they get from their speech synthesis engines.  

Javier, with Rich's patch, does your TTS engine now speak characters as you expect?

Comment 13 Rich Burridge 2007-05-21 18:29:20 UTC

Created attachment 88560 [details] [review]
Revised patch (sans debug messages).

Patch committed. We might need to add a swag of entries in chnames,
but let's see what the TTS engines do in the given languages first.

Comment 14 Rich Burridge 2007-05-21 18:29:58 UTC

Setting the bug into a [pending] state.

Comment 15 Willie Walker 2007-05-22 20:17:39 UTC

In the team meeting today, we decided to include the extended ISO-LATIN 1 character set and Mike volunteered to get the wording for each character.

Comment 16 Joanmarie Diggs (IRC: joanie) 2007-05-23 18:01:22 UTC

I emailed Rich a list so I'm removing my name from the summary.  Rich if you need anything else on this front, you know where to find me -- or rather where to put my name. ;-)

Comment 17 Rich Burridge 2007-05-23 18:24:16 UTC

Yup. That's fine, and thanks! I plan to add a patch to this bug
with all the new chnames entries (hopefully this afternoon or
tomorrow).

Comment 18 Rich Burridge 2007-05-23 21:57:59 UTC

Created attachment 88705 [details] [review]
Patch to extend the chnames dictionary.

I'm combined what Joanie created with that was already in
chnames.py. As Joanie mentioned, there were some cases where 
entries were previous present, where there is a now a difference 
in what is spoken. Will commented that "either way is fine",
so I've taken what Joanie gave me. Please let me know if 
I need to adjust back to any of the old entries. 

They are:

WAS: chnames["."] = _("dot")
NOW: chnames["."] = _("period")

WAS: chnames["_"] = _("underscore")
NOW: chnames["_"] = _("underline")

WAS: chnames["<"] = _("less than")
NOW: chnames["<"] = _("less")

WAS: chnames[">"] = _("greater than")
NOW: chnames[">"] = _("greater")

WAS: chnames["|"] = _("vertical line")
NOW: chnames["|"] = _("vertical bar")

WAS: chnames["`"] = _("grave accent")
NOW: chnames["`"] = _("grave")

WAS: chnames[u'\u00A2'] = _("cent")
NOW: chnames[u'\u00a2'] = _("cents")

WAS: chnames[u'\u00A3'] = _("pound")
NOW: chnames[u'\u00a3'] = _("pounds")

WAS: chnames[u'\u00AC'] = _("not")
NOW: chnames[u'\u00ac'] = _("logical not")

WAS: chnames[u'\u00B0'] =  _("degree")
NOW: chnames[u'\u00b0'] = _("degrees")

WAS: chnames[u'\u00B1'] = _("plus minus")
NOW: chnames[u'\u00b1'] = _("plus or minus")

WAS: chnames[u'\u00B2'] = _("2 superscript")
NOW: chnames[u'\u00b2'] = _("superscript 2")

WAS: chnames[u'\u00B3'] = _("3 superscript")
NOW: chnames[u'\u00b3'] = _("superscript 3")

WAS: chnames[u'\u00BC'] = _("one quarter")
NOW: chnames[u'\u00bc'] = _("one fourth")

WAS: chnames[u'\u00BE'] = _("three quarters")
NOW: chnames[u'\u00be'] = _("three fourths")

Testing for uppercase in sayCharacter() in default.py
now does "character.decode("UTF-8").isupper()"

The sayCharacter() method now also calls 
chnames.getCharacterName(character) rather than 
just passing "character" to speech.speak(). 

Patch not committed yet. Please test.

Comment 19 Mike Pedersen 2007-05-23 23:16:08 UTC

The only changes I'd like to see here are:  I'd like to see < and > put back the way we had them.  This is just a personal preference so if anyone really has a lot of pushback I'm OK with whatever.

Comment 20 Willie Walker 2007-05-25 15:57:59 UTC

(In reply to comment #19)
> The only changes I'd like to see here are:  I'd like to see < and > put back
> the way we had them.  This is just a personal preference so if anyone really
> has a lot of pushback I'm OK with whatever.

Works well for me.  Since Mike is OK with whatever, I say commit the patch as is -- "less" and "greater" seem to be the de facto standard.  We can always deviate if users complain. Thanks Rich!

Comment 21 Rich Burridge 2007-05-25 16:25:47 UTC

Patch committed. Closing as FIXED.

Comment 22 Mike Pedersen 2007-05-26 19:10:57 UTC

One thing that has to change is the period.  The problem here is decimals, URLs and email addresses.   They just do not sound at all right with the new implementation.  In orca we currently handle this symbol different weather or not it is being used to mark the end of a sentense or some other type of expression.  
We should either change period back to dot or speak period in normal text and dot every where else.  I just changed chnames.py back to dot and now have the behavior I would expect but I'll leave the checkin to Rich so he can decide on the proper solution.

Comment 23 Rich Burridge 2007-05-28 14:36:42 UTC

Changing this back to a dot seems fine with me. Will, okay with you?

Comment 24 Willie Walker 2007-05-28 20:49:26 UTC

Fine by me.  Thanks!

Comment 25 Rich Burridge 2007-05-29 15:09:21 UTC

chnames entry for "." changed from "period" back to "dot". 
Checked into SVN trunk/HEAD.