GNOME Bugzilla – Bug 345399
Key echo missing alpha numeric and punctuation keys.
Last modified: 2008-07-22 19:27:07 UTC
Please describe the problem: When you enable key echo and checks alphanumeric and punctuation keys, if you press for example "ñ" or "ç/Ç" Orca doesn't speak these keys. Steps to reproduce: 1. Go to Orca's preferences settings and key echo tab. 2. Check the key echo mode, and then alphanumeric and punctuation keys. 3. Save the preferences and open a text editor (Gedit) 4. As you press letters on the keyboard Orca announces it. 5. So type in a Spanish keyboard the letter "ñ" or "ç" in French. Actual results: Orca doesn't speak these characters as you type. Expected results: These letters has to be spoken in the alphanumeric echo. There will be more extra characters in different languages so would need more testing to include them. Does this happen every time? yes Other information: Latest Orca's CVS build in a Ubuntu dapper drake.
The test to see whether the character that the user typed in is alphanumeric or a punctuation key is in _isPrintableKey() in orca.py. Currently it is just doing a check against "string.printable", which is defined as: whitespace = ' \t\n\r\v\f' lowercase = 'abcdefghijklmnopqrstuvwxyz' uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' letters = lowercase + uppercase digits = '0123456789' punctuation = """!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~""" printable = digits + letters + punctuation + whitespace We will need to improve or this for characters from other locales.
Add accessibility keyword. Apologies for spam.
(In reply to comment #1) > The test to see whether the character that the user typed in is > alphanumeric or a punctuation key is in _isPrintableKey() in > orca.py. Currently it is just doing a check against > "string.printable", which is defined as: > > whitespace = ' \t\n\r\v\f' > lowercase = 'abcdefghijklmnopqrstuvwxyz' > uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' > letters = lowercase + uppercase > digits = '0123456789' > punctuation = """!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~""" > printable = digits + letters + punctuation + whitespace > > We will need to improve or this for characters from other locales. One thing to do here might be to look to the unicode support. wwalker@wwalker-laptop:~$ python Python 2.5.1c1 (release25-maint, Apr 12 2007, 21:00:25) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> a = u'\u00F1' >>> import string >>> a in string.printable False >>> a u'\xf1' >>> print a ñ >>> a.isalnum() True So...when determining printable, one might decode the string from UTF-8 to unicode, and then do a check like the following in orca.py:_isPrintableKey: unicodeString = event_string.decode("UTF-8") reply = (len(unicodeString) == 1) \ and (unicodeString.isalnum() or unicodeString.isspace()) Or something like that. The rest would be left up to chnames.py to have an appropriate entry and/or the speech synthesizer to speak the appropriate word(s) for a character. reply = a.isalnum()
Created attachment 88335 [details] Simple sample text document with three of the missing alpha numeric characters.
Created attachment 88340 [details] [review] Patch to add debug messages to orca._isPrintableKey() Javier, we could use your help here. Could you please try applying the attached patch and then type in some of the characters like ñ" or ç/Ç and let us know the debug output you get? I think this is only half of the patch. I suspect we will also need to add some entries to the dictionary in chnames.py for those characters so that they are spoken correctly. If we are getting _isPrintableKey() returning True for those characters, then I can work on the other part of the puzzle next. Thanks.
You can modify your keymap to output these characters. The following command changes your "\ |" key to "ñ ç" on Ubuntu: xmodmap -e "keycode 51 = ntilde ccedilla" The following command resets it: xmodmap -e "keycode 51 = backslash bar"
Excellent. Thanks! Okay, applying the previous patch and then typing "añç" into a gedit window with Orca running, I get: ... BRAILLE LINE: 'orca Application Orca Screen Reader / Magnifier Frame' VISIBLE: 'Orca Screen Reader / Magnifier F', cursor=1 SPEECH OUTPUT: 'Orca Screen Reader / Magnifier frame' BRAILLE LINE: 'orca Application Orca Screen Reader / Magnifier Frame Preferences Button' VISIBLE: 'Preferences Button', cursor=1 SPEECH OUTPUT: '' SPEECH OUTPUT: 'Preferences button' BRAILLE LINE: 'gedit Application Unsaved Document 1 - gedit Frame' VISIBLE: 'Unsaved Document 1 - gedit Frame', cursor=1 SPEECH OUTPUT: 'Unsaved Document 1 - gedit frame' BRAILLE LINE: 'gedit Application Unsaved Document 1 - gedit Frame TabList Unsaved Document 1 ScrollPane $l' VISIBLE: ' $l', cursor=1 SPEECH OUTPUT: 'Unsaved Document 1 page' SPEECH OUTPUT: 'text ' orca._isPrintableKey: event_string: a orca._isPrintableKey: unicodeString: a orca._isPrintableKey: returning: True SPEECH OUTPUT: 'a' BRAILLE LINE: 'gedit Application *Unsaved Document 1 - gedit Frame TabList *Unsaved Document 1 ScrollPane a $l' VISIBLE: 'a $l', cursor=2 BRAILLE LINE: 'gedit Application *Unsaved Document 1 - gedit Frame TabList *Unsaved Document 1 ScrollPane a $l' VISIBLE: 'a $l', cursor=2 orca._isPrintableKey: event_string: ñ orca._isPrintableKey: unicodeString: ñ orca._isPrintableKey: returning: True SPEECH OUTPUT: 'n tilde' BRAILLE LINE: 'gedit Application *Unsaved Document 1 - gedit Frame TabList *Unsaved Document 1 ScrollPane añ $l' VISIBLE: 'añ $l', cursor=3 BRAILLE LINE: 'gedit Application *Unsaved Document 1 - gedit Frame TabList *Unsaved Document 1 ScrollPane añ $l' VISIBLE: 'añ $l', cursor=3 orca._isPrintableKey: event_string: Shift_R orca._isPrintableKey: unicodeString: Shift_R orca._isPrintableKey: returning: False SPEECH OUTPUT: 'right shift' orca._isPrintableKey: event_string: ç orca._isPrintableKey: unicodeString: ç orca._isPrintableKey: returning: True SPEECH OUTPUT: 'ç' BRAILLE LINE: 'gedit Application *Unsaved Document 1 - gedit Frame TabList *Unsaved Document 1 ScrollPane añç $l' VISIBLE: 'añç $l', cursor=4 BRAILLE LINE: 'gedit Application *Unsaved Document 1 - gedit Frame TabList *Unsaved Document 1 ScrollPane añç $l' VISIBLE: 'añç $l', cursor=4 SPEECH OUTPUT: 'Goodbye.' BRAILLE LINE: 'Goodbye.' VISIBLE: 'Goodbye.', cursor=0 So it looks like _isPrintableKey() is doing the right thing and it's nicely speaking "n tilde" because of: chnames.py:chnames[u'\u00F1'] = _("n tilde") but we don't have an entry in chnames for ç Where can I find a full list of such characters, their unicode values and how they should be spoken, so that I can update chnames?
(In reply to comment #7) > Where can I find a full list of such characters, their unicode > values and how they should be spoken, so that I can update chnames? unicode.org has tables galore. Try: http://www.unicode.org/charts/PDF/U0080.pdf As for how they should be spoken, you'll see that some adjustment is in order. For instance: 00E7 LATIN SMALL LETTER C WITH CEDILLA probably should just be "c cedilla"
Just a note of caution: I think there are about 50000 unicode characters to consider. I don't think we want all of those in Orca. As a fallback, we should depend upon the synthesizer to do the right thing and speak alphanumerics appropriately. For now, I think chnames.py should be viewed generally as something to contain the exceptions for what we would not expect a user's TTS engine to speak when the engine is using the locale of the user. For example, punctuation, bullets, and a few others. n tilde probably shouldn't be in there. If one wanted to err on the cautious side a little bit, however, it might be OK to include characters 160-255 of the ISO LATIN-1 character set: http://htmlhelp.com/reference/charset/latin1.gif Keep in mind, however, that our translators will need to come up with translations for each of these. As such, I'd still opt for relying upon the TTS engine to do the right thing with alphanumerics from the user's character set.
> user. For example, punctuation, bullets, and a few others. n tilde probably > shouldn't be in there. That's what I thought too. It's there because of this comment: (http://bugzilla.gnome.org/show_bug.cgi?id=416971#c7) > I'm closing bug 418147 with the intent that the patch for this bug will include > at least the ñ from niño. :-) No need to go crazy with all characters, > though.
> If one wanted to err on the cautious side a little bit, > however, it might be OK to include characters 160-255 > of the ISO LATIN-1 character set: Before I spend more time on this then I have to, I want to clarify exactly what I should put in chnames. For example, 162 has a description of "cent sign" but we've currently just got "cent" in chnames for that one. I'd like Will or Mike to go through the list below and tell me exactly what you'd like me to use as the value of the chnames dictionary entries. For example, how are we going to tell the difference between "A acute" and "a acute" and between "THORN" and "thorn"? 160 non-breaking space 161 inverted exclamation 162 cent sign 163 pound sign 164 currency sign 165 yen sign 166 broken bar 167 section sign 168 umlaut or diaeresis 169 copyright sign 170 feminine ordinal 171 left angle quotes 172 logical not sign 173 soft hyphen 174 registered trademark 175 spacing macron 176 degree sign 177 plus-minus sign 178 superscript 2 179 superscript 3 180 spacing acute 181 micro sign 182 paragraph sign 183 middle dot 184 spacing cedilla 185 superscript 1 186 masculine ordinal 187 right angle quotes 188 one quarter 189 one half 190 three quarters 191 inverted question mark 192 A grave 193 A acute 194 A circumflex 195 A tilde 196 A umlaut 197 A ring 198 AE ligature 199 C cedilla 200 E grave 201 E acute 202 E circumflex 203 E umlaut 204 I grave 205 I acute 206 I circumflex 207 I umlaut 208 ETH 209 N tilde 210 O grave 211 O acute 212 O circumflex 213 O tilde 214 O umlaut 215 multiplication sign 216 O slash 217 I grave 218 U acute 219 U circumflex 220 U umlaut 221 Y acute 222 THORN 223 sharp s 224 a grave 225 a acute 226 a circumflex 227 a tilde 228 a umlaut 229 a ring 230 ae ligature 231 c cedilla 232 e grave 233 e acute 234 e circumflex 235 e umlaut 236 i grave 237 i acute 238 i circumflex 239 i umlaut 240 eth 241 n tilde 242 o grave 243 o acute 244 o circumflex 245 o tilde 246 o umlaut 247 division sign 248 o slash 249 u grave 250 u acute 251 u circumflex 252 u umlaut 253 y acute 254 thorn 255 y umlaut > n tilde probably shouldn't be in there. I'm confused. "n tilde" is code 241, which falls in the range 160-255. Should we still keep it or remove it? Are there any others that should be removed?
(In reply to comment #11) > > If one wanted to err on the cautious side a little bit, > > however, it might be OK to include characters 160-255 > > of the ISO LATIN-1 character set: > > Before I spend more time on this then I have to, I want to clarify > exactly what I should put in chnames. For example, 162 has a > description of "cent sign" but we've currently just got "cent" > in chnames for that one. I think the first question to answer is if we want to err on the cautious side or not. For now, I lean towards thinking that we do not need to do this. Instead, we can err on relying upon the TTS engine to know about the alphanumerics for the user's locale. This may or may not be the right thing to do, and I think we need input from our users regarding what they get from their speech synthesis engines. Javier, with Rich's patch, does your TTS engine now speak characters as you expect?
Created attachment 88560 [details] [review] Revised patch (sans debug messages). Patch committed. We might need to add a swag of entries in chnames, but let's see what the TTS engines do in the given languages first.
Setting the bug into a [pending] state.
In the team meeting today, we decided to include the extended ISO-LATIN 1 character set and Mike volunteered to get the wording for each character.
I emailed Rich a list so I'm removing my name from the summary. Rich if you need anything else on this front, you know where to find me -- or rather where to put my name. ;-)
Yup. That's fine, and thanks! I plan to add a patch to this bug with all the new chnames entries (hopefully this afternoon or tomorrow).
Created attachment 88705 [details] [review] Patch to extend the chnames dictionary. I'm combined what Joanie created with that was already in chnames.py. As Joanie mentioned, there were some cases where entries were previous present, where there is a now a difference in what is spoken. Will commented that "either way is fine", so I've taken what Joanie gave me. Please let me know if I need to adjust back to any of the old entries. They are: WAS: chnames["."] = _("dot") NOW: chnames["."] = _("period") WAS: chnames["_"] = _("underscore") NOW: chnames["_"] = _("underline") WAS: chnames["<"] = _("less than") NOW: chnames["<"] = _("less") WAS: chnames[">"] = _("greater than") NOW: chnames[">"] = _("greater") WAS: chnames["|"] = _("vertical line") NOW: chnames["|"] = _("vertical bar") WAS: chnames["`"] = _("grave accent") NOW: chnames["`"] = _("grave") WAS: chnames[u'\u00A2'] = _("cent") NOW: chnames[u'\u00a2'] = _("cents") WAS: chnames[u'\u00A3'] = _("pound") NOW: chnames[u'\u00a3'] = _("pounds") WAS: chnames[u'\u00AC'] = _("not") NOW: chnames[u'\u00ac'] = _("logical not") WAS: chnames[u'\u00B0'] = _("degree") NOW: chnames[u'\u00b0'] = _("degrees") WAS: chnames[u'\u00B1'] = _("plus minus") NOW: chnames[u'\u00b1'] = _("plus or minus") WAS: chnames[u'\u00B2'] = _("2 superscript") NOW: chnames[u'\u00b2'] = _("superscript 2") WAS: chnames[u'\u00B3'] = _("3 superscript") NOW: chnames[u'\u00b3'] = _("superscript 3") WAS: chnames[u'\u00BC'] = _("one quarter") NOW: chnames[u'\u00bc'] = _("one fourth") WAS: chnames[u'\u00BE'] = _("three quarters") NOW: chnames[u'\u00be'] = _("three fourths") Testing for uppercase in sayCharacter() in default.py now does "character.decode("UTF-8").isupper()" The sayCharacter() method now also calls chnames.getCharacterName(character) rather than just passing "character" to speech.speak(). Patch not committed yet. Please test.
The only changes I'd like to see here are: I'd like to see < and > put back the way we had them. This is just a personal preference so if anyone really has a lot of pushback I'm OK with whatever.
(In reply to comment #19) > The only changes I'd like to see here are: I'd like to see < and > put back > the way we had them. This is just a personal preference so if anyone really > has a lot of pushback I'm OK with whatever. Works well for me. Since Mike is OK with whatever, I say commit the patch as is -- "less" and "greater" seem to be the de facto standard. We can always deviate if users complain. Thanks Rich!
Patch committed. Closing as FIXED.
One thing that has to change is the period. The problem here is decimals, URLs and email addresses. They just do not sound at all right with the new implementation. In orca we currently handle this symbol different weather or not it is being used to mark the end of a sentense or some other type of expression. We should either change period back to dot or speak period in normal text and dot every where else. I just changed chnames.py back to dot and now have the behavior I would expect but I'll leave the checkin to Rich so he can decide on the proper solution.
Changing this back to a dot seems fine with me. Will, okay with you?
Fine by me. Thanks!
chnames entry for "." changed from "period" back to "dot". Checked into SVN trunk/HEAD.