GNOME Bugzilla – Bug 464754
Pronunciation dictionary checks should be case insensitive.
Last modified: 2008-07-22 19:32:25 UTC
We are going to need to make the checks for entries in the pronunciation diction case insensitive. For example, the "strikethrough" entry (which gets replaced with "strike through") might occur at the beginning of a sentence, in which case the first letter in capitalized. For that case, it's not going to currently match against the "strikethrough" entry in the pronunciation dictionary.
Created attachment 93304 [details] [review] Patch to hopefully fix this. Not committed yet. Patch is against SVN HEAD, although if it's what we want, then there is no reason why it couldn't be applied to the gnome-2-20 as well.
I'm wondering if, instead of altering the entries written by the user, we want to just be sure that we alter the strings prior to doing the comparison. Mostly this is minor: If I typed the actual word in all caps (like I would an acronym) or used an initial cap (like I would a name), I would prefer it remain that way the next time I look at the dictionary. However, I stumbled across an interesting/more significant case: Enter JOSÉ as the actual word, come up with a replacement string, press OK, then quit and restart Orca. Results: Actual word becomes: josÉ In addition, the replacement string is not used for any of the following: JOSÉ José josé josÉ <-- surprised me If you enter the actual string as José, the replacement string works for all four of the above variants.
> I'm wondering if, instead of altering the entries written by the user, we want > to just be sure that we alter the strings prior to doing the comparison. I'm not sure I follow you. What do you mean by "just be sure that we alter the strings prior to doing the comparison"? The attached patch I believe does that, but in order to successfully do a comparison with potential pronunciation dictionary entries, then the keys for that dictionary need to be in lower case too. > However, I stumbled across an interesting/more significant case: Yup. That's a biggie. I've no idea what the best way of solving that problem is.
> I'm wondering if, instead of altering the entries written by the user, we want > to just be sure that we alter the strings prior to doing the comparison. > Mostly this is minor: If I typed the actual word in all caps (like I would an > acronym) or used an initial cap (like I would a name), I would prefer it remain > that way the next time I look at the dictionary. This is a good point. The problem we're facing is that we have the string acting as the key for the dictionary. I wonder if we should consider something like the following: ... pronunciation_dict[_("ASAP").lower()] = _("as soon as possible") ... and then: ... if pronunciations != None: return pronunciations[word.lower()] else: return pronunciation_dict[word.lower()] ...
Okay, then can be done. Will, how should the second problem in comment #3 be fixed?
Will, if we do you proposal in comment #4, what should be written out to the ~/.orca/user-settings.py and ~/.orca/app-settings/<APPNAME>.py file instead of lines like: pronunciation_dict[_("ASAP")] = _("as soon as possible") If it's: pronunciation_dict[_("ASAP").lower()] = _("as soon as possible") then how do we rebuild the pronunciation list in the Orca preferences dialog correctly with the user's original capitalized Actual words?
(In reply to comment #6) > Will, if we do you proposal in comment #4, what should be written > out to the ~/.orca/user-settings.py and ~/.orca/app-settings/<APPNAME>.py > file instead of lines like: > > pronunciation_dict[_("ASAP")] = _("as soon as possible") > > If it's: > > pronunciation_dict[_("ASAP").lower()] = _("as soon as possible") > > then how do we rebuild the pronunciation list in the Orca preferences > dialog correctly with the user's original capitalized Actual words? Darn. You're right. Hmmm...this is getting ugly. A dirty hack is to loop through the keys. Something like: lowerWord = word.lower() if not pronunciations: pronunciations = pronunciation_dict for key in pronunciations: if key.lower() == lowerWord: return pronunciations[key] return word Another alternative might be to consider reworking the pronunciations API to expose a method such as addPronunciation(word, pronunciation). Internally, we could then do whatever we want. For example, we could maintain two parallel dictionaries -- one for the word/pronunciation as spelled by the user and another that uses lower case words for the keys.
Or maybe just write out lines like: pronunciation_dict[_("ASAP").lower()] = [ _("ASAP"), _("as soon as possible") ] and adjust the list rebuilding code and the getPronunciation() routine to understand that the values are now a list.
Will, still need your reply on second problem that Joanie raised in comment #2 (sorry, not comment #3).
(In reply to comment #8) > Or maybe just write out lines like: > > pronunciation_dict[_("ASAP").lower()] = [ _("ASAP"), _("as soon as > possible") ] > > and adjust the list rebuilding code and the getPronunciation() > routine to understand that the values are now a list. That seems like a reasonable alternative. (In reply to comment #9) > Will, still need your reply on second problem that Joanie raised in > comment #2 (sorry, not comment #3). Well, it looks like the lower() of JOSÉ and José are different: Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> "José".lower() 'jos\xc3\xa9' >>> "JOSÉ".lower() 'jos\xc3\x89' As for how to handle this, I'm not sure. Maybe we need to store both the original and lower() forms as keys?
(In reply to comment #10) > As for how to handle this, I'm not sure. Maybe we need to store both the > original and lower() forms as keys? ....and upper() form as well, I guess... I think the route of adding new API (addPronunciation) might the thing that would provide us with the most flexibility here. We might also consider making the pronunciation thing a full blown class instead of just being a module.
> As for how to handle this, I'm not sure. Maybe we need to store both the > original and lower() forms as keys? Ugh! ;-) There are a swag of Unicode characters like this (see work that Joanie did in chnames.py). Are you suggesting that for each character in the potential key for a pronunciation dictionary entry, we are going to have to check to see if it's one of these special characters, and if it is, then we are going to have to generate two dictionary entries? I think that's what you are saying. Just want to make sure?
> There are a swag of Unicode characters like this > (see work that Joanie did in chnames.py). Are you suggesting > that for each character in the potential key for a pronunciation > dictionary entry, we are going to have to check to see if it's > one of these special characters, and if it is, then we are going to > have to generate two dictionary entries? I think that's what you are > saying. Just want to make sure? I'm really not sure of the right solution here. It seems like the solution should handle the case where the user enters the following word/replacement: "Ééks"/"yikes". With this case, we should get "yikes" for any of the following words if indeed "É".lower() should be "é" and "é".upper() should be "É": ééks Ééks éÉks ÉÉks From the following unicode descriptions, it seems as though the upper/lower definition of "é" and "É" are well defined and are what we would expect (see the "Upper case" and "Lower case" portions on each page): http://www.fileformat.info/info/unicode/char/00e9/index.htm http://www.fileformat.info/info/unicode/char/00c9/index.htm Given this, it seems as though Python might be broken. I'm not sure we want to incur a whole lot of overhead in a frequently called portion of the code in an attempt to fix an underlying Python bug. Instead, I propose we attempt the string.lower() solution as a minimum, and I'd also propose that we attempt to hide that detail from the user-settings.py file. That is, internally we can go with whatever solution we want, but externally (i.e., in user-settings.py), we save/restore the exact strings that the user typed. With this, we can do our own brute force matching technique if we really need to.
If you convert the string to unicode first and then call lower(), I believe you get the expected results.
> >>> "José".lower() > 'jos\xc3\xa9' > >>> "JOSÉ".lower() > 'jos\xc3\x89' >>> "José".lower() 'jos\xc3\xa9' >>> "JOSÉ".decode("UTF-8").lower().encode("UTF-8") 'jos\xc3\xa9'
Well dang! That was sitting right in my window and I swore I didn't see it. I must be looking at too many different problems at once today. >>> originalWord = "Ééks" >>> keyWord = originalWord.decode("UTF-8").lower() >>> a = {} >>> a[keyWord] = [originalWord, "yikes"] >>> for word in ["ééks", "Ééks", "éÉks", "ÉÉks"]: ... lowerWord = word.decode("UTF-8").lower() ... print a[lowerWord] ... ['\xc3\x89\xc3\xa9ks', 'yikes'] ['\xc3\x89\xc3\xa9ks', 'yikes'] ['\xc3\x89\xc3\xa9ks', 'yikes'] ['\xc3\x89\xc3\xa9ks', 'yikes'] Rich, hopefully you have enough information at your fingertips to propose a solution.
> Rich, hopefully you have enough information at your fingertips to propose a > solution. Yup. Thankyou both. I'll probably leave this to next week now. I've got my head wrapped around OOo sbase bugs at the moment.
Created attachment 93589 [details] [review] Revised patch. Patch against SVN HEAD. It seems to fixup the recent problems found by Joanie. Tested with the following pronunciation added to the global settings: Actual String Replacement String ------------------------------------------- "josé" hose eh california Then with the following three lines in gedit: Can you tell me the way to San José Can you tell me the way to San josé Can you tell me the way to San JOSÉ This patch also fixes a bug that seems to have gone unnoticed. Namely, that if you had set an application specific pronunciation, then did a Insert-Control-Space and tried to change it to something else, the application specific pronunciation list would have been re-filled with the global pronunciations instead. Patch not committed yet. Please test.
Oh, I forget to mention. We have a flag day here. The new pronunciation dictionary entries in ~/.orca/user-settings.py and the application specific files now look like: # User customized pronunciation dictionary settings # import orca.pronunciation_dict orca.pronunciation_dict.pronunciation_dict={} orca.pronunciation_dict.pronunciation_dict["asap"]=[ "ASAP", "as soon as possible" ] orca.pronunciation_dict.pronunciation_dict["ghz"]=[ "GHz", "super gigahertz" ] orca.pronunciation_dict.pronunciation_dict["imap"]=[ "IMAP", "eye map" ] orca.pronunciation_dict.pronunciation_dict["josé"]=[ "josé", "hose eh california" ] orca.pronunciation_dict.pronunciation_dict["ldap"]=[ "LDAP", "ell dap" ] orca.pronunciation_dict.pronunciation_dict["lol"]=[ "LOL", "laughing out loud" ] orca.pronunciation_dict.pronunciation_dict["mhz"]=[ "MHz", "megahertz" ] orca.pronunciation_dict.pronunciation_dict["selinux"]=[ "SELinux", "ess ee linux" ] orca.pronunciation_dict.pronunciation_dict["strikethrough"]=[ "strikethrough", "strike through" ] and # User customized application specific pronunciation dictionary settings # import orca.pronunciation_dict def overridePronunciations(script, pronunciations): pronunciations["asap"]=[ "ASAP", "as soon as possible" ] pronunciations["ghz"]=[ "GHz", "ultra mega super gigahertz" ] pronunciations["imap"]=[ "IMAP", "eye map" ] pronunciations["ldap"]=[ "LDAP", "ell dap" ] pronunciations["lol"]=[ "LOL", "laughing out loud" ] pronunciations["mhz"]=[ "MHz", "megahertz" ] pronunciations["selinux"]=[ "SELinux", "ess ee linux" ] pronunciations["strikethrough"]=[ "strikethrough", "strike through" ] return pronunciations orca.settings.overridePronunciations = overridePronunciations This means that if you have any of the old-style entries, Orca is going to chuck a wobblie (um, throw a Traceback). Is just informing the users on the Orca mailing list enough here, or should we be trying to test for such occurances and programming around them? I'd prefer the former.
What about informing the users and doing *minimal* programming around them? I just tried this on a clean machine where I don't have any existing entries other than the old-style ones in user-settings.py. The resulting wobblie being chucked (I love that) keeps the Orca Preferences dialog from ever appearing. I assume this will impact any user upgrading to the latest Orca. It seems that pronunciation_dict.py already wraps its stuff in a try/except so that wouldn't spit up. So doesn't that just leave orca_gui_prefs.py's _createPronunciationTreeView()? Perhaps there, we could test to see if we have old-style entries and, if so, set pronDict to pronunciation_dict.pronunciation_dict? Less work than trying to do a conversion.... Should we go this route, the note to the list would be to inform users that their dictionary's getting blown away -- as opposed to suggest that they need to blow away their user_settings.py (and corresponding app settings files) or manually edit out the old-style entries from each affected file.
I'd like it if we didn't have to inflict our translators with the work of translating the same string twice (e.g., 'asap' and 'ASAP'): +pronunciation_dict[_("asap")] = [ _("ASAP"), _("as soon as possible") ] Going down the route of adding new API (addPronunciation or setPronunciation) might the thing that would provide us with the most flexibility here: pronunciation_dict.setPronunciation(_("ASAP"), _("as soon as possible"))
Created attachment 93604 [details] [review] Second revised patch. Further revised patch based on previous comments from Joanie and Will. - Adds a setPronunciation() routine in pronunciation_dict.py (no new strings). - Add a try/except clause around the setting on model entries when the pronunciation list is created for the Orca preferences, with the except clause hopefully handling old-style ~/.orca/user-settings.py. I didn't see an easy way to use setPronunciation(), for the setting of pronunciation entries for the application specific files. These look different. I.e: # User customized application specific pronunciation dictionary settings # import orca.pronunciation_dict def overridePronunciations(script, pronunciations): pronunciations["asap"]=[ "ASAP", "as soon as possible" ] pronunciations["ghz"]=[ "GHz", "ultra mega super gigahertz" ] pronunciations["imap"]=[ "IMAP", "eye map" ] pronunciations["ldap"]=[ "LDAP", "ell dap" ] pronunciations["lol"]=[ "LOL", "laughing out loud" ] pronunciations["mhz"]=[ "MHz", "megahertz" ] pronunciations["selinux"]=[ "SELinux", "ess ee linux" ] pronunciations["strikethrough"]=[ "strikethrough", "strike through" ] return pronunciations orca.settings.overridePronunciations = overridePronunciations Will/Joanie if you have any ideas on how to rewrite this to use setPronunciation(), please let me know. Otherwise I think we can leave it the way it is (compatible with the way that application specific key bindings are done). Thoughts?
I wrote: > Will/Joanie if you have any ideas on how to rewrite this to use > setPronunciation(), please let me know. Actually, I think I can see how this can be done. I'll work on a further revised patch tomorrow morning.
Created attachment 93649 [details] [review] Fourth version. Adjusts _writePronunciation() in app_prefs.py to use orca.pronunciation_dict.setPronunciation()
I noticed something that seems to be a (positive) side effect of this change: Before we were initially populating the app-specific dictionaries with the existing entries; now we're not. I really like seeing just those things I've added within an app appear in the tree for that app. So far this patch seems to be working *most of the time*. Every once in a while, however, it seems that one dictionary is stomping on the other. But it's rare and I haven't yet worked out why. My test case has been two entries for my first name: user-settings.py: orca.pronunciation_dict.setPronunciation("Joanmarie", "Joan Marie") app-settings/gedit.py: orca.pronunciation_dict.setPronunciation("Joanmarie", "JD", pronunciations) I then created a single line document in Gedit and another in Writer, each containing just my first name. I'm Alt Tabbing between these, arrowing off, and then back on my name. I'd say 8 or 9 times out of 10, Orca gets it right; the other time or two it doesn't. There was also one time when I went back into the Gedit preferences and didn't have my entry listed. That I cannot reproduce.
Thanks for testing this. It's loadAppSettings() in focus_tracking_presenter.py that will load an application specific pronunciation dictionary (see about line 419). That method is called from _processObjectEvent() in focus_tracking_presenter.py at about line 619, if it's seen a "window:activate" event or a "focus:" event on a FRAME object. I'd guess that the times that it's not working correctly for you, you aren't getting either of those trigger events.
The other possibility is that something else is coming along (say like that network applet) that's running in the "background", and that's calling loadAppSettings() for itself and there goes the gedit settings. Dunno. Just thinking out loud. A line or two of debug just before the call to loadAppSettings() would confirm or deny that theory.
It's still pretty iffy in the reproducibility department, but I think that your first theory (comment 26) is the correct one. I added some debug lines to spit out orca_state.activeScript after it was set and just before loadAppSettings() was called. When Orca fails to use the correct dictionary, there's no output from these lines. I didn't see any unexpected lines from other app(let)s. More noteworthy is that I cannot reproduce this issue to save my life when compiz is disabled. That's not this patch.... I suspect that we're going to start seeing these sorts of issues when Gutsy is released. (Another issue: We don't speak the selected item during the Alt+Tab when compiz is enabled). It would seem that if compiz is enable-able, it now is by default in Gutsy. Guess it's time to start paying attention to it....
I tested this, modifying both the global user-settings.py and app-specific settings for gnome-terminal.py. Seems to work like a charm. I'm not sure I beat on it long enough, but I was unable to reproduce the problem Joanie saw. Then again, I don't use Compiz. Given that the problem Joanie saw seems like it might be related to Compiz use, and since Compiz seems to have its own bugs, I think it might be safe to commit this patch. I'll let Joanie poke more, though, if she wants to. BTW, at some point, I think we need to consider turning settings.py into a class and giving each script instance its own settings class instance. I'm not sure if that will be better/worse than what we're doing now, but it might be worth a thought experiment at some time for GNOME 2.22.
I've only committed it to SVN HEAD. As I tried to patch it in to the gnome-2020 branch, I realized that I'd totally refactored orca_prefs.py in SVN HEAD, and this patch isn't going to easily fit into the old code. Will, if you think this patch is essential for Orca for gnome-2-20, let me know and I'll try to work out the equivalent for the old code.
Will and I chatted this morning about this, and I said it should be possible to rework the patch to fit the gnome-2-20 code and we probably should do this because Halim found the problem (i.e. a RealWorld(TM) user). Having now looked into it in more detail, it's going to be quite disruptive to the existing code in order to wedge this into the old (non-refactored) way of doing things. So I take it back. I suggest we don't try to fix this for Orca 2.19.91, but instead such fix it for the next major release. Thoughts? ...
I think this is OK as most of our users seem to build from trunk any way. This problem doesn't really impact the core stability of orca.
> So I take it back. I suggest we don't try to fix this for Orca 2.19.91, > but instead such fix it for the next major release. Given your analysis, I agree. Thanks!
Putting the bug into a "[pending]" state.
Closing as FIXED.