GNOME Bugzilla – Bug 570658
Refactor the speech and braille generators
Last modified: 2009-11-09 21:35:20 UTC
The speech and braille generators currently do very specific things per role. Is something new comes along, such as DESCRIBED_BY in bug #568611, we can end up needing to make a lot of changes to the generators. I'm opening this bug as a means to brainstorm how we might refactor the generators. For example, one idea might be to base the output on the interfaces, relations, states, and attributes of an object versus the object role. As a means to experiment, we might try playing in the speechgenerator.py:_getDefaultSpeech and braillegenerator.py:_getDefaultBrailleRegions methods. That is, experiment with growing these methods out to look more at the interfaces/relations/states/attributes/etc of an object and try to build presentation up based upon those. Note that this was one of the original intents of the AT-SPI, but we ended up going down the role-specific path because we had difficulty figuring out how to make the more generic approach work. Furthermore, in the event that we do want to provide exceptions for the presentation order of some things (e.g., states and labels of check boxes and combo boxes just to make the speech "flow" better), we might want to consider something different from using a method. For example, perhaps a formatting string might be something to consider. A simple example might be "label text state role", but I suspect the string might need to be more expressive. This kind of string might also be used to express the default format as well. These strings should also be easily customized so that their values can be overridden by user preferences. An acid test use case might be the ability to customize the strings so that we can better support learning disabilities. Finally, the generators should also be able to provide complete information to the output devices about presentation. For example, it should be possible to determine which ACSSs to use for sections of a string to speak and which dots should be used to represent additional attribute information in braille. Note that the exact ACSSs and dots to use need not be returned by the generator -- the main goal is that these things can be determined. See also bug #412656 for a discussion of the finer granularity desired for ACSS usage.
appologies for the long post, but please stay with me to the end. :) ### generatorSettings.py contains: ### A formatting Dictionary of the following layout. # Note that these are formatting strings with logic, please be careful when modifying them. # for each non-default role: # The unfocused formatting string for the attributes to speak # (already_focused: False) # roleDict['unfocusedSpeech'] = 'label + name +roleName' # the formatting string for the attributes to speak # when the object has already the focus. (already_focused: True) # roleDict['focusedSpeech'] = 'checkState' # The acss to be used for speaking the text. # roleDict['acss'] = acss # The unfocused braille formatting string # # roleDict['unfocusedBraille'] = 'labelOrName + checkState' # The region to get the braille focus, has to be one of the labels in the # roleDict['unfocusedBraille'] string. # # roleDict['unfocusedBrailleCursorRegion'] = 'checkState' # The focused braille formatting string # This is usually, but not necessarily the same as the above. # if it is not defined, then we use the unfocused version. # # roleDict['focusedBraille'] = 'labelOrName' # The region to get the braille focus, has to be one of the labels in the # roleDict['focusedBraille'] string. # # roleDict['focusedBrailleCursorRegion'] = 'checkState' # braille attributes to send along for this role # # roleDict['brailleAttributes'] = None # Associate the particular role # dictionary into the general lookup dictionary. # formattingDict[role] = roleDict ### iterate until all roles ### ### end of generatorSettings.py ### Documentation for altspeechgenerator.py: # Currently functions as a drop in replacement for speechGenerator.py # everything accept _getSpeechForTableCell and _getSpeechForTableCellRow. # Given an object, this class uses the specified strings in the settings file to return information about the object. # While more labels in formatting string, and the string hasnt evaluated to true. # get the next label, check if it has an associated function, and if so call it with the object and its already_focused # associate the returned value from the function with the label. # if the current format string evaluates to true # with all the values returned by the functions, then we quit. # otherwise we look for the next label. # WARNING: [mmh] need to do some checking here # what happens if all labels are undefined, or they all return empty list. # the formatting string allows for things to be optional. # the user can choose to have labels, values and text read, but no role info, no selection info, no state info. # Things are very much customizable :) # the reason why we evaluate the string is to give us logic powers in the formatting string # for example: # include label and name if they are difrrent can be written as: # '(label!=name and label+name) or label' # We use the fact that python returns the last statement that it evaluates to give # us the data that we need. ############# ### Ideally: ### we have a generator class, with the small functions defined. This class has two main functions, the getSpeech, and the getBraille They both use the same internal functions, since they have a lot in common. Where they differ, is that each one constructs the right type of return value. I.e. the getBrailleRegions returns something like [regions, componentRegion, nonDefaultBrailleAttributes] and the getSpeech returns: [textToBeSpoken, acssToBeUsed] ### issues: It is currently dificult to express recursive constructs such as _getSpeechForTableCell without having to rethink and restructure the function, because we dont currently keep any depth info. A possible way of getting around this, is not to pass explicit variables, but to pass the functions a dictionary, where the variables that the particular function needs can look up. [mmh] think of this more carefully. ------------------------- Appologies for the long post, Looking forward to having some feedback on this.
Created attachment 129280 [details] [review] rev 0.1 almost drop in replacement for speechgenerator.py, to illustrate the idea described in the previous comment. Should be run from gnome-terminal, so that we can see the diffrence in output. r(length)'text' # is the text from the current getSpeech s(length)'text' # is the text returned by the alternative speechGenerator. r==s, just a quick test to compare the two strings.
maybe we should also include a whereAmI formatting string for roles, to provide customization of the ordering.
(In reply to comment #3) > maybe we should also include a whereAmI formatting string for roles, to provide > customization of the ordering. > I think we should.
Sorry to be spammy, but what about for flat review too?
(In reply to comment #5) > Sorry to be spammy, but what about for flat review too? Anywhere we do presentation, we should provide the ability to customize the output. :-) So, this also includes SayAll and the results that are output when structural navigation is used. As a result, we may also need to include some consideration for *why* the presentation is being done. In any case, Jon - your patch is a great first pass. I haven't had a chance to fully review/digest it yet since we are in the last throes of the GNOME 2.26 release and need to get some other bugs fixed first. I'll get to it, though. I promise, especially because this is something I've wanted to get done for a long long time.
See also bug #520595, which is something that this refactor should hopefully permit.
Hi Will, If you could have a think about this one at some point during this week that would be great. I am busy all week, but should be able to dedicate the weekend and the start of next week to working on this if you can see this moving forward. Thank you P.S. yes, this design should allow us to close off bug #520595.
Thanks much for this work, Jon! I like what you have done so far. In particular, I like that you are breaking methods out based upon the features of the object and then piecing these things together based upon the role. That's almost exactly what I had in mind. Yeah! These can also form the building blocks for a refactor of the Where Am I code and the SayAll code that presents object information. Having said that, we need to think about how this new stuff will be used from those areas of the code. I'd really love, for example, to get rid of all the duplicate code in WhereAmI. Perhaps the code might be restructured so that an optional formatting string can be passed into the getSpeech method. Alternatively, maybe it would be a time to pull the formatting strings into the spaces where they are used. So, the focus tracking code might have its set of formatting strings, the Where Am I code might have its set, etc. I'm curious about the syntax of the expressions used for formatting. In particular, it seems to allow only keywords. Some of the extra things I think might be good to have would be the ability to add phrases to be spoken verbatim, meaningful punctuation to effect prosody, and voice names that encompass subphrases. The voice names stuff might help with bug #412656, for example, and could give people fine control over mixing voices in the output. With bug #400729, we also want to support audio at some point. It would be nice to try to think about how audio might be incorporated into the formatting string. I also wonder if whitespace should be meaningful in some way, but I'm not sure how or why (e.g., should whitespace end up being sent to the TTS engine?). Finally, we might need some uber function to say something like "screenOrderedLabelsAndText" (see bug #568611 and bug #516120). Right now, we place all labels ahead of the name of the object, but the GUI's often put the labels to the right of the object. For example, gnome-terminal's Edit/Profile Preferences/Scrolling tab has a spin button that is labeled both to the left and right. Assuming the accessible relations were set up correctly (they aren't), I believe we'd end up speaking all the labels first and then then spin button. I assume some users will probably want these things in screen order, so we should work to support that. From a structural and coding point of view, I just have a couple comments. These are far less important than getting the general ideas down, so I'm not going to labor over them. 1) I think the components of generator_settings.py might either belong in the speechgenerator.py and braillegenerator.py files or they should be split into separate speechgenerator_settings.py and braillegenerator_settings.py modules. Or, they might be stuffed into the areas that call getSpeech (e.g., WhereAmI, default.py, structural_navigation.py, mouse_review.py, etc.). 2) Just a minor nit that people have minor nitted me about in the past. The Pythonic way of doing things is "Just Do It" instead of "Look Before You Leap". So, this code: if generator_settings.formattingDict.has_key(role): roleDict = generator_settings.formattingDict[role] else: roleDict = generator_settings.formattingDict['default'] is more Pythonic if written as: try: roleDict = generator_settings.formattingDict[role] except KeyError: roleDict = generator_settings.formattingDict['default'] There are still plenty examples of "Look Before You Leap" style in the code, which is mostly my fault because the Pythonic nitpickers of the world didn't nitpick me until I had already written a ton of code. :-) 3) Some clarification of getLabelOrName and getLabelOrName2 would be helpful. In any case, this is a very exciting thing for Orca. If this were one thing you focused on for the whole of the 2.27.x release series, it would be very awesome and a great improvement overall. Many thanks for your work and contributions!
Jon - how is this going? Would you welcome some collaboration on this work?
Created attachment 134254 [details] [review] rev 1, should work for getSpeech Hi Will, Collaboration would be great, it was tablecell and tablerow that got me stuck. With this new getSpeech, we use the eval to give us the next function name to be called, this ensures we only call functions in short logic order. This new getSpeech, takes obj, and any number of args. The dict of args gets passed around, and the functions that need something from it will grab the variables they require. Sometimes we overwrite things in the args dict, but we have to remember to reset the values when we have done our calls.
Created attachment 134432 [details] [review] Modified patch (In reply to comment #11) > Created an attachment (id=134254) [edit] > rev 1, should work for getSpeech Looks great so far. This attachment reorganizes things in the files and kind of makes them match the coding style of the rest of the Orca source files. I also renamed the _getLabelName and _getLabelName2 methods to _getLabelAndName and _getLabelOrName. All in all, it may seem like I rewrote everything, but the basic ideas are identical to what you did. I also made a couple 'larger' changes: 1) Rather than use a static dictionary, I used introspection in the altgenerator.py:__init__ method to build up the dictionary of generator methods. The introspection finds any method that begins with "_get" and makes an entry in the dictionary for it by stripping off the "_get" prefix and lowering the case of the first letter after that. 2) Added a verification check to make sure the speech related strings in generator_settings.formattingDict are valid, including making sure the method names exist in the speech generator. From here, I think we need to play more and see how this works. I'd like to see if we can do the following: 1) Make generator_settings a per-script thing to allow people to make things speak differently depending upon the app. 2) Allow one to enable/disable the speaking of the speech context. I wonder if _getContext might be something we could do, though it would require two objects (e.g., old and new locus of focus) to be passed in. 3) Try to incorporate ACSS and other things in there in some fashion. I see that ACSS is loosely in generator_settings.py at the formatting string level, but it might be nice to make the granularity even finer (e.g., at the word level). I wonder if we can play with the formatting strings so that the eval returns a heterogeneous array of objects where the objects can be strings (as they are now) or something else, such as an ACSS, a reference to an entry in orca.settings.voices, a reference to an audio file, etc. 4) Look at how this can be used for the where am I, tutorial, and say all functionality. I can envision turning the formattingDict into a dictionary of two dictionaries, formattingDict["speech"] and formattingDict["braille"]. The formattingDict["speech"] dictionary could have "unfocused", "focused", "whereAmI", "tutorial", "sayAll", etc., sub-dictionaries that contain the specific formatting strings. The call to getSpeech would take as a parameter which sub-dictionary to use. > it was tablecell and tablerow that got me stuck. Yeah, it's a difficult one. > With this new getSpeech, we use the eval to give us the next function name to > be called, this ensures we only call functions in short logic order. The use of 'eval' is wonderful! It lets one use Python logic in the formatting strings. This allows one, for example, to embed arbitrary text in an utterance. Nice job! What do you think about making the script and accessible available in the globals for the eval? This could potentially allow us to do more sophisticated decision making in the formatting string. > Sometimes we overwrite things in the args dict, but we have to remember to > reset the values when we have done our calls. I saw one spots where it was changed, but not set back. This was in _getTableCell2ChildLabel. Was this an intentional thing? In any case, nice work. Let's keep hammering on this. I'd like to make 2.27.4 (July 13) a target for rolling this in if possible.
Created attachment 134441 [details] [review] Updated patch that makes Formatting a script field This patch builds up the previous patch to make the formatting stuff a field in the script. This renames generator_settings.py to formatting.py and creates a Formatting class inside formatting.py. Note also that this makes a formatting.py:defaultFormatting dictionary that one can override from user-settings.py. This helps get us towards allowing the user to override the formatting.
Created attachment 134455 [details] [review] Complete patch Sorry - I'm still getting used to GIT. Here's a complete patch with all the files.
Created attachment 134461 [details] [review] One more try Ahh..git rhymes with sit and other s-words. I think I'm getting it figured out.
Hi Will, Maybe we can work on a branch, and once we are happy we create a final diff and attach it here, as well as committing to master. # for my own benefit # creating local branch to track remote: git br 570658 --track origin/570658 # check out the 570658 git checkout 570658; git pull ### modify modify modify # pushing local commits git push # switch back to master git checkout master
(In reply to comment #12) > Looks great so far. This attachment reorganizes things in the files and kind > of makes them match the coding style of the rest of the Orca source files. I > also renamed the _getLabelName and _getLabelName2 methods to _getLabelAndName > and _getLabelOrName. All in all, it may seem like I rewrote everything, but > the basic ideas are identical to what you did. > I also made a couple 'larger' changes: > 1) Rather than use a static dictionary, I used introspection in the > altgenerator.py:__init__ method to build up the dictionary of generator > methods. The introspection finds any method that begins with "_get" and makes > an entry in the dictionary for it by stripping off the "_get" prefix and > lowering the case of the first letter after that. > 2) Added a verification check to make sure the speech related strings in > generator_settings.formattingDict are valid, including making sure the method > names exist in the speech generator. Thanks Will, looks neater and your additions are excellent. > From here, I think we need to play more and see how this works. I'd like to > see if we can do the following: > ... > 2) Allow one to enable/disable the speaking of the speech context. I wonder if > _getContext might be something we could do, though it would require two objects > (e.g., old and new locus of focus) to be passed in. Do we only want the context, or do we also say information about the new object (label state etc) i.e. should 'context' be in the formatting string? if so: speechgenerator.getSpeech(newLocus, stopAncestor=oldLocus) the stopAncestor will be ignored by all functions in altspeechgenerator, except the _getContext. > 3) Try to incorporate ACSS and other things in there in some fashion. I see > that ACSS is loosely in generator_settings.py at the formatting string level, > but it might be nice to make the granularity even finer (e.g., at the word > level). word level? > I wonder if we can play with the formatting strings so that the eval > returns a heterogeneous array of objects where the objects can be strings (as > they are now) or something else, such as an ACSS, a reference to an entry in > orca.settings.voices, a reference to an audio file, etc. 1. maybe another settings file/dict, where we have: voiceDict { 'label': acss, 'context': acss } etc then _getLabel as example returns another dict: result['string'] = 'long label' result['acss'] voiceDict['label'] result['audiofile'] = '' > 4) Look at how this can be used for the where am I, tutorial, and say all > functionality. I can envision turning the formattingDict into a dictionary of > two dictionaries, formattingDict["speech"] and formattingDict["braille"]. The > formattingDict["speech"] dictionary could have "unfocused", "focused", > "whereAmI", "tutorial", "sayAll", etc., sub-dictionaries that contain the > specific formatting strings. The call to getSpeech would take as a parameter > which sub-dictionary to use. yes, thats the hopeful plan, but I was thinking something like generator.getSpeech, generator.getWhereAmI, and generator.getBraille I havent looked at the details, but we should be able to use a lot of our _get methods for braille. If we get this one up and working as we want it, then whereAmI and the rest should fal into place. > The use of 'eval' is wonderful! It lets one use Python logic in the formatting > strings. This allows one, for example, to embed arbitrary text in an > utterance. With the requirement that we can potentially attribute acss to any bit of text, it might be safer not to put text in the formatting string. Instead we create a new function, that doesnt act on its arguments, and simply return the dict to be used. (string, acss, soundfile) > Nice job! What do you think about making the script and accessible > available in the globals for the eval? This could potentially allow us to do > more sophisticated decision making in the formatting string. I don't know, the thoughts that i had was that the functions do all the decisions and the formatting string is only that, a formatting string. If we added more complexity to the formatting string, it might not be as straight forward for the end user to modify the strings, and still get something sensible out of them. > In any case, nice work. Let's keep hammering on this. I'd like to make 2.27.4 > (July 13) a target for rolling this in if possible. If we could both work on it please, and get it in for 2.27.4 that would be an excellent birthday gift. :)
(In reply to comment #17) > > 2) Allow one to enable/disable the speaking of the speech context. I wonder if > > _getContext might be something we could do, though it would require two objects > > (e.g., old and new locus of focus) to be passed in. > > > Do we only want the context, or do we also say information about the new object > (label state etc) i.e. should 'context' be in the formatting string? > if so: > speechgenerator.getSpeech(newLocus, stopAncestor=oldLocus) > the stopAncestor will be ignored by all functions in altspeechgenerator, except > the _getContext. What you wrote is pretty much what I was thinking. I'm not sure if we want to define exactly what is in the context (i.e., row/col header, tree node level, radio button group name, panel names, etc.), or if we just leave it as is and use 'context' as something to put in the formatting string. I'm thinking maybe we should allow at least the contents and ordering of the context to be defined. So, maybe a "context" entry where the default value is something like "namedHierarchy(True) radioGroup(True) tableHeaders(True) nodeLevel(True)" -- the boolean value would state whether to only present what changed (True) or always present it (False). What do you think? > > 3) Try to incorporate ACSS and other things in there in some fashion. I see > > that ACSS is loosely in generator_settings.py at the formatting string level, > > but it might be nice to make the granularity even finer (e.g., at the word > > level). > > word level? I'm not a huge fan of cacophony, but some people are and it would be nice to try to allow this. For example, instead of the entire result being spoken with a given voice, maybe we want to speak just the rolename with a difference voice. Similar things for headers might apply. Also, we may want the decision for when to switch to uppercase and hyperlink voices to be put in the generator itself because it is dealing directly with the object and has the opportunity to understand what should be done. For example, when speaking a line of text, we might need uppercase and hyperlink voice switching to happen across the line. To support this, I think we can just embed object instances in the results array. So, we might have ACSS instances in the results, which would mean switch to the given ACSS until we run into another ACSS. The decision to put these objects in the results could be done implicitly by the generator itself (e.g., switch to the uppercase voice) or could be done via tags in the formatting string if the person creating the formatting strings wants something specifically. > 1. maybe another settings file/dict, where we have: > voiceDict { > 'label': acss, > 'context': acss > } We have this now as settings.voices for default, uppercase, hyperlink, and I've always intended for it to grow. > then _getLabel as example returns another dict: > result['string'] = 'long label' > result['acss'] voiceDict['label'] > result['audiofile'] = '' Last night, I played with putting 'settings' in the globals for the eval and then just referred to settings.voices[settings.UPPERCASE_VOICE]. Seemed to work nicely and I was able to get back a result that was an array of strings and ACSS's. I'm thinking, however, maybe a _getVoice method in the generator might be nice -- the default would get the ACSS voice for the role of the object. If we can figure out how to pass parameters to the generator methods, then we could express voice switching via 'voice(SYSTEM_VOICE)' and such. The _getVoice function could then handle error/fallback cases where a voice doesn't exist. Similarly, I could imagine returning an instance of some class that represents an audio file to play in parallel (at the same time) or in sequence (audio plays before moving on) with the speech. So, we might have _getPlay and _getPlaySequential that return an instance of some new classes, perhaps named AudioCue and SequentialAudioCue. We might have the generator implicitly/automatically insert audio cues for punctuation, capitals, line boundaries, etc. But, we might also want to allow explicit "play" tags in the formatting string. For example "play('somefile')" or "playSequential('somefile')". > I havent looked at the details, but we should be able to use a lot of our _get > methods for braille. Agreed - eliminating duplicate code would be wonderful. We might make a general Generator class that both the speech and braille generator classes extend. This class would hold the code that is common to both the generators (e.g., getting the labels, names, etc.). Being able to temporarily cache the results might be nice, too, so that the speech and braille generators don't need to always do all the work. > With the requirement that we can potentially attribute acss to any bit of text, > it might be safer not to put text in the formatting string. Instead we create a > new function, that doesnt act on its arguments, and simply return the dict to > be used. (string, acss, soundfile) Can you give an example of this? If it creates a simpler model than what I describe above, it would be great. > I don't know, the thoughts that i had was that the functions do all the > decisions and the formatting string is only that, a formatting string. We definitely want to encourage keeping the formatting strings simple. There are already some examples of simple "or" logic in the strings, though. I think that's fine. Thanks again!
This bug currently has the following status: * All the speech generators have been migrated to the new form * All the where am I functionality has been migrated to the new form * No braille work has been done yet * Gecko's code uses very little of the speech generator -- we should investigate this further and open a separate bug if we decide it should be changed
*** Bug 569928 has been marked as a duplicate of this bug. ***
Is it ok to close this as fixed, both speech and braille is now in master, thanks to Wills hard work.
(In reply to comment #21) > Is it ok to close this as fixed, > both speech and braille is now in master, thanks to Wills hard work. Sure thing. I've opened bug #589415, bug #589416, and bug #589417 as feature enhancements that can follow on from this.