GNOME Bugzilla – Bug 354469
[requirement] Repeated character count
Last modified: 2008-07-22 19:24:09 UTC
Orca must be able to optionally compress the repetition of character by saying something such as "25 dashes" instead of "dash dash dash dash dash...". For more information, see: http://cvs.gnome.org/viewcvs/*checkout*/orca/docs/doc-set/orca.html#URSPEECH
Tracking.
*** Bug 348061 has been marked as a duplicate of this bug. ***
The example you give is "dash dash ... dash" becomes "25 dashes". Does that mean we need a plural equivalent of each kind of character? Wouldn't it be better/easier to say "25 dash characters" or something like that? If we do take this approach, then we also need to think about where we start doing this optimisation. i.e. replacing "dash dash" with "2 dash characters" is inefficient. Three dashs speaks the same number of words. Four and beyond is a reduction. Mike, what would you like done here?
The reason we want " 25 dashes" is just to minimize the amount of extra speech while still conveying the needed information. 4 dashes or less should just be spoken as "dash dash dash dash"
I understand that Mike, but if it's implemented this way, what's the magic formula for turning the name of a character into the plural name of a character?
Playing devil's advocate: Most of the time, do you really need to know how many dashes there are? If there are 25 dashes on a line, most of the time what matters is that there are a bunch of dashes there. The fact that there are 20 or 25 or 30 is *usually* extraneous info. Under those circumstances, maybe "dash repeated" would suffice? On the rare circumstances where you must know the absolute number: Based on Rich's comment, knowing how many dashes we have is not the issue, it's coming up with a plural -- and presumably localizing it. So maybe you could give the command to spell the current word and Orca would respond with "4 dash characters". Just a thought....
I'm really reluctant to lose the actual number of characters because I think it helps understand layout but I do understand the problems this will pose for translation. After thinking about it I think it would be OK to go with "25 dash characters" for example and lose the plural for the actual character. What do you all think?
For whatever it's worth, I think "25 dash characters" is great. My suggestion was merely a response the plural/localization issue....
This character counting should be treated similar to the speaking of a word. If when reading for example, by line, word, or in say all a large group of the same character is encountered the repeat filtering should kick in.
So here's the spec I'm working off: The "repeat character count" potentially can happen for the following scenerios: 1/ Arrow up and arrow down. This is in default.sayLine(). If the shift key is down, then default.sayPhrase() is called. 2/ Control-left and Control-right. Also Control-Shift-Left and Control-Shift-Right. These are covered by default.sayWord() and default.sayPhrase(). 3/ Selection keys such as Shift+Home and Shift+End and Control+Home and Control+End. These are covered by default.sayPhrase() and default.sayLine(). 4/ If echo by word is turned on (after each completed word). Handled by default.echoPreviousWord(). 5/ The following flat review commands: Numpad+- - default.toggleFlatReviewMode Numpad++ - default.sayAll Numpad+Enter - default.whereAmI Numpad+7 - default.reviewPreviousLine Numpad+8 - default.reviewCurrentLine Numpad+9 - default.reviewNextLine Numpad+4 - default.reviewPreviousItem Numpad+5 - default.reviewCurrentItem Numpad+6 - default.reviewNextItem As some people will want this and others might not, there will be a setting in settings.py that determines whether this functionality is enabled. repeatCharacterLimit = <n> If <n> is 0, then there would be no repeat characters. Otherwise <n> would be the number of same characters (or more) in a row that cause the repeat character count output. If the value is set to 1, 2 or 3 then it's treated as if it was zero. In other words, no repeat character count is given.
Created attachment 73097 [details] [review] Patch to mostly implement this.
Here's the current status of this. The attached patch has been checked into CVS HEAD. The "repeat character count" option should be available for all the scenerios above except: Numpad++ - default.sayAll Numpad+Enter - default.whereAmI Both of these routines call speech.speakUtterances(utterances) to speak the line(s) and I haven't yet determined how to easily hook the new functionality into this routine. By default, I've set the repeat character count to zero. I.e: settings.repeatCharacterLimit = 0 This is until Mike (and others) have given the go-ahead that it's doing what is required. To enable it, just add the following line to your ~/.orca/user-settings.py file: orca.settings.repeatCharacterLimit = 4 If you try this out, please give feedback via comments to this bug. Thanks.
I did an initial cursory test drive; will try it more thoroughly later today. One observation/question: Do we want it to apply to *all* characters or just characters that are non-alphanumeric? An example for your consideration: "Today my odometer read '111111'. Isn't that waaaaaaaaaaaay cooooooool?" ;-)
Okay, I didn't want to be working on what I was working on anyway. This is more interesting. ;-) If your punctuation level is set to some or none, the dash character is not normally spoken. Under these conditions, when you arrow to a line with a bunch of dashes, Orca says "X characters" rather than "X dash characters". So how should this be handled? What is currently going on isn't ideal, but should it say "X dash characters" or nothing at all? Or.... perhaps a combination of the two? I'm leaning towards a combination. Here's my thinking: If I turn my punctuation level to some or none it is because I want to be able to focus on the content without being distracted by its markup. If I then do a Say All, it's because I want to take all of the content in without really looking at it -- in other words hear the information as someone would read it to me. In this case, I don't care about any decorative punctuation. However, if I'm reading line by line or word by word, then I am looking more closely at what's there. I still don't want to be distracted by normal punctuation (if I did, I'd turn punctuation back to most or all), but if there's a bunch of decorative punctuation there, I'd like to know about it.
I'll let Mike answer the "Do we want it to apply to *all* characters or just characters that are non-alphanumeric?" question. As for your odemeter; it suggests you need a new car. 8-) It is cool though. Unless it's in binary. Ack, I forgot to factor in the punctuation level. My feeling with multiple characters in a row that are also of a punctuation type is that they are not punctuation in this case. But that's just me. I'll leave it to the experts. Mike, how do you want that handled?
Hi Joanie, my preference here would just be to honor the punctuation setting. For example if you had punctuation set to some and hit a line of periods you would hear nothing as period is not spoken at the some level. My reason for saying this is if you are for example: reading an email which has been quoted and replied to many times you might hear a group of > symbols at the beginning of a line that you didn't want to.
Another idea/question: What, if anything, do we want to do about decorative punctuation that is composed of more than one punctuation mark and/or includes spaces? I see this sort of thing in email signatures and in email-based newsletters. For instance: :.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.: For all intents and purposes, the above is 60 repeated characters. Of course, since it's really 30 repeated pairs of characters, Orca announces them all. I'm not saying that Orca *needs* to be handling the above differently; I'm simply tossing it out for your consideration, since you're working on this feature currently. In that same vein, what if the repeated characters are separated by a space, such as: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Again, functionally speaking it's a bunch of repeated dashes. Should they be treated as such by Orca? And yes, I know what you're thinking Rich: #%^&*(@~! My answer is, of course, that those characters are not repeating. ;-) In all seriousness, I do realize that there are a huge number of possible punctuation combinations, repeating and otherwise, and it doesn't make sense to try to handle them all. I'm just wondering about the simpler cases. Thanks! And thanks for implementing this. As it is currently, it is very cool.
Mike, I see you're point about the greater thans in email. Those are just plain evil. So how 'bout this: If there are a bunch of repeated characters (like the greater thans) on a line that also has text, do as Mike suggests and respect the punctuation settings in Orca. But, if the repeated characters are on a line by themselves (i.e. 30 greater thans, nothing else), indicate their presence ("30 greater than characters") regardless of the level of punctuation. Just a thought. Rich, I drive a Volvo. I'll still be chugging along when it IS in binary. (W 4 repeated o charcters H 4 repeated o characters). ;-)
Hi Joanie, this is a good idea. I'll let Rich speak to how practical it would be to implement.
Created attachment 73110 [details] [review] Another patch to hopefully now respect the punctuation level.
I've added in the code in the second patch to CVS HEAD. Hopefully this now respects the various punctuation levels. I'll respond to all the other various suggestions tomorrow after you've had a chance to test this new version. Note that if you think the adjustForRepeats() code is doing the wrong thing with a specific test line, try setting orca.settings.repeatCharacterLimit = 0 in your ~/.orca/user-settings.py file and repeating the test. This is the equivalent of not adjusting the text at all.
The other thing I should mention is I've only been testing this against gedit. If you try it with Evolution or StarOffice or OpenOffice and you find problems that aren't present with gedit, please file separate bugs. I don't want this to be a catchall for all repeated character count problems. I'll never get to close it. ;-)
Regarding #21: It does indeed seem to respect the various punctuation levels. Regarding #22: <backspace, backspace, backspace, backspace> ;-) See bug #356970. I've been testing in OOo 2.0.4 and it's good there. It also works in the comment edit boxes here. Very cool!!!
Here are my (hopefully) final comments on this. 1/ Didn't see anything back from Mike on whether we should just do the repeat stuff for punctuation characters, so I'm leaving it as it is (i.e. does it for both punctuation and non-punctuation). It's a trivial change to adjust this if needed. 2/ As for looking and handling things like: :.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.: That's more than I want to do right now. There's probably a post-doc dissertation in their if somebody were interested. I'm not. The existing functionality is sufficient. 3/ I've filed a separate enhancement request on the "special line" request. See bug #357063. Just so I can close this bug. So I've just checked in the change to settings.py: settings.repeatCharacterLimit = 4 and I'm now closing out this bug/feature request. Thanks for all the testing and feedback.
Boing! said Zebadee. I totally forgot that I still have to handle: Numpad++ - default.sayAll Numpad+Enter - default.whereAmI Both of these routines call speech.speakUtterances(utterances) to speak the line(s) and I haven't yet determined how to easily hook the new functionality into this routine. I'll need to work with will on this, so reopening for now so we don't forget. Thanks for the GentleHint(TM) Mike.
When you closed this, I was going to move on. But since it's reopened.... I'd like to enter one last plea for the repeat characters count being applied only to non-alphanumeric characters. Perhaps it is the company that I keep, but in casual environments (email from friends, IM) I see things like * Waaaaay coooool * Wooooo hooooo * Grrrrrrrrrrr In fact, the other day in #orca of all places, someone said "niiiiiice". I also think that, while it is rare to come across a number that is unpunctuated and has at least 4 of the same digit in a row, it does happen on occasion. I think that numbers should be treated as numbers. I promise this is the last I'll say/beg/plead on the subject. That said, Mike, what do you think? Thanks for your consideration guys! (And sorry, Rich)
After thinking more about this I have to agree with Joanie. Rich, since you said this is a simple change lets go for it.
No problem. It's a trivial change. I'll check it in after lunch. Thanks.
Created attachment 73164 [details] [review] Patch to not give the repeated character count for non-punctuation characters. Patch checked into CVS HEAD.
Works great! Thanks so much!!
Created attachment 73182 [details] [review] Patch to do repeated character count for the whereAmI() function. Checked into CVS HEAD. Test with Numpad-Enter. Just the sayAll() functionality to do.
Created attachment 73188 [details] [review] Patch to add in repeated character count for sayAll() I've added in support for the repeated character count for the sayAll() function now. I had to move _addRepeatSegment() and adjustForRepeats() from default.py to util.py, so this is a fairly substantial change. I think everything is okay. Could you give it a little test before I close out the bug please. Thanks.
sayAll and whereAmI seem to be working correctly. I did notice one significant thing and one minor thing. The minor thing: If you give your document a file name with repeated characters (e.g. "---------- lame file name @@@@@@@@@@@@@"), the repeated character feature doesn't kick in for the window title when you use whereAmI. Personally, I am of the opinion that anyone who names their documents in this fashion deserves to hear all of the characters. I was merely trying to be thorough in my testing. ;-) The significant thing: 1. Launch Gedit 2. Type the following lines: I think that I am testing. O my! Very odd. i am testing. 3. Use NumPad 4, 5, and 6 to flat review through the text word by word. When I do this, the "I"s and the "O" are not spoken; everything else, including the "i" in the fourth line, is. It seems that when a word consists of a single capital letter, it is no longer spoken when flat reviewing by word.
Hi Joanie, Re: the minor thing. The repeated character count functionality only kicks in for the various scenerios in comment #10. I thought about putting it lower down so that *every* text string went through it, but decided against it. I think, for consistency we should leave this the way it is, although it would be a fairly simply change to adjust the whereAmI() routine to handle this. But I'm sure we'd then find a load of other case. Re: the major thing. Flat-reviewing the initial "I" was causing: Traceback (most recent call last):
+ Trace 72964
consumed = self._function(script, inputEvent)
self._reviewCurrentItem(inputEvent, targetCursorCell, clickCount)
elif string.isupper() and not spellWord:
I've just checked in a patch for that one, so that should be fixed. Thanks for testing it.
Patch/fix confirmed. Thanks for implementing this feature!!
Thanks Joanie. Looks like this one is done. Closing as FIXED.