Bug 354469 – [requirement] Repeated character count

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 354469 - [requirement] Repeated character count


Summary:	[requirement] Repeated character count


Status:	RESOLVED FIXED

Product:	orca
Classification:	Applications
Component:	general
Version:	1.0.x
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	2.18.0
Assigned To:	Rich Burridge
QA Contact:	Orca Maintainers

URL:
Whiteboard:

Duplicates:	348061 (view as bug list)
Depends on:
Blocks:

Reported:	2006-09-05 16:21 UTC by Willie Walker
Modified:	2008-07-22 19:24 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Patch to mostly implement this. (4.71 KB, patch) 2006-09-20 17:24 UTC, Rich Burridge	none	Details \| Review
Another patch to hopefully now respect the punctuation level. (2.98 KB, patch) 2006-09-20 20:59 UTC, Rich Burridge	none	Details \| Review
Patch to not give the repeated character count for non-punctuation characters. (663 bytes, patch) 2006-09-21 19:57 UTC, Rich Burridge	none	Details \| Review
Patch to do repeated character count for the whereAmI() function. (681 bytes, patch) 2006-09-22 00:46 UTC, Rich Burridge	none	Details \| Review
Patch to add in repeated character count for sayAll() (9.33 KB, patch) 2006-09-22 01:22 UTC, Rich Burridge	none	Details \| Review

Description Willie Walker 2006-09-05 16:21:17 UTC

Orca must be able to optionally compress the repetition of character by saying something such as "25 dashes" instead of "dash dash dash dash dash...".  For more information, see: http://cvs.gnome.org/viewcvs/*checkout*/orca/docs/doc-set/orca.html#URSPEECH

Comment 1 Rich Burridge 2006-09-05 16:44:59 UTC

Tracking.

Comment 2 Rich Burridge 2006-09-06 22:21:38 UTC

*** Bug 348061 has been marked as a duplicate of this bug. ***

Comment 3 Rich Burridge 2006-09-08 16:29:54 UTC

The example you give is "dash dash ... dash" becomes
"25 dashes". Does that mean we need a plural equivalent
of each kind of character? Wouldn't it be better/easier to
say "25 dash characters" or something like that?
If we do take this approach, then we also need to think about
where we start doing this optimisation. i.e. replacing "dash dash"
with "2 dash characters" is inefficient. Three dashs speaks the same
number of words. Four and beyond is a reduction.

Mike, what would you like done here?

Comment 4 Mike Pedersen 2006-09-08 16:57:41 UTC

The reason we want " 25 dashes" is just to minimize the amount of extra speech while still conveying the needed information.  4 dashes or less should just be spoken as "dash dash dash dash"

Comment 5 Rich Burridge 2006-09-08 17:07:03 UTC

I understand that Mike, but if it's implemented this
way, what's the magic formula for turning the name of 
a character into the plural name of a character?

Comment 6 Joanmarie Diggs (IRC: joanie) 2006-09-08 19:19:29 UTC

Playing devil's advocate: 

Most of the time, do you really need to know how many dashes there are?  If there are 25 dashes on a line, most of the time what matters is that there are a bunch of dashes there.  The fact that there are 20 or 25 or 30 is *usually* extraneous info.  Under those circumstances, maybe "dash repeated" would suffice?  

On the rare circumstances where you must know the absolute number:  Based on Rich's comment, knowing how many dashes we have is not the issue, it's coming up with a plural -- and presumably localizing it.  So maybe you could give the command to spell the current word and Orca would respond with "4 dash characters".

Just a thought....

Comment 7 Mike Pedersen 2006-09-08 19:45:19 UTC

I'm really reluctant to lose the actual number of characters because I think it helps understand layout but I do understand the problems this will pose for translation.  After thinking about it I think it would be OK to go with "25 dash characters" for example and lose the plural for the actual character.  
What do you all think?

Comment 8 Joanmarie Diggs (IRC: joanie) 2006-09-08 20:34:11 UTC

For whatever it's worth, I think "25 dash characters" is great.  My suggestion was merely a response the plural/localization issue....

Comment 9 Mike Pedersen 2006-09-12 22:11:47 UTC

This character counting should be treated similar to the speaking of a word.  If when reading for example, by line, word, or in say all a large group of the same character is encountered the repeat filtering should kick in.

Comment 10 Rich Burridge 2006-09-20 15:15:37 UTC

So here's the spec I'm working off:

The "repeat character count" potentially can happen
for the following scenerios:

1/ Arrow up and arrow down.  This is in default.sayLine().
   If the shift key is down, then default.sayPhrase() is
   called.

2/ Control-left and Control-right. Also Control-Shift-Left
   and Control-Shift-Right. These are covered by default.sayWord()
   and default.sayPhrase().

3/ Selection keys such as Shift+Home and Shift+End and
   Control+Home and Control+End. These are covered by
   default.sayPhrase() and default.sayLine().

4/ If echo by word is turned on (after each completed word).
   Handled by default.echoPreviousWord().

5/ The following flat review commands:

      Numpad+-     - default.toggleFlatReviewMode
      Numpad++     - default.sayAll
      Numpad+Enter - default.whereAmI

      Numpad+7     - default.reviewPreviousLine
      Numpad+8     - default.reviewCurrentLine
      Numpad+9     - default.reviewNextLine

      Numpad+4     - default.reviewPreviousItem
      Numpad+5     - default.reviewCurrentItem
      Numpad+6     - default.reviewNextItem

As some people will want this and others might not,
there will be a setting in settings.py that determines
whether this functionality is enabled.

repeatCharacterLimit = <n>

If <n> is 0, then there would be no repeat characters.
Otherwise <n> would be the number of same characters (or more)
in a row that cause the repeat character count output.
If the value is set to 1, 2 or 3 then it's treated as if it was
zero. In other words, no repeat character count is given.

Comment 11 Rich Burridge 2006-09-20 17:24:02 UTC

Created attachment 73097 [details] [review]
Patch to mostly implement this.

Comment 12 Rich Burridge 2006-09-20 17:26:39 UTC

Here's the current status of this. The attached patch has been
checked into CVS HEAD. The "repeat character count" option should
be available for all the scenerios above except:

      Numpad++     - default.sayAll
      Numpad+Enter - default.whereAmI

Both of these routines call speech.speakUtterances(utterances)
to speak the line(s) and I haven't yet determined how to easily
hook the new functionality into this routine.

By default, I've set the repeat character count to zero. I.e:

settings.repeatCharacterLimit = 0

This is until Mike (and others) have given the go-ahead that
it's doing what is required.

To enable it, just add the following line to your
~/.orca/user-settings.py file:

orca.settings.repeatCharacterLimit = 4

If you try this out, please give feedback via comments to this bug.

Thanks.

Comment 13 Joanmarie Diggs (IRC: joanie) 2006-09-20 17:57:31 UTC

I did an initial cursory test drive; will try it more thoroughly later today.

One observation/question:  Do we want it to apply to *all* characters or just characters that are non-alphanumeric?  An example for your consideration:

"Today my odometer read '111111'.  Isn't that waaaaaaaaaaaay cooooooool?" ;-)

Comment 14 Joanmarie Diggs (IRC: joanie) 2006-09-20 18:21:34 UTC

Okay, I didn't want to be working on what I was working on anyway. This is more interesting. ;-)

If your punctuation level is set to some or none, the dash character is not normally spoken.  Under these conditions, when you arrow to a line with a bunch of dashes, Orca says "X characters" rather than "X dash characters".  

So how should this be handled?  What is currently going on isn't ideal, but should it say "X dash characters" or nothing at all?  Or.... perhaps a combination of the two?  I'm leaning towards a combination.  Here's my thinking:

If I turn my punctuation level to some or none it is because I want to be able to focus on the content without being distracted by its markup.  If I then do a Say All, it's because I want to take all of the content in without really looking at it -- in other words hear the information as someone would read it to me.  In this case, I don't care about any decorative punctuation.

However, if I'm reading line by line or word by word, then I am looking more closely at what's there.  I still don't want to be distracted by normal punctuation (if I did, I'd turn punctuation back to most or all), but if there's a bunch of decorative punctuation there, I'd like to know about it.

Comment 15 Rich Burridge 2006-09-20 18:27:22 UTC

I'll let Mike answer the "Do we want it to apply to *all* characters or just
characters that are non-alphanumeric?" question.

As for your odemeter; it suggests you need a new car. 8-) It is
cool though. Unless it's in binary.

Ack, I forgot to factor in the punctuation level. My feeling with multiple
characters in a row that are also of a punctuation type is
that they are not punctuation in this case. But that's just me.
I'll leave it to the experts. Mike, how do you want that handled?

Comment 16 Mike Pedersen 2006-09-20 18:30:41 UTC

Hi Joanie, my preference here would just be to honor the punctuation setting.  For example if you had punctuation set to some and hit a line of periods you would hear nothing as period is not spoken at the some level.  My reason for saying this is if you are for example: reading an email which has been quoted and replied to many times you might hear a group of > symbols at the beginning of a line that you didn't want to.

Comment 17 Joanmarie Diggs (IRC: joanie) 2006-09-20 18:46:08 UTC

Another idea/question:

What, if anything, do we want to do about decorative punctuation that is composed of more than one punctuation mark and/or includes spaces?  I see this sort of thing in email signatures and in email-based newsletters.  For instance:

:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:

For all intents and purposes, the above is 60 repeated characters.  Of course, since it's really 30 repeated pairs of characters, Orca announces them all.  I'm not saying that Orca *needs* to be handling the above differently; I'm simply tossing it out for your consideration, since you're working on this feature currently.

In that same vein, what if the repeated characters are separated by a space, such as:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Again, functionally speaking it's a bunch of repeated dashes.   Should they be treated as such by Orca?

And yes, I know what you're thinking Rich:  #%^&*(@~!

My answer is, of course, that those characters are not repeating. ;-) 

In all seriousness, I do realize that there are a huge number of possible punctuation combinations, repeating and otherwise, and it doesn't make sense to try to handle them all.  I'm just wondering about the simpler cases.

Thanks!  And thanks for implementing this.  As it is currently, it is very cool.

Comment 18 Joanmarie Diggs (IRC: joanie) 2006-09-20 18:51:16 UTC

Mike, I see you're point about the greater thans in email.  Those are just plain evil.  So how 'bout this:  If there are a bunch of repeated characters (like the greater thans) on a line that also has text, do as Mike suggests and respect the punctuation settings in Orca.  But, if the repeated characters are on a line by themselves (i.e. 30 greater thans, nothing else), indicate their presence ("30 greater than characters") regardless of the level of punctuation.  Just a thought.

Rich, I drive a Volvo.  I'll still be chugging along when it IS in binary. (W 4 repeated o charcters H 4 repeated o characters). ;-)

Comment 19 Mike Pedersen 2006-09-20 19:02:55 UTC

Hi Joanie, this is a good idea.  I'll let Rich speak to how practical it would be to implement.

Comment 20 Rich Burridge 2006-09-20 20:59:22 UTC

Created attachment 73110 [details] [review]
Another patch to hopefully now respect the punctuation level.

Comment 21 Rich Burridge 2006-09-20 21:04:46 UTC

I've added in the code in the second patch to CVS HEAD.
Hopefully this now respects the various punctuation levels.

I'll respond to all the other various suggestions tomorrow
after you've had a chance to test this new version.

Note that if you think the adjustForRepeats() code is doing
the wrong thing with a specific test line, try setting

orca.settings.repeatCharacterLimit = 0

in your ~/.orca/user-settings.py file and repeating the test.
This is the equivalent of not adjusting the text at all.

Comment 22 Rich Burridge 2006-09-20 21:23:45 UTC

The other thing I should mention is I've only been testing this
against gedit. If you try it with Evolution or StarOffice or
OpenOffice and you find problems that aren't present with gedit, 
please file separate bugs. I don't want this to be a catchall 
for all repeated character count problems. I'll never get to 
close it. ;-)

Comment 23 Joanmarie Diggs (IRC: joanie) 2006-09-20 21:44:56 UTC

Regarding #21:  It does indeed seem to respect the various punctuation levels.

Regarding #22:  <backspace, backspace, backspace, backspace> ;-) See bug #356970.  I've been testing in OOo 2.0.4 and it's good there.  It also works in the comment edit boxes here.  Very cool!!!

Comment 24 Rich Burridge 2006-09-21 15:11:50 UTC

Here are my (hopefully) final comments on this.

1/ Didn't see anything back from Mike on whether we should just
   do the repeat stuff for punctuation characters, so I'm leaving
   it as it is (i.e. does it for both punctuation and non-punctuation).
   It's a trivial change to adjust this if needed.

2/ As for looking and handling things like:

   :.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:

   That's more than I want to do right now. There's probably a
   post-doc dissertation in their if somebody were interested. 
   I'm not. The existing functionality is sufficient.

3/ I've filed a separate enhancement request on the "special line"
   request. See bug #357063. Just so I can close this bug.

So I've just checked in the change to settings.py:

settings.repeatCharacterLimit = 4

and I'm now closing out this bug/feature request. Thanks for all the 
testing and feedback.

Comment 25 Rich Burridge 2006-09-21 16:05:25 UTC

Boing! said Zebadee.

I totally forgot that I still have to handle:

      Numpad++     - default.sayAll
      Numpad+Enter - default.whereAmI

Both of these routines call speech.speakUtterances(utterances)
to speak the line(s) and I haven't yet determined how to easily
hook the new functionality into this routine.

I'll need to work with will on this, so reopening for now
so we don't forget. Thanks for the GentleHint(TM) Mike.

Comment 26 Joanmarie Diggs (IRC: joanie) 2006-09-21 18:20:27 UTC

When you closed this, I was going to move on. But since it's reopened....

I'd like to enter one last plea for the repeat characters count being applied only to non-alphanumeric characters.  Perhaps it is the company that I keep, but in casual environments (email from friends, IM) I see things like

* Waaaaay coooool
* Wooooo hooooo
* Grrrrrrrrrrr

In fact, the other day in #orca of all places, someone said "niiiiiice".  

I also think that, while it is rare to come across a number that is unpunctuated and has at least 4 of the same digit in a row, it does happen on occasion.  I think that numbers should be treated as numbers.

I promise this is the last I'll say/beg/plead on the subject.  That said, Mike, what do you think?

Thanks for your consideration guys!  (And sorry, Rich)

Comment 27 Mike Pedersen 2006-09-21 18:25:12 UTC

After thinking more about this I have to agree with Joanie.  Rich, since you said this is a simple change lets go for it.

Comment 28 Rich Burridge 2006-09-21 18:37:22 UTC

No problem. It's a trivial change. I'll check it in after lunch.
Thanks.

Comment 29 Rich Burridge 2006-09-21 19:57:21 UTC

Created attachment 73164 [details] [review]
Patch to not give the repeated character count for non-punctuation characters.

Patch checked into CVS HEAD.

Comment 30 Joanmarie Diggs (IRC: joanie) 2006-09-21 20:39:52 UTC

Works great!  Thanks so much!!

Comment 31 Rich Burridge 2006-09-22 00:46:32 UTC

Created attachment 73182 [details] [review]
Patch to do repeated character count for the whereAmI() function.

Checked into CVS HEAD. Test with Numpad-Enter.

Just the sayAll() functionality to do.

Comment 32 Rich Burridge 2006-09-22 01:22:34 UTC

Created attachment 73188 [details] [review]
Patch to add in repeated character count for sayAll()

I've added in support for the repeated character count for the
sayAll() function now.
I had to move _addRepeatSegment() and adjustForRepeats() from default.py
to util.py, so this is a fairly substantial change.
I think everything is okay. Could you give it a little test before I
close out the bug please.

Thanks.

Comment 33 Joanmarie Diggs (IRC: joanie) 2006-09-22 02:25:37 UTC

sayAll and whereAmI seem to be working correctly. I did notice one significant thing and one minor thing.

The minor thing:  If you give your document a file name with repeated characters (e.g. "---------- lame file name @@@@@@@@@@@@@"), the repeated character feature doesn't kick in for the window title when you use whereAmI.  Personally, I am of the opinion that anyone who names their documents in this fashion deserves to hear all of the characters. I was merely trying to be thorough in my testing. ;-)

The significant thing: 

1. Launch Gedit
2. Type the following lines:

I think that I am testing.
O my!
Very odd.
i am testing.

3. Use NumPad 4, 5, and 6 to flat review through the text word by word.

When I do this, the "I"s and the "O" are not spoken; everything else, including the "i" in the fourth line, is. It seems that when a word consists of a single capital letter, it is no longer spoken when flat reviewing by word.

Comment 34 Rich Burridge 2006-09-22 02:45:43 UTC

Hi Joanie,

Re: the minor thing. The repeated character count functionality
only kicks in for the various scenerios in comment #10. I thought
about putting it lower down so that *every* text string went 
through it, but decided against it. I think, for consistency we
should leave this the way it is, although it would be a fairly simply
change to adjust the whereAmI() routine to handle this. But I'm sure
we'd then find a load of other case.

Re: the major thing. Flat-reviewing the initial "I" was causing:

Traceback (most recent call last):

+ Trace 72964

File "/usr/lib/python2.5/site-packages/orca/input_event.py", line 178 in processInputEvent
```
consumed = self._function(script, inputEvent)
```
File "/usr/lib/python2.5/site-packages/orca/default.py", line 2781 in reviewCurrentItem
```
self._reviewCurrentItem(inputEvent, targetCursorCell, clickCount)
```
File "/usr/lib/python2.5/site-packages/orca/default.py", line 2846 in _reviewCurrentItem
```
elif string.isupper() and not spellWord:
```

NameError: global name 'spellWord' is not defined


I've just checked in a patch for that one, so that should be fixed.

Thanks for testing it.

Comment 35 Joanmarie Diggs (IRC: joanie) 2006-09-22 02:54:25 UTC

Patch/fix confirmed.  

Thanks for implementing this feature!!

Comment 36 Rich Burridge 2006-09-22 15:40:25 UTC

Thanks Joanie. Looks like this one is done. Closing as FIXED.