Bug 356425 – [blocked] Selecting certain text by word in OOo causes Orca to restart

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 356425 - [blocked] Selecting certain text by word in OOo causes Orca to restart


Summary:	[blocked] Selecting certain text by word in OOo causes Orca to restart


Status:	RESOLVED FIXED

Product:	orca
Classification:	Applications
Component:	general
Version:	1.0.x
Hardware:	Other All

Importance:	Normal major
Target Milestone:	---
Assigned To:	Willie Walker
QA Contact:	Orca Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2006-09-17 18:06 UTC by Joanmarie Diggs (IRC: joanie)
Modified:	2006-10-17 21:46 UTC

See Also:
GNOME target:	---
GNOME version:	2.15/2.16

Attachments
Test case (8.45 KB, application/vnd.oasis.opendocument.text) 2006-09-17 18:07 UTC, Joanmarie Diggs (IRC: joanie)		Details
Orca debug output on testing this. (52.83 KB, text/plain) 2006-09-21 17:03 UTC, Rich Burridge		Details
Standalone app to show OOo has implemented the text specialization incorrectly (15.44 KB, text/plain) 2006-09-28 14:00 UTC, Willie Walker		Details
Ugly patch to workaround the problem (758 bytes, patch) 2006-09-28 14:57 UTC, Willie Walker	none	Details \| Review
Continuation of ugly patch (2.81 KB, patch) 2006-09-28 17:29 UTC, Willie Walker	none	Details \| Review
Isolation of ugliness to StarOffice via addition of getText method to default.py (4.65 KB, patch) 2006-09-28 20:54 UTC, Willie Walker	committed	Details \| Review
More complete use of self.getText in all scripts (8.32 KB, patch) 2006-09-29 17:04 UTC, Willie Walker	committed	Details \| Review

Description Joanmarie Diggs (IRC: joanie) 2006-09-17 18:06:10 UTC

Please describe the problem:
Periodically, selecting text in Openoffice.org causes Orca to restart itself.  In other words, rather than telling you what you just selected it says "Welcome to Orca."  I have not been able to reliably reproduce it until now, and I have yet to figure out what is special about the test case I'm presenting, hence the vague summary. I will attach the relevant file after filing this bug. 

Steps to reproduce:
1. Launch Orca
2. Open the file font-test.odt (to be attached)
3. Up arrow to move the cursor to the line that reads: "(3)This text will be in a different font - Helvetica Narrow".  Be sure that you are at the end of the line.
4. Use Control Shift Left Arrow to select the period, then "Narrow", and finally "Helvetica"


Actual results:
Upon selecting "Helvetica", Orca restarts.  As an aside, Orca also does not correctly report what is being selected.  For instance it says "ow" instead of "dot" when the period gets selected.  This problem seems to be unique to this particular text in this particular document.

Expected results:
Orca would correctly indicate what is being selected and would not restart itself.

Does this happen every time?
Yes.

Other information:
I can reliably reproduce this problem using this particular document both in OOo 2.0.4 RC1 and in 2.0.3 that comes with Edgy.  I am running the latest Edgy and Orca from CVS HEAD on all three machines.

Comment 1 Joanmarie Diggs (IRC: joanie) 2006-09-17 18:07:35 UTC

Created attachment 72941 [details]
Test case

This is the file mentioned in the bug report.

Comment 2 Mike Pedersen 2006-09-17 18:35:44 UTC

Hey Joannie, Does the text change in any part of this text?  For example, font size or type.  Also which synth are you using?

Comment 3 Joanmarie Diggs (IRC: joanie) 2006-09-17 18:44:52 UTC

Hey Mike.  Yes, it all changes from line to line: font size, type, color, alignment, effects such as relief, etc.  *Most* of the other attributes are described in the text itself.  

I am using Fonix DECTalk.

Comment 4 Rich Burridge 2006-09-21 17:03:11 UTC

Created attachment 73152 [details]
Orca debug output on testing this.

I've just reproduced this with latest Orca on my Solaris box
with latest OpenOffice v2.0.4 release candidate.

Comment 5 Rich Burridge 2006-09-21 17:06:47 UTC

Here's the interest part of the debug output. This occured
for me as I was typing Shift-Left Arrow, and I'd just selected
the "H" in Helvetica.

----
vvvvv PROCESS OBJECT EVENT object:text-caret-moved vvvvv
OBJECT EVENT: object:text-caret-moved                  detail=(44,0)
KEYEVENT: type=1
          hw_code=100
          modifiers=1
          event_string=(Left)
          is_text=True
          time=1158857941.392078
    app.name='soffice.bin'        name=None role='paragraph' state='EDITABLE ENABLED FOCUSABLE FOCUSED MULTI_LINE MULTISELECTABLE SHOWING VISIBLE'
BRAILLE LINE:  '(3) This text will be in a different font â Helvetica Narrow. $l'
     VISIBLE:  'ferent font â Helvetica Narrow', cursor=15
/usr/lib/python2.5/site-packages/orca/brlmon.py:145: GtkWarning: Failed to set label from markup due to error parsing markup: Error on line 1 char 9: Invalid UTF-8 encoded text
  % char)
SPEECH OUTPUT: ''

Traceback (most recent call last):

+ Trace 72925

File "/usr/lib/python2.5/site-packages/orca/gnomespeechfactory.py", line 698 in __speak
```
return speaker.say(text)
```
File "/usr/lib/python2.5/site-packages/orca/gnomespeechfactory.py", line 95 in say
```
return self.gnome_speaker.say(text) COMM_FAILURE
```
File "/usr/lib/python2.5/site-packages/orca/focus_tracking_presenter.py", line 410 in _processObjectEvent
```
s.processObjectEvent(event)
```
File "/usr/lib/python2.5/site-packages/orca/script.py", line 160 in processObjectEvent
```
self.listeners[key](event)
```
File "/usr/lib/python2.5/site-packages/orca/default.py", line 1838 in onCaretMoved
```
self._presentTextAtNewCaretPosition(event)
```
File "/usr/lib/python2.5/site-packages/orca/default.py", line 1752 in _presentTextAtNewCaretPosition
```
self.sayCharacter(event.source)
```
File "/usr/lib/python2.5/site-packages/orca/default.py", line 1143 in sayCharacter
```
speech.speak(character, voice)
```
File "/usr/lib/python2.5/site-packages/orca/speech.py", line 150 in speak
```
__speechserver.speak(text, __resolveACSS(acss), interrupt)
```
File "/usr/lib/python2.5/site-packages/orca/gnomespeechfactory.py", line 725 in speak
```
self.__speak(text, acss, interrupt)
```
File "/usr/lib/python2.5/site-packages/orca/gnomespeechfactory.py", line 705 in __speak
```
self.reset()
```
File "/usr/lib/python2.5/site-packages/orca/gnomespeechfactory.py", line 905 in reset
```
self.shutdown()
```
File "/usr/lib/python2.5/site-packages/orca/gnomespeechfactory.py", line 880 in shutdown
```
speaker.stop()
```
File "/usr/lib/python2.5/site-packages/orca/gnomespeechfactory.py", line 98 in stop
```
self.gnome_speaker.stop() COMM_FAILURE
```
File "/usr/lib/python2.5/site-packages/orca/gnomespeechfactory.py", line 698 in __speak
```
return speaker.say(text)
```
File "/usr/lib/python2.5/site-packages/orca/gnomespeechfactory.py", line 95 in say
```
return self.gnome_speaker.say(text) COMM_FAILURE
```
File "<string>", line 1 in <module>
File "/usr/lib/python2.5/site-packages/orca/orca.py", line 1223 in main
```
start(registry) # waits until we stop the registry
```
File "/usr/lib/python2.5/site-packages/orca/orca.py", line 983 in start
```
registry.start()
```
File "/usr/lib/python2.5/site-packages/orca/atspi.py", line 168 in start
```
bonobo.main()
```
File "/usr/lib/python2.5/site-packages/orca/focus_tracking_presenter.py", line 617 in _dequeueEvent
```
self._processObjectEvent(event)
```
File "/usr/lib/python2.5/site-packages/orca/focus_tracking_presenter.py", line 410 in _processObjectEvent
```
s.processObjectEvent(event)
```
File "/usr/lib/python2.5/site-packages/orca/script.py", line 160 in processObjectEvent
```
self.listeners[key](event)
```
File "/usr/lib/python2.5/site-packages/orca/default.py", line 1838 in onCaretMoved
```
self._presentTextAtNewCaretPosition(event)
```
File "/usr/lib/python2.5/site-packages/orca/default.py", line 1752 in _presentTextAtNewCaretPosition
```
self.sayCharacter(event.source)
```
File "/usr/lib/python2.5/site-packages/orca/default.py", line 1143 in sayCharacter
```
speech.speak(character, voice)
```
File "/usr/lib/python2.5/site-packages/orca/speech.py", line 150 in speak
```
__speechserver.speak(text, __resolveACSS(acss), interrupt)
```
File "/usr/lib/python2.5/site-packages/orca/gnomespeechfactory.py", line 725 in speak
```
self.__speak(text, acss, interrupt)
```
File "/usr/lib/python2.5/site-packages/orca/gnomespeechfactory.py", line 705 in __speak
```
self.reset()
```
File "/usr/lib/python2.5/site-packages/orca/gnomespeechfactory.py", line 899 in reset
```
debug.printStack(debug.LEVEL_ALL)
```
File "/usr/lib/python2.5/site-packages/orca/debug.py", line 135 in printStack
```
traceback.print_stack(None, 100, debugFile) ----
```

Comment 6 Rich Burridge 2006-09-21 17:11:27 UTC

Another data point. This fails for both DECtalk and FreeTTS.

Comment 7 Willie Walker 2006-09-25 18:57:40 UTC

I'm going to guess this has to do with unicode characters and UTF-8 encoding.  Do you see similar problems when dealing with bullets?

Comment 8 Joanmarie Diggs (IRC: joanie) 2006-09-28 05:31:23 UTC

I hadn't noticed one, but I just gave it a try.  Yes, I do indeed see a similar problem.  Selection isn't even necessary.

I created a test document in OOo with the following structure:

<top of document>
* item 1
* item 2
* item 3
<end of document>

where the asterisks above represent OOo's default bullet (first choice in the Bullets and Numbering dialog)

If I arrow from the end of any given item towards the left, Orca does the following:

1.  Speaks two characters to the left of the current character e.g. "m" when the current character is "3", "e" when the current character is the space, "t" when the current character is the "m" and so on.

2. Restarts itself when the current character is the "t" in "item".

From debug.out:


Traceback (most recent call last):

+ Trace 73447

File "/usr/lib/python2.4/site-packages/orca/gnomespeechfactory.py", line 698 in __speak
```
return speaker.say(text)
```
File "/usr/lib/python2.4/site-packages/orca/gnomespeechfactory.py", line 95 in say
```
return self.gnome_speaker.say(text) COMM_FAILURE
```
File "/usr/lib/python2.4/site-packages/orca/gnomespeechfactory.py", line 698 in __speak
```
return speaker.say(text)
```
File "/usr/lib/python2.4/site-packages/orca/gnomespeechfactory.py", line 95 in say
```
return self.gnome_speaker.say(text) COMM_FAILURE
```


Restarting speech...
Something looks wrong with speech.  Aborting.

Comment 9 Willie Walker 2006-09-28 14:00:46 UTC

Created attachment 73560 [details]
Standalone app to show OOo has implemented the text specialization incorrectly

Well...I was hoping for a smoking unicode gun in Orca, but I believe it turns out that this is a smoking unicode gun in OOo.  The attached standalone app seems to show that OOo is making assumptions that bullets are single-byte UTF-8 strings.  Either that, or OOo is assuming the offsets in the text specialization are byte offsets and not character offsets.  In any case, this appears to be a bug in OOo.

Comment 10 Willie Walker 2006-09-28 14:23:26 UTC

Opened issue in OOo: http://www.openoffice.org/issues/show_bug.cgi?id=69945

Comment 11 Willie Walker 2006-09-28 14:57:13 UTC

Created attachment 73562 [details] [review]
Ugly patch to workaround the problem

Here's an ugly patch to workaround part of the problem.  It seems as though getText(0,-1) on an OOo text object will give us back the entire string as a UTF-8 string.  In addition, the caret offset from an OOo text object is generally correct in that it is a character offset and not a byte offset.  So, we can: get all the text, turn it into a unicode type to get the unicode character at the given offset, and then turn the character back into a UTF-8 string for speaking.  The problem with this patch is that it gets the entire document.  This can be very inefficient for very large documents.  As such, I'm not sure we really should commit this.  In addition, I'm also not sure it really prevents the crash (can someone please try this?).  Finally, the OOo caret offset value is still a bit 'jumpy' (we never get an offset of 1) when navigating around bullets.  This is an OOo bug that I'm not sure we can work around.

Comment 12 Joanmarie Diggs (IRC: joanie) 2006-09-28 15:37:23 UTC

Just tried the patch.  

It does NOT prevent the crash with the original file (font-test.odt).  

It DOES prevent the crash with the bulleted items.  HOWEVER, Orca is not always reading the correct characters, though it is much improved.  When left arrowing from the end of "item 3" I get:

(current character, spoken character)
3, 3
space, space
m, m
e, e
t, t
i, <nothing>
bullet, <nothing>
carriage return at the end of "2", 3
2, 2
space, space
m, m
e, e
t, t
i, <nothing>
bullet, <nothing>
carriage return at the end of "1", 2

Rinse and repeat. :-)

Hope this helps!

Comment 13 Willie Walker 2006-09-28 16:18:04 UTC

Argh.  Bummer about not solving the original problem.  I'll look into that some more.  With respect to the caret position, I think that's an OOo problem where OOo is not presenting correct caret offsets when the caret moves.

Comment 14 Willie Walker 2006-09-28 17:29:39 UTC

Created attachment 73573 [details] [review]
Continuation of ugly patch

Here's a more thorough ugly patch.  The basic idea is that any call to getText to get a substring is not guaranteed to deliver the actual substring when we work with OOo.  So, we get the entire string, convert it to the unicode type, get the substring from that, and then convert back to UTF-8.  It's very ugly and I'd much rather see the OOo problem be fixed at the source before we have to hack this bad.

Comment 15 Joanmarie Diggs (IRC: joanie) 2006-09-28 18:21:19 UTC

This patch does solve the original problem:  Selected text is correctly spoken; Orca doesn't restart itself.

Results with the sample bulleted list are the same as reported in comment #12.

Comment 16 Rich Burridge 2006-09-28 18:43:19 UTC

Would it make sense to make these change just in the StarOffice.py script?

Comment 17 Willie Walker 2006-09-28 19:50:55 UTC

(In reply to comment #16)
> Would it make sense to make these change just in the StarOffice.py script?
> 

I was actually thinking about this on my bike ride today: "wow...how would I go about dealing with atspi implementations whose accessible text implementations partially work?"  We've done some of this work already by restricting what we do.  For example, we only do line by line versus sentence by sentence in SayAll because the sentence offset handling in atspi implementations can be unpredictable and unreliable.  In addition, we also have a lot of defensive code in Orca to detect and handle nonsensical values we get back from calls to getTextAtOffset.

For this particular case, I'm wondering if the graceful way to handle things would be to have a getText(self, obj, startOffset, endOffset) method as part of the default Script.  Instead of calling obj.text.getText directly, all scripts would call their self.getText method.  The default.py implementation would merely call obj.text.getText(startOffset, endOffset).  The StarOffice.py implementation, however, would do the ugly hack.  What do you think?

Comment 18 Rich Burridge 2006-09-28 20:18:32 UTC

That's sounds like it should work nicely.

Comment 19 Willie Walker 2006-09-28 20:54:03 UTC

Created attachment 73593 [details] [review]
Isolation of ugliness to StarOffice via addition of getText method to default.py

Comment 20 Willie Walker 2006-09-29 17:04:57 UTC

Created attachment 73653 [details] [review]
More complete use of self.getText in all scripts

Comment 21 Willie Walker 2006-10-15 00:25:25 UTC

Add accessibility keyword.  Apologies for spam.

Comment 22 Willie Walker 2006-10-17 21:45:38 UTC

Closing this one out.  See also http://www.openoffice.org/issues/show_bug.cgi?id=69945 to follow the fix being done in OOo for this problem.  Will open a separate bug for the indexing problem around bullets.