GNOME Bugzilla – Bug 356425
[blocked] Selecting certain text by word in OOo causes Orca to restart
Last modified: 2006-10-17 21:46:00 UTC
Please describe the problem: Periodically, selecting text in Openoffice.org causes Orca to restart itself. In other words, rather than telling you what you just selected it says "Welcome to Orca." I have not been able to reliably reproduce it until now, and I have yet to figure out what is special about the test case I'm presenting, hence the vague summary. I will attach the relevant file after filing this bug. Steps to reproduce: 1. Launch Orca 2. Open the file font-test.odt (to be attached) 3. Up arrow to move the cursor to the line that reads: "(3)This text will be in a different font - Helvetica Narrow". Be sure that you are at the end of the line. 4. Use Control Shift Left Arrow to select the period, then "Narrow", and finally "Helvetica" Actual results: Upon selecting "Helvetica", Orca restarts. As an aside, Orca also does not correctly report what is being selected. For instance it says "ow" instead of "dot" when the period gets selected. This problem seems to be unique to this particular text in this particular document. Expected results: Orca would correctly indicate what is being selected and would not restart itself. Does this happen every time? Yes. Other information: I can reliably reproduce this problem using this particular document both in OOo 2.0.4 RC1 and in 2.0.3 that comes with Edgy. I am running the latest Edgy and Orca from CVS HEAD on all three machines.
Created attachment 72941 [details] Test case This is the file mentioned in the bug report.
Hey Joannie, Does the text change in any part of this text? For example, font size or type. Also which synth are you using?
Hey Mike. Yes, it all changes from line to line: font size, type, color, alignment, effects such as relief, etc. *Most* of the other attributes are described in the text itself. I am using Fonix DECTalk.
Created attachment 73152 [details] Orca debug output on testing this. I've just reproduced this with latest Orca on my Solaris box with latest OpenOffice v2.0.4 release candidate.
Here's the interest part of the debug output. This occured for me as I was typing Shift-Left Arrow, and I'd just selected the "H" in Helvetica. ---- vvvvv PROCESS OBJECT EVENT object:text-caret-moved vvvvv OBJECT EVENT: object:text-caret-moved detail=(44,0) KEYEVENT: type=1 hw_code=100 modifiers=1 event_string=(Left) is_text=True time=1158857941.392078 app.name='soffice.bin' name=None role='paragraph' state='EDITABLE ENABLED FOCUSABLE FOCUSED MULTI_LINE MULTISELECTABLE SHOWING VISIBLE' BRAILLE LINE: '(3) This text will be in a different font â Helvetica Narrow. $l' VISIBLE: 'ferent font â Helvetica Narrow', cursor=15 /usr/lib/python2.5/site-packages/orca/brlmon.py:145: GtkWarning: Failed to set label from markup due to error parsing markup: Error on line 1 char 9: Invalid UTF-8 encoded text % char) SPEECH OUTPUT: '' Traceback (most recent call last):
+ Trace 72925
return speaker.say(text)
return self.gnome_speaker.say(text) COMM_FAILURE
s.processObjectEvent(event)
self.listeners[key](event)
self._presentTextAtNewCaretPosition(event)
self.sayCharacter(event.source)
speech.speak(character, voice)
__speechserver.speak(text, __resolveACSS(acss), interrupt)
self.__speak(text, acss, interrupt)
self.reset()
self.shutdown()
speaker.stop()
self.gnome_speaker.stop() COMM_FAILURE
start(registry) # waits until we stop the registry
registry.start()
bonobo.main()
self._processObjectEvent(event)
debug.printStack(debug.LEVEL_ALL)
traceback.print_stack(None, 100, debugFile) ----
Another data point. This fails for both DECtalk and FreeTTS.
I'm going to guess this has to do with unicode characters and UTF-8 encoding. Do you see similar problems when dealing with bullets?
I hadn't noticed one, but I just gave it a try. Yes, I do indeed see a similar problem. Selection isn't even necessary. I created a test document in OOo with the following structure: <top of document> * item 1 * item 2 * item 3 <end of document> where the asterisks above represent OOo's default bullet (first choice in the Bullets and Numbering dialog) If I arrow from the end of any given item towards the left, Orca does the following: 1. Speaks two characters to the left of the current character e.g. "m" when the current character is "3", "e" when the current character is the space, "t" when the current character is the "m" and so on. 2. Restarts itself when the current character is the "t" in "item". From debug.out: Traceback (most recent call last):
+ Trace 73447
Restarting speech... Something looks wrong with speech. Aborting.
Created attachment 73560 [details] Standalone app to show OOo has implemented the text specialization incorrectly Well...I was hoping for a smoking unicode gun in Orca, but I believe it turns out that this is a smoking unicode gun in OOo. The attached standalone app seems to show that OOo is making assumptions that bullets are single-byte UTF-8 strings. Either that, or OOo is assuming the offsets in the text specialization are byte offsets and not character offsets. In any case, this appears to be a bug in OOo.
Opened issue in OOo: http://www.openoffice.org/issues/show_bug.cgi?id=69945
Created attachment 73562 [details] [review] Ugly patch to workaround the problem Here's an ugly patch to workaround part of the problem. It seems as though getText(0,-1) on an OOo text object will give us back the entire string as a UTF-8 string. In addition, the caret offset from an OOo text object is generally correct in that it is a character offset and not a byte offset. So, we can: get all the text, turn it into a unicode type to get the unicode character at the given offset, and then turn the character back into a UTF-8 string for speaking. The problem with this patch is that it gets the entire document. This can be very inefficient for very large documents. As such, I'm not sure we really should commit this. In addition, I'm also not sure it really prevents the crash (can someone please try this?). Finally, the OOo caret offset value is still a bit 'jumpy' (we never get an offset of 1) when navigating around bullets. This is an OOo bug that I'm not sure we can work around.
Just tried the patch. It does NOT prevent the crash with the original file (font-test.odt). It DOES prevent the crash with the bulleted items. HOWEVER, Orca is not always reading the correct characters, though it is much improved. When left arrowing from the end of "item 3" I get: (current character, spoken character) 3, 3 space, space m, m e, e t, t i, <nothing> bullet, <nothing> carriage return at the end of "2", 3 2, 2 space, space m, m e, e t, t i, <nothing> bullet, <nothing> carriage return at the end of "1", 2 Rinse and repeat. :-) Hope this helps!
Argh. Bummer about not solving the original problem. I'll look into that some more. With respect to the caret position, I think that's an OOo problem where OOo is not presenting correct caret offsets when the caret moves.
Created attachment 73573 [details] [review] Continuation of ugly patch Here's a more thorough ugly patch. The basic idea is that any call to getText to get a substring is not guaranteed to deliver the actual substring when we work with OOo. So, we get the entire string, convert it to the unicode type, get the substring from that, and then convert back to UTF-8. It's very ugly and I'd much rather see the OOo problem be fixed at the source before we have to hack this bad.
This patch does solve the original problem: Selected text is correctly spoken; Orca doesn't restart itself. Results with the sample bulleted list are the same as reported in comment #12.
Would it make sense to make these change just in the StarOffice.py script?
(In reply to comment #16) > Would it make sense to make these change just in the StarOffice.py script? > I was actually thinking about this on my bike ride today: "wow...how would I go about dealing with atspi implementations whose accessible text implementations partially work?" We've done some of this work already by restricting what we do. For example, we only do line by line versus sentence by sentence in SayAll because the sentence offset handling in atspi implementations can be unpredictable and unreliable. In addition, we also have a lot of defensive code in Orca to detect and handle nonsensical values we get back from calls to getTextAtOffset. For this particular case, I'm wondering if the graceful way to handle things would be to have a getText(self, obj, startOffset, endOffset) method as part of the default Script. Instead of calling obj.text.getText directly, all scripts would call their self.getText method. The default.py implementation would merely call obj.text.getText(startOffset, endOffset). The StarOffice.py implementation, however, would do the ugly hack. What do you think?
That's sounds like it should work nicely.
Created attachment 73593 [details] [review] Isolation of ugliness to StarOffice via addition of getText method to default.py
Created attachment 73653 [details] [review] More complete use of self.getText in all scripts
Add accessibility keyword. Apologies for spam.
Closing this one out. See also http://www.openoffice.org/issues/show_bug.cgi?id=69945 to follow the fix being done in OOo for this problem. Will open a separate bug for the indexing problem around bullets.