GNOME Bugzilla – Bug 355525
Orca skips HTML document content in flatreview mode
Last modified: 2012-04-08 17:07:33 UTC
Please describe the problem: When browsing web content with orca flatreview mode, some lines will be skipped by orca. Steps to reproduce: 1. Invoke orca. 2. Invoke firefox. 3. Open the file attached below. 4. Press 9 to read the web content line by line. Actual results: The line after the first seperator, "32 bugs found." was skipped, orca will not report this line. Expected results: Orca should report all the text lines in flatreview mode. Does this happen every time? Yes. Other information: This bug can be found with Mozilla/5.0 (X11; U; SunOS i86pc; en-US; rv:1.9a1) Gecko/20060911 Minefield/3.0a1 and orca1.0.0 on solaris nevada build 46.
Created attachment 72579 [details] test page
Created attachment 72581 [details] Test page
We've not started looking at Firefox 3 yet, but we'll definitely start looking at Firefox 3 builds very soon. Thanks for reporting this!
Add accessibility keyword. Apologies for spam.
Hi Tim: When running at-poke, I cannot seem to find the "32 bugs found" line in the hierarchy. Can you see if you can find it in there? If not, this seems like a Minefield bug. Will
Actually, I'm now convinced this is a Minefield bug. Looks like it doesn't like HTML where paragraphs aren't marked by paragraph tags. For example, take the following snippet and try to find Waldo in at-poke: <p>Here is a line.</p> Where is Waldo? <p>Here is another.</p>
This is an AT-SPI implementation bug in Minefield: https://bugzilla.mozilla.org/show_bug.cgi?id=357204
Unblocking this - the Firefox guys showed me where Waldo hides - he's hiding as text on the document frame.
Created attachment 75135 [details] [review] Patch to honor accessible text on non-leaf nodes and to also handle unicode embedded object characters Fixes the "32 bugs found" problem, but I still need to work on the other test case - flat review is missing some of its text (e.g., "Text Formats", "This sentence is bold", etc.).
Created attachment 75217 [details] [review] Patch to refactor flat review to give it knowledge of script and to also allow scripts to more easily extend/override the implementation
I dug into this quite a bit deeper today. I think Gecko is giving us bad information for text.getTextAtOffset(offset, atspi.Accessibility.TEXT_BOUNDARY_LINE_START).
I think I fixed the original bug, but flat review in Firefox is still broken with other areas of the test case as described in comment 9. In addition, I notice that flat review is also picking up text from the content in other tabs. I think there may be several problems lurking in the Firefox AT-SPI implementation with respect to both VISIBLE/SHOWING and text extents (as of the latest Firefox 3.0 Gecko/20070212 Minefield/3.0a3pre from this morning). So...I'm going to retitle this bug for now and will dig into the other problems more. If there are Firefox bugs, I'll file new ones with them and will open separate tracking bugs here.
The problem regarding reading hidden content in tabs is a separate problem: see bug 408071. I'm retitling this bug as such.
Created attachment 83911 [details] Simple test case In this test, Orca skips over "Text Formats" and "This sentence is bold." When looking at the content in at-poke, one sees that they are nestled in with the text on the document frame. This might be a problem with how embedded object characters are handled.
Created attachment 83913 [details] Stripped debug output showing flat review in progress Here's some debug logic. The document frame text looks like: The problem seems to be that the call to getTextAtOffset to get the line of text at th current offset is returning garbage: flat_review:getZonesFromText detected garbage from getTextAtOffset for accessible name='HTML test page' role'='document frame': offset used=2, start/end offset returned=(0,1), string='$-3õ¼' I was going to cut/paste the full text from the document frame here, but it looks like Firefox has decided to not expose that right now. I'll need to restart Firefox and rerun at-poke to see if I can get the text. There might be some defensiveness we could do -- I think I'm seeing some \n characters in there.
> The document frame text looks like: Had to restart FF to get this. Here's what it looks like, where \n is the newline character and \e is the embedded object character: \e\n\n\e\eText Formats\n\nThis sentence is bold.\e So...from this debug output: flat_review:getZonesFromText detected garbage from getTextAtOffset for accessible name='HTML test page' role'='document frame': offset used=2, start/end offset returned=(0,1), string='$-3õ¼' I think we can see that the second \n (offset=2) is causing us issues. IMO, Firefox is just not doing the right thing here. Instead, it probably should be returning (startoffset=2, endoffset=3). Oh well. Nobody ever seems to get getTextAtOffset right. I'm not even sure I understand the spec myself. :-) One thing we might try doing inside the "the AT-SPI implementation blew it" block in flat_review.py:getZonesFromText is detecting if the current character is a \n. If it is, just increment offset by one and check again. This might work. It's in a sensitive enough part of the code, however, that such a change is not worth the risk for GNOME 2.18.
I added the following code to flat_review.py:getZonesFromText, right after the "length = text.characterCount" line: if accessible.role == rolenames.ROLE_DOCUMENT_FRAME: for i in range(0, length): character = self.script.getText(accessible, i, i + 1) if character == self.script.EMBEDDED_OBJECT_CHARACTER: character = "EMBEDDED_OBJECT_CHARACTER" elif character == "\n": character = "\\n" print "%d. '%s'" % (i, character) [string, startOffset, endOffset] = text.getTextAtOffset( i, atspi.Accessibility.TEXT_BOUNDARY_LINE_START) print " line(%d, %d) = '%s'" \ % (startOffset, endOffset, string) For each character in the text for the document frame, the output tells us what the index of the character is, what the character itself is, and what Gecko thinks the line is for that character, including the start and end offset for the line. The main idea was to see what Gecko thought a line was for the troublesome page at http://bugzilla.gnome.org/attachment.cgi?id=83911. Things start falling apart somewhere around character 19, and I think it is fair to declare this a Gecko implementation problem on the Mozilla side. I'm going to give this little rest before I file a Mozilla bug. We *might* be able to work around this, but I'd rather they fix it. 0. 'EMBEDDED_OBJECT_CHARACTER' line(0, 2) = ' ' 1. '\n' line(0, 2) = ' ' 2. '\n' line(2, 3) = ' ' 3. 'EMBEDDED_OBJECT_CHARACTER' line(3, 4) = '' 4. 'EMBEDDED_OBJECT_CHARACTER' line(5, 18) = 'Text Formats ' 5. 'T' line(5, 18) = 'Text Formats ' 6. 'e' line(5, 18) = 'Text Formats ' 7. 'x' line(5, 18) = 'Text Formats ' 8. 't' line(5, 18) = 'Text Formats ' 9. ' ' line(5, 18) = 'Text Formats ' 10. 'F' line(5, 18) = 'Text Formats ' 11. 'o' line(5, 18) = 'Text Formats ' 12. 'r' line(5, 18) = 'Text Formats ' 13. 'm' line(5, 18) = 'Text Formats ' 14. 'a' line(5, 18) = 'Text Formats '15. 't' line(5, 18) = 'Text Formats ' 16. 's' line(5, 18) = 'Text Formats ' 17. '\n' line(5, 18) = 'Text Formats ' 18. '\n' line(18, 19) = ' ' 19. 'T' line(18, 20) = ' T' 20. 'h' line(18, 19) = ' ' 21. 'i' line(18, 19) = ' ' 22. 's' line(18, 19) = ' ' 23. ' ' line(18, 19) = ' ' 24. 's' line(18, 19) = ' ' 25. 'e' line(18, 19) = ' ' 26. 'n' line(18, 19) = ' ' 27. 't' line(18, 19) = ' ' 28. 'e' line(18, 19) = ' ' 29. 'n' line(18, 19) = ' ' 30. 'c' line(18, 19) = ' ' 31. 'e' line(18, 19) = ' ' 32. ' ' line(18, 19) = ' ' 33. 'i' line(18, 19) = ' ' 34. 's' line(18, 19) = ' ' 35. ' ' line(18, 19) = ' ' 36. 'b' line(18, 19) = ' ' 37. 'o' line(18, 19) = ' ' 38. 'l' line(18, 19) = ' ' 39. 'd' line(18, 19) = ' ' 40. '.' line(18, 19) = ' ' 41. 'EMBEDDED_OBJECT_CHARACTER' line(41, 42) = ''
Marking this as blocked due to Mozilla bug: https://bugzilla.mozilla.org/show_bug.cgi?id=384101
Reducing to normal/minor since flat review in HTML content just isn't used all that much.
Bulk reassigning Will's bugs to the default assignee. (Sorry for the spam!)
I just tried this using Firefox 11. Orca presented 32 bugs found to me when I used flat review.