Bug 355525 – Orca skips HTML document content in flatreview mode

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 355525 - Orca skips HTML document content in flatreview mode


Summary:	Orca skips HTML document content in flatreview mode


Status:	RESOLVED NOTGNOME

Product:	orca
Classification:	Applications
Component:	speech
Version:	1.0.x
Hardware:	Other All

Importance:	Normal minor
Target Milestone:	---
Assigned To:	Orca Maintainers
QA Contact:	Orca Maintainers

URL:
Whiteboard:	https://bugzilla.mozilla.org/show_bug...

Depends on:
Blocks:	404403

Reported:	2006-09-12 02:23 UTC by Tim Miao
Modified:	2012-04-08 17:07 UTC

See Also:
GNOME target:	---
GNOME version:	2.13/2.14

Attachments
test page (43.63 KB, text/html) 2006-09-12 02:23 UTC, Tim Miao		Details
Test page (72.36 KB, text/html) 2006-09-12 03:15 UTC, Tim Miao		Details
Patch to honor accessible text on non-leaf nodes and to also handle unicode embedded object characters (18.15 KB, patch) 2006-10-21 16:06 UTC, Willie Walker	committed	Details \| Review
Patch to refactor flat review to give it knowledge of script and to also allow scripts to more easily extend/override the implementation (62.23 KB, patch) 2006-10-22 21:25 UTC, Willie Walker	committed	Details \| Review
Simple test case (368 bytes, text/html) 2007-03-04 18:21 UTC, Willie Walker		Details
Stripped debug output showing flat review in progress (3.25 KB, text/plain) 2007-03-04 18:34 UTC, Willie Walker		Details

Description Tim Miao 2006-09-12 02:23:04 UTC

Please describe the problem:
When browsing web content with orca flatreview mode, some lines will be skipped by orca.

Steps to reproduce:
1. Invoke orca.
2. Invoke firefox.
3. Open the file attached below.
4. Press 9 to read the web content line by line.


Actual results:
The line after the first seperator, "32 bugs found." was skipped, orca will not report this line.

Expected results:
Orca should report all the text lines in flatreview mode.

Does this happen every time?
Yes.

Other information:
This bug can be found with Mozilla/5.0 (X11; U; SunOS i86pc; en-US; rv:1.9a1) Gecko/20060911 Minefield/3.0a1 and orca1.0.0 on solaris nevada build 46.

Comment 1 Tim Miao 2006-09-12 02:23:49 UTC

Created attachment 72579 [details]
test page

Comment 2 Tim Miao 2006-09-12 03:15:59 UTC

Created attachment 72581 [details]
Test page

Comment 3 Willie Walker 2006-09-14 13:28:54 UTC

We've not started looking at Firefox 3 yet, but we'll definitely start looking at Firefox 3 builds very soon.  Thanks for reporting this!

Comment 4 Willie Walker 2006-10-15 00:25:25 UTC

Add accessibility keyword.  Apologies for spam.

Comment 5 Willie Walker 2006-10-18 22:38:27 UTC

Hi Tim:

When running at-poke, I cannot seem to find the "32 bugs found" line in the hierarchy.  Can you see if you can find it in there?  If not, this seems like a Minefield bug.

Will

Comment 6 Willie Walker 2006-10-18 22:44:19 UTC

Actually, I'm now convinced this is a Minefield bug.  Looks like it doesn't like HTML where paragraphs aren't marked by paragraph tags.  For example, take the following snippet and try to find Waldo in at-poke:

<p>Here is a line.</p>

Where is Waldo?

<p>Here is another.</p>

Comment 7 Willie Walker 2006-10-18 22:48:34 UTC

This is an AT-SPI implementation bug in Minefield: https://bugzilla.mozilla.org/show_bug.cgi?id=357204

Comment 8 Willie Walker 2006-10-21 15:52:39 UTC

Unblocking this - the Firefox guys showed me where Waldo hides - he's hiding as text on the document frame.

Comment 9 Willie Walker 2006-10-21 16:06:16 UTC

Created attachment 75135 [details] [review]
Patch to honor accessible text on non-leaf nodes and to also handle unicode embedded object characters

Fixes the "32 bugs found" problem, but I still need to work on the other test case - flat review is missing some of its text (e.g., "Text Formats", "This sentence is bold", etc.).

Comment 10 Willie Walker 2006-10-22 21:25:58 UTC

Created attachment 75217 [details] [review]
Patch to refactor flat review to give it knowledge of script and to also allow scripts to more easily extend/override the implementation

Comment 11 Willie Walker 2007-01-04 22:58:50 UTC

I dug into this quite a bit deeper today.  I think Gecko is giving us bad information for text.getTextAtOffset(offset, atspi.Accessibility.TEXT_BOUNDARY_LINE_START).

Comment 12 Willie Walker 2007-02-13 22:56:28 UTC

I think I fixed the original bug, but flat review in Firefox is still broken with other areas of the test case as described in comment 9.  In addition, I notice that flat review is also picking up text from the content in other tabs.  I think there may be several problems lurking in the Firefox AT-SPI implementation with respect to both VISIBLE/SHOWING and text extents (as of the latest Firefox 3.0 Gecko/20070212 Minefield/3.0a3pre from this morning). 

So...I'm going to retitle this bug for now and will dig into the other problems more.  If there are Firefox bugs, I'll file new ones with them and will open separate tracking bugs here.

Comment 13 Willie Walker 2007-03-04 18:19:37 UTC

The problem regarding reading hidden content in tabs is a separate problem: see bug 408071.  I'm retitling this bug as such.

Comment 14 Willie Walker 2007-03-04 18:21:37 UTC

Created attachment 83911 [details]
Simple test case

In this test, Orca skips over "Text Formats" and "This sentence is bold."  When looking at the content in at-poke, one sees that they are nestled in with the text on the document frame.  This might be a problem with how embedded object characters are handled.

Comment 15 Willie Walker 2007-03-04 18:34:03 UTC

Created attachment 83913 [details]
Stripped debug output showing flat review in progress

Here's some debug logic.  The document frame text looks like:

The problem seems to be that the call to getTextAtOffset to get the line of text at th current offset is returning garbage:

flat_review:getZonesFromText detected garbage from getTextAtOffset for accessible name='HTML test page' role'='document frame': offset used=2, start/end offset returned=(0,1), string='$-3õ¼'

I was going to cut/paste the full text from the document frame here, but it looks like Firefox has decided to not expose that right now.  I'll need to restart Firefox and rerun at-poke to see if I can get the text.  There might be some defensiveness we could do -- I think I'm seeing some \n characters in there.

Comment 16 Willie Walker 2007-03-04 18:46:20 UTC

>  The document frame text looks like:

Had to restart FF to get this.  Here's what it looks like, where \n is the newline character and \e is the embedded object character:

\e\n\n\e\eText Formats\n\nThis sentence is bold.\e

So...from this debug output:

flat_review:getZonesFromText detected garbage from getTextAtOffset for
accessible name='HTML test page' role'='document frame': offset used=2,
start/end offset returned=(0,1), string='$-3õ¼'

I think we can see that the second \n (offset=2) is causing us issues.  IMO, Firefox is just not doing the right thing here.  Instead, it probably should be returning (startoffset=2, endoffset=3).  Oh well. Nobody ever seems to get getTextAtOffset right.  I'm not even sure I understand the spec myself.  :-)

One thing we might try doing inside the "the AT-SPI implementation blew it" block in flat_review.py:getZonesFromText is detecting if the current character is a \n.  If it is, just increment offset by one and check again.  This might work.  It's in a sensitive enough part of the code, however, that such a change is not worth the risk for GNOME 2.18.

Comment 17 Willie Walker 2007-06-08 19:22:40 UTC

I added the following code to flat_review.py:getZonesFromText, right after the  "length = text.characterCount" line:

        if accessible.role == rolenames.ROLE_DOCUMENT_FRAME:
            for i in range(0, length):
                character = self.script.getText(accessible, i, i + 1)
                if character == self.script.EMBEDDED_OBJECT_CHARACTER:
                    character = "EMBEDDED_OBJECT_CHARACTER"
                elif character == "\n":
                    character = "\\n"
                print "%d. '%s'" % (i, character)
                [string, startOffset, endOffset] = text.getTextAtOffset(
                    i,
                    atspi.Accessibility.TEXT_BOUNDARY_LINE_START)
                print "  line(%d, %d) = '%s'" \
                      % (startOffset, endOffset, string)

For each character in the text for the document frame, the output tells us what the index of the character is, what the character itself is, and what Gecko thinks the line is for that character, including the start and end offset for the line.  

The main idea was to see what Gecko thought a line was for the troublesome page at http://bugzilla.gnome.org/attachment.cgi?id=83911.  Things start falling apart somewhere around character 19, and I think it is fair to declare this a Gecko implementation problem on the Mozilla side.  I'm going to give this little rest before I file a Mozilla bug.  We *might* be able to work around this, but I'd rather they fix it.

0. 'EMBEDDED_OBJECT_CHARACTER'
  line(0, 2) = '
'
1. '\n'
  line(0, 2) = '
'
2. '\n'
  line(2, 3) = '
'
3. 'EMBEDDED_OBJECT_CHARACTER'
  line(3, 4) = ''
4. 'EMBEDDED_OBJECT_CHARACTER'
  line(5, 18) = 'Text Formats
'
5. 'T'
  line(5, 18) = 'Text Formats
'
6. 'e'
  line(5, 18) = 'Text Formats
'
7. 'x'
  line(5, 18) = 'Text Formats
'
8. 't'
  line(5, 18) = 'Text Formats
'
9. ' '
  line(5, 18) = 'Text Formats
'
10. 'F'
  line(5, 18) = 'Text Formats
'
11. 'o'
  line(5, 18) = 'Text Formats
'
12. 'r'
  line(5, 18) = 'Text Formats
'
13. 'm'
  line(5, 18) = 'Text Formats
'
14. 'a'
  line(5, 18) = 'Text Formats
'15. 't'
  line(5, 18) = 'Text Formats
'
16. 's'
  line(5, 18) = 'Text Formats
'
17. '\n'
  line(5, 18) = 'Text Formats
'
18. '\n'
  line(18, 19) = '
'
19. 'T'
  line(18, 20) = '
T'
20. 'h'
  line(18, 19) = '
'
21. 'i'
  line(18, 19) = '
'
22. 's'
  line(18, 19) = '
'
23. ' '
  line(18, 19) = '
'
24. 's'
  line(18, 19) = '
'
25. 'e'
  line(18, 19) = '
'
26. 'n'
  line(18, 19) = '
'
27. 't'
  line(18, 19) = '
'
28. 'e'
  line(18, 19) = '
'
29. 'n'
  line(18, 19) = '
'
30. 'c'
  line(18, 19) = '
'
31. 'e'
  line(18, 19) = '
'
32. ' '
  line(18, 19) = '
'
33. 'i'
  line(18, 19) = '
'
34. 's'
  line(18, 19) = '
'
35. ' '
  line(18, 19) = '
'
36. 'b'
  line(18, 19) = '
'
37. 'o'
  line(18, 19) = '
'
38. 'l'
  line(18, 19) = '
'
39. 'd'
  line(18, 19) = '
'
40. '.'
  line(18, 19) = '
'
41. 'EMBEDDED_OBJECT_CHARACTER'
  line(41, 42) = ''

Comment 18 Willie Walker 2007-06-12 01:46:24 UTC

Marking this as blocked due to Mozilla bug: https://bugzilla.mozilla.org/show_bug.cgi?id=384101

Comment 19 Willie Walker 2008-01-05 23:37:52 UTC

Reducing to normal/minor since flat review in HTML content just isn't used all that much.

Comment 20 Joanmarie Diggs (IRC: joanie) 2010-04-03 20:23:32 UTC

Bulk reassigning Will's bugs to the default assignee. (Sorry for the spam!)

Comment 21 Joanmarie Diggs (IRC: joanie) 2012-04-08 17:07:33 UTC

I just tried this using Firefox 11. Orca presented 32 bugs found to me when I used flat review.