After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 355525 - Orca skips HTML document content in flatreview mode
Orca skips HTML document content in flatreview mode
Status: RESOLVED NOTGNOME
Product: orca
Classification: Applications
Component: speech
1.0.x
Other All
: Normal minor
: ---
Assigned To: Orca Maintainers
Orca Maintainers
https://bugzilla.mozilla.org/show_bug...
Depends on:
Blocks: 404403
 
 
Reported: 2006-09-12 02:23 UTC by Tim Miao
Modified: 2012-04-08 17:07 UTC
See Also:
GNOME target: ---
GNOME version: 2.13/2.14


Attachments
test page (43.63 KB, text/html)
2006-09-12 02:23 UTC, Tim Miao
  Details
Test page (72.36 KB, text/html)
2006-09-12 03:15 UTC, Tim Miao
  Details
Patch to honor accessible text on non-leaf nodes and to also handle unicode embedded object characters (18.15 KB, patch)
2006-10-21 16:06 UTC, Willie Walker
committed Details | Review
Patch to refactor flat review to give it knowledge of script and to also allow scripts to more easily extend/override the implementation (62.23 KB, patch)
2006-10-22 21:25 UTC, Willie Walker
committed Details | Review
Simple test case (368 bytes, text/html)
2007-03-04 18:21 UTC, Willie Walker
  Details
Stripped debug output showing flat review in progress (3.25 KB, text/plain)
2007-03-04 18:34 UTC, Willie Walker
  Details

Description Tim Miao 2006-09-12 02:23:04 UTC
Please describe the problem:
When browsing web content with orca flatreview mode, some lines will be skipped by orca.

Steps to reproduce:
1. Invoke orca.
2. Invoke firefox.
3. Open the file attached below.
4. Press 9 to read the web content line by line.


Actual results:
The line after the first seperator, "32 bugs found." was skipped, orca will not report this line.

Expected results:
Orca should report all the text lines in flatreview mode.

Does this happen every time?
Yes.

Other information:
This bug can be found with Mozilla/5.0 (X11; U; SunOS i86pc; en-US; rv:1.9a1) Gecko/20060911 Minefield/3.0a1 and orca1.0.0 on solaris nevada build 46.
Comment 1 Tim Miao 2006-09-12 02:23:49 UTC
Created attachment 72579 [details]
test page
Comment 2 Tim Miao 2006-09-12 03:15:59 UTC
Created attachment 72581 [details]
Test page
Comment 3 Willie Walker 2006-09-14 13:28:54 UTC
We've not started looking at Firefox 3 yet, but we'll definitely start looking at Firefox 3 builds very soon.  Thanks for reporting this!
Comment 4 Willie Walker 2006-10-15 00:25:25 UTC
Add accessibility keyword.  Apologies for spam.
Comment 5 Willie Walker 2006-10-18 22:38:27 UTC
Hi Tim:

When running at-poke, I cannot seem to find the "32 bugs found" line in the hierarchy.  Can you see if you can find it in there?  If not, this seems like a Minefield bug.

Will
Comment 6 Willie Walker 2006-10-18 22:44:19 UTC
Actually, I'm now convinced this is a Minefield bug.  Looks like it doesn't like HTML where paragraphs aren't marked by paragraph tags.  For example, take the following snippet and try to find Waldo in at-poke:

<p>Here is a line.</p>

Where is Waldo?

<p>Here is another.</p>
Comment 7 Willie Walker 2006-10-18 22:48:34 UTC
This is an AT-SPI implementation bug in Minefield: https://bugzilla.mozilla.org/show_bug.cgi?id=357204
Comment 8 Willie Walker 2006-10-21 15:52:39 UTC
Unblocking this - the Firefox guys showed me where Waldo hides - he's hiding as text on the document frame.
Comment 9 Willie Walker 2006-10-21 16:06:16 UTC
Created attachment 75135 [details] [review]
Patch to honor accessible text on non-leaf nodes and to also handle unicode embedded object characters

Fixes the "32 bugs found" problem, but I still need to work on the other test case - flat review is missing some of its text (e.g., "Text Formats", "This sentence is bold", etc.).
Comment 10 Willie Walker 2006-10-22 21:25:58 UTC
Created attachment 75217 [details] [review]
Patch to refactor flat review to give it knowledge of script and to also allow scripts to more easily extend/override the implementation
Comment 11 Willie Walker 2007-01-04 22:58:50 UTC
I dug into this quite a bit deeper today.  I think Gecko is giving us bad information for text.getTextAtOffset(offset, atspi.Accessibility.TEXT_BOUNDARY_LINE_START).
Comment 12 Willie Walker 2007-02-13 22:56:28 UTC
I think I fixed the original bug, but flat review in Firefox is still broken with other areas of the test case as described in comment 9.  In addition, I notice that flat review is also picking up text from the content in other tabs.  I think there may be several problems lurking in the Firefox AT-SPI implementation with respect to both VISIBLE/SHOWING and text extents (as of the latest Firefox 3.0 Gecko/20070212 Minefield/3.0a3pre from this morning). 

So...I'm going to retitle this bug for now and will dig into the other problems more.  If there are Firefox bugs, I'll file new ones with them and will open separate tracking bugs here.
Comment 13 Willie Walker 2007-03-04 18:19:37 UTC
The problem regarding reading hidden content in tabs is a separate problem: see bug 408071.  I'm retitling this bug as such.
Comment 14 Willie Walker 2007-03-04 18:21:37 UTC
Created attachment 83911 [details]
Simple test case

In this test, Orca skips over "Text Formats" and "This sentence is bold."  When looking at the content in at-poke, one sees that they are nestled in with the text on the document frame.  This might be a problem with how embedded object characters are handled.
Comment 15 Willie Walker 2007-03-04 18:34:03 UTC
Created attachment 83913 [details]
Stripped debug output showing flat review in progress

Here's some debug logic.  The document frame text looks like:

The problem seems to be that the call to getTextAtOffset to get the line of text at th current offset is returning garbage:

flat_review:getZonesFromText detected garbage from getTextAtOffset for accessible name='HTML test page' role'='document frame': offset used=2, start/end offset returned=(0,1), string='$-3õ¼'

I was going to cut/paste the full text from the document frame here, but it looks like Firefox has decided to not expose that right now.  I'll need to restart Firefox and rerun at-poke to see if I can get the text.  There might be some defensiveness we could do -- I think I'm seeing some \n characters in there.
Comment 16 Willie Walker 2007-03-04 18:46:20 UTC
>  The document frame text looks like:

Had to restart FF to get this.  Here's what it looks like, where \n is the newline character and \e is the embedded object character:

\e\n\n\e\eText Formats\n\nThis sentence is bold.\e

So...from this debug output:

flat_review:getZonesFromText detected garbage from getTextAtOffset for
accessible name='HTML test page' role'='document frame': offset used=2,
start/end offset returned=(0,1), string='$-3õ¼'

I think we can see that the second \n (offset=2) is causing us issues.  IMO, Firefox is just not doing the right thing here.  Instead, it probably should be returning (startoffset=2, endoffset=3).  Oh well. Nobody ever seems to get getTextAtOffset right.  I'm not even sure I understand the spec myself.  :-)

One thing we might try doing inside the "the AT-SPI implementation blew it" block in flat_review.py:getZonesFromText is detecting if the current character is a \n.  If it is, just increment offset by one and check again.  This might work.  It's in a sensitive enough part of the code, however, that such a change is not worth the risk for GNOME 2.18.

Comment 17 Willie Walker 2007-06-08 19:22:40 UTC
I added the following code to flat_review.py:getZonesFromText, right after the  "length = text.characterCount" line:

        if accessible.role == rolenames.ROLE_DOCUMENT_FRAME:
            for i in range(0, length):
                character = self.script.getText(accessible, i, i + 1)
                if character == self.script.EMBEDDED_OBJECT_CHARACTER:
                    character = "EMBEDDED_OBJECT_CHARACTER"
                elif character == "\n":
                    character = "\\n"
                print "%d. '%s'" % (i, character)
                [string, startOffset, endOffset] = text.getTextAtOffset(
                    i,
                    atspi.Accessibility.TEXT_BOUNDARY_LINE_START)
                print "  line(%d, %d) = '%s'" \
                      % (startOffset, endOffset, string)

For each character in the text for the document frame, the output tells us what the index of the character is, what the character itself is, and what Gecko thinks the line is for that character, including the start and end offset for the line.  

The main idea was to see what Gecko thought a line was for the troublesome page at http://bugzilla.gnome.org/attachment.cgi?id=83911.  Things start falling apart somewhere around character 19, and I think it is fair to declare this a Gecko implementation problem on the Mozilla side.  I'm going to give this little rest before I file a Mozilla bug.  We *might* be able to work around this, but I'd rather they fix it.

0. 'EMBEDDED_OBJECT_CHARACTER'
  line(0, 2) = '
'
1. '\n'
  line(0, 2) = '
'
2. '\n'
  line(2, 3) = '
'
3. 'EMBEDDED_OBJECT_CHARACTER'
  line(3, 4) = ''
4. 'EMBEDDED_OBJECT_CHARACTER'
  line(5, 18) = 'Text Formats
'
5. 'T'
  line(5, 18) = 'Text Formats
'
6. 'e'
  line(5, 18) = 'Text Formats
'
7. 'x'
  line(5, 18) = 'Text Formats
'
8. 't'
  line(5, 18) = 'Text Formats
'
9. ' '
  line(5, 18) = 'Text Formats
'
10. 'F'
  line(5, 18) = 'Text Formats
'
11. 'o'
  line(5, 18) = 'Text Formats
'
12. 'r'
  line(5, 18) = 'Text Formats
'
13. 'm'
  line(5, 18) = 'Text Formats
'
14. 'a'
  line(5, 18) = 'Text Formats
'15. 't'
  line(5, 18) = 'Text Formats
'
16. 's'
  line(5, 18) = 'Text Formats
'
17. '\n'
  line(5, 18) = 'Text Formats
'
18. '\n'
  line(18, 19) = '
'
19. 'T'
  line(18, 20) = '
T'
20. 'h'
  line(18, 19) = '
'
21. 'i'
  line(18, 19) = '
'
22. 's'
  line(18, 19) = '
'
23. ' '
  line(18, 19) = '
'
24. 's'
  line(18, 19) = '
'
25. 'e'
  line(18, 19) = '
'
26. 'n'
  line(18, 19) = '
'
27. 't'
  line(18, 19) = '
'
28. 'e'
  line(18, 19) = '
'
29. 'n'
  line(18, 19) = '
'
30. 'c'
  line(18, 19) = '
'
31. 'e'
  line(18, 19) = '
'
32. ' '
  line(18, 19) = '
'
33. 'i'
  line(18, 19) = '
'
34. 's'
  line(18, 19) = '
'
35. ' '
  line(18, 19) = '
'
36. 'b'
  line(18, 19) = '
'
37. 'o'
  line(18, 19) = '
'
38. 'l'
  line(18, 19) = '
'
39. 'd'
  line(18, 19) = '
'
40. '.'
  line(18, 19) = '
'
41. 'EMBEDDED_OBJECT_CHARACTER'
  line(41, 42) = ''

Comment 18 Willie Walker 2007-06-12 01:46:24 UTC
Marking this as blocked due to Mozilla bug: https://bugzilla.mozilla.org/show_bug.cgi?id=384101
Comment 19 Willie Walker 2008-01-05 23:37:52 UTC
Reducing to normal/minor since flat review in HTML content just isn't used all that much.
Comment 20 Joanmarie Diggs (IRC: joanie) 2010-04-03 20:23:32 UTC
Bulk reassigning Will's bugs to the default assignee. (Sorry for the spam!)
Comment 21 Joanmarie Diggs (IRC: joanie) 2012-04-08 17:07:33 UTC
I just tried this using Firefox 11. Orca presented 32 bugs found to me when I used flat review.