GNOME Bugzilla – Bug 407941
[verified] Infer labels for objects in HTML content
Last modified: 2008-07-22 19:27:02 UTC
There are web accessibility guidelines for setting the relationship between labels and the things they are labelling in things such as HTML forms. We're discovering that web content developers do not follow these guidelines. For example, see the username/password areas on http://www.charter.net and https://www.google.com/accounts/Login. As a result of this, it's highly desirable for Orca to attempt to infer the label for things that seem like they should have a label, but the labelled_by property is not set.
I messed around with this today, and I think it is going to require some refactoring of how we get the displayed label for an object. Currently, scripts call util.getDisplayedLabel directly, and this depends upon application developers setting up accessible relations correctly. For Gecko, I think we want to do something a little different. I tried looking at things such as the accessible hierarchy of the pages in question to determine if the ancestral relationship yielded any common design pattern. They are unfortunately different. As a result, I think we may need to look at geographic proximity of objects if util.getDisplayedLabel fails to find anything. To do so, we could determine the set of objects on the same line as the object in question using getLineContentsAtOffset. We could then look at the objects immediately prior to or following the object in question and determine if they are possible labels. To accomplish this, I think we need to do the refactoring described in bug 395548. In particular, create default.py:getDisplayedLabel, which delegates to util.py. Change all scripts and {speech,braille}generators to call script.getDisplayedLabel instead of util.getDisplayedLabel. I started mucking with this, and the impact is rather large and risky for GNOME 2.18 (it touches 7 modules in not-so-subtle ways). In addition, I think I may have uncovered YAFFB where getting the same child via the same index from the same object in Firefox doesn't yield the same object for us to do equality comparisons on. The impact of that problem is that it hinders our ability to solve this bug.
The search area on the left hand side of http://sfbay.craigslist.org/ is another example of an unlabelled text entry. It has the label above the text area.
At the Mozilla meetings at CSUN, we discussed the idea that Gecko should support the heuristics for this and that it should offer up a label-guess attribute of some form. Maybe the value could be a nested numeric path similar to what at-poke shows us (e.g., 0.5.4.2 means "starting at the html document frame, get child 0 and then child 5 of that child and child 4 of that child...").
See also Mozilla bug: https://bugzilla.mozilla.org/show_bug.cgi?id=376481
If we are going to have to do some inference based on spatial relationships, it would be helpful to know the direction of the text in question. I haven't figured out where this gets exposed by Firefox. I believe other apps expose it as one of their text attributes. I'm also not convinced that Firefox is failing to expose it *somehow*: If you look at a page with a mixture of Arabic and English in Accerciser, Accerciser displays the Arabic text on the right-hand side and the caretOffset reflects a rtl order as you arrow within it. Any ideas how Accerciser knows? :-) Thanks!
Tom Brunet'scomment on the mozilla bug/RFE: <start> Here's a general idea of what Home Page Reader used as a heuristic. First, from the documentation that comes with the product: HPR renders a label for each control by looking for one of the following pieces of information: 1. Surrounding LABEL element. LABEL element with for attribute value matching control element ID. 2. Text immediately following or preceding the control element in the same, previous, or next item. 3. Text in the previous, next, or above table cell. 4. title attribute. --- That's a little vague for this purpose, so here's a little more detail. For text inputs: 1. Text/img that precedes control in same item. 2. Text/img that follows control in same item (nothing between end of text and item) 3. Text/img that precedes control in previous item/cell (not another control in that item) 4. Text/img in cell above without other controls in cell above Throw out #4 if other controls in this cell above For checkboxes/radio buttons: 1. Text/img that follows control in same item. 2. Text/img that precedes control in same item (nothing between start of text and item) 3. Text/img that follows control in next item/cell (not another control in that item) 4. Text/img that precedes control in previous item/cell (not another control in that item) 5. Text/img in cell above without other controls in cell above Throw out #5 if other controls in this cell above --- An item boundary usually created a visual line break, but I don't think I can easily expand more on that. Also, you might want to consider labels for radio groups? This heuristic wasn't perfect, but worked quite well for us - at least for most situations that didn't completely ignore UI guidelines. Eyeballing it, you might run into trouble if a survey-like set of radio buttons had labels below each radio button. I hope this helps a little. <end> Thoughts?
I figured I'd start with entries. Given that we cannot count on the position in the hierarchy and instead have to look at things spatially, my plan was to use getLineContentsAtOffset() to get all the objects on the line. Then I would figure out which object was the entry. Once I knew where the entry was with respect to the line, I could use getExtents() to figure out where things were spatially in relation to the entry and then do the guessing based on that. The trick seems to be figuring out where the entry is with respect to the line. Test case: In simpleform.html from orca/test/html, I typed "hello world" in the entry that immediately follows "Type something in here:". Then I did the following three times in a row: 1. Clicked on "Type something in here" 2. Pressed Tab Below are the results. It seems that in any given cycle we have three accessibles representing the entry: 1. An accessible representing the object as our locusOfFocus (i.e. the object for which we're trying to guess a label) 2. An accessible from the perspective of getLineContentsAtOffset() 3. An accessible from the perspective of the paragraph that contains the entry ---------------- locusOfFocus: <orca.atspi.Accessible instance at 0x87a03ec> entry, text: hello world line: [[<orca.atspi.Accessible instance at 0x8630e4c>, 0, 28], [<orca.atspi.Accessible instance at 0x87a064c>, 0, 11]] Item 1: <orca.atspi.Accessible instance at 0x8630e4c> paragraph, text: Type something here:  child: <orca.atspi.Accessible instance at 0x8630f4c> entry, text: hello world Item 2: <orca.atspi.Accessible instance at 0x87a064c> entry, text: hello world locusOfFocus: <orca.atspi.Accessible instance at 0x8630eac> entry, text: hello world line: [[<orca.atspi.Accessible instance at 0x8630aec>, 0, 28], [<orca.atspi.Accessible instance at 0x8630d8c>, 0, 11]] Item 1: <orca.atspi.Accessible instance at 0x8630aec> paragraph, text: Type something here:  child: <orca.atspi.Accessible instance at 0x87a012c> entry, text: hello world Item 2: <orca.atspi.Accessible instance at 0x8630d8c> entry, text: hello world locusOfFocus: <orca.atspi.Accessible instance at 0x87a03cc> entry, text: hello world line: [[<orca.atspi.Accessible instance at 0x87a03ec>, 0, 28], [<orca.atspi.Accessible instance at 0x87a016c>, 0, 11]] Item 1: <orca.atspi.Accessible instance at 0x87a03ec> paragraph, text: Type something here:  child: <orca.atspi.Accessible instance at 0x8630a8c> entry text: hello world Item 2: <orca.atspi.Accessible instance at 0x87a016c> entry, text: hello world ---------------- Am I missing something? I would have thought that we'd find the accessible that represents the entry in what is returned by getLineContentsAtOffset() -- either directly or as the child of the paragraph that contains the entry.... Insight and/or suggestions on how to proceed and/or snarky comments welcome. :-) In the meantime, I'm going to walk away from this for a bit in the hopes that a new, more reliable plan reveals itself. Thanks!
It appears that, despite a bit of multiple personality disorder, the entries are sufficiently in touch with reality to know that they occupy the same physical space. Therefore I can use extents to compare objects. I still want to know why they are claiming to be three different accessibles, however. Will, any guesses?
Created attachment 87675 [details] test case with various and sundry entries whose labels need to be guessed This is a sample with entries with text on the left, right, above, beneath, in table cells, etc. etc. Some of the functional labels include links in order to verify that we are handling EMBEDDED_OBJECT_CHARACTERS correctly.
Created attachment 87676 [details] [review] first crack at guessing labels for entries - Not fit for human consumption :-) As I stated in the description, this patch is not quite fit for human consumption. :-) It's more of a proof-of-concept. Will: As I had emailed you yesterday, it seems like being able to use getDisplayedText() with text that is made up of EMBEDDED_OBJECT_CHARACTERs would be handy. And you had a comment in that method indicating it needed doing, so I did it. :-) The good news is that it seems to work reasonably well. The exception is that Gecko.getSpeechContext() now chats up a storm. :-( My best guess is that we were avoiding some chattiness not by explicit design but rather as a side effect of not handling text that contained EOCs in getDisplayedText(). But I could easily be missing something. If so, please let me know what that something is. :-) In the meantime, in order to facilitate your testing the label guessing functionality, I solved this problem temporarily by having getSpeechContext() always return an empty list until we can discuss what really should be done. As an aside, having getSpeechContext() return an empty list doesn't seem to have any noticeable impact on what Orca says or doesn't say. Anyhoo, this patch was tested successfully with the following: 1. The test case I just attached 2. Gmail - compose new message (basic HTML version) 3. Google's advanced search page (with one exception) 4. Bugzilla's enter a new bug form Known issues: 1. On Google's advanced search page, we're not picking up the "label" on the very first entry; the rest of the entries seem fine. I suspect when I look at this after a break, the reason why will be obvious to me. 2. On bugzilla, with existing bugs, tabbing to the Additional Comments entry causes a traceback. We're getting a negative Y extent, plus there are several hidden objects and a bunch of EOCs that aren't even showing up in Accerciser.... I'm not convinced the solution here will be as obvious, but I'll look at it as well after a break. Please let me know what you think. Thanks!
TODO: It looks like the guesswork I did to identify a grid of fields should not just take into account the field immediately above the current field, but instead consider all of the fields up to the top of the column. For example, see the "re-enter email address" entry on the monster.com create account page (http://my.monster.com/Account/Account.aspx). We're speaking "Use 4 to 20 letters and/or numbers" because the entry immediately above us is the same size.
> I still want to know why they are claiming to be three different accessibles, > however. Will, any guesses? This is usually the result of one of the following: 1) Something somewhere on the AT side did not ref() the CORBA accessible. When this happens, the thing on the other side (e.g., Gecko) can feel free to garbage collect and then give us different objects for the same object over different calls. This doesn't seem to be the case. 2) The thing on the other side has a bug where it doesn't keep track of the object it gave us, and thus chooses to just keep giving us a different object for the same child. You might be able to verify this by repeatedly calling getChildAtIndex on the thing holding the text entry to see if it keeps giving us a different child.
> Will: As I had emailed you yesterday, it seems like being able to use > getDisplayedText() with text that is made up of EMBEDDED_OBJECT_CHARACTERs > would be handy. Ha! Rather than abusing my mind yesterday, I was abusing my body on tough hills all over Hollis, Brookline, Mason, Greenfield, Temple, Wilton, etc. 100K+ miles of tough riding with a couple guys in my racing category and a couple more in the next racing category above me. They == strong. Me == not as strong. > And you had a comment in that method indicating it needed > doing, so I did it. :-) The good news is that it seems to work reasonably > well. The exception is that Gecko.getSpeechContext() now chats up a storm. > :-( My best guess is that we were avoiding some chattiness not by explicit > design but rather as a side effect of not handling text that contained EOCs in > getDisplayedText(). But I could easily be missing something. If so, please > let me know what that something is. :-) The main problem is that the speech context code does not expect a parent to defer to its children for more text. The presence of EMBEDDED_OBJECT_CHARACTERs (EOCs) stuff throws a monkey wrench into this. A thing to think about is whether we want to just toss out speech context in specific situations (e.g.,places where we are obviously not in a typical WIMP hierarchy). > Anyhoo, this patch was tested successfully with the following: Cool!
Created attachment 87768 [details] slightly revised test case
Created attachment 87769 [details] [review] cleaner, better, and now with combo box support! :-) This version is actually worth testing I think. There are still known issues, plenty of work to do, etc., but the functionality of our heuristic approach to label guessing is starting to shape up -- at least with respect to entries and combo boxes. :-) Improvements over the last version: * All of the entry known issues I mentioned before seem to be resolved * Added a check for text below the line (but that is not in a table cell) * Added basic support for guessing combo box labels * Separated things out into smaller methods so that guessTheLabel() is no longer a single, monstrous blob of code. :-) * No longer completely nuking the functionality of getSpeechContext() -- only nuking it if in the document frame. :-) (Will gave me lots of good things to think about -- I need to think about them.) New Known Issues: * If you use form field structural navigation to move to a multi-line entry, we're not guessing the labels. This is due to one of my go{Next, Previous}FormField() hacks. Once Aaron's relevant patches are in the build, that should be a non-issue. Hopefully that will be fixable tomorrow. In the meantime, to test guessing functionality w.r.t. multi-line entries, use Tab/Shift+Tab to move focus there. * Every once in a while there is a combo box that we're not guessing the label for; instead we're guessing an item inside the combo box, but not the currently selected one. Examples: On the Google advanced search page, the "10 results" combo box; in bugzilla, the "resolution" combo box. I suspect this is a no-brainer, but it's after 2AM and *I'm* now a no-brainer. :-) I'll hopefully fix this issue tomorrow as well. Question for Mike: If you look at the attachment table in bugzilla, there is a combo box for the status of each attachment. Currently, we're grabbing just the cell on its immediate left and not the whole line. What should we be saying there? Keep in mind that this is all heuristics, so it's not just "it should read the whole line" but "it should read the whole line because the text is all squished together" (in which case, we need to define what constitutes "squished" <grin>). Nap time. :-)
(In reply to comment #15) > Created an attachment (id=87769) [edit] > cleaner, better, and now with combo box support! :-) Way cool. Definitely looks cleaner and better. Mike, please test this!
This seems to work well on many combo boxes however the first problem I've noticed is the following: 1. open this bug to comment on it. 2. Tab or insert+tab to the first two combo boxes after the save button. You'll notice that the first two combo boxes after the safe button have the size of the patch for there label. I'll keep testing of course but this is just my first finding.
Thanks for testing Mike. There are definitely kinks that need to be worked out so please keep reporting them as you find them. > You'll notice that the first two combo boxes after the safe button have the > size of the patch for there label. Yup, this is what I was referring to in comment #15 when I said: > Question for Mike: > If you look at the attachment table in bugzilla, there is a combo box for the > status of each attachment. Currently, we're grabbing just the cell on its > immediate left and not the whole line. What should we be saying there?
Mike: In addition to the above combo box situation, I need guidance on what you would like me to do about checkboxes and radio buttons. We, of course, want to guess their labels. But beyond that, what context if any should we attempt to provide as part of the guess? For instance, if the user presses Tab or Orca_Modifier+Tab and lands on a checkbox called "Red", we'll say "Red" as the label. But if there's a group of checkboxes (like the addition of a "Green" checkbox and "Blue" checkbox), there's probably some contextual text somewhere like "What is your favorite color?" or "What color should your towel NOT be if you want to avoid being impaled by a bull?" Do we going looking for that text as well as part of the label? If so, where do we go looking for it? These sorts of things are not addressed in the approach described by Tom Brunet; what he describes is strictly for identifying the "Red" bit. Thanks!
I really don't think we will be able to do anything extra here with out getting a whole lot of false possitives. I think at this point you should just stick with the plan of speaking the lavel for each button or check box. If you come up with any really clever way in the future of infering group information we can revisit this at that time. After our phone call earlier I also don't think you should try to get any more information for the combo boxes I talked about in comment 17. I think your current solution will get the labels for the vast majority of comboboxes that users will encounter without extra verbosity.
> I really don't think we will be able to do anything extra here with out getting > a whole lot of false possitives. I tend to agree. The solution here is working around a problem that should be addressed at the content provider level. I hate giving them a free ride when there are specifications in place that say how to create accessible content. I do realize there will always be lunkhead content generators that create inaccessible content, though, so some level of guessing is good. One question on the table is if what Joanie currently has done is at a sufficient level. Mike, Joanie, if you think it is, is this bug ready for closure?
I still have to do lists, radio buttons, and checkboxes as well as the little check you and I talked about to see if we're in an entry or password_text in getUtterancesFromContents(). Shouldn't take very long to do these things, but they're not done yet. I'll look at them a bit later today. I think we can move the target for this to 2.19.3.
Created attachment 89166 [details] [review] latest version: adds lists, radio buttons, checkboxes, plus other cases/conditions This adds lists, radio buttons, and checkboxes into the mix; cleans up some bugs; addresses more cases for entries and combo boxes; and probably does some other stuff that I will remember after the nap I'm about to take. :-) This is not the final version; what remains is dealing with getSpeechContext() which I am (still) currently bypassing for document content in order to isolate the guessing logic and properly expand EMBEDDED_OBJECT_CHARACTERS. Will, I think I'm going to need your help with that -- at least to brainstorm about what exactly we want to do. In the meantime, Mike please test this patch for its guessing accuracy. Thanks guys!!
(In reply to comment #23) > Created an attachment (id=89166) [edit] > latest version: adds lists, radio buttons, checkboxes, plus other > cases/conditions Wow! It'll take some time to review this, but a quick scan indicates it's pretty cool. :-) > This is not the final version; what remains is dealing with getSpeechContext() > which I am (still) currently bypassing for document content in order to isolate > the guessing logic and properly expand EMBEDDED_OBJECT_CHARACTERS. Will, I > think I'm going to need your help with that -- at least to brainstorm about > what exactly we want to do. You bet. Bypassing things with EOC's is still probably a fair thing to do -- one problem with including them is that you may end up digging back into the object you came from, so you'll duplicate text. Another problem is that you might end up bubbling up to something like the document frame and you could end up speaking the entire document (I think). Give me a ping on IRC/AOL or call me, though, and we can brainstorm.
Created attachment 89190 [details] [review] very minor change; some typo corrections
I'm currently on the phone with Joanie and I've pointed out some small edit field problems with the following pages. live.gnome.org and a personalized www.google.com page. Braille is also quite messed up and Joanie is aware of this.
Something dawned on me around the same time it seemed to be witnessed by Mike with whom I just spoke: Expanding EOC's in getDisplayedText() f's mightily with braille. :-( So the question is, do we adjust all of our calls in Gecko.py -- I think we can do so by passing in the endOffset of the current EOC to getDisplayedText(). -- or do we not expand the text there and instead do something clever somewhere else?
Created attachment 89199 [details] [review] Be more conservative when guessing If it's not in the form, don't guess it as a label. The flip side of being conservative in this way is that there will be occasions in which the "label" is contained in some object (section, paragraph, perhaps a table cell) and that object in turn contains the form. Handling that scenario *accurately* will need some thought. In the meantime Mike, please give this a spin -- from a speech perspective. :-) Thanks!!
Mike, don't bother testing the most recent patch. I chatted with Will and a new solution that should make everything (including braille) good will be forthcoming. Stay tuned!
Created attachment 89210 [details] [review] latest version: Don't mess with getDisplayedText() I am now leaving getDisplayedText() alone and expanding EOCs in Gecko.py. getSpeechContext() is back to doing its thing, braille is (theoretically) untouched by this patch. Mike please test. Thanks!
> Created an attachment (id=89210) [edit] > latest version: Don't mess with getDisplayedText() Cool. Looking good! I'm just marking this as 'reviewed' mostly because I want to wait to see what Mike's testing shows. If his testing is positive, assume I said "accepted commit now!" :-)
This is really looking good. I'm no longer getting the false possitives I was. I'm for checking it in and letting the community work with it.
oops, the patch is trying to guess labels when we are moving by words within an entry. :-( If you are writing a comment and move around with Left, Right, Up, or Down, we're good; try Control Left or Right and we guess the label.... There's your answer to what's up with word navigation Mike. Let me see what I can figure out....
Created attachment 89219 [details] [review] modify sayWord() to handle entries itself As part of the new label guess functionality, getUtterancesFromContents() relies more on the speechgenerators rather than on getText(). sayWord() was calling getUtterancesFromContents() so we were doing all sorts of things that we shouldn't be doing during a sayWord() such as guessing the label, speaking the line at the caret, etc. I tried having sayWord() call default.sayWord() when in an entry as we are doing with sayLine(). Worked going forward; not backward. Now sayWord() handles entries itself. So much for having a hack-free patch. ;-) Mike, please test. (label guess logic, speech, and braille should be unchanged; navigation by word is all that is different) Thanks!
I've just re-tested this patch and all now seems well with navigating by control+left and right. Label guess still also seems to be working fine.
Thanks Mike. Will, do you have any issues with my revised sayWord()? Ways I could have done it better, etc.?
> Will, do you have any issues with my revised sayWord()? Ways I could have done > it better, etc.? Looks good to me! :-) Go ahead and commit! :-)
> Looks good to me! :-) Go ahead and commit! :-) Woo hoo!! I mean, "thanks". ;-) Patch committed. Very gratefully moving this to [pending]. :-)
From the usibility side of things this is going to be our nicest release in the 2.20 cycle.
> From the usibility side of things this is going to be our nicest release in the > 2.20 cycle. Thanks Mike!! From the development side, all I can say is that I'm fixin' to get me a bottle of champagne. ;-) Although I must say that Rich's addition of GUI support for app-specific and app-unique settings is just as nice. Anyhoo, since Mike has verified this one, I'm closing it as FIXED.