Bug 520656 – The regression test harness should be capable of handling alternative expected results

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 520656 - The regression test harness should be capable of handling alternative expected results


Summary:	The regression test harness should be capable of handling alternative expecte...


Status:	RESOLVED FIXED

Product:	orca
Classification:	Applications
Component:	general
Version:	2.21.x
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Orca Maintainers
QA Contact:	Orca Maintainers

URL:
Whiteboard:

Depends on:
Blocks:	519271

Reported:	2008-03-06 02:15 UTC by Joanmarie Diggs (IRC: joanie)
Modified:	2008-03-24 16:47 UTC

See Also:
GNOME target:	---
GNOME version:	2.21/2.22

Attachments
revision 1 - probably needs work (4.07 KB, patch) 2008-03-06 02:27 UTC, Joanmarie Diggs (IRC: joanie)	none	Details \| Review
Patch to treat expected results as regular expressions (2.90 KB, patch) 2008-03-19 15:21 UTC, Willie Walker	committed	Details \| Review

Description Joanmarie Diggs (IRC: joanie) 2008-03-06 02:15:19 UTC

One of the difficulties with automated regression tests is that different platforms and different versions of software within each platform can each result in different (but equally valid/correct) output by Orca.  

We should try to minimize these differences by creating tests that are not platform and/or environment dependent as well as by maintaining test environments which are as close as possible to a specified configuration.

Sadly, these measures still won't eliminate all differences. :-(  See for example, http://bugzilla.gnome.org/show_bug.cgi?id=519271#c6 as well as the following comment.

If the regression test harness were capable of handling alternative expected results, we might be able to better address these remaining differences.

Comment 1 Joanmarie Diggs (IRC: joanie) 2008-03-06 02:18:12 UTC

We could have output along these lines:

Test 1 of 7 FAILED: /home/jd/orca/test/keystrokes/gtk-demo/role_icon.py:Layered pane focus
EXPECTED:
     "BUG? - should something be presented here?",
ACTUAL:
     "",
[FAILURE WAS EXPECTED - LOOK FOR BUG? IN EXPECTED RESULTS]
Test 2 of 7 FAILED: /home/jd/orca/test/keystrokes/gtk-demo/role_icon.py:Layered pane Where Am I
EXPECTED:
     "BUG? - should we present the number of items in the layered pane?",
     "BRAILLE LINE:  'gtk-demo Application GtkIconView demo Frame ScrollPane LayeredPane'",
     "     VISIBLE:  'LayeredPane', cursor=1",
     "SPEECH OUTPUT: ''",
     "SPEECH OUTPUT: 'layered pane'",
ACTUAL:
     "BRAILLE LINE:  'gtk-demo Application GtkIconView demo Frame ScrollPane LayeredPane'",
     "     VISIBLE:  'LayeredPane', cursor=1",
     "SPEECH OUTPUT: ''",
     "SPEECH OUTPUT: 'layered pane'",
[FAILURE WAS EXPECTED - LOOK FOR BUG? IN EXPECTED RESULTS]
Test 4 of 7 FAILED: /home/jd/orca/test/keystrokes/gtk-demo/role_icon.py:bin icon Where Am I
EXPECTED:
     "BRAILLE LINE:  'gtk-demo Application GtkIconView demo Frame ScrollPane LayeredPane bin Icon'",
     "     VISIBLE:  'bin Icon', cursor=1",
     "SPEECH OUTPUT: 'Icon panel'",
     "SPEECH OUTPUT: 'foobar'",
     "SPEECH OUTPUT: '1 of 24 items selected'",
     "SPEECH OUTPUT: 'on item 1 of 24'",
ALTERNATIVELY:
     "BRAILLE LINE:  'gtk-demo Application GtkIconView demo Frame ScrollPane LayeredPane bin Icon'",
     "     VISIBLE:  'bin Icon', cursor=1",
     "SPEECH OUTPUT: 'Icon panel'",
     "SPEECH OUTPUT: 'bin'",
     "SPEECH OUTPUT: '1 of 23 items selected'",
     "SPEECH OUTPUT: 'on item 1 of 23'",
ACTUAL:
     "BRAILLE LINE:  'gtk-demo Application GtkIconView demo Frame ScrollPane LayeredPane bin Icon'",
     "     VISIBLE:  'bin Icon', cursor=1",
     "SPEECH OUTPUT: 'Icon panel'",
     "SPEECH OUTPUT: 'bin'",
     "SPEECH OUTPUT: '1 of 24 items selected'",
     "SPEECH OUTPUT: 'on item 1 of 24'",
[FAILURE WAS UNEXPECTED]
Test 3 of 7 SUCCEEDED: /home/jd/orca/test/keystrokes/gtk-demo/role_icon.py:bin icon
Test 5 of 7 SUCCEEDED: /home/jd/orca/test/keystrokes/gtk-demo/role_icon.py:boot icon
Test 6 of 7 SUCCEEDED: /home/jd/orca/test/keystrokes/gtk-demo/role_icon.py:icon selection
Test 7 of 7 SUCCEEDED: /home/jd/orca/test/keystrokes/gtk-demo/role_icon.py:icon selection Where Am I
SUMMARY: 4 SUCCEEDED and 3 FAILED (1 UNEXPECTED) of 7 for /home/jd/orca/test/keystrokes/gtk-demo/role_icon.py

:-)

Comment 2 Joanmarie Diggs (IRC: joanie) 2008-03-06 02:27:38 UTC

Created attachment 106657 [details] [review]
revision 1 - probably needs work

This patch consists of some minor changes to utils.py to support alternative expected results.  The change should work with our existing tests because it takes the expected results and turns them into a list of lists (if it's not already a list of lists).

As part of the "proof of concept", I modified the role_icon.py test as follows:

* test 4 of 7 has two possible alternative results, both of which
  were designed to fail on my machine.  As a result, you get the

  EXPECTED:
      "foo"
  ALTERNATIVELY:
      "bar"
  ACTUAL:
      "oh crap"

  as seen my previous comment.

* test 7 of 7 has two alternatives, the first of which succeeds,
  hence:

  Test 7 of 7 SUCCEEDED: /home/jd/orca/test/yadda/yadda/yadda.py

Comment 3 Joanmarie Diggs (IRC: joanie) 2008-03-06 02:29:25 UTC

So guys, whatchya think?

Comment 4 Rich Burridge 2008-03-06 16:12:49 UTC

I think we need something like this. There will always be
differences between Solaris and "Linux", and if we want a
reliable set of regression tests (instead of just punting
with a "KNOWN ISSUE" solution), then we have to devise an
approach that fixes that.

My only thought was YA way of doing this using a dictionary
where the results are stored with keys that are derived from
running "uname".

Something like:

{'solaris': ['Line 1', 'Line 2', 'Line 3'], 
 'Linux': ['Line A', 'Line B', 'Line C']}

That gives us a tighter match up of what's expected on each 
platform.

Comment 5 Willie Walker 2008-03-06 17:00:26 UTC

IMO, this is an interesting ides, but it should be used only as a last resort.  Ideally, GNOME should behave like GNOME no matter where it runs.  If the tests encounter differences that are the result of the underlying platform being exposed, we should try to avoid those issues.  In most cases, I think we can.  

For example, something such as depending upon the contents of "/" remaining constant is just a really bad idea.  My apologies for creating it.  We need a better/different test, not a crutch.

In addition, if there are differences that are just plain unexplainable (e.g., extra spaces in some tests on Solaris), we really should try to dig to the bottom of those issues.  They could be symptoms of underlying toolkit/AT-SPI bugs and not something we should hide.

Before applying this practice, I think we should try to better evaluate and fix our current issues via other means if we can.

Comment 6 Willie Walker 2008-03-19 15:21:05 UTC

Created attachment 107620 [details] [review]
Patch to treat expected results as regular expressions

Here's a patch to treat the expected results as a list of regular expressions.  Also included is a gtk-demo/debug_commands.py modification to show how it might be used.

I still think this should only be used as a last resort.  That is, we should only use it for cases where we understand why and accept that there will be differences.  It should not be used to hide unexplained problems.

Comment 7 Willie Walker 2008-03-24 16:47:01 UTC

Committed the regular expression patch, with some additional documentation.  Closing as FIXED.