After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 725003 - Accessible text at offset for sentence boundaries is incorrect for documents with wrapped text
Accessible text at offset for sentence boundaries is incorrect for documents ...
Status: RESOLVED FIXED
Product: evince
Classification: Core
Component: PDF
3.10.x
Other Linux
: Normal normal
: ---
Assigned To: Joanmarie Diggs (IRC: joanie)
Evince Maintainers
Depends on:
Blocks:
 
 
Reported: 2014-02-23 15:07 UTC by Joanmarie Diggs (IRC: joanie)
Modified: 2014-03-17 12:33 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
test case (13.83 KB, application/pdf)
2014-02-23 15:07 UTC, Joanmarie Diggs (IRC: joanie)
  Details
accessible-event listener (523 bytes, text/x-python)
2014-02-23 15:08 UTC, Joanmarie Diggs (IRC: joanie)
  Details
Stop using GailTextUtil (3.71 KB, patch)
2014-02-25 18:40 UTC, Joanmarie Diggs (IRC: joanie)
accepted-commit_now Details | Review
Shorter debug.out file (128.06 KB, application/octet-stream)
2014-02-27 06:23 UTC, Hammer Attila
  Details
Solution - Take 1 (9.99 KB, patch)
2014-03-03 18:28 UTC, Joanmarie Diggs (IRC: joanie)
reviewed Details | Review
Stop using GtkTextBuffer (14.52 KB, patch)
2014-03-14 08:03 UTC, Joanmarie Diggs (IRC: joanie)
none Details | Review
WIP - Not yet ready for formal review (15.28 KB, patch)
2014-03-14 15:07 UTC, Joanmarie Diggs (IRC: joanie)
none Details | Review
proposed patch (18.80 KB, patch)
2014-03-14 21:31 UTC, Joanmarie Diggs (IRC: joanie)
committed Details | Review

Description Joanmarie Diggs (IRC: joanie) 2014-02-23 15:07:48 UTC
Created attachment 270054 [details]
test case

Steps to reproduce:
1. Load the attached test case in Evince and enable caret navigation (F7)
2. Load the attached accessible-event listener in a terminal
3. Place the caret in the text of the document

Expected results: The sentences would not be split up by newline characters.

Actual results: The sentences are split up by newline characters as shown below.

Sentence at offset 0	<This is a document which was created in LibreOffice\n>
Sentence at offset 52	<Writer. >
Sentence at offset 60	<The text is in a single paragraph with no newline\n>
Sentence at offset 110	<characters added to it. >
Sentence at offset 134	<It is also in large print so that I do\n>
Sentence at offset 173	<not have to type a lot to get text to wrap.>
Comment 1 Joanmarie Diggs (IRC: joanie) 2014-02-23 15:08:41 UTC
Created attachment 270055 [details]
accessible-event listener
Comment 2 Joanmarie Diggs (IRC: joanie) 2014-02-25 18:40:51 UTC
Created attachment 270303 [details] [review]
Stop using GailTextUtil

This doesn't fix the bug. But fixing the bug requires we stop using GailTextUtil so we can handle the Evince-specific case of there being newline chars inserted in the text where we (accessibility) would not normally expect to find newline chars.
Comment 3 Hammer Attila 2014-02-27 06:13:08 UTC
This is interesting with you wrote the first issue related.
If I launch Pidgin my Ubuntu 14.04 system (I using GNOME Shell classic session 3.10 version), I hear only the add button related informations.
I using at-spi2-core 2.10.2 version.
What other components version number is important this problem related?
I generated a shorter debug.out file with concentrate only the first issue, now I used hungarian locale.
I will attaching the short debug.out file.

Attila
Comment 4 Hammer Attila 2014-02-27 06:23:59 UTC
Created attachment 270447 [details]
Shorter debug.out file

What happening my debug.out file when I launched Pidgin?
A possible interesting part:
DEQUEUED OBJECT:PROPERTY-CHANGE:ACCESSIBLE-NAME  <----------

vvvvv PROCESS OBJECT EVENT object:property-change:accessible-name vvvvv
OBJECT EVENT: object:property-change:accessible-name   detail=(0,0,)
    app.name='gnome-shell' name='None' role='label' state='' relations=''
Script for event: gnome-shell (module=orca.scripts.apps.gnome-shell.script)
TOTAL PROCESSING TIME: 0.0020
^^^^^ PROCESS OBJECT EVENT object:property-change:accessible-name ^^^^^

DEQUEUED OBJECT:PROPERTY-CHANGE:ACCESSIBLE-NAME  <----------

vvvvv PROCESS OBJECT EVENT object:property-change:accessible-name vvvvv
OBJECT EVENT: object:property-change:accessible-name   detail=(0,0,)
    app.name='gnome-shell' name='None' role='label' state='' relations=''
Script for event: gnome-shell (module=orca.scripts.apps.gnome-shell.script)
TOTAL PROCESSING TIME: 0.0021
^^^^^ PROCESS OBJECT EVENT object:property-change:accessible-name ^^^^^

DEQUEUED OBJECT:PROPERTY-CHANGE:ACCESSIBLE-NAME  <----------

vvvvv PROCESS OBJECT EVENT object:property-change:accessible-name vvvvv
OBJECT EVENT: object:property-change:accessible-name   detail=(0,0,)
    app.name='gnome-shell' name='None' role='label' state='' relations=''
Script for event: gnome-shell (module=orca.scripts.apps.gnome-shell.script)
TOTAL PROCESSING TIME: 0.0021
^^^^^ PROCESS OBJECT EVENT object:property-change:accessible-name ^^^^^

DEQUEUED OBJECT:STATE-CHANGED:SHOWING  <----------

vvvvv PROCESS OBJECT EVENT object:state-changed:showing vvvvv
OBJECT EVENT: object:state-changed:showing             detail=(1,0,0)
    app.name='gnome-shell' name='None' role='panel' state='enabled focusable sensitive showing visible' relations=''
Script for event: gnome-shell (module=orca.scripts.apps.gnome-shell.script)
TOTAL PROCESSING TIME: 0.0017
^^^^^ PROCESS OBJECT EVENT object:state-changed:showing ^^^^^

DEQUEUED OBJECT:STATE-CHANGED:ACTIVE  <----------

vvvvv PROCESS OBJECT EVENT object:state-changed:active vvvvv
OBJECT EVENT: object:state-changed:active              detail=(0,0,0)
    app.name='Pidgin' name='Partnerlista' role='frame' state='enabled resizable sensitive showing visible' relations=''
Script for event: Pidgin (module=orca.scripts.apps.pidgin.script)
TOTAL PROCESSING TIME: 0.0016
^^^^^ PROCESS OBJECT EVENT object:state-changed:active ^^^^^

DEQUEUED OBJECT:STATE-CHANGED:FOCUSED  <----------

vvvvv PROCESS OBJECT EVENT object:state-changed:focused vvvvv
OBJECT EVENT: object:state-changed:focused             detail=(1,0,0)
    app.name='Pidgin' name='Hozzáadás…' role='push button' state='enabled focusable focused sensitive showing visible' relations=''
Script for event: Pidgin (module=orca.scripts.apps.pidgin.script)
mapped Pidgin to pidgin
Looking for app-settings.pidgin.py
Could not import app-settings.pidgin.py
ACTIVE SCRIPT: Pidgin (module=orca.scripts.apps.pidgin.script) (reason=Event source claimed focus.)
LOCUS OF FOCUS: app='Pidgin' name='Hozzáadás…' role='push button'
                event='object:state-changed:focused'

PREPARATION TIME: 0.0010
generate braille for focused app.name='Pidgin' name='Hozzáadás…' role='push button' state='enabled focusable focused sensitive showing visible' relations='' (args={'recursing': True, 'role': <enum ATSPI_ROLE_PUSH_BUTTON of type AtspiRole>, 'formatType': 'focused', 'mode': 'braille'}) using '(includeContext                           and (ancestors                                + (rowHeader and [Region(" " + asString(rowHeader))])                                + (columnHeader and [Region(" " + asString(columnHeader))])                                + (radioButtonGroup and [Region(" " + asString(radioButtonGroup))])                                + [Region(" ")])                           or []) + [Component(obj,                                     asString((labelAndName or description) + expandableState + roleName))] + (nodeLevel and [Region(" " + asString(nodeLevel))])'
GENERATION  TIME: 0.0006  ---->  includeContext=True

PREPARATION TIME: 0.0009
generate braille for focused app.name='Pidgin' name='Fiókok' role='dialog' state='active enabled resizable sensitive showing visible' relations='' (args={'recursing': True, 'role': <enum ATSPI_ROLE_DIALOG of type AtspiRole>, 'formatType': 'focused', 'mode': 'braille', 'includeContext': False}) using '[Component(obj,                                     asString(label + displayedText + value + roleName + required))]'
GENERATION  TIME: 0.0006  ---->  label=[]
GENERATION  TIME: 0.0005  ---->  displayedText=['Fiókok']
GENERATION  TIME: 0.0006  ---->  value=['']
GENERATION  TIME: 0.0007  ---->  roleName=['párbeszédablak']
GENERATION  TIME: 0.0003  ---->  required=[]
COMPLETION  TIME: 0.0044
generate braille results:
  Component: '$fi9kok p"rbe51dablak', 1

PREPARATION TIME: 0.0008
generate braille for focused app.name='Pidgin' name='Pidgin' role='application' state='' relations='' (args={'recursing': True, 'role': <enum ATSPI_ROLE_APPLICATION of type AtspiRole>, 'formatType': 'focused', 'mode': 'braille', 'includeContext': False}) using '[Component(obj,                                     asString(label + displayedText + value + roleName + required))]'
GENERATION  TIME: 0.0006  ---->  label=[]
GENERATION  TIME: 0.0003  ---->  displayedText=['Pidgin']
GENERATION  TIME: 0.0005  ---->  value=['']
GENERATION  TIME: 0.0006  ---->  roleName=['alkalmazás']
GENERATION  TIME: 0.0003  ---->  required=[]
COMPLETION  TIME: 0.0038
generate braille results:
  Component: '$pidgin alkalma2"s', 1
GENERATION  TIME: 0.0108  ---->  ancestors=[<orca.braille.Component object at 0xb2f4182c>, <orca.braille.Region object at 0xb2f418ec>, <orca.braille.Component object at 0xb2ed684c>]
GENERATION  TIME: 0.0007  ---->  rowHeader=[]
GENERATION  TIME: 0.0005  ---->  columnHeader=[]
GENERATION  TIME: 0.0004  ---->  radioButtonGroup=[]
GENERATION  TIME: 0.0018  ---->  labelAndName=['Hozzáadás…']
GENERATION  TIME: 0.0007  ---->  expandableState=[]
GENERATION  TIME: 0.0009  ---->  roleName=['gomb']
GENERATION  TIME: 0.0014  ---->  nodeLevel=[]
COMPLETION  TIME: 0.0197
generate braille results:
  Component: '$pidgin alkalma2"s', 1
  Region: ' ', 0
  Component: '$fi9kok p"rbe51dablak', 1
  Region: ' ', 0
  Component: '$ho22"ad"s... gomb', 1
BRAILLE LINE:  '$pidgin alkalma2"s $fi9kok p"rbe51dablak $ho22"ad"s... gomb'
     VISIBLE:  '$pidgin alkalma2"s $fi9kok p"rbe51dablak $ho22"ad"s... gomb', cursor=43

PREPARATION TIME: 0.0012
generate speech for unfocused app.name='Pidgin' name='Hozzáadás…' role='push button' state='enabled focusable focused sensitive showing visible' relations='' (args={'recursing': True, 'role': <enum ATSPI_ROLE_PUSH_BUTTON of type AtspiRole>, 'formatType': 'unfocused', 'priorObj': None, 'mode': 'speech'}) using 'oldAncestors + newAncestors + labelAndName + expandableState + roleName + availability + (mnemonic and (pause + mnemonic + lineBreak) or []) + accelerator + newNodeLevel + unselectedCell + (tutorial and (pause + tutorial) or [])'
GENERATION  TIME: 0.0003  ---->  oldAncestors=[]
GENERATION  TIME: 0.0003  ---->  newAncestors=[]
GENERATION  TIME: 0.0003  ---->  labelAndName=['Hozzáadás…', {'rate': 90.0, 'family': {'locale': 'hu', 'name': 'hungarian'}, 'average-pitch': 3.0}]
GENERATION  TIME: 0.0005  ---->  expandableState=[]
GENERATION  TIME: 0.0006  ---->  roleName=['gomb', {}]
GENERATION  TIME: 0.0004  ---->  availability=[]
script_utilities.getKeyBinding: ['', 'Alt+A', '']
GENERATION  TIME: 0.0010  ---->  mnemonic=['Alt+A', {}]
GENERATION  TIME: 0.0003  ---->  pause=[<orca.speech_generator.Pause object at 0xb3cab92c>]
GENERATION  TIME: 0.0002  ---->  lineBreak=[<orca.speech_generator.LineBreak object at 0xb3c9b94c>]
GENERATION  TIME: 0.0003  ---->  accelerator=[]
GENERATION  TIME: 0.0005  ---->  newNodeLevel=[]
GENERATION  TIME: 0.0004  ---->  unselectedCell=[]
GENERATION  TIME: 0.0003  ---->  tutorial=[]
COMPLETION  TIME: 0.0073
generate speech results:
  Hozzáadás…
  {'rate': 90.0, 'family': {'locale': 'hu', 'name': 'hungarian'}, 'average-pitch': 3.0}
  gomb
  {}
  <orca.speech_generator.Pause object at 0xb3cab92c>
  Alt+A
  {}
  <orca.speech_generator.LineBreak object at 0xb3c9b94c>
SPEECH OUTPUT: 'Hozzáadás… gomb.'
SPEECH OUTPUT: 'Alt+A'
TOTAL PROCESSING TIME: 0.0915

I hear only entire dialog label related informations when I perss ALT+TAB key combination until I not jump again the accounts related dialog after Orca spokened me the "Hozzáadás" button related informations.

Attila
Comment 5 Alejandro Piñeiro Iglesias (IRC: infapi00) 2014-02-27 11:11:56 UTC
Hi Attila,

(In reply to comment #3)
> This is interesting with you wrote the first issue related.
> If I launch Pidgin my Ubuntu 14.04 system (I using GNOME Shell classic session
> 3.10 version), I hear only the add button related informations.

I have the feeling that you posted this information in the wrong the bug. This bug is about evince, but here you talk about pidgin, and looking at the debug.out, it confirms it.
Comment 6 Hammer Attila 2014-02-27 11:16:31 UTC
Oops Joanie, very sorry.
I want morning add comment the previous day Pidgin related bug.
Comment 7 Joanmarie Diggs (IRC: joanie) 2014-03-03 18:28:58 UTC
Created attachment 270829 [details] [review]
Solution - Take 1

* Stop using GailTextUtil
* Borrow logic from Gtk+'s support for identifying text by boundary type
* Create a heuristic to distinguish soft returns from hard returns based
  on the rendered text in the EvView
* Strip out "\n" from the text returned for sentences

Carlos, thanks for the review and ACN on the previous patch. I didn't commit it because while I knew we would need to stop using GailTextUtil and that we'd want to rely largely upon the tried-and-tested logic found in Gtk+, I wasn't sure what we'd need to change. Now I think I have a better handle on it.

My assumptions:

1. For non-tagged PDFs, Poppler does not know if a line ends because it wrapped or because there is a hard return. Thus each line of text we get from Poppler will end with one -- and only one -- "\n"

2. The lack of an additional "\n" or other separator between blocks of text retrieved from Poppler is by design.

3. Because of 1 & 2, we have to handle this within Evince. And we don't want to change the non-a11y text in Evince any more than we want to in Poppler.

If those assumptions are indeed correct, the only safe way I can think of to solve this problem is to heuristically try to figure out in the accessibility code what is likely a soft return, and strip those "\n" out.

Long way of saying, I hate my patch but I don't have any better ideas. :( And this solution has been tested with quite a few test cases and seems to get the job done for the bulk of them.
Comment 8 Carlos Garcia Campos 2014-03-12 17:53:20 UTC
Review of attachment 270829 [details] [review]:

I wonder if we could use ev_page_cache_get_text_log_attrs like we currently do for caret navigation, and we avoid using a GtkTextBuffer as well since we were using it only to take advantage of gail utils.

::: libview/ev-view-accessible.c
@@ +258,3 @@
+	next_iter = gtk_text_iter_copy (iter);
+	if (!gtk_text_iter_forward_char (next_iter) || !gtk_text_iter_backward_char (prev_iter))
+		return FALSE;

You are leaking both iters here.

@@ +266,3 @@
+	if (!gtk_text_iter_starts_word (next_iter) &&
+	    (!gtk_text_iter_forward_char (next_iter) || !gtk_text_iter_starts_word (next_iter)))
+		return FALSE;

Ditto, and I think for every early return in this function.
Comment 9 Joanmarie Diggs (IRC: joanie) 2014-03-14 08:03:44 UTC
Created attachment 271831 [details] [review]
Stop using GtkTextBuffer

Note: This patch does not in any way fix the reported bug. It is to address the following:

(In reply to comment #8)
 
> I wonder if we could use ev_page_cache_get_text_log_attrs like we currently do
> for caret navigation, and we avoid using a GtkTextBuffer as well since we were
> using it only to take advantage of gail utils.

Seems we're using the GtkTextBuffer for that and a few other accessible text things.

But in answer to your question, based on the attached: Yes I think we can. Thanks for that tip! :)

While this patch doesn't fix the reported bug, it seems (fingers crossed) to not break any existing functionality and fixes the add_selection, set_selection, and remove_selection implementations. And it gets rid of the GtkTextBuffer which never seemed quite right. That's the good news. My concern is that it's a non-trivial change fairly close to the stable release. So.... Please let me know what you think about this. And in the meantime, I'll do a new fix for the original bug based on this GtkTextBuffer- and GtkTextIter-free world.
Comment 10 Joanmarie Diggs (IRC: joanie) 2014-03-14 15:07:23 UTC
Created attachment 271908 [details] [review]
WIP - Not yet ready for formal review

This fixes a stupid c&p error I made in the previous version and strips newlines out of the sentence presentation (for TTS prosody). It does not yet fix the main/reported bug from the OR. But I'm getting there.... :)
Comment 11 Joanmarie Diggs (IRC: joanie) 2014-03-14 21:31:02 UTC
Created attachment 271947 [details] [review]
proposed patch

* Stop using GtkTextBuffer: It was a hack more than a real solution
* Fix setting and clearing of selection via AtkText for the current page
* Strip newline chars out of the sentence strings: Newlines break TTS prosody
* Add some logic to heuristically distinguish soft and hard returns

Please review. Thanks!!
Comment 12 Carlos Garcia Campos 2014-03-17 11:12:29 UTC
Review of attachment 271947 [details] [review]:

I love to see GtkTextBuffer is gone. This LGTM in general, I trust you the heuristics are correct :-)

::: libview/ev-view-accessible.c
@@ +299,3 @@
 	widget = gtk_accessible_get_widget (GTK_ACCESSIBLE (text));
 	if (widget == NULL)
+		return g_strdup ("");

I prefer to return NULL in these cases

@@ +303,3 @@
+	view = EV_VIEW (widget);
+	if (!view->page_cache)
+		return g_strdup ("");

Ditto.

@@ +362,3 @@
+			if (*c == '\n')
+				*c = ' ';
+	        }

You can probably use g_strdelimit here.
Comment 13 Joanmarie Diggs (IRC: joanie) 2014-03-17 12:33:15 UTC
Comment on attachment 271947 [details] [review]
proposed patch

(In reply to comment #12)
> Review of attachment 271947 [details] [review]:
> 
> I love to see GtkTextBuffer is gone. This LGTM in general, I trust you the
> heuristics are correct :-)

I didn't enjoy removing it ;), but now that it's gone I will agree that I'm very glad it is. :)
 
> ::: libview/ev-view-accessible.c
> @@ +299,3 @@
>      widget = gtk_accessible_get_widget (GTK_ACCESSIBLE (text));
>      if (widget == NULL)
> +        return g_strdup ("");
> 
> I prefer to return NULL in these cases

Done.
 
> @@ +303,3 @@
> +    view = EV_VIEW (widget);
> +    if (!view->page_cache)
> +        return g_strdup ("");
> 
> Ditto.

Done.

> @@ +362,3 @@
> +            if (*c == '\n')
> +                *c = ' ';
> +            }
> 
> You can probably use g_strdelimit here.

Aha! I knew there had to be some sort of string replace functionality. Didn't see that one when I went looking. Sorry, thanks much for pointing it out, and done.

https://git.gnome.org/browse/evince/commit/?id=03afe275fabbb235f9697ad3274cde0c9e2b077b