GNOME Bugzilla – Bug 725003
Accessible text at offset for sentence boundaries is incorrect for documents with wrapped text
Last modified: 2014-03-17 12:33:39 UTC
Created attachment 270054 [details] test case Steps to reproduce: 1. Load the attached test case in Evince and enable caret navigation (F7) 2. Load the attached accessible-event listener in a terminal 3. Place the caret in the text of the document Expected results: The sentences would not be split up by newline characters. Actual results: The sentences are split up by newline characters as shown below. Sentence at offset 0 <This is a document which was created in LibreOffice\n> Sentence at offset 52 <Writer. > Sentence at offset 60 <The text is in a single paragraph with no newline\n> Sentence at offset 110 <characters added to it. > Sentence at offset 134 <It is also in large print so that I do\n> Sentence at offset 173 <not have to type a lot to get text to wrap.>
Created attachment 270055 [details] accessible-event listener
Created attachment 270303 [details] [review] Stop using GailTextUtil This doesn't fix the bug. But fixing the bug requires we stop using GailTextUtil so we can handle the Evince-specific case of there being newline chars inserted in the text where we (accessibility) would not normally expect to find newline chars.
This is interesting with you wrote the first issue related. If I launch Pidgin my Ubuntu 14.04 system (I using GNOME Shell classic session 3.10 version), I hear only the add button related informations. I using at-spi2-core 2.10.2 version. What other components version number is important this problem related? I generated a shorter debug.out file with concentrate only the first issue, now I used hungarian locale. I will attaching the short debug.out file. Attila
Created attachment 270447 [details] Shorter debug.out file What happening my debug.out file when I launched Pidgin? A possible interesting part: DEQUEUED OBJECT:PROPERTY-CHANGE:ACCESSIBLE-NAME <---------- vvvvv PROCESS OBJECT EVENT object:property-change:accessible-name vvvvv OBJECT EVENT: object:property-change:accessible-name detail=(0,0,) app.name='gnome-shell' name='None' role='label' state='' relations='' Script for event: gnome-shell (module=orca.scripts.apps.gnome-shell.script) TOTAL PROCESSING TIME: 0.0020 ^^^^^ PROCESS OBJECT EVENT object:property-change:accessible-name ^^^^^ DEQUEUED OBJECT:PROPERTY-CHANGE:ACCESSIBLE-NAME <---------- vvvvv PROCESS OBJECT EVENT object:property-change:accessible-name vvvvv OBJECT EVENT: object:property-change:accessible-name detail=(0,0,) app.name='gnome-shell' name='None' role='label' state='' relations='' Script for event: gnome-shell (module=orca.scripts.apps.gnome-shell.script) TOTAL PROCESSING TIME: 0.0021 ^^^^^ PROCESS OBJECT EVENT object:property-change:accessible-name ^^^^^ DEQUEUED OBJECT:PROPERTY-CHANGE:ACCESSIBLE-NAME <---------- vvvvv PROCESS OBJECT EVENT object:property-change:accessible-name vvvvv OBJECT EVENT: object:property-change:accessible-name detail=(0,0,) app.name='gnome-shell' name='None' role='label' state='' relations='' Script for event: gnome-shell (module=orca.scripts.apps.gnome-shell.script) TOTAL PROCESSING TIME: 0.0021 ^^^^^ PROCESS OBJECT EVENT object:property-change:accessible-name ^^^^^ DEQUEUED OBJECT:STATE-CHANGED:SHOWING <---------- vvvvv PROCESS OBJECT EVENT object:state-changed:showing vvvvv OBJECT EVENT: object:state-changed:showing detail=(1,0,0) app.name='gnome-shell' name='None' role='panel' state='enabled focusable sensitive showing visible' relations='' Script for event: gnome-shell (module=orca.scripts.apps.gnome-shell.script) TOTAL PROCESSING TIME: 0.0017 ^^^^^ PROCESS OBJECT EVENT object:state-changed:showing ^^^^^ DEQUEUED OBJECT:STATE-CHANGED:ACTIVE <---------- vvvvv PROCESS OBJECT EVENT object:state-changed:active vvvvv OBJECT EVENT: object:state-changed:active detail=(0,0,0) app.name='Pidgin' name='Partnerlista' role='frame' state='enabled resizable sensitive showing visible' relations='' Script for event: Pidgin (module=orca.scripts.apps.pidgin.script) TOTAL PROCESSING TIME: 0.0016 ^^^^^ PROCESS OBJECT EVENT object:state-changed:active ^^^^^ DEQUEUED OBJECT:STATE-CHANGED:FOCUSED <---------- vvvvv PROCESS OBJECT EVENT object:state-changed:focused vvvvv OBJECT EVENT: object:state-changed:focused detail=(1,0,0) app.name='Pidgin' name='Hozzáadás…' role='push button' state='enabled focusable focused sensitive showing visible' relations='' Script for event: Pidgin (module=orca.scripts.apps.pidgin.script) mapped Pidgin to pidgin Looking for app-settings.pidgin.py Could not import app-settings.pidgin.py ACTIVE SCRIPT: Pidgin (module=orca.scripts.apps.pidgin.script) (reason=Event source claimed focus.) LOCUS OF FOCUS: app='Pidgin' name='Hozzáadás…' role='push button' event='object:state-changed:focused' PREPARATION TIME: 0.0010 generate braille for focused app.name='Pidgin' name='Hozzáadás…' role='push button' state='enabled focusable focused sensitive showing visible' relations='' (args={'recursing': True, 'role': <enum ATSPI_ROLE_PUSH_BUTTON of type AtspiRole>, 'formatType': 'focused', 'mode': 'braille'}) using '(includeContext and (ancestors + (rowHeader and [Region(" " + asString(rowHeader))]) + (columnHeader and [Region(" " + asString(columnHeader))]) + (radioButtonGroup and [Region(" " + asString(radioButtonGroup))]) + [Region(" ")]) or []) + [Component(obj, asString((labelAndName or description) + expandableState + roleName))] + (nodeLevel and [Region(" " + asString(nodeLevel))])' GENERATION TIME: 0.0006 ----> includeContext=True PREPARATION TIME: 0.0009 generate braille for focused app.name='Pidgin' name='Fiókok' role='dialog' state='active enabled resizable sensitive showing visible' relations='' (args={'recursing': True, 'role': <enum ATSPI_ROLE_DIALOG of type AtspiRole>, 'formatType': 'focused', 'mode': 'braille', 'includeContext': False}) using '[Component(obj, asString(label + displayedText + value + roleName + required))]' GENERATION TIME: 0.0006 ----> label=[] GENERATION TIME: 0.0005 ----> displayedText=['Fiókok'] GENERATION TIME: 0.0006 ----> value=[''] GENERATION TIME: 0.0007 ----> roleName=['párbeszédablak'] GENERATION TIME: 0.0003 ----> required=[] COMPLETION TIME: 0.0044 generate braille results: Component: '$fi9kok p"rbe51dablak', 1 PREPARATION TIME: 0.0008 generate braille for focused app.name='Pidgin' name='Pidgin' role='application' state='' relations='' (args={'recursing': True, 'role': <enum ATSPI_ROLE_APPLICATION of type AtspiRole>, 'formatType': 'focused', 'mode': 'braille', 'includeContext': False}) using '[Component(obj, asString(label + displayedText + value + roleName + required))]' GENERATION TIME: 0.0006 ----> label=[] GENERATION TIME: 0.0003 ----> displayedText=['Pidgin'] GENERATION TIME: 0.0005 ----> value=[''] GENERATION TIME: 0.0006 ----> roleName=['alkalmazás'] GENERATION TIME: 0.0003 ----> required=[] COMPLETION TIME: 0.0038 generate braille results: Component: '$pidgin alkalma2"s', 1 GENERATION TIME: 0.0108 ----> ancestors=[<orca.braille.Component object at 0xb2f4182c>, <orca.braille.Region object at 0xb2f418ec>, <orca.braille.Component object at 0xb2ed684c>] GENERATION TIME: 0.0007 ----> rowHeader=[] GENERATION TIME: 0.0005 ----> columnHeader=[] GENERATION TIME: 0.0004 ----> radioButtonGroup=[] GENERATION TIME: 0.0018 ----> labelAndName=['Hozzáadás…'] GENERATION TIME: 0.0007 ----> expandableState=[] GENERATION TIME: 0.0009 ----> roleName=['gomb'] GENERATION TIME: 0.0014 ----> nodeLevel=[] COMPLETION TIME: 0.0197 generate braille results: Component: '$pidgin alkalma2"s', 1 Region: ' ', 0 Component: '$fi9kok p"rbe51dablak', 1 Region: ' ', 0 Component: '$ho22"ad"s... gomb', 1 BRAILLE LINE: '$pidgin alkalma2"s $fi9kok p"rbe51dablak $ho22"ad"s... gomb' VISIBLE: '$pidgin alkalma2"s $fi9kok p"rbe51dablak $ho22"ad"s... gomb', cursor=43 PREPARATION TIME: 0.0012 generate speech for unfocused app.name='Pidgin' name='Hozzáadás…' role='push button' state='enabled focusable focused sensitive showing visible' relations='' (args={'recursing': True, 'role': <enum ATSPI_ROLE_PUSH_BUTTON of type AtspiRole>, 'formatType': 'unfocused', 'priorObj': None, 'mode': 'speech'}) using 'oldAncestors + newAncestors + labelAndName + expandableState + roleName + availability + (mnemonic and (pause + mnemonic + lineBreak) or []) + accelerator + newNodeLevel + unselectedCell + (tutorial and (pause + tutorial) or [])' GENERATION TIME: 0.0003 ----> oldAncestors=[] GENERATION TIME: 0.0003 ----> newAncestors=[] GENERATION TIME: 0.0003 ----> labelAndName=['Hozzáadás…', {'rate': 90.0, 'family': {'locale': 'hu', 'name': 'hungarian'}, 'average-pitch': 3.0}] GENERATION TIME: 0.0005 ----> expandableState=[] GENERATION TIME: 0.0006 ----> roleName=['gomb', {}] GENERATION TIME: 0.0004 ----> availability=[] script_utilities.getKeyBinding: ['', 'Alt+A', ''] GENERATION TIME: 0.0010 ----> mnemonic=['Alt+A', {}] GENERATION TIME: 0.0003 ----> pause=[<orca.speech_generator.Pause object at 0xb3cab92c>] GENERATION TIME: 0.0002 ----> lineBreak=[<orca.speech_generator.LineBreak object at 0xb3c9b94c>] GENERATION TIME: 0.0003 ----> accelerator=[] GENERATION TIME: 0.0005 ----> newNodeLevel=[] GENERATION TIME: 0.0004 ----> unselectedCell=[] GENERATION TIME: 0.0003 ----> tutorial=[] COMPLETION TIME: 0.0073 generate speech results: Hozzáadás… {'rate': 90.0, 'family': {'locale': 'hu', 'name': 'hungarian'}, 'average-pitch': 3.0} gomb {} <orca.speech_generator.Pause object at 0xb3cab92c> Alt+A {} <orca.speech_generator.LineBreak object at 0xb3c9b94c> SPEECH OUTPUT: 'Hozzáadás… gomb.' SPEECH OUTPUT: 'Alt+A' TOTAL PROCESSING TIME: 0.0915 I hear only entire dialog label related informations when I perss ALT+TAB key combination until I not jump again the accounts related dialog after Orca spokened me the "Hozzáadás" button related informations. Attila
Hi Attila, (In reply to comment #3) > This is interesting with you wrote the first issue related. > If I launch Pidgin my Ubuntu 14.04 system (I using GNOME Shell classic session > 3.10 version), I hear only the add button related informations. I have the feeling that you posted this information in the wrong the bug. This bug is about evince, but here you talk about pidgin, and looking at the debug.out, it confirms it.
Oops Joanie, very sorry. I want morning add comment the previous day Pidgin related bug.
Created attachment 270829 [details] [review] Solution - Take 1 * Stop using GailTextUtil * Borrow logic from Gtk+'s support for identifying text by boundary type * Create a heuristic to distinguish soft returns from hard returns based on the rendered text in the EvView * Strip out "\n" from the text returned for sentences Carlos, thanks for the review and ACN on the previous patch. I didn't commit it because while I knew we would need to stop using GailTextUtil and that we'd want to rely largely upon the tried-and-tested logic found in Gtk+, I wasn't sure what we'd need to change. Now I think I have a better handle on it. My assumptions: 1. For non-tagged PDFs, Poppler does not know if a line ends because it wrapped or because there is a hard return. Thus each line of text we get from Poppler will end with one -- and only one -- "\n" 2. The lack of an additional "\n" or other separator between blocks of text retrieved from Poppler is by design. 3. Because of 1 & 2, we have to handle this within Evince. And we don't want to change the non-a11y text in Evince any more than we want to in Poppler. If those assumptions are indeed correct, the only safe way I can think of to solve this problem is to heuristically try to figure out in the accessibility code what is likely a soft return, and strip those "\n" out. Long way of saying, I hate my patch but I don't have any better ideas. :( And this solution has been tested with quite a few test cases and seems to get the job done for the bulk of them.
Review of attachment 270829 [details] [review]: I wonder if we could use ev_page_cache_get_text_log_attrs like we currently do for caret navigation, and we avoid using a GtkTextBuffer as well since we were using it only to take advantage of gail utils. ::: libview/ev-view-accessible.c @@ +258,3 @@ + next_iter = gtk_text_iter_copy (iter); + if (!gtk_text_iter_forward_char (next_iter) || !gtk_text_iter_backward_char (prev_iter)) + return FALSE; You are leaking both iters here. @@ +266,3 @@ + if (!gtk_text_iter_starts_word (next_iter) && + (!gtk_text_iter_forward_char (next_iter) || !gtk_text_iter_starts_word (next_iter))) + return FALSE; Ditto, and I think for every early return in this function.
Created attachment 271831 [details] [review] Stop using GtkTextBuffer Note: This patch does not in any way fix the reported bug. It is to address the following: (In reply to comment #8) > I wonder if we could use ev_page_cache_get_text_log_attrs like we currently do > for caret navigation, and we avoid using a GtkTextBuffer as well since we were > using it only to take advantage of gail utils. Seems we're using the GtkTextBuffer for that and a few other accessible text things. But in answer to your question, based on the attached: Yes I think we can. Thanks for that tip! :) While this patch doesn't fix the reported bug, it seems (fingers crossed) to not break any existing functionality and fixes the add_selection, set_selection, and remove_selection implementations. And it gets rid of the GtkTextBuffer which never seemed quite right. That's the good news. My concern is that it's a non-trivial change fairly close to the stable release. So.... Please let me know what you think about this. And in the meantime, I'll do a new fix for the original bug based on this GtkTextBuffer- and GtkTextIter-free world.
Created attachment 271908 [details] [review] WIP - Not yet ready for formal review This fixes a stupid c&p error I made in the previous version and strips newlines out of the sentence presentation (for TTS prosody). It does not yet fix the main/reported bug from the OR. But I'm getting there.... :)
Created attachment 271947 [details] [review] proposed patch * Stop using GtkTextBuffer: It was a hack more than a real solution * Fix setting and clearing of selection via AtkText for the current page * Strip newline chars out of the sentence strings: Newlines break TTS prosody * Add some logic to heuristically distinguish soft and hard returns Please review. Thanks!!
Review of attachment 271947 [details] [review]: I love to see GtkTextBuffer is gone. This LGTM in general, I trust you the heuristics are correct :-) ::: libview/ev-view-accessible.c @@ +299,3 @@ widget = gtk_accessible_get_widget (GTK_ACCESSIBLE (text)); if (widget == NULL) + return g_strdup (""); I prefer to return NULL in these cases @@ +303,3 @@ + view = EV_VIEW (widget); + if (!view->page_cache) + return g_strdup (""); Ditto. @@ +362,3 @@ + if (*c == '\n') + *c = ' '; + } You can probably use g_strdelimit here.
Comment on attachment 271947 [details] [review] proposed patch (In reply to comment #12) > Review of attachment 271947 [details] [review]: > > I love to see GtkTextBuffer is gone. This LGTM in general, I trust you the > heuristics are correct :-) I didn't enjoy removing it ;), but now that it's gone I will agree that I'm very glad it is. :) > ::: libview/ev-view-accessible.c > @@ +299,3 @@ > widget = gtk_accessible_get_widget (GTK_ACCESSIBLE (text)); > if (widget == NULL) > + return g_strdup (""); > > I prefer to return NULL in these cases Done. > @@ +303,3 @@ > + view = EV_VIEW (widget); > + if (!view->page_cache) > + return g_strdup (""); > > Ditto. Done. > @@ +362,3 @@ > + if (*c == '\n') > + *c = ' '; > + } > > You can probably use g_strdelimit here. Aha! I knew there had to be some sort of string replace functionality. Didn't see that one when I went looking. Sorry, thanks much for pointing it out, and done. https://git.gnome.org/browse/evince/commit/?id=03afe275fabbb235f9697ad3274cde0c9e2b077b