Bug 551489 – API doc for atk_text_get_text_before/at/after_offset() are not consistent

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 551489 - API doc for atk_text_get_text_before/at/after_offset() are not consistent


Summary:	API doc for atk_text_get_text_before/at/after_offset() are not consistent


Status:	RESOLVED FIXED

Product:	atk
Classification:	Platform
Component:	atk
Version:	unspecified
Hardware:	Other All

Importance:	High major
Target Milestone:	---
Assigned To:	Li Yuan
QA Contact:	Li Yuan

URL:
Whiteboard:

Depends on:
Blocks:	638537

Reported:	2008-09-09 11:06 UTC by Evan Yan
Modified:	2011-05-30 06:16 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Patch updating the docs according to last comments (1.47 KB, patch) 2010-11-02 12:38 UTC, Fernando Herrera	none	Details \| Review
Patch updating the docs according to last comments (1.47 KB, patch) 2010-11-04 16:39 UTC, Fernando Herrera	committed	Details \| Review

Description Evan Yan 2008-09-09 11:06:18 UTC

Please describe the problem:
The API doc are at
http://library.gnome.org/devel/atk/stable/AtkText.html

The specifications of the APIs use some terms of "word start", "word end", "inside a word". I think we first need to clarify these terms (Sorry for my ignorance if the definitions are obvious).

Take this string for example,
"see a dog"  // the string
 0123456789   // the offset
As my understanding, offset 0, 4, 6 are the "word start"s and 3, 5, 9 are the "word end"s.
For offset 4, is it "inside a word" or "outside a word"?

Let's take offset 4 as "inside a word" for now, so that the below specification for atk_text_get_text_at_offset() makes sense.
"
If the boundary_type is ATK_TEXT_BOUNDARY_WORD_START the returned string is from the word start at or before the offset to the word start after the offset.

The returned string will contain the word at the offset if the offset is inside a word and will contain the word before the offset if the offset is not inside a word. 
"

But that makes the specification of atk_text_get_text_before_offset() self-conflict.
"
If the boundary_type is ATK_TEXT_BOUNDARY_WORD_START the returned string is from the word start before the word start before the offset to the word start before the offset.

The returned string will contain the word before the offset if the offset is inside a word and will contain the word before the word before the offset if the offset is not inside a word. 
"

Either way, there are some places in the specifications self-conflict.

Steps to reproduce:


Actual results:


Expected results:


Does this happen every time?


Other information:

Comment 1 Alexander Surkov 2008-09-16 03:45:38 UTC

We create automated test for Mozilla to test methods of text accessibles. So it's very important to get your opinion on Evan's issue. Any feedback? Thank you.

Comment 2 Pete Brunet 2008-09-16 15:42:28 UTC

IA2 doesn't have WORD_START/END, just WORD.  Using IBM Lotus Symphony as a data point, if you use "see a dog" and use IAText::textAt/After/BeforeOffset with  boundry type WORD you get

for offset 4
At: "a", 4, 5
After: "dog", 6, 9
Before: "see", 0, 3

for offset 5
At: "dog", 6, 9 <-- this is a bug, "" should be returned.
After: "dog", 6, 9
Before: "a", 4, 5

BTW, IA2's IAText::textAt/After/BeforeOffset has some typos.  This is the correct text and I will fix it in the IA2 IDL.

textAtOffset
The following sentence should be deleted:
For example, if text type is IA2_TEXT_BOUNDARY_WORD, then the complete word that is closest to and located before offset is returned.

textAfterOffset
The word "before" should be changed to "after" in this sentence:
For example, if text type is IA2_TEXT_BOUNDARY_WORD, then the complete word that is closest to and located before offset is returned.

Comment 3 Pete Brunet 2008-09-16 16:02:39 UTC

I see there is another bug in the IA2 IDL comments.  All three of IAText::textAt/After/BeforeOffset have this sentence:

"If the index is valid, but no suitable word (or other text type) is found, an empty text segment is returned."

That sentence should be removed.  The return value information is correct:

S_FALSE ...if there is nothing to return; [out] values are 0s and NULL respectively

Comment 4 Pete Brunet 2008-09-16 16:03:54 UTC

BTW, The IA2 docs are at:
http://www.linuxfoundation.org/~ptbrunet/ia2/docs/html/

Comment 5 Alexander Surkov 2008-09-17 07:24:31 UTC

IAccessible2 is simpler here (that's the god :)), ATK is more complicated. Since Gecko accessibility API is similar to ATK in these methods then first of all we need to clarify Evan's question (since he is reviewer of my patch of the https://bugzilla.mozilla.org/show_bug.cgi?id=452769). When we get this clarified then I will ensure we are correct with IA2 stuffs.

Comment 6 Li Yuan 2008-09-17 10:27:03 UTC

Currently I don't have time to look into this. Please refer to gail's code, there is implementation of text interface.

Comment 7 Alexander Surkov 2008-09-17 12:04:50 UTC

(In reply to comment #6)
> Currently I don't have time to look into this. Please refer to gail's code,
> there is implementation of text interface.
> 

Li, is gail's code 100% tested or something about that because firefox also has text interface implementation but I'm not sure it's 100% correct?

Comment 8 Pete Brunet 2008-09-17 16:22:28 UTC

In comment 2 above, when IA2::textAtOffset is requested for an offset on whitespace I indicated the return should be "".  That is wrong.  The return string should be a NULL pointer.  The returned offsets should be 0 and the HRESULT should be S_FALSE.  BTW, NVDA, which can't sense for an S_FALSE due to their Python infrastructure will sense for the NULL.

Comment 9 Li Yuan 2008-09-18 06:13:42 UTC

Add Brian to cc, for he may know more history of gail's text interface.

Comment 10 Alexander Surkov 2008-09-18 06:20:40 UTC

Here is mozilla bug https://bugzilla.mozilla.org/show_bug.cgi?id=452769 where we add automated tests for text interface. Also it would be great if you could find a time to look at it to check if our assumptions about text interface methods are correct.

Comment 11 yue.wang 2008-12-23 03:18:32 UTC

Evan, as you mentioned, there are "self-conflict"in the specification in these two sentences:

>"
>If the boundary_type is ATK_TEXT_BOUNDARY_WORD_START the returned string is
>from the word start at or before the offset to the word start after the offset.

>The returned string will contain the word at the offset if the offset is inside
>a word and will contain the word before the offset if the offset is not inside
>a word. 
>"

>"
>If the boundary_type is ATK_TEXT_BOUNDARY_WORD_START the returned string is
>from the word start before the word start before the offset to the word start
>before the offset.

>The returned string will contain the word before the offset if the offset is
>inside a word and will contain the word before the word before the offset if
>the offset is not inside a word. 
>"


And I read them word by word, I found a small problem in them. I will explain it by an example. If I made some mistake please figure out.

Take the text "many kids here" for an example:


1. for "atk_text_get_text_at_offset()" 

1.1 first half sentence:
>doc said "If the boundary_type is ATK_TEXT_BOUNDARY_WORD_START the returned string is
>from the word start at or before the offset to the word start after the offset."

for example, offset at 'k' in "kids". ("many kids here")
Situation: "from the word start at the offset to the word start after the offset"
"word start at the offset" is 'k'
"word start after the offset" is 'h'
Returned : "kids "

for example, offset at 'd' in "kids". ("many kids here")
Situation: "from the word start before the offset to the word start after the offset"
"word start before the offset" is 'k'
"word start after the offset" is 'h'
Returned : "kids "

for example, offset at '_'(blank) between "kids" and "here".("many kids here")
Situation: "from the word start before the offset to the word start after the offset"
"word start before the offset" is 'k'
"word start after the offset" is 'h'
Returned : "kids "

1.2 second half sentence:
>doc said "The returned string will contain the word at the offset if the offset is inside
>a word and will contain the word before the offset if the offset is not inside
>a word. "

for example, offset at 'k' in "kids",or offset at 'd' in "kids".("many kids here")
Situation: "contain the word at the offset if the offset is inside a word"
Returned : "kids " 

for example, offset at '_'(blank) between "kids" and "here".("many kids here")
Situation: "contain the word before the offset if the offset is not inside a word"
Returned : "kids "

2. for "atk_text_get_text_before_offset()" 

2.1 first half sentence:
>doc said "If the boundary_type is ATK_TEXT_BOUNDARY_WORD_START the returned string is
>from the word start before the word start before the offset to the word start
>before the offset."

for example, offset at 'd' in "kids". ("many kids here")
Situation: "FROM the word start before the word start before the offset TO the word start before the offset"
"the word start before the word start before the offset" is 'm'
"the word start before the offset" is 'k'
Returned : "many "

for example, offset at '_'(blank) between "kids" and "here". ("many kids here")
Situation: "FROM the word start before the word start before the offset TO the word start before the offset"
"the word start before the word start before the offset" is 'm'
"the word start before the offset" is 'k'
Returned : "many "

2.2 second half sentence:
>doc said "The returned string will contain the word before the offset if the offset is
>inside a word and will contain the word before the word before the offset if
>the offset is not inside a word. "

for example, offset at 'k' in "kids". ("many kids here")
Situation: "contain the word before the offset if the offset is inside a word"
Returned : "many "

for example, offset at 'd' in "kids". ("many kids here")
Situation: "contain the word before the offset if the offset is inside a word"
Returned : "many "

for example, offset at '_'(blank) between "kids" and "here".
Situation: "contain the word before the word before the offset if the offset is not inside a word."
Returned : "many"

2.3 problems
If there is a problems, I think is in 2.1 (first half sentence of atk_text_get_text_before_offset()).
It didn't specify the offset at "word start". For example, offset at 'k' in "kids".
According to first half sentence, returns the word before "many", but according to second half sentence,
returns the word "many".

So, in my opinion, the first half sentence should be "If the boundary_type is ATK_TEXT_BOUNDARY_WORD_START the returned string is from the word start before the word start before or at the offset to the word start before or at the offset." (add "or at" like other two functions: "atk_text_get_text_at_offset ()"and "atk_text_get_text_after_offset ()")

In a word, no problems in doc of atk_text_get_text_at_offset(), for atk_text_get_text_before_offset() missed "or at".
Am i right, Evan?

Comment 12 Evan Yan 2009-01-05 14:45:31 UTC

Thanks for looking into this.

You're right. However, the problem is not so simple.

You fix is applicable to this case. But there are other cases. There are interfaces atk_text_get_text_before/at/after_offset(), and each interface can have different boundaries as its argument. Please go over the API docs of all the interfaces, and examine different cases carefully. See whether we can make sure all the docs consistent, and either two interfaces won't have conflict definition.

Comment 13 Alexander Surkov 2009-02-24 09:15:23 UTC

It would be really nice to define start word, end word, inside word and outside word offsets. I'll try, please fix me if I'm wrong.

start word offset - the offset where word starts, i.e. offset of its first letter, in 'hello world' example start word offsets are 0 (letter 'h' of 'hello' word) and 6 (letter 'w' of 'world' word)

end word offset - the offset where word was ended, i.e. offset immediately after of its last letter, in 'hello word' example end word offsets are 5 (' ' blank after "hello" word) and 10 (the end of 'hello word' string).

inside word offset - the offset equals or bigger than start offset but strictly lesser than end offset of the same word. In the case of 'hello world' examples inside word offsets are [0, 4] ("hello" word), and [6, 9] ("world" word).

outside word offset - the offset equals or bigger than end offset of one word and strictly lesser than the start offset of next word. In the case of "hello world" outside word offsets are 5 (' ' (blank) symbol after 'hello' word) and 10 (the end of "hello word" string).

If this sounds correct then I can see one doc error of atk_text_get_text_after_offset () function. Let's consider example "hello my friend", offset is 5 (' ' (blank) symbol after "hello" word).

"If the boundary_type is ATK_TEXT_BOUNDARY_WORD_END the returned string is from the word end at or after the offset to the next work end."

the result is " my" because the given offset at the word end offset.

The returned string will contain the word after the offset if the offset is inside a word and will contain the word after the word after the offset if the offset is not inside a word. 

the result is " firend" because offset is not inside a word.

Comment 14 yue.wang 2009-02-25 02:17:23 UTC

Thx for your comments:

> end word offset - the offset where word was ended, i.e. offset immediately
> after of its last letter, in 'hello word' example end word offsets are 5 (' '
> blank after "hello" word) and 10 (the end of 'hello word' string).
 
In my opinion, end word offset in example 'hello word' are 4-'o',9-'d'.

>"If the boundary_type is ATK_TEXT_BOUNDARY_WORD_END the returned string is from
>the word end at or after the offset to the next work end."

>the result is " my" because the given offset at the word end offset.

So, here, the result is "friend",because "from the word end after the offset to the next work end".

Please fix me if I'm wrong.  :-)

Comment 15 Alexander Surkov 2009-02-25 02:45:31 UTC

Thank you for quick reply.

(In reply to comment #14)

> In my opinion, end word offset in example 'hello word' are 4-'o',9-'d'.

Ok, I thought about that but next phrase sounds strange for me:

"If the boundary_type is ATK_TEXT_BOUNDARY_WORD_END the returned string is from
the word end at or after the offset to the next work end."

If word end offset is 'o' in 'hello world' then "string from the word end" should start from 'o' because "string from the word start" includes 'h' in 'hello world'. So we should get 'o worl' for string 'hello world' at 4-'o' offset.

Comment 16 Alexander Surkov 2009-02-25 02:49:04 UTC

(In reply to comment #15)

> If word end offset is 'o' in 'hello world' then "string from the word end"
> should start from 'o' because "string from the word start" includes 'h' in
> 'hello world'. So we should get 'o worl' for string 'hello world' at 4-'o'
> offset.
> 

I guess 'o world' if "to" in "to the next word end" is inclusive.

Comment 17 yue.wang 2009-02-25 03:18:06 UTC

Yes, Alexander, you are right. I thought about that I made a mistake.

Comment 18 Alexander Surkov 2009-02-25 03:32:20 UTC

So I think atk_text_get_text_after_offset () with ATK_TEXT_BOUNDARY_WORD_END is inconsistent.

"If the boundary_type is ATK_TEXT_BOUNDARY_WORD_END the returned string is from the word end at or after the offset to the next work end."

here the word "at" should be removed to make correspond it with this sentence:

"The returned string will contain the word after the offset if the offset is inside a word and will contain the word after the word after the offset if the offset is not inside a word. "

so that we get word "world" in string "hello my world" for offset 5=' ' (blank after 'hello' word) in both cases. Sounds right?

Comment 19 Li Yuan 2010-03-11 08:55:14 UTC

(In reply to comment #13)
> It would be really nice to define start word, end word, inside word and outside
> word offsets. I'll try, please fix me if I'm wrong.
> 
> start word offset - the offset where word starts, i.e. offset of its first
> letter, in 'hello world' example start word offsets are 0 (letter 'h' of
> 'hello' word) and 6 (letter 'w' of 'world' word)
> 
> end word offset - the offset where word was ended, i.e. offset immediately
> after of its last letter, in 'hello word' example end word offsets are 5 (' '
> blank after "hello" word) and 10 (the end of 'hello word' string).
> 
> inside word offset - the offset equals or bigger than start offset but strictly
> lesser than end offset of the same word. In the case of 'hello world' examples
> inside word offsets are [0, 4] ("hello" word), and [6, 9] ("world" word).
> 
> outside word offset - the offset equals or bigger than end offset of one word
> and strictly lesser than the start offset of next word. In the case of "hello
> world" outside word offsets are 5 (' ' (blank) symbol after 'hello' word) and
> 10 (the end of "hello word" string).
> 
> If this sounds correct then I can see one doc error of
> atk_text_get_text_after_offset () function. Let's consider example "hello my
> friend", offset is 5 (' ' (blank) symbol after "hello" word).

This will make doc of atk_text_get_text_at_offset wrong too.

Seems both doc of atk_text_get_text_at_offset and doc of atk_text_get_text_after_offset assume offset 5 of "hello world" is inside the word "hello". But doc of atk_text_get_text_before_offset think offset 5 is outside the word. So my suggestion is to change the doc of atk_text_get_text_before_offset.

Change "If the boundary_type is ATK_TEXT_BOUNDARY_WORD_END the returned string is from the word end before the word end at or before the offset to the word end at or before the offset." to "If the boundary_type is ATK_TEXT_BOUNDARY_WORD_END the returned string is from the word end before the word end before the offset to the word end before the offset."

Comment 20 Alexander Surkov 2010-03-11 09:47:57 UTC

Li, I'm still not sure in terms definitions what makes me read the doc, for example, as "if start offset is one thing then ... or if the start offset is other thing then ...". Could you please give definitions of the terms? It will help much reading documentation.

Comment 21 Li Yuan 2010-03-12 03:47:32 UTC

Let's take "see a dog" as an example.

Offset 0, 4, 6 are word starts. Offset 3, 5, 9 are word ends. Both word start and word end are "inside a word".

Comment 22 Alexander Surkov 2010-03-12 03:56:40 UTC

So every offset in this example is inside a word. The offset can be outside any word iif sequence of more than one whitespace (non word character) is encountered. Also this should mean the statement "if the offset is inside a word" widely used in documentation is always true excluding the case of whitespace sequences. Sounds right?

Comment 23 Li Yuan 2010-03-12 04:07:26 UTC

Yes. Both doc of atk_text_get_text_at_offset and doc of atk_text_get_text_after_offset assume so.

We need to change the doc of atk_text_get_text_before_offset from
"If the boundary_type is ATK_TEXT_BOUNDARY_WORD_START the returned string is from the word start before the word start before the offset to the word start before the offset.

The returned string will contain the word before the offset if the offset is inside a word and will contain the word before the word before the offset if the offset is not inside a word.

If the boundary_type is ATK_TEXT_BOUNDARY_WORD_END the returned string is from the word end before the word end at or before the offset to the word end at or before the offset.

The returned string will contain the word before the offset if the offset is inside a word or if the offset is not inside a word. "

to

"If the boundary_type is ATK_TEXT_BOUNDARY_WORD_START the returned string is from the word start before the word start before or at the offset to the word start before or at the offset.

The returned string will contain the word before the offset if the offset is inside a word and will contain the word before the word before the offset if the offset is not inside a word.

If the boundary_type is ATK_TEXT_BOUNDARY_WORD_END the returned string is from the word end before the word end before the offset to the word end before the offset.

The returned string will contain the word before the offset if the offset is inside a word or if the offset is not inside a word. "

Comment 24 Fernando Herrera 2010-11-02 12:38:02 UTC

Created attachment 173693 [details] [review]
Patch updating the docs according to last comments

Comment 25 Fernando Herrera 2010-11-04 16:39:26 UTC

Created attachment 173833 [details] [review]
Patch updating the docs according to last comments

Fix the commit message from the previous patch

Comment 26 Li Yuan 2010-11-18 08:50:34 UTC

Review of attachment 173833 [details] [review]:

It would be great if you can change the doc for atk_text_get_text_after_offset at the same time.

Comment 27 Fernando Herrera 2010-12-09 00:07:06 UTC

what changes are needed for atk_text_get_text_after_offset?

Comment 28 André Klapper 2011-01-22 14:16:23 UTC

Li: Ping - can you please answer comment 27?

Comment 29 Li Yuan 2011-01-24 05:53:06 UTC

Sorry, I mean before_offset. The changes are similar to at_offset:

* If the boundary_type is ATK_TEXT_BOUNDARY_WORD_START the returned string
* is from the word start before the word start before the offset to 
* the word start before the offset.

to

* If the boundary_type is ATK_TEXT_BOUNDARY_WORD_START the returned string
* is from the word start before the word start at or before the offset to 
* the word start at or before the offset.

and

* If the boundary_type is ATK_TEXT_BOUNDARY_WORD_END the returned string
* is from the word end before the word end at or before the offset to the 
* word end at or before the offset.

to

* If the boundary_type is ATK_TEXT_BOUNDARY_WORD_END the returned string
* is from the word end before the word end before the offset to the 
* word end before the offset.

Comment 30 André Klapper 2011-03-04 12:08:45 UTC

Fernando: Time to update the patch according to Li's last comment?

Comment 31 Joanmarie Diggs (IRC: joanie) 2011-04-29 18:42:39 UTC

(In reply to comment #30)
> Fernando: Time to update the patch according to Li's last comment?

Fer, ping?

Comment 32 Li Yuan 2011-05-30 06:15:52 UTC

Review of attachment 173833 [details] [review]:

Committed.

Comment 33 Li Yuan 2011-05-30 06:15:58 UTC

Review of attachment 173833 [details] [review]:

Committed.