After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 783245 - Invalid unref at atk_gobject_accessible_object_gone_cb()
Invalid unref at atk_gobject_accessible_object_gone_cb()
Status: RESOLVED NOTABUG
Product: atk
Classification: Platform
Component: atk
2.22.x
Other Linux
: Normal critical
: ---
Assigned To: ATK maintainer(s)
ATK maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2017-05-30 17:21 UTC by Milan Crha
Modified: 2019-09-18 13:40 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
life-cycle of one such object (26.62 KB, text/plain)
2017-05-30 17:21 UTC, Milan Crha
  Details
proposed patch (551 bytes, patch)
2017-05-30 17:25 UTC, Milan Crha
none Details | Review

Description Milan Crha 2017-05-30 17:21:11 UTC
Created attachment 352894 [details]
life-cycle of one such object

There seems to be an invalid g_object_unref() call in atk_gobject_accessible_object_gone_cb(), causing crash in at-spi2-atk's expiry_func(), as shown in the below backtrace, due to dereferencing something already freed. The attached is a log showing life cycle with all ref/unref calls on it of one such object, down the the crash point (the line numbers do not match, due to added debug prints all around the code).

I think all the ref/unref calls are correct except the one in the atk_gobject_accessible_object_gone_cb().

The at-spi2-atk backtrace is (unfortunately no line numbers):

Thread 1 (Thread 0x7ff91b3dca80 (LWP 27590))

  • #0 g_type_check_instance_is_fundamentally_a
    from /lib64/libgobject-2.0.so.0
  • #1 g_object_weak_unref
    from /lib64/libgobject-2.0.so.0
  • #2 expiry_func
    from /lib64/libatk-bridge-2.0.so.0
  • #3 g_timeout_dispatch
    from /lib64/libglib-2.0.so.0
  • #4 g_main_context_dispatch
    from /lib64/libglib-2.0.so.0
  • #5 g_main_context_iterate.isra.21
    from /lib64/libglib-2.0.so.0
  • #6 g_main_loop_run
    from /lib64/libglib-2.0.so.0
  • #7 gtk_main
    from /lib64/libgtk-3.so.0
  • #8 main

Comment 1 Milan Crha 2017-05-30 17:25:03 UTC
Created attachment 352895 [details] [review]
proposed patch

This change seems to fix the crash, caused by ref/unref imbalance.
Comment 2 Rui Matos 2017-05-31 12:44:29 UTC
(In reply to Milan Crha from comment #0)
> I think all the ref/unref calls are correct except the one in the
> atk_gobject_accessible_object_gone_cb().

I'm not sure. Looking at the initial version of this code (from commit 273d1a64b904bc019c3c8c5cd4e7ad44210f795d) it seems to me that the original life cycle of AtkGObjectAccessible instances was to be completely tied to the main GObject, i.e. instances are created on demand in atk_gobject_accessible_for_object() and the unref() you're removing here was supposed to drop the last reference on the accessible when the main gobject was finalized.

Note how atk_gobject_accessible_for_object() is annotated with Returns: (transfer none), i.e. the accessible instance does *not* belong to the caller. IMO there is a layer above this code that is assuming it owns this reference.
Comment 3 Milan Crha 2017-05-31 14:02:34 UTC
That's very old commit, being done long before at-spi2-atk existed. When at-spi2-atk had been added, especially with its expiry_func(), the rules around this could change. As far as I can tell, this change corrects the crash and the last g_object_unref() in the expiry_func() is done with GObject::ref_count equal to 1, thus the object is properly freed as well (see the first attachment here).
Comment 4 Rui Matos 2017-05-31 16:13:24 UTC
(In reply to Milan Crha from comment #3)
> That's very old commit, being done long before at-spi2-atk existed. When
> at-spi2-atk had been added, especially with its expiry_func(), the rules
> around this could change.

Yes, maybe they did, but where's the evidence?

> As far as I can tell, this change corrects the
> crash and the last g_object_unref() in the expiry_func() is done with
> GObject::ref_count equal to 1, thus the object is properly freed as well
> (see the first attachment here).

Does this happen with any other AtkObject inherited types in evolution or just GalA11yETableColumnHeader ?

I noticed that other AtkObject inherited types in evolution provide a type specific AtkObjectFactory implementation but GalA11yETableColumnHeader doesn't. Could that be related?
Comment 5 Alejandro Piñeiro Iglesias (IRC: infapi00) 2017-06-01 06:46:23 UTC
(In reply to Rui Matos from comment #2)
> (In reply to Milan Crha from comment #0)
> > I think all the ref/unref calls are correct except the one in the
> > atk_gobject_accessible_object_gone_cb().
> 
> I'm not sure. Looking at the initial version of this code (from commit
> 273d1a64b904bc019c3c8c5cd4e7ad44210f795d) it seems to me that the original
> life cycle of AtkGObjectAccessible instances was to be completely tied to
> the main GObject, i.e. instances are created on demand in
> atk_gobject_accessible_for_object()

AFAIU, yes that was the idea.

> and the unref() you're removing here was
> supposed to drop the last reference on the accessible when the main gobject
> was finalized.



> Note how atk_gobject_accessible_for_object() is annotated with Returns:
> (transfer none), i.e. the accessible instance does *not* belong to the
> caller. IMO there is a layer above this code that is assuming it owns this
> reference.

Well, the accessible instance belongs to the base gobject. AFAIU, if at-spi2-atk ask for the accessible object, and store it somehow, it should increase the ref count.

We could try to ask Mike Gorse about that though.


(In reply to Rui Matos from comment #4)
> (In reply to Milan Crha from comment #3)
> > That's very old commit, being done long before at-spi2-atk existed. When
> > at-spi2-atk had been added, especially with its expiry_func(), the rules
> > around this could change.
> 
> Yes, maybe they did, but where's the evidence?

I doubt that the behaviour changed so much. CCing Mike Gorse just in case.

> > As far as I can tell, this change corrects the
> > crash and the last g_object_unref() in the expiry_func() is done with
> > GObject::ref_count equal to 1, thus the object is properly freed as well
> > (see the first attachment here).
> 
> Does this happen with any other AtkObject inherited types in evolution or
> just GalA11yETableColumnHeader ?
> 
> I noticed that other AtkObject inherited types in evolution provide a type
> specific AtkObjectFactory implementation but GalA11yETableColumnHeader
> doesn't. Could that be related?

I didn't follow the a11y implementation for Evolution at all, so I could be wrong, but one possible reason to some code using Atk factories and other not is that AtkObjectFactory usage has been discouraged. Initially it was created because at the time all the ATK implementations were modules (plugin), so it was handy. Now that it is not the case (on most cases), it is not needed burden. The only reason I didn't mark them as deprecated is that there are a lot of implementations using them, with no real plan to stop to use them.

It was even discussed if was needed or not before we stopped to use modules:
https://bugzilla.gnome.org/show_bug.cgi?id=647486

Note that on that old bug, we also discuss some complexities when using AtkGObjectAccessible.
Comment 6 Milan Crha 2017-06-01 07:26:26 UTC
I cannot speak about the factory, that code predates me and had not been seriously touched for years.

I see I forgot to give further explanation for the life-cycle attachment in comment #0. It shows where the offending object had been created and where each ref/unref happened to it. The "refs:X" means how many references the object had when the function had been called, that means "ref:1 with followed g_object_ref()" resulted in GObject::ref_count increased to 2 when the function finished. Similarly "refs:2 with followed g_object_unref()" resulted in GObject::ref_count decreased to 1 when the function finished.

For me, the first spi_leasing_take() is fine and does what you said it should, at-spi2-atk added its own reference to the object (thus it has ref_count=2 after the call):
obj:0x493a5a0 refs:1 bt:
	   at at g_object_ref() at gobject.c:3069
	   by spi_leasing_take() at accessible-leasing.c:250

That's confirmed with:
   spi_leasing_take: going to free object:0x493a5a0 (GalA11yETableColumnHeader) refs:2 with counter:1

The "counter" here means how many times the object is in the leasing queue.

After this impl_GetChildAtIndex() unrefs the reference.
obj:0x493a5a0 refs:2 bt:
	   at at g_object_unref() at gobject.c:3111
	   by impl_GetChildAtIndex() at accessible-adaptor.c:188

This is correct, because this is the place where the object had been created (see the very first backtrace). After this the object is kept alive in the spi leasing queue only, with only one reference - as expected.

After this begins the interesting part. Evolution has opened a composer window, while the main window is closed, thus freed. That makes evolution alive, but this header object is tight to the main window, thus it is also freed. It leads to call of
	   by atk_object_notify_state_change() at atkobject.c:1221
	   by atk_gobject_accessible_object_gone_cb() at atkgobjectaccessible.c:166
with only single reference, that one for spi_leasing queue.
This atk_object_notify_state_change() causes the leasing queue being invoked again, with:
   spi_leasing_take: adding an object 0x493a5a0 (GalA11yETableColumnHeader) which is already in the leasing queue; refs:2
obj:0x493a5a0 refs:2 bt:
	   at at g_object_ref() at gobject.c:3069
	   by spi_leasing_take() at accessible-leasing.c:250

and it adds the object to the leasing queue once again, even it is already there. Thus the counter also increased to two:
   spi_leasing_take: going to free object:0x493a5a0 (GalA11yETableColumnHeader) refs:3 with counter:2

When I skip all the boring ref/unref calls related to GObject processing in signals and so on, I get down to the end of the current atk_gobject_accessible_object_gone_cb() call:
obj:0x493a5a0 refs:2 bt:
	   at at g_object_unref() at gobject.c:3111
	   by atk_gobject_accessible_object_gone_cb() at atkgobjectaccessible.c:167

The current state of the spi leasing queue is that the object is there twice and the leasing queue added two references to it on its own. The above call shows that the unref() had been called with object's ref_count being 2, thus after the call the ref_count will be 1. That's wrong, that reference belongs to the leasing queue.

When the expiry timeout comes, the spi leasing queue makes the first unref of the object:
   expiry_func: going to unref:0x493a5a0 with counter in leasing hash:1
obj:0x493a5a0 refs:1 bt:
	   at at g_object_unref() at gobject.c:3111
	   by expiry_func() at accessible-leasing.c:143

which also frees the object:

gal_a11y_e_table_column_header_dispose: 0x493a5a0
      atk_gobject_accessible_dispose: atk-obj:0x493a5a0 acc-obj:(nil)
acc_leasing_object_freed_cb: obj:0x493a5a0
gal_a11y_e_table_column_header_finalize: 0x493a5a0

but the spi leasing queue still has one "instance" (now freed) of the object in itself, thus it results in the use-after-free issue:

   expiry_func: going to unref:0x493a5a0 with counter in leasing hash:0

(evolution:31409): GLib-GObject-CRITICAL **: g_object_unref: assertion 'G_IS_OBJECT (object)' failed

----------------------------------------------------------------------------

The theory is great, but the life-cycle log shows what the code does in reality. And it shows it in detail. I would not be able to describe what I see in any good way, thus I decided to share the log with you. My fault I didn't describe it this way earlier, I'm sorry for that.

I hope you better understand my point of view and the reasoning for the patch now.
Comment 7 Milan Crha 2019-09-17 16:32:22 UTC
I do not know what yet, but there's something fishy, most likely in Evolution (to be re-investigated). The proposed patch leaks the AtkObject in other applications (for instance in gnome-shell), thus it's not correct, which means my initial conclusion was wrong and Rui was right (comment #2, about the unref being there to actually free the AtkObject).
Comment 8 Milan Crha 2019-09-17 16:53:31 UTC
Maybe the implementation of eti_ref_child() at gal-a11y-e-table-item.c:332 is wrong. This function implements AtkObjectClass::ref_child, which should return a new reference to an existing object, but it returns a newly created object. That might be the missing reference in the log.

I would move this to evolution, but it's not possible, thus I'm at least closing it here, because it's not a bug on the atk side. I'm sorry for the false conclusions I made.
Comment 9 Milan Crha 2019-09-17 17:07:32 UTC
For the reference, the Evolution bug is:
https://gitlab.gnome.org/GNOME/evolution/issues/624
Comment 10 Alejandro Piñeiro Iglesias (IRC: infapi00) 2019-09-18 08:38:25 UTC
(In reply to Milan Crha from comment #7)
> I do not know what yet, but there's something fishy, most likely in
> Evolution (to be re-investigated). The proposed patch leaks the AtkObject in
> other applications (for instance in gnome-shell), thus it's not correct,
> which means my initial conclusion was wrong and Rui was right (comment #2,
> about the unref being there to actually free the AtkObject).

Yes, although I didn't check it with Evolution, we need to take into account that that code is used on other applications, so I find strange that no other is needing that change.

Having said so, now that you mentions gnome-shell, I made some git research as I remembered some gobjectaccessible specific handling, and I found this:

https://bugzilla.gnome.org/show_bug.cgi?id=738147

that was finally pushed as this commit:
https://gitlab.gnome.org/GNOME/gnome-shell/commit/5e5035a0f70e75e54d74dee346f877a65f2713d3

I didn't take a deep look on it, so I'm not saying that your problem can be fixed doing something like that, but perhaps that commit gives an idea here?

> I would move this to evolution, but it's not possible, thus I'm at least closing it here, because it's not a bug on the atk side. I'm sorry for the false conclusions I made.

No problem, atkgobjectaccessible ref handling is somewhat complex, and too much a black box unless you take a look to the code. In fact I have around a TODO about that class needing a update, not done to not break API or behaviour for existing implementation.

Thanks to being taking a look to that Evolution bug.

> For the reference, the Evolution bug is:
https://gitlab.gnome.org/GNOME/evolution/issues/624

FWIW, if in the end you find that the bug is on ATK. Would you mind to create a gitlab issue instead of re-opening this bug? These days I'm more on top of gitlab that bugzilla.
Comment 11 Milan Crha 2019-09-18 13:40:47 UTC
(In reply to Alejandro Piñeiro Iglesias (IRC: infapi00) from comment #10)
> FWIW, if in the end you find that the bug is on ATK. Would you mind to
> create a gitlab issue instead of re-opening this bug? These days I'm more on
> top of gitlab that bugzilla.

I already fixed it on the Evolution side. My initial misunderstanding caused the confusion. It was not obvious (to me) that AtkObjectClass::ref_child() really should return a new reference to an existing object, not a new object on its own, especially when it's a descendant of the AtkGObjectAccessible object. With that fixed (on the Evolution side) things work as expected. I even found some other issues when doing proper fix for this, which I fixed as well.

I'm also more on the gitlab, keeping bugzilla bugs for reference and as a longstanding ToDo pile.