GNOME Bugzilla – Bug 561885
Glib::Source - Want simpler ref counting and deletion logic, at least at ABI break
Last modified: 2016-12-03 09:33:44 UTC
Please describe the problem: I posted first bug 561884, but digging more I got into what seems to me, bigger other problems. Looking at the code of Glib::RefPtr and Glib::Source::~Source I think I saw something problematic so I ran the same simple C++ program under 'valgrind'. As expected, memory access problems (accessing the same C++ object which was already freed) Checking into the code of both Glib::Source and C GSource I think there is a misunderstanding over the 'destroy' notification from C GSource. It seems that the 'destroy' notification has nothing to do w/ reference counting of the GSource itself but with the fact that the 'attachment' to a context is actually destroyed. So, in Glib::Source there shouldn't be a correlation when the notify/destroy callback is called and the destroying of the C++ Glib::Source object itself. I think that Glib::Source object destroying should be done completely different (seems that it might not be a derivation of g_object (why!?) but you still have a 'finalize' callback to use when ref counting goes to 0. Steps to reproduce: run under valgrind, the following c++ program (see in it how to compile it), once as is, and once after changing the "#if 0": #include <glibmm.h> // g++ -g `pkg-config --cflags glibmm-2.4` `pkg-config --libs glibmm-2.4` t.cpp // bug 1: 561884 static bool on_t1(Glib::RefPtr<Glib::MainLoop> l) { printf("timer 1\n"); l->quit(); return FALSE; } int main(int argc, char *argv[]) { Glib::init(); Glib::RefPtr<Glib::MainLoop> loop(Glib::MainLoop::create()); Glib::RefPtr<Glib::TimeoutSource> t1; t1 = Glib::TimeoutSource::create(1000); t1->connect(sigc::bind(sigc::ptr_fun(on_t1), loop)); t1->attach(); #if 0 t1->reference();t1->reference();t1->reference(); #endif loop->run(); printf("ending loop (timer 1=%p)\n", t1->gobj()); t1.clear(); return 0; } Actual results: valgrind: memory access to already freed data Expected results: no memory access problems. no immediate destroy of the Glib::TimeoutSource in case it has enough of a big ref count. Does this happen every time? yes Other information: no patch this time :-/
Oops, correction to indroduction part: I came to see these problems after reading source of Glib::RefPtr and Glib::Source::destroy_notify_callback - instead of destructor of Source.
> Checking into the code of both Glib::Source and C GSource I think there is a > misunderstanding over the 'destroy' notification from C GSource. Do you mean the GDestroyNotify notify callback given to the g_source_set_callback() function? http://library.gnome.org/devel/glib/unstable/glib-The-Main-Event-Loop.html#g-source-set-callback
Yes. Other comments: - why the C GSource isn't derived also from GObject? (I'm not waiting an answer to this though :-/ ) - since it looks like GSource doesn't have a QData data member, I'm wondering how/if you can wrap the same C GSource with the same C++ GSource object instance... (there's no such code around the wrapping C++ GSource constructor to do similar things as in Glib::ObjectBase) If you can't do that, that you won't be able to use Glib::RefPtr<GSource> comparisons - unless you change Glib::RefPtr to compare the actual C GObject (pCppObject_->gobj()) instead of the C++ instances.
For what it's worth, I spent most of the evening looking at this bug and I'm not really much closer to a solution. Ideally, we would simply be able to use the 'finalize' vfunc, but GSource doesnt' allow us to store arbitrary data (i.e. a pointer to the wrapper) in it, so we are not able to determine the proper wrapper object at that point. So I guess the destroy_notify callback was used as a workaround for these deficiencies, but it does certainly seem to have problems. One potential idea I thought about was passing a slightly larger struct size to g_source_new() and then storing a pointer to the wrapper in the larger struct, e.g.: struct GSourceWithWrapper { GSource source; Glib::Source* wrapper; }; however, this only helps when we're creating a new Glib::Source from glibmm. it doesn't help us at all if we're creating a wrapper object for an existing GSource* object. So I'm out of ideas for now.
OK, so I do have one other idea -- keep our own map of GSource*-to-wrapper so that we can use the finalize vfunc and look up a C object's wrapper object when finalize gets called. This wouldn't be great for performance since we'd probably have to protect the map with a mutex to make it thread-safe, but I don't imagine that creating and destroying GSource objects is going to be on the critical path for performance in any applications...
*** Bug 539186 has been marked as a duplicate of this bug. ***
Add danielk to cc on this bug since we had a discussion about this on IRC a little while back.
Created attachment 149450 [details] [review] fix lifetime mgmt of Glib::Source This patch is a proposed fix, based on IRC conversations with Daniel & Owen. It makes it impossible to wrap an existing GSource as a Glib::Source, but the gain is correct lifetime management of Glib::Source at very little cost. I suspect that it may not be complete, but any missing pieces will hopefully be trivial. I will attach the header file patch and a small test source file immediately.
Created attachment 149451 [details] [review] header file patch
Created attachment 149458 [details] slightly smaller test case the patch was tested on the source code included in the existing bug report, and on this small test case, which has the interesting wrinkle that we quit a MainLoop from within the handler for an IOSource.
Could you just provide one patch, please. It could add the test case to the tests in tests/. Please use the @deprecated doxygen keyword and write a ChangeLog entry. It looks like we have missed the opportunity to get this in this glibmm version. Sorry.
Paul?
Please?
Murray, for some reason, I haven't seen any emails arising from your requests in this report, so apologies. I'll try to follow through on the request ASAP.
Created attachment 196372 [details] [review] 0001-Glib-Source-Correct-the-memory-mangement.patch Here is an actual git patch with Paul's changes, with some cosmetic changes by me. I have attempted to explain the changes in the ChangeLog/commit-message, but I could not explain all of them. It would be great if someone else reviewed this patch. It does not add the test case, because I don't know what the test case should do, or not do. It does not stop either with or without this patch, so we can't just add it like the existing tests.
I noticed that this patch triggers the abort in Glom, because Glib::IOSource's constructor uses that Glib::Source constructor, in glib/glibmm/main.c: IOSource::IOSource(const Glib::RefPtr<IOChannel>& channel, IOCondition condition) : Source(g_io_create_watch(channel->gobj(), (GIOCondition) condition), (GSourceFunc) &glibmm_iosource_callback) {} 0 0x00110832 in ?? () from /lib/ld-linux.so.2
+ Trace 228469
I've not looked at the patches in this bug. I've only been recently considering the possibility of wrapping an existing C object and making sure that the wrapper is freed when the C object is finalized. One way is to add weak referencing to GSource as GStreamer did recently with their GstMiniObject type: http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/gstreamer-GstMiniObject.html#gst-mini-object-weak-ref I've been looking into this and it would not be possible to add any fields in any of the existing GSource structs to store the weak reference callbacks with the given data because that would break ABI (I think). GSource already has a _GSourcePrivate struct so what I was thinking (I might be able to provide a patch) is to create a new "_GSourceBiggerPrivate" struct that can store what the original private struct stores and the weak references (either by having a member _GSourcePrivate struct as it first member so it can be cast or by including the two fields of the original private struct as the first two members in the new struct). The code in GSource that creates, accesses and frees the private struct would have to be changed so that it uses the new private struct. Also weak referencing code would have to be added. If this solution is feasible it may be a good idea to consider removing the recently added Pollable[Input|Output]Stream and Converter[Input|Output]Stream interfaces and classes because the interfaces have unwrapped virtual functions that return GSources and the classes derive from the interfaces. Adding the virtual functions would break ABI later. Maybe they can be added for 2.36.
*** Bug 694778 has been marked as a duplicate of this bug. ***
How about you just have the C++ object keep it's own reference counter and have the destroy function g_object_unref() the C object that many times before setting the pointer to NULL, then have the C++ object's unreference() method skip calling g_object_unref() when it is NULL?
Actually, there's no reason to pass the references straight through to the C object in the first place. The C++ object just needs its own reference count, period, and it can just maintain a single reference to the C object that can be released in the destroy method.
(In reply to comments 20 and 21) Adding a ref counter in Glib::Source would break ABI. That's allowed only in a new major release (glibmm 3.0). The present situation: The C++ object, Glib::Source, is deleted in Source:: destroy_notify_callback(). That's too early. The deleted object is accessed in Source::unreference(). The C object, GSource is not deleted. We have a memory leak. Solutions in comments 20 and 21: With some care it would be possible to have a solution that avoids both memory leaks and accesses to deallocated memory. But I'm not fond of double reference counters. And as I said, it can't be implemented until glibmm 3.0, whenever that will be. Bug 539186 comment 3 summarizes the problem with the destroy functions. The GSource object is _not_ being destroyed when Source:: destroy_notify_callback() is called, it's being detached from its GMainContext. (In reply to comment 18) Has anyone filed a glib bug, requesting some addition to GSource? I can't find any such bug. I suppose GSource is a problematic class for all language bindings. There's no way to store the address of a wrapper object such that it survives until the GSource object is finalized. I can see only two ABI/API-preserving solutions. 1. Introduce our own GSource*-to-wrapper* map, as proposed on comment 5. 2. Ask for added functionality in GSource. A weak reference, as proposed in comment 18 is one possibility. The patch in comment 16 avoids breaking ABI, but doesn't it break API? One of the constructors does not construct sensible objects any more. There's an alternative to a weak reference in GSource that would be very easy to implement, if the glib developers accept it: Add an element, user_data, in struct _GSourcePrivate: struct _GSourcePrivate { ... void* user_data; }; This struct is declared in gmain.c. It would not be an API/ABI break to change it. Also add void* g_source_get_user_data(const GSource* source) { return source->priv->user_data; } void g_source_set_user_data(GSource* source, void* user_data) { source->priv->user_data = user_data; } With this change of GSource glibmm_source_get_callback_data() can be simplified.
What? It wouldn't break ABI since it already gives the appearance of having a refcount, it just implements it by piggy backing on to of the C object refcount. All that is needed is to add a member variable to the class and change the reference/unreference implementation and that doesn't break ABI. Then you change Source::destroy_notify_callback to call unreference() instead of delete self. That way if the caller is still holding the RefPtr, the delete will be delayed until the RefPtr is released, the way it is supposed to.
Have I misunderstood? Don't you want to add a data member to Glib::Source (int ref_count or something like that)? Adding a data member or a virtual function breaks the ABI. (ABI = application binary interface, not API) Even if it would be possible to add another ref counter without breaking the rules, I don't like the idea. At least we should first ask to have GSource changed in such a way that it's possible to make better language bindings. I hope for a reply from someone who has commented on this bug before I became interested: Has anyone filed a glib bug, requesting a change in GSource? I find it surprising if no one has, but I can't find such a bug, neither open nor closed.
You can add a new public data member at the end without breaking the ABI. Heck, if your private data members are all after the public ones, you can add or remove private members without breaking ABI. It seems to me there's nothing wrong with GSource, but the problem is that the C++ wrapper tried to take a shortcut in its implementation based on a misunderstanding of the way GSource works. Specifically it assumed that it did not need to keep its own reference counter because it thought that the destroy notify callback would only be called after the GSource ref count dropped to zero, and this is not the case.
In reviewing this bug and the related C and C++ API (not thoroughly though), another practical solution that could be used to address this bug presently is to have the C API call the GSource's finalize function (when it is time for its finalization) with an additional 'gpointer user_data' parameter passing in the data stored in the GSource's callback data field (that was set with a call to g_source_set_callback()). Glib::Source could then have a finalize_vfunc() function accepting the data, extracting the wrapper from it and then correctly freeing it. The following two patches (one for glib and the other for glibmm) show how it would work. The weak referencing idea was a quick idea that came to me as I quickly thought about how to ensure that wrappers are correctly deleted, but in reviewing the bug and the C/C++ API a little more, it seems like it would be supplying functionality that can already be achieved with the finalization notification. I admit, in reviewing the patches they may look a little hackish but they do work. I don't really want to spend more time on this bug, though, unless we really need a fix and we have to do it right away. That's why I've been slow to respond. However, if these patches seem good and they need to be made better for filing a bug against glib, I can fix them accordingly.
Created attachment 238273 [details] [review] glib patch calling the GSource's finalize vfunc with the additional user_data parameter
Created attachment 238274 [details] [review] glibmm patch implementing the finalize vfunc to correctly free the wrapper
(In reply to comment #26) Thanks for the patches, but I didn't really ask for patches. I just wanted to know if my suspicion is right that no one has asked for a change in glib. This bug was filed more than 4 years ago, and several persons have commented on it. > I admit, in reviewing the patches they may look a little hackish but they do > work. I agree they look hackish. Glib::Source::destroy_notify_callback() writes to GSource::callback_data, which is marked <private>. It does not write the value that was stored there before destroy_notify_callback() was called. g_source_set_callback() stores a GSourceCallback* in GSource::callback_data, and the pointer to the Glib::SourceCallbackData is stored in GSourceCallback::data. If I were a glib developer, I'd be very reluctant to accept your glib patch. From the glib point of view it looks as if the finalize function shall include a new parameter which is either 0 (if the destroy notify function has been called) or a pointer to a GSourceCallback instance, a class defined in gmain.c (if the destroy notify function has not been called before the call to finalize). Looking at g_source_unref_internal() I get the impression that SourceCallbackData::destroy_notify_callback() can be called either before or _after_ the finalize function (the presently nonexistent Source::finalize_vfunc()). If I'm right, your patch will not work in all situations. I also wonder if it's an API break to add a new parameter to the finalize function. I don't really know, I'm just wondering. I will probably file a glib bug suggesting a new user_data element, as described in comment 22. (In reply to comment #25) > You can add a new public data member at the end without breaking the ABI. > Heck, if your private data members are all after the public ones, you can add > or remove private members without breaking ABI. I'm confused and sceptical. If you add a data member to a class, even it is added after all other data members, won't the offsets of all data members declared in subclasses be changed? You might reply that in this case it doesn't matter, because the only subclasses of Source with data members are TimeoutSource and IOSource. Those data members are private, and the classes contain no inline methods. All accesses to data members with changed offsets will be from the glibmm library, and they will be updated when the library is rebuilt. But can you be sure that no application program defines its own subclass with data members? That's possible, and (I suppose) permitted.
I was just writing the following comment to the solution I was proposing when there was a collision with the most recent comment (above I guess). Here's what I was writing: Actually, I think that having the C API provide functions such as g_source_set_bindings_data() and g_source_get_bindings_data() (as was sort of described in comment 22 by Kjell) would probably make the most easy and clear solution. In other words (similar to what was described and assuming, of course, that there's no ABI break by the code): struct _GSourcePrivate { ... void* bindings_data; }; void* g_source_get_bindings_data(const GSource* source) { return source->priv->bindings_data; } void g_source_set_bindings_data(GSource* source, void* data) { source->priv->bindings_data = data; } The GSource internals make it difficult to keep the user data set with a call to g_source_set_callback() accessible until the finalization process takes place. The g_source_destroy_internal() function (which is called by g_source_destroy()) NULLs out the user data before finalization takes place, thus making difficult for the data to be accessed then: static void g_source_destroy_internal (GSource *source, GMainContext *context, gboolean have_lock) { ... if (!SOURCE_DESTROYED (source)) { ... source->callback_data = NULL; source->callback_funcs = NULL; ... } ... } I thought that removing the NULLing lines would make it possible to keep the data until finalization but that causes unpredictable memory access problems causing the most simple programs (that somehow rely on glib?) to crash. Even a program such as kdbg (which I wouldn't think uses glib) crashes. If the data were accessible at finalization, it would not be necessary to use a finalize_vfunc() with an additional 'gpointer user_data' parameter because the data would be accessible through the GSource. Thus the C API's method of invoking the finalize function would not have to change for the accommodation of bindings. But that does not look easily achievable. The weak referencing is still a possibility but easier than that, I think, are the bindings data functions already described. As far as having the wrapper keep its own reference count, I still remain dubious that adding a new 'ref_count' member to Glib::Source would not be an ABI break because other experienced glibmm developers have never mentioned such a possibility as adding new data members to classes under any circumstances without breaking ABI when trying to fix things that could easily be fixed that way.
*kicks gnome bugzilla for refusing to accept replies via email* On 3/8/2013 10:49 AM, glibmm (bugzilla.gnome.org) wrote: > I also wonder if it's an API break to add a new parameter to the finalize > function. I don't really know, I'm just wondering. Yes, that's definitely an API break, unless you can stash a flag so you know whether you should call the finalize function with, or without an argument. > I'm confused and sceptical. If you add a data member to a class, even it is > added after all other data members, won't the offsets of all data members > declared in subclasses be changed? Ugh, yea... any classes derived from this one would break. I'm really starting to get annoyed at C++. The only other way I can see to fix this is a little hack I recall being used frequently in Windows 3.x. You allocate a few bytes of writable, executable memory, and create a binder there. You write some code there that pushes the address of the glibmm::Source onto the stack ( or shoves it into the correct register ) then jumps to Source::finalize_func, and you pass the address of the binder to GSource to invoke on finalization.
(In reply to comment #31) > *kicks gnome bugzilla for refusing to accept replies via email* > > On 3/8/2013 10:49 AM, glibmm (bugzilla.gnome.org) wrote: > > I also wonder if it's an API break to add a new parameter to the finalize > > function. I don't really know, I'm just wondering. > > Yes, that's definitely an API break, unless you can stash a flag so you > know whether you should call the finalize function with, or without an > argument. As commented, if it were possible to keep the callback user data until finalization the call to the finalize function would not have to be changed. It would be an API break if the call required that the second argument actually be declared by the callee. But I think it's possible to not have the callees declare the new argument and still call the function that way. I tested that and there were no problems. (I think it has to do with how the call stack, heap, etc. are configured when calling C functions.) Suffice it to say though that I still think that the bindings data solution seems the easiest and quickest to solve this issue.
For the sake of completeness, I wanted to see if the finalization method would be a feasible solution and I think I found that it could be. The g_source_destroy_internal() function not only NULLs out the callback data and the callback functions, but it also unreferences the callback data: static void g_source_destroy_internal (GSource *source, GMainContext *context, gboolean have_lock) { ... if (!SOURCE_DESTROYED (source)) { ... source->callback_data = NULL; source->callback_funcs = NULL; if (old_cb_funcs) { UNLOCK_CONTEXT (context); old_cb_funcs->unref (old_cb_data); LOCK_CONTEXT (context); } ... } ... } Since this process already takes place in the g_source_unref_internal() function which g_source_destroy_internal() calls, the NULLing out and unreffing can (and should) be removed from g_source_destroy_internal(). The same NULLing and unreffing lines in the g_source_unref_internal() function should be moved after finalization so that the data can be accessed during finalization. Once that's done, glibmm's Source can include a finalize_vfunc() that correctly frees the wrappers. The following two updated patches show what the initial changes would be like and verifies that the concept works. I know that I said that adding setter/getter functions for bindings data would easier and clearer, but I wanted to fully explore the finalization idea.
Created attachment 238554 [details] [review] glib: Allow bindings to correctly free their GSource wrappers.
Created attachment 238555 [details] [review] Glib::Source: Partially correct the deletion of the wrappers.
I came to the opposite conclusion when I studied Glib::Source once more: We must not depend on a finalize callback function in Source::vfunc_table_. If the Glib::Source instance has been constructed by the default constructor, it would be possible to depend on such a finalize callback, after some change of GSource in glib. The patch in comment 16 shows that it's possible even without changing GSource. Unfortunately there is also the constructor Source::Source(GSource* cast_item, GSourceFunc callback_func). When that constructor is used, we can't set our own GSourceFuncs callbacks (prepare, check, dispatch, finalize). Comment 17 shows that at least one application uses this constructor. I now intend to propose an added qdata API in GSource, similar to g_object_get_qdata(), g_object_set_qdata(), g_object_set_qdata_full(), g_object_steal_qdata(). That's what glibmm uses for classes derived from GObject. Weak references would also be acceptable, but a qdata API is better.
Created attachment 238581 [details] [review] patch: Glib::Source: Fix the destruction and deletion. I have filed glib bug 695631. This is a glibmm patch that can be committed if the glib patch is accepted. In the glib bug I suggest a qdata API, like g_object_set_qdata_full() and friends. I also suggest a different qdata API and weak references as alternatives. I prefer one of the qdata APIs.
So what was wrong with using a weak reference again? That should not interfere with the other constructor.
There's nothing wrong with weak references. A weak reference can fix the problem discussed in this bug. That's why I've mentioned it in the glib bug as an alternative to the qdata API, in case the glib developers prefer it. Copied from the glib bug 695631: Weak references, like g_object_weak_ref() and g_object_weak_unref(), would be acceptable as an alternative, but a qdata API is better. Data stored with g_object_set_qdata() or g_object_set_qdata_full() is available in all callback functions, not only when the object is being finalized. In the patch in comment 37, I've used g_source_get_qdata() in Source:: get_wrapper(). It's not necessary, but it's a tiny simplification. Do weak references have an advantage over the qdata API, an advantage that I'm not aware of?
(In reply to comment #39) > Do weak references have an advantage over the qdata API, an advantage that I'm > not aware of? Probably none. Just two ways of addressing this issue, I think.
(In reply to comment #36) > I came to the opposite conclusion when I studied Glib::Source once more: > We must not depend on a finalize callback function in Source::vfunc_table_. > If the Glib::Source instance has been constructed by the default constructor, > it would be possible to depend on such a finalize callback, after some change > of GSource in glib. The patch in comment 16 shows that it's possible even > without changing GSource. > > Unfortunately there is also the constructor > Source::Source(GSource* cast_item, GSourceFunc callback_func). > When that constructor is used, we can't set our own GSourceFuncs callbacks > (prepare, check, dispatch, finalize). Comment 17 shows that at least one > application uses this constructor. One way to work around this is that if a finalize function of the cast_item's GSourceFuncs exists, store it in a new member variable of the SourceCallbackData struct replacing the function with our custom finalize_vfunc(). Our finalize_vfunc(), once called, can check for the existence of the replaced finalize vfunc, call it, complete the destruction process and return. This would require almost no change to the C API except the correction of the deletion of the callback data from when destruction of the source takes place to when the last reference of it is lost (which I think would be the proper place do that). Of course, it requires more work on our part and it's probably not as easy to understand as using the additions requested in the glib bug, but I think it would work.
Let's wait for a while and see if we get a reaction to bug 695631. There are alternatives that don't require a qdata API or weak references in GSource, but they all seem far-fetched and ugly. One of the goals of glib and its relatives (gtk+, etc.) is that it shall be easy to make language bindings. That goal has been met for most of glib, but there are a few difficult classes. GSource is one difficult-to-wrap class. With just a little bit of help from glib, it will be _easy_ to wrap GSource without causing memory leaks or illegal memory accesses.
Created attachment 239348 [details] [review] patch: Glib::Source: Fix the destruction and deletion. The glib bug 695631 has been closed with WONTFIX. We must find a solution without modification of glib. The patch shows my suggested solution. It's a variation of Phillip's suggestion in comment 21. Glibmm keeps its own reference counter, and unreferences the GSource instance just once, when glibmm's ref count has reached 0. Glib::IOChannel uses this strategy in some situations. As discussed previously we can't add a ref_count member to Glib::Source now, because it would break ABI. It complicates the solution. I've added a std::map<const Glib::Source*, ExtraSourceData> that stores the ref_count. If you think that the mutex, protecting the map, slows down execution too much, you should know that g_source_ref() and g_source_unref() also use a mutex. If you think that searching the map slows down execution too much, you may be right if there are 1000s of sources.
I'd prefer to avoid using a map all the time if it isn't too difficult. Is there a reason you don't think the weak ref idea is better?
I think both weak references and the qdata API I proposed are better than a ref_count in a map. But both solutions require changes in gmain.c, and the glib developers don't want it changed. It's true I only mentioned weak references in passing in bug 695631. I'm not sure the glib developers would oppose it as much as they oppose the patches I and José proposed. Both José and I have spent much more time on this bug than it's worth. I'm not going to push more for a change in gmain.c. You're welcome to continue the discussion yourself, for example by reopening bug 695631 and discussing the matter there. I'd be glad if you manage to get either weak references or a qdata API into GSource.
Huh? Neither the ref_count in a map, nor weak references require any changes to glib.
Have I overlooked something? The only glib functions I have found, dealing with weak references, are g_object_weak_* and g_weak_ref_*. They all work only with a GObject. A GSource is not a GObject. g_weak_ref_* don't even offer a chance to register a callback function, which is what we want. What else is there? How can we set a weak reference on a GSource instance, and have a callback function called when the instance is finalize? Or do you have anything else in mind?
What the hell? I thought everything in glib was supposed to be derived from GObject?! Well, the refcount in a map at least won't require any changes to glib.
José, do you plan an addition to this bug? You say in bug 695631 comment 12 that you see an easy way to fix this bug. Do you see a better way than my patch in comment 43? If not, I suggest I push that patch. Why don't we use the pimpl idiom for private data in glibmm classes? (pimpl = pointer to implementation, http://en.wikipedia.org/wiki/Pimpl) Then we could add private data without breaking ABI. (Unfortunately it's not possible to misuse Glib::Source::gobject_ as a general pimpl pointer without breaking ABI, because Glib::Source::gobj() is inlined.)
Created attachment 240459 [details] [review] Glib::Source: Preliminary patch to correct the wrapper destruction. I did have a slightly different idea. It too has to do with referencing/unreferencing but I did not think of keeping a separate reference count. What I was thinking is to sort of take note of when the GSource is about to lose its last references and then delete the wrapper then. This patch exemplifies the idea more or less. The patch is not complete because the case for when an existing GSource is wrapped is not included but I believe that the case that is included (the one when the Glib::Source is created) does work. The places that I modified to take note of when the GSource is about to lose its last references and then free the wrapper were in the Glib::Source::unreference() method and in the Glib::Source::dispatch_vfunc(). In the Glib::Source::unreference() method, if there is a final reference, the wrapper is destroyed. As far as the Glib::Source::dispatch_vfunc(), it is called from g_main_dispatch(): static void g_main_dispatch (GMainContext *context) { ... for (i = 0; i < context->pending_dispatches->len; i++) { GSource *source = context->pending_dispatches->pdata[i]; ... if (!SOURCE_DESTROYED (source)) { ... gboolean (*dispatch) (GSource *, GSourceFunc, gpointer); ... dispatch = source->source_funcs->dispatch; ... need_destroy = ! dispatch (source, callback, user_data); ... if (need_destroy && !SOURCE_DESTROYED (source)) { g_assert (source->context == context); g_source_destroy_internal (source, context, TRUE); } } SOURCE_UNREF (source, context); } ... } When the Glib::Source::dispatch_vfunc() returns false, this signals that the GSource should be destroyed. If that is the case it will lose one reference count from the call to g_source_destroy_internal() and then one other reference from the SOURCE_UNREF() macro call in g_main_dispatch(). Thus before returning, the Glib::Source::dispatch_vfunc() checks for these conditions (whether the GSource will be destroyed and if so whether it has two final references). If that is the case the wrapper is also freed. As far as dealing with the wrapping of an existing GSource, that would be a little more complicated. I think it would be necessary to do something like what I described in comment 41 with respect to the finalize_vfunc(). I've not explored this completely. I stopped because I thought it was more complicated than what an easy solution might be. However, of course, if it's necessary and it works I can complete the patch. As far as pushing your patch, I'm not sure that would be a good idea. I modified the timeout example in gtkmm-documentation with the following patch and it crashed. I'm not sure if I did something wrong.
Created attachment 240460 [details] [review] gtkmm-documentation: timeout: Modified to test Gllib::Source.
Created attachment 240493 [details] [review] patch: Glib::Source: Fix the destruction and deletion. Many thanks for detecting the bug in my patch in comment 43. I had not realized that SourceCallbackData::destroy_notify_callback() can be called _after_ all the RefPtr<Source> refs are gone. The C++ wrapper must be kept until both all RefPtr<Source> have been deleted _and_ SourceCallbackData::destroy_notify_callback() has been called. This new patch fixes that problem. Your patch in comment 50 does not cause crashes, as far as I can see, but it can cause memory leaks. There are situations when the C++ wrapper is not deleted. Run your modified gtkmm-documentation/book/timeout program. Manually remove a running timer and/or quit the program while a timer is running. The timer's wrapper is not deleted. The timer is not stopped by result == 0 in Source:: dispatch_vfunc() but by the call to g_source_destroy() in SourceConnectionNode::notify() when the slot is disconnected. (In reply to comment #50) > As far as dealing with the wrapping of an existing GSource, that would be a > little more complicated. Your fix works also in this case, provided there is only one reference to the existing GSource when it's wrapped, and no extra refs are created later except from Source::reference(). E.g. the call to Source::Source(GSource*, GSourceFunc) from IOSource::IOSource(const Glib::RefPtr<IOChannel>&, IOCondition) would be ok. I don't think we need to bother with other cases. Which solution shall we choose? The main drawback with my solution is the std::map that stores the extra ref counter. It can easily be removed when we can break ABI. If we could do that, I would recommend my solution. Now it's more or less a tie. Your solution needs some more fixing to avoid the memory leak. It's a drawback that you must know how many refs GSource holds internally when Source:: dispatch_vfunc() is called. I suppose the glib developers can change that without notice. Possible, but not probable.
Of course, I posted the patch to exemplify what I was thinking of. It was an initial proof of concept. Yes, there would be other things that would have to be fixed. I pretty much knew that; I just wanted to have what I was thinking of posted so it could be seen more or less how it would work. When I noticed the crash with your patch I thought there could be a problem with keeping a separate reference count (I really never liked that idea). I also don't like the addition of the std::map<> because if there were many sources (though I couldn't conceive why right now) it could cause execution delays when the sources are referenced/unreferenced. From the discussion in the glib bug, there may in fact be a much simpler solution than both of the ones we're discussing. That's why I had not posted my idea. I still think we should wait a little and see if that other solution mentioned in the glib bug exists and would in fact fix the problem.
Ok, I'll wait until you present a better solution, or say you give up. I doubt that there is a good solution that does not require changes in GSource. (In reply to comment #52) >> As far as dealing with the wrapping of an existing GSource, that would be a >> little more complicated. > > Your fix works also in this case, ... I was wrong again. I forgot that Source::dispatch_vfunc() is not called for those GSource instances.
If it is I that has to provide a better solution, I doubt that I will be able to come up with something better than either the most recent solution I have proposed or the ones that have already been proposed before. I don't vehemently object to the separate reference count idea, I just didn't like the idea much from the beginning, but that's just preference, nothing more. However, even if I liked the idea and thought that the patch could be pushed, it is not up to me to decide that, the glibmm maintainers decide that, in which case I think you would still have to wait for their approval. Personally, I don't mind which solution is used. I myself have just provided possible solutions as well which the glibmm maintainers can approve or not themselves also. I'm not bothered either way. Again, I think waiting for the glibmm maintainers' decision would be best. If you feel you don't need their approval and want to push the patch yourself then you feel free to do that if you like. It won't bother me at all.
> the glibmm maintainers decide that, I trust your and Kjell's judgement, particularly on this. I have no plan to wade through this and choose between competing solutions. Thanks for all the hard work as usual.
In that case then let's wait a little, consider this a little more and see what would best before rushing to commit anything. We may be able to improve on the ideas presented before a final commit.
Created attachment 240893 [details] [review] patch: Glib::Source: Fix the destruction and deletion. My patch in comment 52 has a flaw. I assumed that the destroy notify function is unconditionally called when g_source_destroy() is called on a source that is not already destroyed. That is not true. If the source has not been attached to a main context, the destroy notify function is not called until the source is finalized. This new patch does not make such an assumption. My patch is ugly, but it's the best I can do. It's ridiculously difficult to avoid both memory leaks and accesses to deallocated memory without modifying GSource. With a suitable modification of GSource, it would be easy. I gave up too easily when bug 695631 was closed. (In reply to comment #55) > If it is I that has to provide a better solution, I doubt that I will be able > to come up with something better than ... I misunderstood bug 695631 comment 12. I thought it meant that you had a great idea how to easily fix this bug without modifying GSource. Hope you don't feel forced to do more on this. We've both done more than what's reasonable.
The change in glib was not accepted because they were told that it could be done in glibmm instead. If that's no longer true, you can try asking for the change in glib again, explaining the new situation.
It's not impossible to do it in glibmm alone, but it would be easier and cleaner if glibmm could register a callback function with user_data (a pointer to the C++ wrapper), a function that is called when the GSource instance is finalized. Just like it's done with all classes derived from GObject. Right now I'm convinced that my patch in comment 58 works. But it's the third version. I can't be sure I still have confidence in it in a week. I think it's wise to wait for a while, like José has suggested.
FYI, this has been causing gparted to crash randomly due to heap corruption. This basically makes the glibmm GSource wrappers unusable. Please change the priority of this to high/critical, since it causes crashes and makes the component seriously broken.
The segfaults reported in gparted bug 697727 have been tracked down to this bug in Glib::Source. My patch in comment 58 is not pretty, but I still think it works. It can be improved when we can break ABI. Then we can get rid of the std::map. How about this way forward? I push the patch very soon. If anyone finds a better non-ABI-breaking solution later, that solution can replace my present solution in a later version of glibmm. But first: Phillip, is it possible for you (or someone else) to check if my patch fixes the segfault problems in gparted? If you try to apply my patch to the latest version of glibmm from the git repository, you will find that the changes to ChangeLog don't apply. That's unimportant. You can skip those changes (git apply --exclude=ChangeLog ...).
(In reply to comment #62) > The segfaults reported in gparted bug 697727 have been tracked down to this bug > in Glib::Source. > > My patch in comment 58 is not pretty, but I still think it works. It can be > improved when we can break ABI. Then we can get rid of the std::map. > > How about this way forward? > I push the patch very soon. If anyone finds a better non-ABI-breaking solution > later, that solution can replace my present solution in a later version of > glibmm. That's fine with me if your patch works. > > But first: Phillip, is it possible for you (or someone else) to check if my > patch fixes the segfault problems in gparted? > If you try to apply my patch to the latest version of glibmm from the git > repository, you will find that the changes to ChangeLog don't apply. That's > unimportant. You can skip those changes (git apply --exclude=ChangeLog ...).
Due to the nature of race conditions and non deterministic effects of writing to freed memory, it isn't very reproducible. I myself haven't been effected by it.
The problem with the segfault identified with GParted 0.15.0 in bug #697727 seems to occur most frequently with Debian distributions. If you can instruct me on how to compile and test gparted with this patch, then I would be happy to help out. Please note that I have never compiled any of the glibmm libraries from scratch. Also as Phillip points out correctly, it is difficult to know for sure if the patch actually fixes the problem, mainly because there is no sequence of steps that is guaranteed to reproduce the problem.
Provided that the proposed patch can be applied to glibmm 2.24.2 in Debian 6; it would arguably be better from the point of view of verifying that the patch fixes bug #697727 with GParted 0.15.0, to just rebuild Debian's libglibmm package with the patch applied. That might be an easier task with the Debian package maintainers having already taken care of how to build glibmm from source. (Though I don't know how to rebuild dpkg's myself as I'm an rpm boy).
I have pushed the patch in comment 58. https://git.gnome.org/browse/glibmm/commit/?id=746f8ba0259cb88c0c6b667f4a663766fcfba359 It's not meaningful to test it with gparted. The crashes in gparted are not caused by the problem discussed in this bug. They are caused by Glib::signal_idle().connect() and Glib::signal_child_watch().connect(), which are not thread-safe. See bug 396958 and 697727. My patch fixes the problem in this bug, but it's quite ugly. If no one finds a better solution, it shall at least be improved when we can break ABI. I keep this bug open, but I change its title and lower its severity.
If the reference counting is fixed, then it should solve the gparted crash because the object won't be deleted until the reference is dropped, even if the GSource is destroyed.
The patch that I pushed changes the methods SourceCallbackData::destroy_notify_callback() Glib::Source::reference() Glib::Source::unreference() The code causing crashes in gparted is, according to your patch Glib::signal_idle().connect( sigc::mem_fun(*this, ©_blocks::set_progress_info) ); Glib::signal_child_watch().connect( sigc::mem_fun( status, &utils_execute_command_status::store_exit_status ), pid ); These calls create neither a SourceCallbackData nor a Glib::Source. None of the changed functions are called. Take Glib::signal_idle().connect() as an example. It creates a Glib::SignalIdle (temporarily) and a SourceConnectionNode. It is possible to replace Glib::signal_idle().connect( sigc::mem_fun(*this, ©_blocks::set_progress_info) ); by Glib::RefPtr<Glib::IdleSource> idle_source = Glib::IdleSource::create(); idle_source->connect(sigc::mem_fun(*this, ©_blocks::set_progress_info)); idle_source->attach(Glib::MainContext::get_default()); With such replacements, I believe the code would actually become thread-safe, provided 1. The thread-unsafe sigc::connection, returned by Glib::IdleSource::connect() is discarded, and hopefully deleted, before Glib::IdleSource::attach() is called, attaching the source to a MainContext running in another thread. (As in the code above.) 2. copy_blocks is not derived from the thread-unsafe sigc::trackable. I suppose this is not an option for gparted now, but probably in the future. You want to be able to use gparted together with any version of glibmm that has been released in the last 3 years or so, if I understand the situation correctly. I very much doubt that the comment 58 patch will ever be applied to all those versions.
If Glib::RefPtr is replaced by std::shared_ptr (bug 755037), it may be possible to simplify Glib::Source. At least I hope so.
In the ABI-breaking master branch: I have removed the extra_source_data map and added instance data in Glib::Source. I don't think it's possible to make Source simpler without modification of glib. If std::shared_ptr replaces Glib::RefPtr, Source::ref_count_ will be replaced by shared_ptr's use_count. But I think Source will still need the keep_wrapper_ counter and a specialized deleter. Closing this lengthy bug. Future problems connected to shared_ptr can be discussed in bug 755037.