GNOME Bugzilla – Bug 697727
Segfault in Gparted 0.15.0 when copying partition
Last modified: 2013-04-30 16:31:24 UTC
Just downloaded the latest liveCD today to clone my ubuntu installation onto a larger hard drive Old drive, connected to USB-SATA cable: sdc1 - 13.97GB partition, EXT3, bootable sdc2 - 13.97GB partition, EXT3 sdc3 - 13.97GB partition, EXT3 sdc4 - extended partition sdc5 - 7.something GB partition, swap sdc6 - 200something GB partition, EXT3 New drive (desired changes to sda): copy sdc1 to new partition 1 resize new partition 1 to 32GB copy sdc2 to new partition 2 copy sdc3 to new partition 3 create extended partition (new partition 4) copy sdc5 to new partition 5 copy sdc6 to new partition 6 resize new partition 6 to [remainder of available space] All new parititions aligned to MiB, not cylinder. Hit start. Expand the "show details" arrows as they become available. New partitions 1 and 2 are copied successfully. New partition 3 successfully e2fsck's, but the copy operation segfaults. Re-open gparted, and re-create all operations from "copy sdc3" onward. Again, segfault on copy, within the first 60 seconds. (Tried to run the same from the terminal, but no stack trace was printed... just "Segmentation Fault"). Re-open gparted, and just copy sdc3. Hold my breath. No segfault (didn't open details). Re-create all operations from "create extended paritition" onwards. Watch details. Segfault within the first 60 seconds of copy operation. Re-create all operations from "create extended paritition" onwards. Don't watch details. 15 minutes of a 3 hour copy operation done with no issues. Will update bug if it completes successfully.
Thank you Ian for reporting this problem. Is the problem still reproducible for you? If so, then we might work together to troubleshoot and find the exact problem.
The problem happened again during the "big" 3-hour copy, so I went back and completed it with the gParted version that's on the latest Ubuntu LiveCD. I'd be happy to help troubleshoot this; how can I get more debug info (and crash info) from the running program? As I said, running it from the terminal was uninformative :(
And I should add, the problem happened without me expanding the "details". So, unfortunately it seems to be something intermittent.
Can you run "ulimit -c unlimited" to enable core dumps, and attach it to this report?
If I recall correctly, the default location for the core dump is the working directory, and the default name of the core dump is "core". http://linux.die.net/man/5/core
I have added a warning to avoid using GParted 0.15.0 for partition moves, resizes, copies, and checks to the GParted web site download page and news page while we investigate this problem. In the meantime both Mike Fleetwood and I are running copy operations trying to reproduce the bug to acquire a core dump.
There have been at least 3 reports of crashes with GParted 0.15.0. The other two are: Bug #696810 - segfault resizing ext4 partition with gparted-0.15.0-1 iso "Applying pending operations" grey screen http://gparted-forum.surf4.info/viewtopic.php?id=16863 I suggest that we track our troubleshooting in this report.
After many attempts I have finally been able to get GParted to crash. Since the core dump file is too large to upload to bugzilla (the file is ~3.5MB), you can find the core dump temporarily hosted at: http://gparted.org/curtis/core-dump-move-10238MiB-ext4-1MiB-to-right.gz The core dump file has been renamed from core to core-dump-move-10238MiB-ext4-1MiB-to-right and has been compressed with gzip. The crash occurred while moving a 10238 MiB ext4 partition by 1 MiB to the right. This was done within a virtual machine running the gparted-live-0.15.0-3-i486.iso image. Phillip, your help analyzing this core dump would be greatly appreciated.
GParted Live is using e2fsck from e2fsprogs-1.42.5 This is in the changelog for the 1.42.6 release: Fixed a potential seg fault in e2fsck when there is an I/O error while reading the superblock. I don't know if this is the cause, but it sure could be.
Hi Patrick, Thanks for the heads up about e2fsck. We certainly want to get e2fsprogs upgraded in the live image to avoid this problem. :-) My thoughts are that a crash in e2fsck should not result in a crash of GParted because the e2fsck command is spawned from the gpartedbin process. The other item is that in creating the crash dump in comment #8, e2fsck had finished successfully, and GParted was in the process of actually moving the partition to the right. The progress indicators were updating properly when all of a sudden *CRASH* and the GParted GUI disappeared. Hence e2fsck should not have been executing at the time of the crash. In case it is relevant, before running gparted from a terminal with "sudo gparted", I ran the command "ulimit -c unlimited" as suggested by Phillip in comment #4.
The original crashes that I experienced happened after the e2fsck commands had completed successfully. Or rather, whether or not e2fsck completed successfully, the progress bar had switched over to show the progress of the copy operation.
Created attachment 241488 [details] GDB backtrace etc Yesterday I copied 3 partitions: 20 G, 50 G and 130 G on my Fedora 14 desktop using GParted master (git 76a9cd3). All successfully. Might be a distribution / library version specific issue. Today I have reproduced the crash on Debian 6 in Virtual Box by copying a 4 G partition into unallocated space. Had expanded the details in the operation dialog to see the amount copied updating. After installing all the debug symbol packages I have useful information from gdb. Full details from below commands attached. # gdb gpartedbin core backtrace backtrace full thread apply all backtrace info registers x/16i $pc quit The top of the stack looks like this:
+ Trace 231783
Initial reaction is it looks like a crash freeing memory from Gtk.
Created attachment 241508 [details] gdb backtrace for core file in comment 8 Thank you Mike for sharing the commands you used in your investigation. I used similar commands to get details on the core dump from comment #8. Since I was running gpartedbin as compiled in the GParted Live, the backtrace does not have the additional details as if gparted were compiled with the "-g" option. After installing all the debug symbol packages I too have the useful information from gdb. Full details from below commands attached. $ gdb /usr/sbin/gpartedbin ./core | tee -a gdb-log.txt backtrace backtrace full thread apply all backtrace info registers x/16i $pc quit The top of the stack looks like this: (gdb) backtrace
+ Trace 231786
My initial reaction is that this crash is due to a failure to allocate memory. This is different from the crash you observed in comment #12. This makes me think we might have some sort of race condition occurring. This is just a guess at this time.
Created attachment 241521 [details] Valgrind log Hi Curtis, I now have a core dump from malloc() too. The top of the stack looks like:
+ Trace 231787
I have also run valgrind malloc debugger. Like this: # valgrind --trace-origins=yes --log-file=gparted-valgrind.log gpartedbin GParted doesn't crash under valgrind but I see this error reported which is possibly relevant: ==4340== Invalid write of size 4 ==4340== at 0x4975DD3: Glib::SignalIdle::connect(sigc::slot<bool, sigc::nil, sigc::nil, sigc::nil, sigc::nil, sigc::nil, sigc::nil, sigc::nil> const&, int) (main.cc:92) ==4340== by 0x810427D: GParted::copy_blocks::copy_thread() (Copy_Blocks.cc:137) ==4340== by 0x49700C1: call_thread_entry_slot (slot.h:440) ==4340== by 0x4D6C6CE: g_thread_create_proxy (gthread.c:1893) ==4340== by 0x4F17954: start_thread (pthread_create.c:300) ==4340== by 0x4FF81DD: clone (clone.S:130) ==4340== Address 0x61615e8 is 8 bytes inside a block of size 12 free'd ==4340== at 0x4023881: operator delete(void*) (vg_replace_malloc.c:387) ==4340== by 0x4974814: (anonymous namespace)::SourceConnectionNode::destroy_notify_callback(void*) (main.cc:85) ==4340== by 0x4D41913: g_source_callback_unref (gmain.c:1077) ==4340== by 0x4D41F26: g_source_destroy_internal (gmain.c:856) ==4340== by 0x4D42355: g_main_context_dispatch (gmain.c:1985) ==4340== by 0x4D45FE7: g_main_context_iterate (gmain.c:2591) ==4340== by 0x4D46526: g_main_loop_run (gmain.c:2799) ==4340== by 0x469EE18: gtk_main (gtkmain.c:1219) ==4340== by 0x428FE86: Gtk::Main::run_impl() (main.cc:536) ==4340== by 0x428FC81: Gtk::Main::run() (main.cc:483) ==4340== by 0x8104590: GParted::copy_blocks::copy() (Copy_Blocks.cc:187) ==4340== by 0x80A87AF: GParted::GParted_Core::copy_filesystem(Glib::ustring const&, Glib::ustring const&, long long, long long, long long, long long, long long, GParted::OperationDetail&, bool, long long&, bool) (GParted_Core.cc:2755) I have a second "Invalid write of size 4" too. Full log attached. It looks like the heap is getting corrupted leading to nondeterministic crashes when calling malloc() and free(). Valgrind is saying it's all happening in glib(mm)/libstdc++/gtk(mm). Could be a bug in that code, or a race between multiple threads performing malloc() and free() in an unsafe way. Not sure really. Mike
Hi Curtis, Should we remove GParted 0.15.0 application and Live CD from the download pages too? Both gparted.org and sf.net. http://www.gparted.org/download.php http://sourceforge.net/projects/gparted/files/ Thanks, Mike
(In reply to comment #15) > Should we remove GParted 0.15.0 application and Live CD from the download > pages too? Both gparted.org and sf.net. > http://www.gparted.org/download.php > http://sourceforge.net/projects/gparted/files/ Thanks Mike for pointing this out. Since many other sites pick up and distribute these versions, I thought it best not to have these versions disappear mysteriously into the ether. Instead I have added a warning README to each directory containing GParted 0.15.0 download files. I have also set the default SF download link to the gparted-live-0.14.1-6-i486.iso file. On the GParted web site front page I have updated the direct links to the respective GParted 0.14.1 versions. On the topic of the segmentation fault, I wonder if we need semaphores or something so that our Copy_Blocks thread and main Dialog_Progress thread play nice together. My guess is that perhaps the updates from Copy_Blocks are perhaps getting out of sync with the Dialog_Progress thread updating the GUI. It is also possible, as you point out, that we have found a bug glib(mm)/libstdc++/gtk(mm) but I always suspect our own code first. Phillip, if you are out there we would certainly appreciate your thoughts on this segmentation fault matter.
Sorry, I was a little busy doing my taxes over the weekend. Good work so far, while catching up valgrind was going to be my first recommendation. At first glance, it looks like this is that bug I found in gtkmm before that I thought was benign, but it appears now it may not be. That bug was being tracked in #561885.
I've git bisected it to: bd9e16f22fe3dfae73064369aadeda5de10b61be Thread the internal copy algorithm (#685740) Probably as expected. Also [re]discovered that several of Phillip's patches don't even compile on Debian 6. This makes git bisecting harder! As per comments at: https://bugzilla.gnome.org/show_bug.cgi?id=685740#c8 https://bugzilla.gnome.org/show_bug.cgi?id=685740#c9 several patches used Glib::Threads::Mutex, rather than Glib::Mutex, which is too new for Debian 6. In future we need to ensure all patches compile for git bisectability.
Yep, that's what it is. There is a race condition between SignalIdle::connect and the main thread completing the idle event, and destroying the idle object before connect finishes with it. The bug will ultimately need fixed in glibmm, but as a work around, we can change gparted to avoid using the buggy glibmm GSource wrappers and instead use glib directly.
Created attachment 241568 [details] [review] bad-glibmm.patch I hacked this up real quick. It should do the trick, but I won't have time to test it until tonight. I'm not even sure it will compile, but if you guys want to give it a try go for it.
Thank you Mike for your work so far to identify the root cause of this problem. It looks like valgrind is a very powerful tool. Do you have a good link to a tutorial so that I could learn valgrind? Regarding catching individual commits that do not compile on all distributions, I agree that this is a good goal. However it might prove to be difficult to achieve. In order to do so we would have to test each commit on each distribution. We can try our best, but with the labour intensive nature of this goal I do not think we will be able to catch all occurrences. Having said that, it is a worthy goal to strive for. Phillip, I'm just beginning to test your patch in comment #20. When trying to compile I get the following message: Copy_Blocks.cc: In function ‘gboolean GParted::_set_progress_info(gpointer)’: Copy_Blocks.cc:60:6: error: ‘bool GParted::copy_blocks::set_progress_info()’ is private Copy_Blocks.cc:100:31: error: within this context make[2]: *** [Copy_Blocks.o] Error 1 make[2]: Leaving directory `/home/gedakc/workspace/gparted.dev2/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/gedakc/workspace/gparted.dev2' make: *** [all] Error 2 To work around this I made copy_blocks::set_progress_info() into a public method in include/Copy_Blocks.h.
Valgrind: Nice introduction article "Using Valgrind to Find Memory Leaks and Invalid Memory Use" http://www.cprogramming.com/debugging/valgrind.html On Valgrind's home page they have Quick Start and User Manual documents http://valgrind.org/
My testing of the patch in comment #20 has resulted in a crash. Unfortunately I had not run "ulimit -c unlimited" beforehand so there is no core dump file in the working directory. I will rerun and attempt to get a stack trace.
Ignore comment #23. I think I might have been testing the wrong gpartedbin.
I have now successfully run 20 tests of moving a 10238 MiB ext4 partition by 1 MiB to the left/right using code based on the patch in comment #20. As such the patch shows great promise. I have also run the code through valgrind and did not discover any references to copy_blocks in the output. The valgrind command I used was: sudo valgrind --track-origins=yes --log-file=gparted-valgrind.log src/gpartedbin Mike, would you be able to test with valgrind to confirm if the problem you discovered is now gone? Phillip, would you be able to enhance the patch to be a git patch including comments?
Curtis and Phillip, If one of you can upload a working patch I'll repeat my testing on Debian 6 with and without valgrind.
Created attachment 241648 [details] [review] Patch v2 to fix segfault / bad-glibmm Attached is the git diff of the fix I have been testing for GParted 0.15.0. It should also work with the git head too. The diff file was generated with: git diff > bug697727-segfault-fix.diff To apply the patch, change into the top level of the gparted directory where the README file is and use the command: patch -p1 < bug697727-segfault-fix.diff
Created attachment 241652 [details] [review] 0001-Avoid-glibmm-GSource-bug-crash-697727.patch I converted it to a proper git patch with log.
A further 10 tests of moving a 10238 MiB ext4 partition right then left using the patch in comment #27 (which is the same as the git patch in comment #28) has been successful. No crashes occurred. To check if the GParted 0.15.0 still crashed I ran the same tests in the same environment using the gpartedbin executable from GParted Live 0.15.0-3. This time gparted 0.15.0 crashed on the first test. Often it takes several tries. Mike, If your testing and check with valgrind looks good, then we can consider committing this patch and preparing for a GParted 0.16.0 release. One minor update for the patch is to add the bug number and one line description to the patch comment similar to our example on the GParted git page. http://gparted.org/git.php My thoughts are that the stuff already committed to the git master branch is all good. We could also consider including some of your other patches if you would like these in the release as well. I'm specifically thinking of this report which I can quickly test and commit: Bug #697946 - Absolutely full NTFS reported as partition error
Tested Phillip's patch from comment #28. Valgrind shows the invalid writes have gone. Also run my test case of coping 4 GiB partition on Debian 6 successfully 16 times, where as before it would crash on 1st or 2nd try. Patch passes testing by me. Can the patch summary also mention the number of the bug being worked around in glibmm, formatted as "bug #561885"? (Will allow easier linkage especially with https://git.gnome.org/ converting it to a hyperlink). Curtis, Feel free to include Bug #697946 in GParted 0.16.0.
Created attachment 241745 [details] [review] 0001-Avoid-glibmm-GSource-bug-crash-697727.patch How's this?
I've been working with the glibmm bug 561885 and other glibmm bugs. I can't draw any conclusion from the gdb backtraces in this bug. The crashes may or may not be caused by the fault described in bug 561885. The valgrind log in comment 14, on the other hand, is very informative. The illegal memory accesses are definitely caused by the fault described in glibmm bug 396958. That bug has been partly fixed in glibmm 2.35.8 and later. Even after that fix, Glib::SignalIdle::connect() is not completely thread-safe. This paragraph was added to the reference documentation: * This method is not thread-safe. You should call it, or manipulate the * returned sigc::connection object, only from the thread where the SignalIdle * object's MainContext runs. This remaining thread-unsafety is very difficult to remove. Don't expect it to be done in the foreseeable future. Glib::SignalIdle::connect_once() is more thread-safe in glibmm 2.32.0 and later, where bug 396963 has been completely fixed. The remaining thread- unsafety is described like so: * Because sigc::trackable is not thread-safe, if the slot represents a * non-static method of a class deriving from sigc::trackable, and the slot is * created by sigc::mem_fun(), connect_once() should only be called from * the thread where the SignalIdle object's MainContext runs. You can use, * say, boost::bind() or, in C++11, std::bind() or a C++11 lambda expression * instead of sigc::mem_fun(). Phillip's patch in comment 31 shows that the fault is expected to be in Glib::signal_idle().connect() and Glib::signal_child_watch().connect(). Those functions don't use a C++ wrapper around the created GSource object. They are affected by bug 396958, but not by bug 561885.
This crash is caused by a corrupted heap. The heap is corrupted by glibmm writing to the ConnectionNode after it has already been freed by the destroy notify callback, as shown by the valgrind trace. More specifically, Glib::SignalIdle::connect() calls conn_node->install(source) after having called g_source_attach(). The main thread can dispatch the idle event, and call the destroy_notify_callback, which deletes the SourceConnectionNode, leaving the background thread to call conn_node->install on a now invalid pointer. It appears that the connect_once variants do the reverse order, so should be safe, aside from trying to unref an already freed source object, which I suspect may be capable of causing the same kind of heap corruption, though it is probably far less likely.
Thank you Kjell for commenting on this problem, and Phillip for working towards a resolution. If I understand comment #32 and comment #33 correctly, these indicate that even with the patch from comment #31, the code is still not 100% thread safe. Is my understanding correct? If so, then is there a way that we can make the code 100% thread safe using Glib, or some other library?
No Curtis, he is saying he made a change to glibmm to improve the situation on that end, but it doesn't fix it completely. By avoiding the use of glibmm in this area gparted should be fully safe on all versions of glibmm, so we should be good to go with that patch.
Thank you Phillip for the clarification. This afternoon I will test the patch in comment #31 along with the git repository head, before committing the patch. NOTE: I am moving this report from component livecd to application since it affects more than just the Live CD.
*** Bug 698010 has been marked as a duplicate of this bug. ***
(In reply to comment #33) > This crash is caused by a corrupted heap. The heap is corrupted by glibmm > writing to the ConnectionNode after it has already been freed by the destroy > notify callback, as shown by the valgrind trace. > > More specifically, Glib::SignalIdle::connect() calls conn_node-> > install(source) after having called g_source_attach(). The main thread can > dispatch the idle event, and call the destroy_notify_callback, which deletes > the SourceConnectionNode, leaving the background thread to call conn_node-> > install on a now invalid pointer. Correct! This is what has been fixed with bug 396958, but it was fixed only recently (2013-02-21 in the git repository). Now conn_node->install(source) is called before g_source_attach(). That fix makes Glib::SignalIdle::connect() much more thread-safe, but not completely so, as explained by the added documentation that I cited in comment 32. > It appears that the connect_once variants do the reverse order, so should be > safe, aside from trying to unref an already freed source object, which I > suspect may be capable of causing the same kind of heap corruption, though it > is probably far less likely. The connect_once() methods were fixed almost a year before the connect() methods (2012-04-04). Where do the connect_once() methods unref an already freed source object? Note that the connect() and connect_once() methods create a GSource object, but no Glib::Source wrapper. Neither Glib::Source::unreference() nor SourceCallbackData::destroy_notify_callback() is called. SourceConnectionNode::destroy_notify_callback() is called. The call to g_source_unref() in Glib::SignalIdle::connect() is needed because g_idle_source_new() adds one ref count, and g_source_attach() adds a second ref count. g_idle_add() also calls g_source_unref() after g_source_attach(). (In reply to comment #34) > If I understand comment #32 and comment #33 correctly, these indicate that > even with the patch from comment #31, the code is still not 100% thread safe. Even with the latest fixes in glibmm, the Glib::Signal*::connect() methods are not completely thread-safe. It's not safe to call them in one thread and add a source to a main loop that runs in another thread. With Phillip's patch in comment 31, Glib::Signal*::connect() are not used (unless there are other uses of them in gparted). This is probably the safest way out of your problem, especially since you want to use gparted also with fairly old versions of glibmm. (In bug 561885 comment 66, Mike Fleetwood mentions glibmm 2.24.2 from 2010-05-03.) As I said in comment 32, don't expect the connect() methods to be fully thread-safe in the foreseeable future. Of course I don't know if there are other problems in gparted, apart from those that the patch in comment 31 fixes.
Sorry, I happened to reset component and status by mistake. Changing back to application and NEW.
I was just lamenting to Curtis yesterday that the damn bugzilla "midair collision" nonsense is such a confusing mess. The connect_once() methods call g_source_unref() right before they return. I was thinking that the source may already have been destroyed by the main thread at this point, but now that I look again, I guess not. It looks like the once versions are already safe, but you said that was fixed a year ago, and gparted wants to support older versions of gtkmm, so I suppose we'll just stick with using glib directly.
Thank you Kjell for your explanations in comment #38. We appreciate your assistance with this problem, and with maintenance of Free Software in general. :-) Phillip's patch will work well for GParted because we try to maintain backward compatibility at least for the currently supported major GNU/Linux distributions. This means that for now it will have to work with some distros that are more than a year old (e.g., ubuntu 10.04).
I just successfully ran 10 more moves of a 10238 MiB ext4 partition with no problems using the git head plus the patch in comment #31. This patch passes my testing. The patch in comment #31 has been committed to the Gnome repository for inclusion in the next release of GParted on April 24th. The relevant git commit can be viewed at the following link: Avoid glibmm GSource bug/crash (#697727) https://git.gnome.org/browse/gparted/commit/?id=c36934aca5d0691953fc0eed15ee61bc027ca086
Updating the title of this report because the problem is not limited to the Live CD image. Changing from: Segfault in livecd Gparted v 0.15.0-3 when copying partition To: Segfault in Gparted 0.15.0 when copying partition
The code change to address this report has been included in GParted 0.16.0 released on April 24, 2013.
Created attachment 242493 [details] [review] 0001-Avoid-glibmm-GSource-bug-crash-again-697727.patch Missed one.. this patch gets it.
Tested the patch on CentOS 5.9 and it stopped these assertion errors found on CentOS 5.9: (gpartedbin:9670): GLib-CRITICAL **: g_source_unref: assertion `source != NULL' failed Reviewing the code I though that the PipeCapture constructor needed to initialise sourceid = 0 to ensure that the destructor never depended on assing an uninitialised variable. But doing that gave this error 6 time: (gpartedbin:5836): GLib-CRITICAL **: g_source_remove: assertion `tag > 0' failed Planning to use valgrind to check if there is any uninitialised memory access going on when PipeCapture::PipeCapture doesn't initialise sourceid = 0.
Created attachment 242564 [details] [review] 0001-Avoid-glibmm-GSource-bug-crash-again-697727.patch Woops, good catch. Yes, it should be initialized to zero, and tested for >0 before trying to remove the source in OnReadable. Updated patch.
Thanks Mike for re-opening this bug report and reviewing the patch so quickly. Thanks also Phillip for coming up with another patch. My initial testing on OpenSUSE 11.2 shows that this patch does address the problem with g_source_unref messages. Next I will test this patch in a variety of different VMs and report back here.
To test the patch in comment #48 I performed a half dozen operations with various file systems on various disk devices in various GNU/Linux distributions. VIRTUAL MACHINES TESTED ----------------------- OpenSUSE 11.2, 12.2 Fedora 16, 17, 18 Ubuntu 10.04, 12.04, 12.10, 13.04 Debian 6 The testing was successful in all of the above VMs with no crashes and no g_source_unref messages. PHYSICAL MACHINE TESTED ----------------------- Kubuntu 12.04 All tests on my physical development machine, which includes an Intel Fake RAID, completed successfully.
Hi Phillip, I'm still seeing a race between threads but I don't know if it's safe or not? Sometimes the main thread is running PipeCapture::OnReadable() and getting status==Glib::IO_STATUS_EOF before the set_devices_thread doing the file system specific querying commands has run PipeCapture::connect_signal(). At the moment I think this is working because sourceid is initialised and being accessed by different threads. I don't know if this is safe though. Hopefully self explanatory debug output: D: [15742] main() D: [15746] GParted_Core::set_devices_thread() ... blikd example with normal expected sequence of events: D: [15746] Utils::execute_command(command="blkid", &output, &error, use_C_locale=1) D: [15746] PipeCapture::PipeCapture(fd=10, &string) sourceid=0 D: [15746] PipeCapture::PipeCapture(fd=12, &string) sourceid=0 D: [15746] PipeCapture::connect_signal() sourceid=g_io_add_watch(); sourceid=30 D: [15746] PipeCapture::connect_signal() sourceid=g_io_add_watch(); sourceid=31 D: [15742] PipeCapture::OnReadable() (status==Glib::IO_STATUS_NORMAL) return true D: [15742] PipeCapture::OnReadable() (status==Glib::IO_STATUS_NORMAL) return true D: [15742] PipeCapture::OnReadable() (status==Glib::IO_STATUS_EOF) sourceid=30 D: [15742] PipeCapture::OnReadable() return false D: [15742] PipeCapture::OnReadable() (status==Glib::IO_STATUS_EOF) sourceid=31 D: [15742] PipeCapture::OnReadable() return false D: [15746] PipeCapture::~PipeCapture() sourceid=0 D: [15746] PipeCapture::~PipeCapture() sourceid=0 e2label example with questionable sequence of events: D: [15746] Utils::execute_command(command="e2label /dev/sda1", &output, &error, use_C_locale=1) D: [15746] PipeCapture::PipeCapture(fd=10, &string) sourceid=0 D: [15746] PipeCapture::PipeCapture(fd=12, &string) sourceid=0 D: [15742] PipeCapture::OnReadable() (status==Glib::IO_STATUS_NORMAL) return true D: [15742] PipeCapture::OnReadable() (status==Glib::IO_STATUS_EOF) sourceid=0 D: [15742] PipeCapture::OnReadable() return false D: [15746] PipeCapture::connect_signal() sourceid=g_io_add_watch(); sourceid=51 D: [15742] PipeCapture::OnReadable() (status==Glib::IO_STATUS_EOF) sourceid=0 D: [15746] PipeCapture::connect_signal() sourceid=g_io_add_watch(); sourceid=52 D: [15742] PipeCapture::OnReadable() return false D: [15746] PipeCapture::~PipeCapture() sourceid=52 D: [15746] PipeCapture::~PipeCapture() sourceid=51
That should be fine. It just so happens that the foreground thread finishes before the background has had a chance to store the sourceid. It gets stored after, then the destructor cleans up the source instead of OnReadable(). As long as it gets cleaned up at some point it should be fine. Come to think of it, it's cleaned up automatically as soon as OnReadable() returns false, whether or not sourceid is still 0.
Hi Phillip, Do you see any problem if this change was made? @@ -80,9 +80,6 @@ bool PipeCapture::OnReadable( Glib::IOCondition condition ) if (status != Glib::IO_STATUS_EOF) std::cerr << "Pipe IOChannel read failed" << std::endl; // signal completion - if( sourceid > 0 ) - g_source_remove( sourceid ); - sourceid = 0; eof(); return false; } It means that the sourceid memory location is only ever accessed from a single thread / CPU core. If there are any extended storing of values in registers rather than memory by compiler optimisations; out of order execution by the CPU; or lack of memory barriers between CPUs accessing memory there won't be any unexpected behaviour. This does of cause continue to assume that g_io_add_watch() and g_source_remove() are thread safe against the glib main loop polling the list of events. But that seems to be a given. Thanks, Mike
That should be fine since the return false implicitly removes the source.
Hi Phillip, A couple of questions I'm not sure about: 1) Is it correct that 'Glib::RefPtr<Glib::IOChannel> channel' gets destroyed when ~PipeCapture() runs? 2) As used in OnReadable() what functions or object methods are update() and eof()? I wasn't able to find out. Thanks,
(In reply to comment #55) > 1) Is it correct that 'Glib::RefPtr<Glib::IOChannel> channel' gets > destroyed when ~PipeCapture() runs? Why wouldn't it be? > 2) As used in OnReadable() what functions or object methods are update() > and eof()? I wasn't able to find out. I don't understand.
(In reply to comment #56) > (In reply to comment #55) > > 1) Is it correct that 'Glib::RefPtr<Glib::IOChannel> channel' gets > > destroyed when ~PipeCapture() runs? > > Why wouldn't it be? I was just checking because I don't know glib(mm) very well at all and destructors don't get called when an there is only a pointer to it in the object currently getting destroyed. > > 2) As used in OnReadable() what functions or object methods are update() > > and eof()? I wasn't able to find out. > > I don't understand. Do you have a link for documentation on the update() and eof() functions because I don't know what they are.
(In reply to comment #57) > I was just checking because I don't know glib(mm) very well at all and > destructors don't get called when an there is only a pointer to it in > the object currently getting destroyed. That's why it's a RefPtr and not a plain pointer; RefPtr releases the reference count on the object when it is destroyed. > Do you have a link for documentation on the update() and eof() functions > because I don't know what they are. They are sigc signals ( declared in PipeCapture.h ): they call any connected handlers. I'm using them to forward information on when the pipe hits eof or has just read data ( so we can parse the progress output and update the progress bar ) to the callers of PipeCapture. If they are interested in these events, they connect a slot to the relevant signal.
Hi Curtis, I've been testing the patch from comment #48 with the change I mentioned in comment #53. Tested various create, delete and copy FS and partition operations on Fedora 14, CentOS 5.9 and Debian 6. Unless you have any reason not to I'll apply this upstream shortly. Thanks, Mike
Hi Mike, As a double check, I would like to test the same version that you tested. Namely the patch from comment #48 with the change mentioned in comment #53. I will report back here with the test results. Curtis
*** Bug 696810 has been marked as a duplicate of this bug. ***
Hi Mike, My testing of the patch from comment #48 with the changes you mentioned in comment #53 has gone well. I tested with OpenSUSE 11.2 and Debian 6. I plan to continue testing up until we release 0.16.1 Please feel free to commit the modified patch. I will tentatively plan for a 0.16.1 release after this modified patch is committed, and one is available for the recent problem with Bug #695279 - GParted doesn't compile on RHEL / CentOS 5.9 Thanks, Curtis
Thanks Phillip, I've pushed the commit for inclusion in the next release of GParted: Avoid glibmm GSource bug/crash (again) (#697727) https://git.gnome.org/browse/gparted/commit/?id=9475731ac82e2e29bd604bf73cf2282763c58aeb
The fix to address this bug report has been included in GParted 0.16.1 released on April 30, 2013.