GNOME Bugzilla – Bug 764369
Use realpath() safely
Last modified: 2016-04-26 15:56:49 UTC
Quoting one of my commit messages: realpath(3) manual page says: BUGS The POSIX.1-2001 standard version of this function is broken by design, since it is impossible to determine a suitable size for the output buffer, resolved_path. According to POSIX.1-2001 a buffer of size PATH_MAX suffices, but PATH_MAX need not be a defined constant, and may have to be obtained using pathconf(3). And asking pathconf(3) does not really help, since, on the one hand POSIX warns that the result of pathconf(3) may be huge and unsuitable for mallocing memory, and on the other hand pathconf(3) may return -1 to signify that PATH_MAX is not bounded. The resolved_path == NULL feature, not standardized in POSIX.1-2001, but standardized in POSIX.1-2008, allows this design problem to be avoided. The resolved_path == NULL feature of realpath() has existed as a Glibc extension since realpath() was first added to Glibc 1.90, released in June 1996. Therefore it can be unsed unconditionally. https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=fa0bc87c32d02cd81ec4d0ae00e0d943c683e6e1
Created attachment 325033 [details] [review] Use realpath() safely (v1) Hi Curtis, Here is the patchset to fix this. Thanks, Mike
Hi Mike, Thank you for the patch. I just started testing and have run into a problem. When I compile and run gpartedbin on kubuntu 12.04 LTS (my development box) it crashes with the following message: $ sudo src/gpartedbin ====================== libparted : 2.3 ====================== (gpartedbin:21963): glibmm-ERROR **: unhandled exception (type std::exception) in signal handler: what: basic_string::assign I can run additional tests if needed. 'Just let me know. Curtis
I have narrowed the crash down further. It occurs with the two disk devices in my Intel Soltware RAID aka FAKE RAID (/dev/sdb and /dev/sdd). It does not occur with the actual RAID device (/dev/mapper/isw_efjbbijhh_Vol0). This is a RAID1 mirror configuration. The related device path entries are as follows: $ ls -l /dev/sd* brw-rw---- 1 root disk 8, 0 Mar 31 10:44 /dev/sda brw-rw---- 1 root disk 8, 1 Mar 31 10:44 /dev/sda1 brw-rw---- 1 root disk 8, 10 Mar 31 09:14 /dev/sda10 brw-rw---- 1 root disk 8, 11 Mar 31 09:14 /dev/sda11 brw-rw---- 1 root disk 8, 12 Mar 31 09:14 /dev/sda12 brw-rw---- 1 root disk 8, 13 Mar 31 10:44 /dev/sda13 brw-rw---- 1 root disk 8, 2 Mar 31 10:44 /dev/sda2 brw-rw---- 1 root disk 8, 3 Mar 31 09:14 /dev/sda3 brw-rw---- 1 root disk 8, 4 Mar 31 10:44 /dev/sda4 brw-rw---- 1 root disk 8, 5 Mar 31 09:14 /dev/sda5 brw-rw---- 1 root disk 8, 6 Mar 31 10:44 /dev/sda6 brw-rw---- 1 root disk 8, 7 Mar 31 10:44 /dev/sda7 brw-rw---- 1 root disk 8, 8 Mar 31 09:14 /dev/sda8 brw-rw---- 1 root disk 8, 9 Mar 31 09:14 /dev/sda9 brw-rw---- 1 root disk 8, 16 Mar 31 09:14 /dev/sdb brw-rw---- 1 root disk 8, 17 Mar 31 09:14 /dev/sdb1 brw-rw---- 1 root disk 8, 18 Mar 31 09:14 /dev/sdb2 brw-rw---- 1 root disk 8, 21 Mar 31 09:14 /dev/sdb5 brw-rw---- 1 root disk 8, 32 Mar 31 10:45 /dev/sdc brw-rw---- 1 root disk 8, 48 Mar 31 10:45 /dev/sdd brw-rw---- 1 root disk 8, 64 Mar 31 10:45 /dev/sde brw-rw---- 1 root disk 8, 65 Mar 31 09:14 /dev/sde1 brw-rw---- 1 root disk 8, 66 Mar 31 09:14 /dev/sde2 $ ls -l /dev/mapper total 0 crw------- 1 root root 10, 236 Mar 31 09:14 control lrwxrwxrwx 1 root root 7 Mar 31 10:44 isw_efjbbijhh_Vol0 -> ../dm-0 lrwxrwxrwx 1 root root 7 Mar 31 10:31 isw_efjbbijhh_Vol01 -> ../dm-3 lrwxrwxrwx 1 root root 7 Mar 31 10:31 isw_efjbbijhh_Vol02 -> ../dm-4 lrwxrwxrwx 1 root root 7 Mar 31 10:44 isw_efjbbijhh_Vol0p1 -> ../dm-1 lrwxrwxrwx 1 root root 7 Mar 31 10:44 isw_efjbbijhh_Vol0p2 -> ../dm-2 Of note is that because the drives are in a mirror, they have a visible partition table and partitions (at least to fdisk and parted), but do not have the corresponding partition path entries. For example there is no /dev/sdc1 or /dev/sdc2. Curtis
Typo in comment #3. The RAID devices are *sdc* and *sdd*.
Hi Curtis, Sorry about the crash. Can you bisect it please to the causing commit. (I assume it will be patch number 1, but need to be sure its not in HEAD already). Can you also generate a backtrace please. Thanks, Mike How to capture a backtrace from a coredump ------------------------------------------ 1) Turn off any OS core dump capturing, ensuring: cat /proc/sys/kernel/core_pattern reports just "core" Some methods to do this, depending on distro version, are: * service abrt-ccpp stop * systemctl stop abrtd * sudo service apport stop 2) Increase core dump limit and run GParted as root Either: ulimit -c unlimited sudo gparted or: su - root ulimit -c unlimited gparted 3) Perform crashing action 4) Capture backtrace ls -lrt core* which gpartedbin gdb `which gpartedbin` {COREFILE} --batch --quiet \ -ex backtrace -ex quit > backtrace.log Please paste the terminal output when running gparted and the contents of the backtrace.log file.
Hi Curtis, I tried creating a Fake RAID array with Intel Software RAID format in both a CentOS 7 and Kubuntu 12.04 LTS VMs, but was not able to produce a crash in GParted. # dmraid -f isw -C MyRaid --type 1 --disk /dev/sdc,/dev/sdd # dmraid -ay # .../gpartedbin Going to need the backtrace for guidance about where it is going wrong and possibly going to have to debug via you and your machine. Mike
Hi Mike, Thanks for the detailed instructions. No worries on the crash. It's great to have you on the team. :-) I've introduced more than my share of problems and several made it through to a production release. Regarding git bisect I seem to be challenged with the process. I've had it land on different commits as the problem. In an attempt to get past this I am running the bisect starting with GParted 0.25.0 *AND* running make clean before each and every build. Hopefully this will help me pinpoint the exact commit. $ git bisect start $ git bisect good 976eea771c5374eca2ae845066577d222715de8e $ git bisect bad 3d1262b8bc658e1167eea8dc8eeecb8daa260b61 Bisecting: 48 revisions left to test after this (roughly 6 steps) [ad4191475a228a5db2d150597bb15a1f59d9284c] Rename file system from "crypt-luks" to "luks" (#760080) $ make clean && ./autogen.sh && make -j 8 $ sudo src/gpartedbin /dev/sdc $ git bisect good Bisecting: 24 revisions left to test after this (roughly 5 steps) [27e30a570ff50509d710ee8dccdad848f01c0e4d] Remove unused OperationDetail members (#760709) $ make clean && ./autogen.sh && make -j 8 $ sudo src/gpartedbin /dev/sdc ====================== libparted : 2.3 ====================== (gpartedbin:24088): glibmm-ERROR **: unhandled exception (type std::exception) in signal handler: what: basic_string::assign $ git bisect bad Bisecting: 11 revisions left to test after this (roughly 4 steps) [608060f82dae66cca0a8be2590bdd12ddcdf8be7] Update ext2 resize progress tracker to use the new ProgressBar (#760709) $ make clean && ./autogen.sh && make -j 8 $ sudo src/gpartedbin /dev/sdc $ git bisect good Bisecting: 5 revisions left to test after this (roughly 3 steps) [b0bd4650982f5c804eae24eb59ec8ac015be5ba1] Fix formatting of negative time values (#760709) $ make clean && ./autogen.sh && make -j 8 $ sudo src/gpartedbin /dev/sdc $ git bisect good Bisecting: 2 revisions left to test after this (roughly 2 steps) [965d88d197c4c932bcd5cf4654e4cd44ca997377] Call any FS specific progress trackers for stderr updates too (#760709) $ make clean && ./autogen.sh && make -j 8 $ sudo src/gpartedbin /dev/sdc ====================== libparted : 2.3 ====================== (gpartedbin:4966): glibmm-ERROR **: unhandled exception (type std::exception) in signal handler: what: basic_string::assign $ git bisect bad Bisecting: 0 revisions left to test after this (roughly 1 step) [324d99a172848e4ff3fb7eb189f490bb4e6c53e5] Record file system block size where known (#760709) $ make clean && ./autogen.sh && make -j 8 $ sudo src/gpartedbin /dev/sdc ====================== libparted : 2.3 ====================== (gpartedbin:9334): glibmm-ERROR **: unhandled exception (type std::exception) in signal handler: what: basic_string::assign $ git bisect bad Bisecting: 0 revisions left to test after this (roughly 0 steps) [809a7e095444f881256871d915b460af5cb4bab2] Display progress of XFS file system specific copy operation (#760709) $ make clean && ./autogen.sh && make -j 8 $ sudo src/gpartedbin /dev/sdc $ git bisect good 324d99a172848e4ff3fb7eb189f490bb4e6c53e5 is the first bad commit commit 324d99a172848e4ff3fb7eb189f490bb4e6c53e5 Author: Mike Fleetwood <mike.fleetwood@googlemail.com> Date: Sat Jan 16 10:40:58 2016 +0000 Record file system block size where known (#760709) Record the file system block size in the Partition object. Only implemented for file systems when set_used_sectors() method has already parsed the value or can easily parse the value from the existing executed command(s). Needed for ext2/3/4 copies and moves performed using e2image so that they can be tracked in bytes by the ProgressBar class as e2image reports progress in file system block size units. Bug 760709 - Add progress bars to XFS and EXT2/3/4 file system specific copy methods :040000 040000 5a1e6ad5d1ba82192ba2ed5e414c643157bf9c5f 738c343fca494dc238c8e6e32106ec57a8271aba M include :040000 040000 c8314c2e67d9647ffa592de9feb5d718c26bc71f c767a08f5f3e98cf49c692835a4b7400a66b5241 M src $ This result has me really puzzled. I know that I had compiled and run GParted when testing the following bug report: Bug #760709 - Add progress bars to XFS and EXT2/3/4 file system specific copy methods My guess is that I only tested on a single drive -- I often do this to speed up testing. For example I have an old 160 GiB IDE (sde) drive I often use. sudo src/gpartedbin /dev/sde I will work towards capturing the backtrace. However, I have a number of other commitments so as a heads-up it might delay this effort for a few days. Regards, Curtis
If it helps I only have two file systems on the FAKE RAID mirror. $ sudo parted /dev/mapper/isw_efjbbijhh_Vol0 unit s print Model: Linux device-mapper (mirror) (dm) Disk /dev/mapper/isw_efjbbijhh_Vol0: 312494080s Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 2048s 83888127s 83886080s primary ext4 2 83888128s 104859647s 20971520s primary ntfs $ sudo blkid | grep isw /dev/sdc: TYPE="isw_raid_member" /dev/sdd: TYPE="isw_raid_member" /dev/mapper/isw_efjbbijhh_Vol0p1: LABEL="RAID-Drive" UUID="9d690ab7-bf36-4d63-99f2-96b6627c6c8e" TYPE="ext4" /dev/mapper/isw_efjbbijhh_Vol0p2: LABEL="ntfs-testing" UUID="064FFA5E61837C91" TYPE="ntfs" /dev/mapper/isw_efjbbijhh_Vol01: LABEL="RAID-Drive" UUID="9d690ab7-bf36-4d63-99f2-96b6627c6c8e" TYPE="ext4" /dev/mapper/isw_efjbbijhh_Vol02: LABEL="ntfs-testing" UUID="064FFA5E61837C91" TYPE="ntfs"
Hi Mike, Well, capturing a backtrace didn't take as long as I thought it would. Then again maybe I did it wrong... $ cat /proc/sys/kernel/core_pattern |/usr/share/apport/apport %p %s %c $ sudo service apport stop apport stop/waiting gedakc@octo:~/workspace/gparted$ cat /proc/sys/kernel/core_pattern core $ ulimit -c unlimited $ sudo src/gpartedbin ====================== libparted : 2.3 ====================== (gpartedbin:22268): glibmm-ERROR **: unhandled exception (type std::exception) in signal handler: what: basic_string::assign $ ls -lrt core* -rw------- 1 root root 31916032 Apr 1 10:38 core $ which gpartedbin /usr/local/sbin/gpartedbin $ # Note that I am running the local src/gpartedbin directly, not from PATH $ sudo gdb src/gpartedbin core --batch --quiet \ -ex backtrace -ex quit > backtrace.log warning: Can't read pathname for load map: Input/output error. $ ls -l backtrace.log -rw-rw-r-- 1 gedakc gedakc 992 Apr 1 10:44 backtrace.log $ cat backtrace.log [New LWP 22280] [New LWP 22268] [New LWP 22272] [New LWP 22279] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `src/gpartedbin'. Program terminated with signal 5, Trace/breakpoint trap.
+ Trace 236136
I noticed the warning about "Can't read pathname..." so perhaps I didn't do something correctly. If needed I can recompile with debug options such as with: export CXXFLAGS="-g -O0" Curtis
Hi Curtis, The crash --------- At the moment this crash is hard to figure out. 1) Terminated by signal 5, Trace/breakpoint trap Normally GDB reports either of these as the cause of core dumps: Program terminated with signal 6, Aborted. Program terminated with signal 11, Segmentation fault. Abort when an assert() test fails; or Segmentation fault when out of range memory is accessed such as by a NULL or otherwise invalid pointer. Your crash is from signal 5, Trace/breakpoint trap. This is normally part of the mechanism a debugger uses to allow a program to run until is hits a breakpoint. The kernel delivers signal 5 (SIGTRAP) when a breakpoint is hit to inform the debugger. Subject: Re: When is SIGTRAP raised? http://linux.derkeiler.com/Newsgroups/comp.os.linux.development.apps/2008-10/msg00107.html 2) Backtrace The backtrace shows a new thread being created by glib and running some glibmm function before an exception handler function is called. Doesn't get as far as running any GParted functions in that new thread. clone at close.S line 112 start_thread at pthread_create.c line 308 ?? from libglib-2.0.so.0 ?? from libglibmm-2.4.so.1 Glib::exception_handlers_invoke() from glibmm-2.4.so.1 g_log from libglib-2.0.so.0 g_logv from libglib-2.0.so.0 g_logv from libglib-2.0.so.0 This is consistent with the crashing error. (gpartedbin:22268): glibmm-ERROR **: unhandled exception (type std::exception) in signal handler: what: basic_string::assign There appear to be no debugging symbols available for the glib and glibmm libraries, hence the "from LIBRARY" instead of "at FILE line NUM". 3) GDB reporting "Can't read pathname for load map: Input/output error" I think this is GDB reporting that it can't find the symbolic debugging information in the core file. I have seen one post hinting that this can happen in the stack of the crashing program was corrupted and overwritten. Generating core files on Ubuntu http://askubuntu.com/questions/414429/generating-core-files-on-ubuntu More thoughts ------------- Git bisect can also give inconsistent problem commit when searching for a bug if it doesn't depend on the code being bisected. For example if the bug is related to configuration, is an a third party library or is related to something else in the OS setup. However crashes can be as a result of undefined behaviour which may not be repeatable. It is possible for OS updates to cause weird failures if for example they update shared libraries which are in use by some long running processes but new processes get the new copies. So far the crash looks like it could be caused by memory corruption, possibly overwritting the stack. However commit: commit 324d99a172848e4ff3fb7eb189f490bb4e6c53e5 Record file system block size where known (#760709) which only adds a numeric member to a class and assigns to it, in one case using sscanf() doesn't seem like it would be the root cause. Next steps ---------- 1) Reboot if in any doubt about whether any updates have been installed since the last reboot. 2) Install debugging symbol packages for libraries glib and gtkmm. This Ubuntu wiki article describes how. Debug Symbol Packages https://wiki.ubuntu.com/Debug%20Symbol%20Packages 3) Build GParted from master without optimisation. git checkout -f master ./autogen.sh CXXFLAGS='-g -O0' && make clean && cd src && make 4) Run under valgrind. valgrind --track-origins=yes --leak-check=full ./gpartedbin 2> valgrind.log If memory corruption is happening I would expect the log to contain: Invalid write of size Thanks, Mike P.S. Tell me to stop teaching you to suck eggs with all the specific commands if you want.
Created attachment 325230 [details] File valgrind.log.tar.xz Hi Mike, Thanks again for the detailed instructions. Please do continue with the specific steps. I appreciate these as I tend to be more of a generalist -- I work in multiple computer languages, but not at extreme depth in any single one. I installed the following debug packages: $ sudo apt-get install libglibmm-2.4-dbg libgtkmm-2.4-dbg \ libpangomm-1.4-dbg Then I ran the commands you indicated: $ git checkout -f master $ ./autogen.sh CXXFLAGS='-g -O0' && make clean && cd src && make Then I ran valgrind: $ sudo valgrind --track-origins=yes --leak-check=full ./gpartedbin \ 2> valgrind.log GParted ran much slower under valgrind and I was able to see it scan through various devices (/dev/mapper/isw_efjbbijhh_Vol0, /dev/sda, /dev/sdb) before it crashed on /dev/sdc. I searched the valgrind.log but did not observer any "Invalid write of size" statements. I will try running a backtrace again. Curtis
Hi Mike, Creation of another backtrace follows: $ cat /proc/sys/kernel/core_pattern |/usr/share/apport/apport %p %s %c $ sudo service apport stop apport stop/waiting gedakc@octo:~/workspace/gparted$ cat /proc/sys/kernel/core_pattern core $ ulimit -c unlimited $ sudo src/gpartedbin ====================== libparted : 2.3 ====================== (gpartedbin:13729): glibmm-ERROR **: unhandled exception (type std::exception) in signal handler: what: basic_string::assign $ ls -lrt core* -rw------- 1 root root 31911936 Apr 2 11:55 core $ sudo gdb src/gpartedbin core --batch --quiet \ -ex backtrace -ex quit > backtrace.log warning: Can't read pathname for load map: Input/output error. $ ls -l backtrace.log -rw-rw-r-- 1 gedakc gedakc 1051 Apr 2 11:56 backtrace.log $ cat backtrace.log [New LWP 13741] [New LWP 13733] [New LWP 13740] [New LWP 13729] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `src/gpartedbin'. Program terminated with signal 5, Trace/breakpoint trap.
+ Trace 236137
Some other things I might try are removing each of the partitions from the FAKE RAID. That might help us pinpoint if the problem is related to a specific file system (e.g., ext4 or ntfs). Would you like me to try that? Curtis
Hi Mike, From the previous backtrace it looks like I missed one of the debug packages. $ sudo apt-get install libglib2.0-0-refdbg I rebooted before recompiling. Create another backtrace using the above listed steps (I didn't list all of the steps this time): $ sudo src/gpartedbin ====================== libparted : 2.3 ====================== (gpartedbin:7538): glibmm-ERROR **: unhandled exception (type std::exception) in signal handler: what: basic_string::assign $ ls -lrt core* -rw------- 1 root root 31911936 Apr 2 12:27 core $ sudo gdb src/gpartedbin core --batch --quiet \ -ex backtrace -ex quit > backtrace.log warning: Can't read pathname for load map: Input/output error. $ ls -l backtrace.log -rw-rw-r-- 1 gedakc gedakc 1047 Apr 2 12:28 backtrace.log $ cat backtrace.log [New LWP 7574] [New LWP 7542] [New LWP 7538] [New LWP 7573] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `src/gpartedbin'. Program terminated with signal 5, Trace/breakpoint trap.
+ Trace 236138
Curtis
Hi Mike, Doh! Still missed a debug package... $ sudo apt-get install libglib2.0-0-dbg Also missed a development package: $ sudo apt-get install libglib2.0-dev Here's the backtrace yet again (after recompile, etc.). $ sudo src/gpartedbin ====================== libparted : 2.3 ====================== (gpartedbin:11480): glibmm-ERROR **: unhandled exception (type std::exception) in signal handler: what: basic_string::assign $ ls -lrt core* -rw------- 1 root root 31911936 Apr 2 12:40 core $ sudo gdb src/gpartedbin core --batch --quiet \ -ex backtrace -ex quit > backtrace.log warning: Can't read pathname for load map: Input/output error. 765 /build/buildd/glib2.0-2.32.4/./glib/gmessages.c: No such file or directory. $ ls -l backtrace.log -rw-rw-r-- 1 gedakc gedakc 1516 Apr 2 12:40 backtrace.log $ cat backtrace.log [New LWP 11492] [New LWP 11480] [New LWP 11484] [New LWP 11491] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `src/gpartedbin'. Program terminated with signal 5, Trace/breakpoint trap.
+ Trace 236139
From the output it looks like I might be missing something related to gmessages. Curtis
Hi Curtis, Running under valgrind is expected to be slow. It works by running the client program on a virtual CPU using JIT so that it can track all memory allocations and reads and writes to check for out of bounds reads and writes. Invalid read of size -------------------- From your valgrind.log from comment #11 I see lots of "Invalid read of size" ==13127== Invalid read of size 8 ==13127== at 0x7D40A2F: wcslen (wcslen.S:48) ==13127== by 0x7D4ADB9: wcscoll_l (strcoll_l.c:509) ==13127== by 0x6B696C9: g_utf8_collate (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.3200.4) which are comming from comparision of Glib::ustring types via these function calls: ==13127== by 0x41FD6E: Glib::operator==(Glib::ustring const&, Glib::ustring const&) (ustring.h:1478) ==13127== by 0x41FDC2: Glib::operator!=(Glib::ustring const&, char const*) (ustring.h:1495) ==13127== by 0x421545: Glib::operator<(Glib::ustring const&, Glib::ustring const&) (ustring.h:1504) And these too which is something in the X11 client font system. ==13127== Invalid read of size 4 ==13127== at 0x9BC70D3: ??? (in /usr/lib/x86_64-linux-gnu/libfontconfig.so.1.4.4) ==13127== by 0x9BC9464: FcConfigFilename (in /usr/lib/x86_64-linux-gnu/libfontconfig.so.1.4.4) ==13127== by 0x9BDCA15: FcConfigParseAndLoad (in /usr/lib/x86_64-linux-gnu/libfontconfig.so.1.4.4) I also get similar Invalid read of size 8 and 4 on my Kubuntu 12.04 LTS virtual machine however it doesn't crash GParted. I don't get these on any other distro I have tried, not CentOS 5 and not CentOS 6. Invalid reads don't lead to whole sale overwritting of the stack which is possibly what GDB reporting this warning is indicating "Can't read pathname for load map: Input/output error". Next steps ---------- It seems valgrind is not helping! This is not going to be easy to find. At the moment I am doubting that this bug is recent. There are hints it is related to overruning a string some where. It is specific to something / configuration on your machine. 1) Device details It looks odd that there are both /dev/mapper/isw_efjbbijhh_Vol0[12] and /dev/mapper/isw_efjbbijhh_Vol0p[12] entries pointing at different device mapper devices. Can you run the following commands so I can get a better understanding the device configuration on your machine: # cat /proc/partitions # ls -l /dev/dm* # dmraid -s # dmraid -r # dmsetup ls # dmsetup table 2) Test older GParted releases Can you try manually bisecting GParted releases. So build GParted from git release tags, test and approximately binary chop. git checkout -f GPARTED_0_10_0 ./autogen.sh && make clean && cd src && make cp gpartedbin ~/bin/gpartedbin-0.10.0 cd .. (Add "-j 8" as desired to the make command) Test ~/bin/gpartedbin-0.10.0 /dev/sdc (I find it useful to keep old gpartedbin exes laying about just for this type of investigation). Look at git tag -l output and pick the next release to test based on whether 0.10.0 crashes or not. If works OK try 0.16.0, and if crashes try 0.6.4. Continue binary chopping through the releases like this. Probably not worth going older than about 0.5.0. Mike
Hi Mike, Thank you for your continued help to troubleshoot the crash problem on my development computer. This does look like a tough bug to track down. Following are answers to your questions, and some additional information from my own investigation. Re: 1) Device Details Note that when Kubuntu 12.04 LTS boots, only the three /dev/mapper/isw_efjbbijhh_Vol0p[12] and /dev/dm-? devices are created. When GParted is started the DMRaid class creates the other two. I did this way back because some distros use isw_efjbbijhh_Vol0[12] and others use isw_efjbbijhh_Vol0p[12]. root@octo:~# cat /proc/partitions major minor #blocks name 8 0 1953514584 sda 8 1 262144 sda1 8 2 83886080 sda2 8 3 125829120 sda3 8 4 1 sda4 8 5 33554432 sda5 8 6 41944064 sda6 8 7 41943040 sda7 8 8 125829120 sda8 8 9 167772160 sda9 8 10 335544320 sda10 8 11 41943040 sda11 8 12 629145600 sda12 8 13 200022016 sda13 8 16 117220824 sdb 8 17 104857600 sdb1 8 18 1 sdb2 8 21 4789248 sdb5 8 32 156250000 sdc 8 48 156250000 sdd 11 0 1048575 sr0 8 64 156290904 sde 8 65 20971520 sde1 8 66 10485760 sde2 252 0 156247040 dm-0 252 1 41943040 dm-1 252 2 10485760 dm-2 252 3 41943040 dm-3 252 4 10485760 dm-4 root@octo:~# ls -l /dev/dm* brw-rw---- 1 root disk 252, 0 Apr 3 10:09 /dev/dm-0 brw-rw---- 1 root disk 252, 1 Apr 3 10:09 /dev/dm-1 brw-rw---- 1 root disk 252, 2 Apr 3 10:09 /dev/dm-2 brw-rw---- 1 root disk 252, 3 Apr 3 10:09 /dev/dm-3 brw-rw---- 1 root disk 252, 4 Apr 3 10:09 /dev/dm-4 root@octo:~# dmraid -s *** Group superset isw_efjbbijhh --> Active Subset name : isw_efjbbijhh_Vol0 size : 312494080 stride : 128 type : mirror status : ok subsets: 0 devs : 2 spares : 0 root@octo:~# dmraid -r /dev/sdd: isw, "isw_efjbbijhh", GROUP, ok, 312499998 sectors, data@ 0 /dev/sdc: isw, "isw_efjbbijhh", GROUP, ok, 312499998 sectors, data@ 0 root@octo:~# dmsetup ls isw_efjbbijhh_Vol0p1 (252, 1) isw_efjbbijhh_Vol0 (252, 0) isw_efjbbijhh_Vol02 (252, 4) isw_efjbbijhh_Vol01 (252, 3) isw_efjbbijhh_Vol0p2 (252, 2) root@octo:~# dmsetup table isw_efjbbijhh_Vol0p1: 0 83886080 linear 252:0 2048 isw_efjbbijhh_Vol0: 0 312494080 mirror core 2 131072 nosync 2 8:48 0 8:32 0 1 handle_errors isw_efjbbijhh_Vol02: 0 20971520 linear 252:0 83888128 isw_efjbbijhh_Vol01: 0 83886080 linear 252:0 2048 isw_efjbbijhh_Vol0p2: 0 20971520 linear 252:0 83888128 root@octo:~# Re: 2) Test older GParted releases I haven't yet gone back to test old releases like GPARTED_0_10_0 yet because I do think the problem is recently introduced. Two things that I tested that lead me to believe the problem is invoked (not necessarily caused) by recent changes to GParted are: A) If I checkout and compile GPARTED_0_25_0 on my kubuntu 12.04 LTS, then the resulting gpartedbin runs with no problems. B) In order to see if the problem is related to recent package upgrades on my kubuntu 12.04 LTS, I booted in my older kubuntu *11.04* partition (yes, I'm a pack-rat and still have my old dev env from years ago :-). From 11.04 I checked out the recent gparted master, compiled and ran gpartedbin and this crashed. From this I infer that it is the gparted code that is causing the crash. Next I will try the old fashioned style of debugging by adding print statements in the code to see if I can narrow down where the problem originates. My initial guess is that it is something to do with string handling when either ext4 or ntfs commands are invoked to retrieve partition and file system information. Curtis
Created attachment 325271 [details] git diff of DEBUG print statements added to src/ntfs.cc Hi Mike, I've made some good progress. The crash occurs within ntfs::set_used_sectors, and does appear to be caused by the code additions in: commit 324d99a172848e4ff3fb7eb189f490bb4e6c53e5 Record file system block size where known (#760709) More specifically when the code is dealing with the "Cluster size" it look like we are trying to access beyond the length of the output string. I've attached a "git diff" of the print statements I used for debugging. When the code with the debug statements is compiled and run I see the following output: $ sudo src/gpartedbin /dev/sdc ====================== libparted : 2.3 ====================== DEBUG: Begin ntfs::set_used_sectors DEBUG: Check for Current volume size: DEBUG: Check for resize at DEBUG: Check for ERROR: Volume is full DEBUG: Check for Cluster size DEBUG: index = 4294967295 DEBUG: output.length() = 214 DEBUG: output.npos = 18446744073709551615 DEBUG: begin \noutput\n ntfsresize v2013.1.13AA.0 (libntfs-3g) ERROR(2): Failed to check '/dev/sdc2' mount state: No such file or directory Probably /etc/mtab is missing. It's too risky to continue. You might try an another Linux distro. DEBUG: end \noutput\n (gpartedbin:4000): glibmm-ERROR **: unhandled exception (type std::exception) in signal handler: what: basic_string::assign $ I think we need a check to ensure that index is not larger than output.length() prior to using index to access the output string. For example: if ( index >= output.length() || index == output.npos || sscanf( output.substr( index ).c_str(), "Cluster size : %Ld", &S ) != 1 ) S = -1; If you create a patch for this in the original bug 760709 report then I can test it on my system to ensure that all is well. Curtis
Hi Mike, POST CRASH ANALYSIS Please accept my apologies for providing some incorrect information at the start of the crash reports. In comment #7 I stated that git bisect was landing on different commits. I see now that the cause was because I did not start with an actual known good commit. I assumed that the latest git master was good because I had not encountered any problems with testing and committing other patches. In hindsight I should have started with GPARTED_0_25_0. This misleading fact led us to do more investigation than should have been required. On the plus side I did learn some more in-depth debugging in the process. Thanks for the debugging assistance, Curtis
Hi Curtis, Excellent detective and debugging work. (Printf is still the king of debugging :-). The reason it works for me on my Kubuntu 12.04 LTS is because it's a 32-bit install where as yours is a 64-bit install. This is my equivalent debugging for the the same case: D: ntfs::set_used_sectors(partition) partition.get_path()="/dev/sdc2" D: output="ntfsresize v2012.1.15AR.1 (libntfs-3g) ERROR(2): Failed to check '/dev/sdc2' mount state: No such file or directory Probably /etc/mtab is missing. It's too risky to continue. You might try an another Linux distro. " D: output.find("Cluster size")=4294967295 D: index=output.find("Cluster size") D: index=4294967295 find() is returning string::npos when no match is found. However on a 32-bit machine this is only 4294967295, but on a 64-bit machine it is 18446744073709551615. The 64-bit value is being truncated when stored in index which is only a 32-bit unsigned int type. So index shouldn't be unsigned int it should be ustring::size_type (== size_t) to make it big enough to store npos. I though I was being "more" correct comparing the result of find() to npos, but that turned out to be wrong given that the value was being truncated first! I'll write a suitable patch or two for this. Thanks, Mike P.S. The canonical way to write a new line in C++ is with std::endl like this: std::cout << "My text" << std::endl;
Hi Curtis, No need to apologise. We all make mistakes. Like a said above I find it useful to have some old GParted executables lying about to quickly test and find when bugs are first introduced. Thanks, Mike By the way on my Kubuntu 12.04 LTS VM it only has /dev/mapper/fakeraid? links and no /dev/mapper/fakeraidp? links. This is when the Fake RAID was initially created and after rebooting too. Could it be related to you using the RAID name of "Vol0" (ending in a digit) and me using "MyRaid" (not ending in a digit)? $ sudo dmraid -r [sudo] password for kubuntu: /dev/sdc: isw, "isw_bfaiacejhb", GROUP, ok, 16777214 sectors, data@ 0 /dev/sdb: isw, "isw_bfaiacejhb", GROUP, ok, 16777214 sectors, data@ 0 $ sudo parted /dev/mapper/isw_bfaiacejhb_MyRaid print Model: Linux device-mapper (mirror) (dm) Disk /dev/mapper/isw_bfaiacejhb_MyRaid: 8585MB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 1049kB 1075MB 1074MB primary ext4 2 1075MB 2149MB 1074MB primary ntfs $ ls -l /dev/mapper/ total 0 crw------- 1 root root 10, 236 Apr 3 19:32 control lrwxrwxrwx 1 root root 7 Apr 3 20:12 isw_bfaiacejhb_MyRaid -> ../dm-0 lrwxrwxrwx 1 root root 7 Apr 3 20:12 isw_bfaiacejhb_MyRaid1 -> ../dm-1 lrwxrwxrwx 1 root root 7 Apr 3 20:12 isw_bfaiacejhb_MyRaid2 -> ../dm-2 $ sudo dmsetup ls isw_bfaiacejhb_MyRaid (252, 0) isw_bfaiacejhb_MyRaid2 (252, 2) isw_bfaiacejhb_MyRaid1 (252, 1)
Hi Mike, > By the way on my Kubuntu 12.04 LTS VM it only has /dev/mapper/fakeraid? > links and no /dev/mapper/fakeraidp? links. This is when the Fake RAID > was initially created and after rebooting too. Could it be related to > you using the RAID name of "Vol0" (ending in a digit) and me using > "MyRaid" (not ending in a digit)? Yes. The last character of the device name is important. The generally accepted way of naming partitions in Linux is as follows: If the device name ends in a number, then append a "p" followed by the partition number. E.g. Device name: isw_efjbbijhh_Vol0 yields Partition name: isw_efjbbijhh_Vol0p1 Else (the device name does not end in a number), then append only the partition number. E.g. Device name: isw_efjbbijhh_Vol yields Partition name: isw_efjbbijhh_Vol1 I have never found this partition naming rule explicitly stated before, but it is common practice. The one glaring exception I encountered was with DMRAID. Many versions of DMRAID append only the partition number to the device name, regardless of whether the device name ends in a number. The problem with such a naming scheme is that it is impossible to determine if a path is a device name or a partition name from just the name. For example: isw_efjbbijhh_Vol12 Is this? a) the twelfth disk device b) The second partition on the first disk device isw_efjbbijhh_Vol1 c) The twelfth partition on disk device isw_efjbbijhh_Vol This confusion is avoided if the above listed generally accepted partition naming rule is used. Newer versions of DMRAID in Ubuntu use the generally accepted partition naming rule because Phillip Susi patched this in Ubuntu. Unfortunately patches never made it into the upstream DMRAID because it appears the project is no longer maintained. Curtis
Bug 764658 "GParted crashes when reading NTFS usage when there is no /dev/PTN entry" has been raised to record the code changes to fix the core dump because it is separate to the code fixes in this report.
Hi Mike, Now that we have the NTFS crash out of the way (aka fixed), I have resumed testing of patch set (v1) in comment #1 I have successfully compiled and run GParted on the following distros: debian 7 debian 8 fedora 23 kubuntu 12.04 openSUSE 13.2 ubuntu 14.04 ubuntu 15.10 Patch set (v1) from comment #1 has been committed to the git respository. The relevant git commits can be viewed at the following links: Use realpath() safely (#764369) https://git.gnome.org/browse/gparted/commit/?id=d04826cc27462431b5f43e132c0382e6b6debd6d Minor tidyup in load_proc_partitions_info_cache() https://git.gnome.org/browse/gparted/commit/?id=e0a208576d00bcdf175fc401653a53483b41a87a Curtis
This enhancement was included in the GParted 0.26.0 release on April 26, 2016.