Bug 764369 – Use realpath() safely

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 764369 - Use realpath() safely


Summary:	Use realpath() safely


Status:	RESOLVED FIXED

Product:	gparted
Classification:	Other
Component:	application
Version:	GIT HEAD
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Mike Fleetwood
QA Contact:	gparted maintainers alias

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2016-03-30 14:49 UTC by Mike Fleetwood
Modified:	2016-04-26 15:56 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Use realpath() safely (v1) (7.92 KB, patch) 2016-03-30 16:50 UTC, Mike Fleetwood	none	Details \| Review
File valgrind.log.tar.xz (57.09 KB, application/x-compressed-tar) 2016-04-02 17:52 UTC, Curtis Gedak		Details
git diff of DEBUG print statements added to src/ntfs.cc (2.44 KB, text/plain) 2016-04-03 17:17 UTC, Curtis Gedak		Details

Description Mike Fleetwood 2016-03-30 14:49:36 UTC

Quoting one of my commit messages:

    realpath(3) manual page says:
    
        BUGS
            The POSIX.1-2001 standard version of this function is broken by
            design, since it is impossible to determine a suitable size for
            the output buffer, resolved_path.  According to POSIX.1-2001 a
            buffer of size PATH_MAX suffices, but PATH_MAX need not be a
            defined constant, and may have to be obtained using pathconf(3).
            And asking pathconf(3) does not really help, since, on the one
            hand POSIX warns that the result of pathconf(3) may be huge and
            unsuitable for mallocing memory, and on the other hand
            pathconf(3) may return -1 to signify that PATH_MAX is not
            bounded.  The resolved_path == NULL feature, not standardized in
            POSIX.1-2001, but standardized in POSIX.1-2008, allows this
            design problem to be avoided.
    
    The resolved_path == NULL feature of realpath() has existed as a Glibc
    extension since realpath() was first added to Glibc 1.90, released in
    June 1996.  Therefore it can be unsed unconditionally.
    
        https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=fa0bc87c32d02cd81ec4d0ae00e0d943c683e6e1

Comment 1 Mike Fleetwood 2016-03-30 16:50:07 UTC

Created attachment 325033 [details] [review]
Use realpath() safely (v1)

Hi Curtis,

Here is the patchset to fix this.

Thanks,
Mike

Comment 2 Curtis Gedak 2016-03-31 16:39:38 UTC

Hi Mike,

Thank you for the patch.

I just started testing and have run into a problem.

When I compile and run gpartedbin on kubuntu 12.04 LTS (my development box) it crashes with the following message:

$ sudo src/gpartedbin 
======================
libparted : 2.3
======================

(gpartedbin:21963): glibmm-ERROR **: 
unhandled exception (type std::exception) in signal handler:
what: basic_string::assign


I can run additional tests if needed.  'Just let me know.

Curtis

Comment 3 Curtis Gedak 2016-03-31 16:55:08 UTC

I have narrowed the crash down further.  It occurs with the two disk devices in my Intel Soltware RAID aka FAKE RAID (/dev/sdb and /dev/sdd).  It does not occur with the actual RAID device (/dev/mapper/isw_efjbbijhh_Vol0).  This is a RAID1 mirror configuration.

The related device path entries are as follows:

$ ls -l /dev/sd*
brw-rw---- 1 root disk 8,  0 Mar 31 10:44 /dev/sda
brw-rw---- 1 root disk 8,  1 Mar 31 10:44 /dev/sda1
brw-rw---- 1 root disk 8, 10 Mar 31 09:14 /dev/sda10
brw-rw---- 1 root disk 8, 11 Mar 31 09:14 /dev/sda11
brw-rw---- 1 root disk 8, 12 Mar 31 09:14 /dev/sda12
brw-rw---- 1 root disk 8, 13 Mar 31 10:44 /dev/sda13
brw-rw---- 1 root disk 8,  2 Mar 31 10:44 /dev/sda2
brw-rw---- 1 root disk 8,  3 Mar 31 09:14 /dev/sda3
brw-rw---- 1 root disk 8,  4 Mar 31 10:44 /dev/sda4
brw-rw---- 1 root disk 8,  5 Mar 31 09:14 /dev/sda5
brw-rw---- 1 root disk 8,  6 Mar 31 10:44 /dev/sda6
brw-rw---- 1 root disk 8,  7 Mar 31 10:44 /dev/sda7
brw-rw---- 1 root disk 8,  8 Mar 31 09:14 /dev/sda8
brw-rw---- 1 root disk 8,  9 Mar 31 09:14 /dev/sda9
brw-rw---- 1 root disk 8, 16 Mar 31 09:14 /dev/sdb
brw-rw---- 1 root disk 8, 17 Mar 31 09:14 /dev/sdb1
brw-rw---- 1 root disk 8, 18 Mar 31 09:14 /dev/sdb2
brw-rw---- 1 root disk 8, 21 Mar 31 09:14 /dev/sdb5
brw-rw---- 1 root disk 8, 32 Mar 31 10:45 /dev/sdc
brw-rw---- 1 root disk 8, 48 Mar 31 10:45 /dev/sdd
brw-rw---- 1 root disk 8, 64 Mar 31 10:45 /dev/sde
brw-rw---- 1 root disk 8, 65 Mar 31 09:14 /dev/sde1
brw-rw---- 1 root disk 8, 66 Mar 31 09:14 /dev/sde2

$ ls -l /dev/mapper
total 0
crw------- 1 root root 10, 236 Mar 31 09:14 control
lrwxrwxrwx 1 root root       7 Mar 31 10:44 isw_efjbbijhh_Vol0 -> ../dm-0
lrwxrwxrwx 1 root root       7 Mar 31 10:31 isw_efjbbijhh_Vol01 -> ../dm-3
lrwxrwxrwx 1 root root       7 Mar 31 10:31 isw_efjbbijhh_Vol02 -> ../dm-4
lrwxrwxrwx 1 root root       7 Mar 31 10:44 isw_efjbbijhh_Vol0p1 -> ../dm-1
lrwxrwxrwx 1 root root       7 Mar 31 10:44 isw_efjbbijhh_Vol0p2 -> ../dm-2


Of note is that because the drives are in a mirror, they have a visible partition table and partitions (at least to fdisk and parted), but do not have the corresponding partition path entries.  For example there is no /dev/sdc1 or /dev/sdc2.

Curtis

Comment 4 Curtis Gedak 2016-03-31 16:59:11 UTC

Typo in comment #3.  The RAID devices are *sdc* and *sdd*.

Comment 5 Mike Fleetwood 2016-03-31 19:19:37 UTC

Hi Curtis,

Sorry about the crash.  Can you bisect it please to the causing commit.
(I assume it will be patch number 1, but need to be sure its not in HEAD
already).

Can you also generate a backtrace please.

Thanks,
Mike


How to capture a backtrace from a coredump
------------------------------------------

1)  Turn off any OS core dump capturing, ensuring:
        cat /proc/sys/kernel/core_pattern
    reports just "core"

    Some methods to do this, depending on distro version, are:
    *   service abrt-ccpp stop
    *   systemctl stop abrtd
    *   sudo service apport stop

2)  Increase core dump limit and run GParted as root
    Either:
        ulimit -c unlimited
        sudo gparted
    or:
        su - root
        ulimit -c unlimited
        gparted

3)  Perform crashing action

4)  Capture backtrace

	ls -lrt core*
        which gpartedbin
        gdb `which gpartedbin` {COREFILE} --batch --quiet \
            -ex backtrace -ex quit > backtrace.log

Please paste the terminal output when running gparted and the contents
of the backtrace.log file.

Comment 6 Mike Fleetwood 2016-04-01 12:51:14 UTC

Hi Curtis,

I tried creating a Fake RAID array with Intel Software RAID format in
both a CentOS 7 and Kubuntu 12.04 LTS VMs, but was not able to produce a
crash in GParted.

# dmraid -f isw -C MyRaid --type 1 --disk /dev/sdc,/dev/sdd
# dmraid -ay
# .../gpartedbin

Going to need the backtrace for guidance about where it is going wrong
and possibly going to have to debug via you and your machine.

Mike

Comment 7 Curtis Gedak 2016-04-01 16:21:05 UTC

Hi Mike,

Thanks for the detailed instructions.

No worries on the crash.  It's great to have you on the team.  :-)
I've introduced more than my share of problems and several made it
through to a production release.

Regarding git bisect I seem to be challenged with the process.  I've
had it land on different commits as the problem.  In an attempt to get
past this I am running the bisect starting with GParted 0.25.0 *AND*
running make clean before each and every build.  Hopefully this will
help me pinpoint the exact commit.


$ git bisect start
$ git bisect good 976eea771c5374eca2ae845066577d222715de8e
$ git bisect bad 3d1262b8bc658e1167eea8dc8eeecb8daa260b61
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[ad4191475a228a5db2d150597bb15a1f59d9284c] Rename file system from "crypt-luks" to "luks" (#760080)
$ make clean && ./autogen.sh && make -j 8
$ sudo src/gpartedbin /dev/sdc
$ git bisect good
Bisecting: 24 revisions left to test after this (roughly 5 steps)
[27e30a570ff50509d710ee8dccdad848f01c0e4d] Remove unused OperationDetail members (#760709)
$ make clean && ./autogen.sh && make -j 8
$ sudo src/gpartedbin /dev/sdc
======================
libparted : 2.3
======================

(gpartedbin:24088): glibmm-ERROR **:
unhandled exception (type std::exception) in signal handler:
what: basic_string::assign

$ git bisect bad
Bisecting: 11 revisions left to test after this (roughly 4 steps)
[608060f82dae66cca0a8be2590bdd12ddcdf8be7] Update ext2 resize progress tracker to use the new ProgressBar (#760709)
$ make clean && ./autogen.sh && make -j 8
$ sudo src/gpartedbin /dev/sdc
$ git bisect good
Bisecting: 5 revisions left to test after this (roughly 3 steps)
[b0bd4650982f5c804eae24eb59ec8ac015be5ba1] Fix formatting of negative time values (#760709)
$ make clean && ./autogen.sh && make -j 8
$ sudo src/gpartedbin /dev/sdc
$ git bisect good
Bisecting: 2 revisions left to test after this (roughly 2 steps)
[965d88d197c4c932bcd5cf4654e4cd44ca997377] Call any FS specific progress trackers for stderr updates too (#760709)
$ make clean && ./autogen.sh && make -j 8
$ sudo src/gpartedbin /dev/sdc
======================
libparted : 2.3
======================

(gpartedbin:4966): glibmm-ERROR **:
unhandled exception (type std::exception) in signal handler:
what: basic_string::assign

$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 1 step)
[324d99a172848e4ff3fb7eb189f490bb4e6c53e5] Record file system block size where known (#760709)
$ make clean && ./autogen.sh && make -j 8
$ sudo src/gpartedbin /dev/sdc
======================
libparted : 2.3
======================

(gpartedbin:9334): glibmm-ERROR **: 
unhandled exception (type std::exception) in signal handler:
what: basic_string::assign

$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[809a7e095444f881256871d915b460af5cb4bab2] Display progress of XFS file system specific copy operation (#760709)
$ make clean && ./autogen.sh && make -j 8
$ sudo src/gpartedbin /dev/sdc
$ git bisect good
324d99a172848e4ff3fb7eb189f490bb4e6c53e5 is the first bad commit
commit 324d99a172848e4ff3fb7eb189f490bb4e6c53e5
Author: Mike Fleetwood <mike.fleetwood@googlemail.com>
Date:   Sat Jan 16 10:40:58 2016 +0000

    Record file system block size where known (#760709)

    Record the file system block size in the Partition object.  Only
    implemented for file systems when set_used_sectors() method has already
    parsed the value or can easily parse the value from the existing
    executed command(s).

    Needed for ext2/3/4 copies and moves performed using e2image so that
    they can be tracked in bytes by the ProgressBar class as e2image reports
    progress in file system block size units.

    Bug 760709 - Add progress bars to XFS and EXT2/3/4 file system specific
                 copy methods

:040000 040000 5a1e6ad5d1ba82192ba2ed5e414c643157bf9c5f 738c343fca494dc238c8e6e32106ec57a8271aba M       include
:040000 040000 c8314c2e67d9647ffa592de9feb5d718c26bc71f c767a08f5f3e98cf49c692835a4b7400a66b5241 M       src
$


This result has me really puzzled.  I know that I had compiled and run
GParted when testing the following bug report:

   Bug #760709 - Add progress bars to XFS and EXT2/3/4 file system
                 specific copy methods

My guess is that I only tested on a single drive -- I often do this to
speed up testing.  For example I have an old 160 GiB IDE (sde) drive I
often use.

   sudo src/gpartedbin /dev/sde

I will work towards capturing the backtrace.  However, I have a number
of other commitments so as a heads-up it might delay this effort for a
few days.

Regards,
Curtis

Comment 8 Curtis Gedak 2016-04-01 16:30:07 UTC

If it helps I only have two file systems on the FAKE RAID mirror.

$ sudo parted /dev/mapper/isw_efjbbijhh_Vol0 unit s print
Model: Linux device-mapper (mirror) (dm)
Disk /dev/mapper/isw_efjbbijhh_Vol0: 312494080s
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start      End         Size       Type     File system  Flags
 1      2048s      83888127s   83886080s  primary  ext4
 2      83888128s  104859647s  20971520s  primary  ntfs

$ sudo blkid | grep isw
/dev/sdc: TYPE="isw_raid_member" 
/dev/sdd: TYPE="isw_raid_member" 
/dev/mapper/isw_efjbbijhh_Vol0p1: LABEL="RAID-Drive" UUID="9d690ab7-bf36-4d63-99f2-96b6627c6c8e" TYPE="ext4" 
/dev/mapper/isw_efjbbijhh_Vol0p2: LABEL="ntfs-testing" UUID="064FFA5E61837C91" TYPE="ntfs" 
/dev/mapper/isw_efjbbijhh_Vol01: LABEL="RAID-Drive" UUID="9d690ab7-bf36-4d63-99f2-96b6627c6c8e" TYPE="ext4" 
/dev/mapper/isw_efjbbijhh_Vol02: LABEL="ntfs-testing" UUID="064FFA5E61837C91" TYPE="ntfs"

Comment 9 Curtis Gedak 2016-04-01 16:49:08 UTC

Hi Mike,

Well, capturing a backtrace didn't take as long as I thought it would.
Then again maybe I did it wrong...


$ cat /proc/sys/kernel/core_pattern
|/usr/share/apport/apport %p %s %c
$ sudo service apport stop
apport stop/waiting
gedakc@octo:~/workspace/gparted$ cat /proc/sys/kernel/core_pattern
core
$ ulimit -c unlimited
$ sudo src/gpartedbin
======================
libparted : 2.3
======================

(gpartedbin:22268): glibmm-ERROR **:
unhandled exception (type std::exception) in signal handler:
what: basic_string::assign

$ ls -lrt core*
-rw------- 1 root root 31916032 Apr  1 10:38 core
$ which gpartedbin
/usr/local/sbin/gpartedbin
$ # Note that I am running the local src/gpartedbin directly, not from PATH
$ sudo gdb src/gpartedbin core --batch --quiet \
      -ex backtrace -ex quit > backtrace.log

warning: Can't read pathname for load map: Input/output error.
$ ls -l backtrace.log
-rw-rw-r-- 1 gedakc gedakc 992 Apr  1 10:44 backtrace.log
$  cat backtrace.log
[New LWP 22280]
[New LWP 22268]
[New LWP 22272]
[New LWP 22279]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `src/gpartedbin'.
Program terminated with signal 5, Trace/breakpoint trap.

+ Trace 236136

#0 g_logv
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#0 g_logv
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#1 g_log
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2 Glib::exception_handlers_invoke()
from /usr/lib/x86_64-linux-gnu/libglibmm-2.4.so.1
#3 ??
from /usr/lib/x86_64-linux-gnu/libglibmm-2.4.so.1
#4 ??
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5 start_thread
at pthread_create.c line 308
#6 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 112
#7 ??



I noticed the warning about "Can't read pathname..." so perhaps I
didn't do something correctly.

If needed I can recompile with debug options such as with:

  export CXXFLAGS="-g -O0"


Curtis

Comment 10 Mike Fleetwood 2016-04-02 13:29:14 UTC

Hi Curtis,


The crash
---------

At the moment this crash is hard to figure out.


1) Terminated by signal 5, Trace/breakpoint trap

Normally GDB reports either of these as the cause of core dumps:
    Program terminated with signal 6, Aborted.
    Program terminated with signal 11, Segmentation fault.
Abort when an assert() test fails; or Segmentation fault when out of
range memory is accessed such as by a NULL or otherwise invalid pointer.

Your crash is from signal 5, Trace/breakpoint trap.  This is normally
part of the mechanism a debugger uses to allow a program to run until
is hits a breakpoint.  The kernel delivers signal 5 (SIGTRAP) when a
breakpoint is hit to inform the debugger.

Subject: Re: When is SIGTRAP raised?
http://linux.derkeiler.com/Newsgroups/comp.os.linux.development.apps/2008-10/msg00107.html


2) Backtrace

The backtrace shows a new thread being created by glib and running some
glibmm function before an exception handler function is called.  Doesn't
get as far as running any GParted functions in that new thread.
    clone           at close.S line 112
    start_thread    at pthread_create.c line 308
    ??              from libglib-2.0.so.0
    ??              from libglibmm-2.4.so.1
    Glib::exception_handlers_invoke()  from glibmm-2.4.so.1
    g_log           from libglib-2.0.so.0
    g_logv          from libglib-2.0.so.0
    g_logv          from libglib-2.0.so.0
This is consistent with the crashing error.
    (gpartedbin:22268): glibmm-ERROR **:
    unhandled exception (type std::exception) in signal handler:
    what: basic_string::assign
There appear to be no debugging symbols available for the glib and
glibmm libraries, hence the "from LIBRARY" instead of
"at FILE line NUM".


3) GDB reporting "Can't read pathname for load map: Input/output error"

I think this is GDB reporting that it can't find the symbolic debugging
information in the core file.  I have seen one post hinting that this
can happen in the stack of the crashing program was corrupted and
overwritten.

Generating core files on Ubuntu
http://askubuntu.com/questions/414429/generating-core-files-on-ubuntu


More thoughts
-------------

Git bisect can also give inconsistent problem commit when searching for
a bug if it doesn't depend on the code being bisected.  For example if
the bug is related to configuration, is an a third party library or is
related to something else in the OS setup.  However crashes can be as a
result of undefined behaviour which may not be repeatable.

It is possible for OS updates to cause weird failures if for example
they update shared libraries which are in use by some long running
processes but new processes get the new copies.

So far the crash looks like it could be caused by memory corruption,
possibly overwritting the stack.  However commit:
    commit 324d99a172848e4ff3fb7eb189f490bb4e6c53e5
    Record file system block size where known (#760709)
which only adds a numeric member to a class and assigns to it, in one
case using sscanf() doesn't seem like it would be the root cause.


Next steps
----------

1) Reboot if in any doubt about whether any updates have been installed
   since the last reboot.

2) Install debugging symbol packages for libraries glib and gtkmm.  This
   Ubuntu wiki article describes how.

    Debug Symbol Packages
    https://wiki.ubuntu.com/Debug%20Symbol%20Packages

3) Build GParted from master without optimisation.

    git checkout -f master
    ./autogen.sh CXXFLAGS='-g -O0' && make clean && cd src && make

4) Run under valgrind.

    valgrind --track-origins=yes --leak-check=full ./gpartedbin 2> valgrind.log

   If memory corruption is happening I would expect the log to contain:
    Invalid write of size


Thanks,
Mike

P.S. Tell me to stop teaching you to suck eggs with all the specific
     commands if you want.

Comment 11 Curtis Gedak 2016-04-02 17:52:38 UTC

Created attachment 325230 [details]
File valgrind.log.tar.xz

Hi Mike,

Thanks again for the detailed instructions.  Please do continue with
the specific steps.  I appreciate these as I tend to be more of a
generalist -- I work in multiple computer languages, but not at
extreme depth in any single one.

I installed the following debug packages:

$ sudo apt-get install libglibmm-2.4-dbg libgtkmm-2.4-dbg \
                       libpangomm-1.4-dbg

Then I ran the commands you indicated:

$ git checkout -f master
$ ./autogen.sh CXXFLAGS='-g -O0' && make clean && cd src && make

Then I ran valgrind:

$ sudo valgrind --track-origins=yes --leak-check=full ./gpartedbin \
       2> valgrind.log

GParted ran much slower under valgrind and I was able to see it scan
through various devices (/dev/mapper/isw_efjbbijhh_Vol0, /dev/sda,
/dev/sdb) before it crashed on /dev/sdc.

I searched the valgrind.log but did not observer any "Invalid write of
size" statements.

I will try running a backtrace again.

Curtis

Comment 12 Curtis Gedak 2016-04-02 18:00:36 UTC

Hi Mike,

Creation of another backtrace follows:

$ cat /proc/sys/kernel/core_pattern
|/usr/share/apport/apport %p %s %c
$ sudo service apport stop
apport stop/waiting
gedakc@octo:~/workspace/gparted$ cat /proc/sys/kernel/core_pattern
core
$ ulimit -c unlimited
$ sudo src/gpartedbin
======================
libparted : 2.3
======================

(gpartedbin:13729): glibmm-ERROR **:
unhandled exception (type std::exception) in signal handler:
what: basic_string::assign

$ ls -lrt core*
-rw------- 1 root root 31911936 Apr  2 11:55 core
$ sudo gdb src/gpartedbin core --batch --quiet \
      -ex backtrace -ex quit > backtrace.log

warning: Can't read pathname for load map: Input/output error.
$ ls -l backtrace.log
-rw-rw-r-- 1 gedakc gedakc 1051 Apr  2 11:56 backtrace.log
$ cat backtrace.log
[New LWP 13741]
[New LWP 13733]
[New LWP 13740]
[New LWP 13729]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `src/gpartedbin'.
Program terminated with signal 5, Trace/breakpoint trap.

+ Trace 236137

#0 g_logv
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#0 g_logv
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#1 g_log
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2 glibmm_unexpected_exception
at exceptionhandler.cc line 71
#3 Glib::exception_handlers_invoke
at exceptionhandler.cc line 150
#4 (anonymous namespace)::call_thread_entry_slot
at thread.cc line 61
#5 ??
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#6 start_thread
at pthread_create.c line 308
#7 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 112
#8 ??



Some other things I might try are removing each of the partitions from
the FAKE RAID.  That might help us pinpoint if the problem is related
to a specific file system (e.g., ext4 or ntfs).

Would you like me to try that?

Curtis

Comment 13 Curtis Gedak 2016-04-02 18:29:46 UTC

Hi Mike,

From the previous backtrace it looks like I missed one of the debug
packages.

$ sudo apt-get install libglib2.0-0-refdbg

I rebooted before recompiling.

Create another backtrace using the above listed steps (I didn't list
all of the steps this time):

$ sudo src/gpartedbin
======================
libparted : 2.3
======================

(gpartedbin:7538): glibmm-ERROR **: 
unhandled exception (type std::exception) in signal handler:
what: basic_string::assign

$ ls -lrt core*
-rw------- 1 root root 31911936 Apr  2 12:27 core
$ sudo gdb src/gpartedbin core --batch --quiet \
      -ex backtrace -ex quit > backtrace.log

warning: Can't read pathname for load map: Input/output error.
$ ls -l backtrace.log
-rw-rw-r-- 1 gedakc gedakc 1047 Apr  2 12:28 backtrace.log
$ cat backtrace.log
[New LWP 7574]
[New LWP 7542]
[New LWP 7538]
[New LWP 7573]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `src/gpartedbin'.
Program terminated with signal 5, Trace/breakpoint trap.

+ Trace 236138

#0 g_logv
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#0 g_logv
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#1 g_log
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2 glibmm_unexpected_exception
at exceptionhandler.cc line 71
#3 Glib::exception_handlers_invoke
at exceptionhandler.cc line 150
#4 (anonymous namespace)::call_thread_entry_slot
at thread.cc line 61
#5 ??
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#6 start_thread
at pthread_create.c line 308
#7 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 112
#8 ??



Curtis

Comment 14 Curtis Gedak 2016-04-02 18:45:55 UTC

Hi Mike,

Doh!  Still missed a debug package...

$ sudo apt-get install libglib2.0-0-dbg

Also missed a development package:

$ sudo apt-get install libglib2.0-dev


Here's the backtrace yet again (after recompile, etc.).

$ sudo src/gpartedbin
======================
libparted : 2.3
======================

(gpartedbin:11480): glibmm-ERROR **:
unhandled exception (type std::exception) in signal handler:
what: basic_string::assign

$ ls -lrt core*
-rw------- 1 root root 31911936 Apr  2 12:40 core
$ sudo gdb src/gpartedbin core --batch --quiet \
      -ex backtrace -ex quit > backtrace.log

warning: Can't read pathname for load map: Input/output error.
765     /build/buildd/glib2.0-2.32.4/./glib/gmessages.c: No such file or directory.
$ ls -l backtrace.log
-rw-rw-r-- 1 gedakc gedakc 1516 Apr  2 12:40 backtrace.log
$ cat backtrace.log
[New LWP 11492]
[New LWP 11480]
[New LWP 11484]
[New LWP 11491]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `src/gpartedbin'.
Program terminated with signal 5, Trace/breakpoint trap.

+ Trace 236139

#0 g_logv
#0 g_logv
#1 g_log
at /build/buildd/glib2.0-2.32.4/./glib/gmessages.c line 792
#2 glibmm_unexpected_exception
at exceptionhandler.cc line 71
#3 Glib::exception_handlers_invoke
at exceptionhandler.cc line 150
#4 (anonymous namespace)::call_thread_entry_slot
at thread.cc line 61
#5 g_thread_proxy
at /build/buildd/glib2.0-2.32.4/./glib/gthread.c line 801
#6 start_thread
at pthread_create.c line 308
#7 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 112
#8 ??



From the output it looks like I might be missing something related to
gmessages.

Curtis

Comment 15 Mike Fleetwood 2016-04-03 10:08:29 UTC

Hi Curtis,


Running under valgrind is expected to be slow.  It works by running the
client program on a virtual CPU using JIT so that it can track all
memory allocations and reads and writes to check for out of bounds reads
and writes.


Invalid read of size
--------------------

From your valgrind.log from comment #11 I see lots of "Invalid read of
size"
    ==13127== Invalid read of size 8
    ==13127==    at 0x7D40A2F: wcslen (wcslen.S:48)
    ==13127==    by 0x7D4ADB9: wcscoll_l (strcoll_l.c:509)
    ==13127==    by 0x6B696C9: g_utf8_collate
                      (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.3200.4)
which are comming from comparision of Glib::ustring types via these
function calls:
    ==13127==    by 0x41FD6E: Glib::operator==(Glib::ustring const&,
                                  Glib::ustring const&) (ustring.h:1478)
    ==13127==    by 0x41FDC2: Glib::operator!=(Glib::ustring const&,
                                           char const*) (ustring.h:1495)
    ==13127==    by 0x421545: Glib::operator<(Glib::ustring const&,
                                  Glib::ustring const&) (ustring.h:1504)

And these too which is something in the X11 client font system.
    ==13127== Invalid read of size 4
    ==13127==    at 0x9BC70D3: ???
                   (in /usr/lib/x86_64-linux-gnu/libfontconfig.so.1.4.4)
    ==13127==    by 0x9BC9464: FcConfigFilename
                   (in /usr/lib/x86_64-linux-gnu/libfontconfig.so.1.4.4)
    ==13127==    by 0x9BDCA15: FcConfigParseAndLoad
                   (in /usr/lib/x86_64-linux-gnu/libfontconfig.so.1.4.4)

I also get similar Invalid read of size 8 and 4 on my Kubuntu 12.04 LTS
virtual machine however it doesn't crash GParted.  I don't get these on
any other distro I have tried, not CentOS 5 and not CentOS 6.

Invalid reads don't lead to whole sale overwritting of the stack which
is possibly what GDB reporting this warning is indicating "Can't read
pathname for load map: Input/output error".


Next steps
----------

It seems valgrind is not helping!  This is not going to be easy to find.

At the moment I am doubting that this bug is recent.  There are hints it
is related to overruning a string some where.  It is specific to
something / configuration on your machine.


1) Device details

   It looks odd that there are both /dev/mapper/isw_efjbbijhh_Vol0[12]
   and /dev/mapper/isw_efjbbijhh_Vol0p[12] entries pointing at different
   device mapper devices.  Can you run the following commands so I can
   get a better understanding the device configuration on your machine:

      # cat /proc/partitions
      # ls -l /dev/dm*
      # dmraid -s
      # dmraid -r
      # dmsetup ls
      # dmsetup table


2) Test older GParted releases

   Can you try manually bisecting GParted releases.  So build GParted
   from git release tags, test and approximately binary chop.

      git checkout -f GPARTED_0_10_0
      ./autogen.sh && make clean && cd src && make
      cp gpartedbin ~/bin/gpartedbin-0.10.0
      cd ..

   (Add "-j 8" as desired to the make command)

   Test ~/bin/gpartedbin-0.10.0 /dev/sdc
   (I find it useful to keep old gpartedbin exes laying about just for
   this type of investigation).

   Look at git tag -l output and pick the next release to test based on
   whether 0.10.0 crashes or not.  If works OK try 0.16.0, and if
   crashes try 0.6.4.  Continue binary chopping through the releases
   like this.  Probably not worth going older than about 0.5.0.


Mike

Comment 16 Curtis Gedak 2016-04-03 16:20:25 UTC

Hi Mike,

Thank you for your continued help to troubleshoot the crash problem on
my development computer.  This does look like a tough bug to track
down.

Following are answers to your questions, and some additional
information from my own investigation.

Re: 1)  Device Details

Note that when Kubuntu 12.04 LTS boots, only the three
/dev/mapper/isw_efjbbijhh_Vol0p[12] and /dev/dm-? devices are created.
When GParted is started the DMRaid class creates the other two.  I did
this way back because some distros use isw_efjbbijhh_Vol0[12] and
others use isw_efjbbijhh_Vol0p[12].

root@octo:~# cat /proc/partitions
major minor  #blocks  name

   8        0 1953514584 sda
   8        1     262144 sda1
   8        2   83886080 sda2
   8        3  125829120 sda3
   8        4          1 sda4
   8        5   33554432 sda5
   8        6   41944064 sda6
   8        7   41943040 sda7
   8        8  125829120 sda8
   8        9  167772160 sda9
   8       10  335544320 sda10
   8       11   41943040 sda11
   8       12  629145600 sda12
   8       13  200022016 sda13
   8       16  117220824 sdb
   8       17  104857600 sdb1
   8       18          1 sdb2
   8       21    4789248 sdb5
   8       32  156250000 sdc
   8       48  156250000 sdd
  11        0    1048575 sr0
   8       64  156290904 sde
   8       65   20971520 sde1
   8       66   10485760 sde2
 252        0  156247040 dm-0
 252        1   41943040 dm-1
 252        2   10485760 dm-2
 252        3   41943040 dm-3
 252        4   10485760 dm-4
root@octo:~# ls -l /dev/dm*
brw-rw---- 1 root disk 252, 0 Apr  3 10:09 /dev/dm-0
brw-rw---- 1 root disk 252, 1 Apr  3 10:09 /dev/dm-1
brw-rw---- 1 root disk 252, 2 Apr  3 10:09 /dev/dm-2
brw-rw---- 1 root disk 252, 3 Apr  3 10:09 /dev/dm-3
brw-rw---- 1 root disk 252, 4 Apr  3 10:09 /dev/dm-4
root@octo:~# dmraid -s
*** Group superset isw_efjbbijhh
--> Active Subset
name   : isw_efjbbijhh_Vol0
size   : 312494080
stride : 128
type   : mirror
status : ok
subsets: 0
devs   : 2
spares : 0
root@octo:~# dmraid -r
/dev/sdd: isw, "isw_efjbbijhh", GROUP, ok, 312499998 sectors, data@ 0
/dev/sdc: isw, "isw_efjbbijhh", GROUP, ok, 312499998 sectors, data@ 0
root@octo:~# dmsetup ls
isw_efjbbijhh_Vol0p1    (252, 1)
isw_efjbbijhh_Vol0      (252, 0)
isw_efjbbijhh_Vol02     (252, 4)
isw_efjbbijhh_Vol01     (252, 3)
isw_efjbbijhh_Vol0p2    (252, 2)
root@octo:~# dmsetup table
isw_efjbbijhh_Vol0p1: 0 83886080 linear 252:0 2048
isw_efjbbijhh_Vol0: 0 312494080 mirror core 2 131072 nosync 2 8:48 0 8:32 0 1 handle_errors
isw_efjbbijhh_Vol02: 0 20971520 linear 252:0 83888128
isw_efjbbijhh_Vol01: 0 83886080 linear 252:0 2048
isw_efjbbijhh_Vol0p2: 0 20971520 linear 252:0 83888128
root@octo:~#


Re:  2) Test older GParted releases

I haven't yet gone back to test old releases like GPARTED_0_10_0 yet
because I do think the problem is recently introduced.

Two things that I tested that lead me to believe the problem is
invoked (not necessarily caused) by recent changes to GParted are:

A) If I checkout and compile GPARTED_0_25_0 on my kubuntu 12.04 LTS,
   then the resulting gpartedbin runs with no problems.

B) In order to see if the problem is related to recent package
   upgrades on my kubuntu 12.04 LTS, I booted in my older kubuntu
   *11.04* partition (yes, I'm a pack-rat and still have my old dev
   env from years ago :-).

   From 11.04 I checked out the recent gparted master, compiled and
   ran gpartedbin and this crashed.

   From this I infer that it is the gparted code that is causing the
   crash.


Next I will try the old fashioned style of debugging by adding print
statements in the code to see if I can narrow down where the problem
originates.  My initial guess is that it is something to do with
string handling when either ext4 or ntfs commands are invoked to
retrieve partition and file system information.

Curtis

Comment 17 Curtis Gedak 2016-04-03 17:17:32 UTC

Created attachment 325271 [details]
git diff of DEBUG print statements added to src/ntfs.cc

Hi Mike,

I've made some good progress.  The crash occurs within
ntfs::set_used_sectors, and does appear to be caused by the code
additions in:

    commit 324d99a172848e4ff3fb7eb189f490bb4e6c53e5
    Record file system block size where known (#760709)

More specifically when the code is dealing with the "Cluster size" it
look like we are trying to access beyond the length of the output
string.

I've attached a "git diff" of the print statements I used for
debugging.

When the code with the debug statements is compiled and run I see the
following output:


$ sudo src/gpartedbin /dev/sdc
======================
libparted : 2.3
======================
DEBUG:  Begin ntfs::set_used_sectors
DEBUG:  Check for Current volume size:
DEBUG:  Check for resize at
DEBUG:  Check for ERROR: Volume is full
DEBUG:  Check for Cluster size
DEBUG:  index = 4294967295
DEBUG:  output.length() = 214
DEBUG:  output.npos = 18446744073709551615
DEBUG:  begin \noutput\n
ntfsresize v2013.1.13AA.0 (libntfs-3g)
ERROR(2): Failed to check '/dev/sdc2' mount state: No such file or directory
Probably /etc/mtab is missing. It's too risky to continue. You might try
an another Linux distro.

DEBUG:  end   \noutput\n

(gpartedbin:4000): glibmm-ERROR **:
unhandled exception (type std::exception) in signal handler:
what: basic_string::assign

$


I think we need a check to ensure that index is not larger than
output.length() prior to using index to access the output string.  For
example:


  if ( index >= output.length() ||
       index == output.npos ||
       sscanf( output.substr( index ).c_str(), "Cluster size       : %Ld", &S ) != 1 )
			S = -1;


If you create a patch for this in the original bug 760709 report then
I can test it on my system to ensure that all is well.

Curtis

Comment 18 Curtis Gedak 2016-04-03 18:48:13 UTC

Hi Mike,

POST CRASH ANALYSIS

Please accept my apologies for providing some incorrect information
at the start of the crash reports.

In comment #7 I stated that git bisect was landing on different
commits.  I see now that the cause was because I did not start with an
actual known good commit.  I assumed that the latest git master was
good because I had not encountered any problems with testing and
committing other patches.  In hindsight I should have started with
GPARTED_0_25_0.

This misleading fact led us to do more investigation than should have
been required.

On the plus side I did learn some more in-depth debugging in the process.

Thanks for the debugging assistance,
Curtis

Comment 19 Mike Fleetwood 2016-04-03 19:30:39 UTC

Hi Curtis,


Excellent detective and debugging work.
(Printf is still the king of debugging :-).


The reason it works for me on my Kubuntu 12.04 LTS is because it's a
32-bit install where as yours is a 64-bit install.


This is my equivalent debugging for the the same case:

D: ntfs::set_used_sectors(partition) partition.get_path()="/dev/sdc2"
D:  output="ntfsresize v2012.1.15AR.1 (libntfs-3g)
ERROR(2): Failed to check '/dev/sdc2' mount state: No such file or directory
Probably /etc/mtab is missing. It's too risky to continue. You might try
an another Linux distro.
"
D:  output.find("Cluster size")=4294967295
D:  index=output.find("Cluster size")
D:  index=4294967295


find() is returning string::npos when no match is found.  However on a
32-bit machine this is only 4294967295, but on a 64-bit machine it is
18446744073709551615.  The 64-bit value is being truncated when stored
in index which is only a 32-bit unsigned int type.

So index shouldn't be unsigned int it should be ustring::size_type
(== size_t) to make it big enough to store npos.

I though I was being "more" correct comparing the result of find() to
npos, but that turned out to be wrong given that the value was being
truncated first!

I'll write a suitable patch or two for this.


Thanks,
Mike


P.S.  The canonical way to write a new line in C++ is with std::endl
      like this:
        std::cout << "My text" << std::endl;

Comment 20 Mike Fleetwood 2016-04-03 19:49:33 UTC

Hi Curtis,


No need to apologise.  We all make mistakes.

Like a said above I find it useful to have some old GParted executables
lying about to quickly test and find when bugs are first introduced.


Thanks,
Mike


By the way on my Kubuntu 12.04 LTS VM it only has /dev/mapper/fakeraid?
links and no /dev/mapper/fakeraidp? links.  This is when the Fake RAID
was initially created and after rebooting too.  Could it be related to
you using the RAID name of "Vol0" (ending in a digit) and me using
"MyRaid" (not ending in a digit)?

$ sudo dmraid -r
[sudo] password for kubuntu: 
/dev/sdc: isw, "isw_bfaiacejhb", GROUP, ok, 16777214 sectors, data@ 0
/dev/sdb: isw, "isw_bfaiacejhb", GROUP, ok, 16777214 sectors, data@ 0

$ sudo parted /dev/mapper/isw_bfaiacejhb_MyRaid print
Model: Linux device-mapper (mirror) (dm)
Disk /dev/mapper/isw_bfaiacejhb_MyRaid: 8585MB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End     Size    Type     File system  Flags
 1      1049kB  1075MB  1074MB  primary  ext4
 2      1075MB  2149MB  1074MB  primary  ntfs

$ ls -l /dev/mapper/
total 0
crw------- 1 root root 10, 236 Apr  3 19:32 control
lrwxrwxrwx 1 root root       7 Apr  3 20:12 isw_bfaiacejhb_MyRaid -> ../dm-0
lrwxrwxrwx 1 root root       7 Apr  3 20:12 isw_bfaiacejhb_MyRaid1 -> ../dm-1
lrwxrwxrwx 1 root root       7 Apr  3 20:12 isw_bfaiacejhb_MyRaid2 -> ../dm-2

$ sudo dmsetup ls
isw_bfaiacejhb_MyRaid   (252, 0)
isw_bfaiacejhb_MyRaid2  (252, 2)
isw_bfaiacejhb_MyRaid1  (252, 1)

Comment 21 Curtis Gedak 2016-04-04 16:20:34 UTC

Hi Mike,

> By the way on my Kubuntu 12.04 LTS VM it only has /dev/mapper/fakeraid?
> links and no /dev/mapper/fakeraidp? links.  This is when the Fake RAID
> was initially created and after rebooting too.  Could it be related to
> you using the RAID name of "Vol0" (ending in a digit) and me using
> "MyRaid" (not ending in a digit)?

Yes.  The last character of the device name is important.

The generally accepted way of naming partitions in Linux is as
follows:

  If the device name ends in a number, then append a "p" followed by
  the partition number.

      E.g.      Device name:  isw_efjbbijhh_Vol0
      yields Partition name:  isw_efjbbijhh_Vol0p1

  Else (the device name does not end in a number), then append only
  the partition number.

      E.g.      Device name:  isw_efjbbijhh_Vol
      yields Partition name:  isw_efjbbijhh_Vol1

I have never found this partition naming rule explicitly stated
before, but it is common practice.  The one glaring exception I
encountered was with DMRAID.

Many versions of DMRAID append only the partition number to the device
name, regardless of whether the device name ends in a number.  The
problem with such a naming scheme is that it is impossible to
determine if a path is a device name or a partition name from just the
name.

  For example:  isw_efjbbijhh_Vol12

  Is this?
    a)  the twelfth disk device
    b)  The second partition on the first disk device isw_efjbbijhh_Vol1
    c)  The twelfth partition on disk device isw_efjbbijhh_Vol

This confusion is avoided if the above listed generally accepted
partition naming rule is used.

Newer versions of DMRAID in Ubuntu use the generally accepted
partition naming rule because Phillip Susi patched this in Ubuntu.
Unfortunately patches never made it into the upstream DMRAID because
it appears the project is no longer maintained.

Curtis

Comment 22 Mike Fleetwood 2016-04-06 17:54:47 UTC

Bug 764658 "GParted crashes when reading NTFS usage when there is no
/dev/PTN entry" has been raised to record the code changes to fix the
core dump because it is separate to the code fixes in this report.

Comment 23 Curtis Gedak 2016-04-07 16:42:00 UTC

Hi Mike,

Now that we have the NTFS crash out of the way (aka fixed), I have
resumed testing of patch set (v1) in comment #1

I have successfully compiled and run GParted on the following distros:

    debian  7
    debian  8
    fedora 23
   kubuntu 12.04
  openSUSE 13.2
    ubuntu 14.04
    ubuntu 15.10

Patch set (v1) from comment #1 has been committed to the git
respository.

The relevant git commits can be viewed at the following links:

Use realpath() safely (#764369)
https://git.gnome.org/browse/gparted/commit/?id=d04826cc27462431b5f43e132c0382e6b6debd6d

Minor tidyup in load_proc_partitions_info_cache()
https://git.gnome.org/browse/gparted/commit/?id=e0a208576d00bcdf175fc401653a53483b41a87a

Curtis

Comment 24 Curtis Gedak 2016-04-26 15:56:49 UTC

This enhancement was included in the GParted 0.26.0 release on April 26, 2016.