GNOME Bugzilla – Bug 777973
Segmentation fault on bad disk
Last modified: 2017-08-07 17:20:22 UTC
Created attachment 344634 [details] strace Hi I'm using gparted-0.27.0-1.fc25.x86_64 on a mounted fat16 partition via USB (phone microsd) This leads to a crash: /usr/sbin/gparted: line 193: 19962 Segmentation fault (core dumped) $BASE_CMD I put the strace below. Both gparted and gparted /dev/sde crash gparted /dev/sda is fine. parted /dev/sde works fine and reports: Model: Nokia Nokia 301 (scsi) Disk /dev/sde: 7961MB Sector size (logical/physical): 512B/512B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 1560MB 4140MB 2579MB primary fat16 gparted output: Created symlink /run/systemd/system/-.mount → /dev/null. Created symlink /run/systemd/system/boot.mount → /dev/null. Created symlink /run/systemd/system/home.mount → /dev/null. Created symlink /run/systemd/system/proc-fs-nfsd.mount → /dev/null. Created symlink /run/systemd/system/run-user-1000.mount → /dev/null. Created symlink /run/systemd/system/tmp.mount → /dev/null. Created symlink /run/systemd/system/var-lib-machines.mount → /dev/null. Created symlink /run/systemd/system/var-lib-nfs-rpc_pipefs.mount → /dev/null. ====================== libparted : 3.2 ====================== /usr/sbin/gparted: line 193: 19962 Segmentation fault (core dumped) $BASE_CMD Removed /run/systemd/system/-.mount. Removed /run/systemd/system/boot.mount. Removed /run/systemd/system/home.mount. Removed /run/systemd/system/proc-fs-nfsd.mount. Removed /run/systemd/system/run-user-1000.mount. Removed /run/systemd/system/tmp.mount. Removed /run/systemd/system/var-lib-machines.mount. Removed /run/systemd/system/var-lib-nfs-rpc_pipefs.mount.
/usr/sbin/gparted is actually a shell script which runs /usr/sbin/gpartedbin, the binary. I assume that it is gpartedbin which is core dumping. Please follow the instructions below to produce a core dump and back trace, running 'gparted /dev/sde'. Thanks, Mike How to capture a backtrace from a core dump ------------------------------------------- 1) Turn off any OS core dump capturing, ensuring: cat /proc/sys/kernel/core_pattern reports just "core" Some methods to do this, depending on distro version, are: * service abrt-ccpp stop (RedHat/Fedora pre systemd) * systemctl stop abrtd (RedHat/Fedora with systemd) * sudo service apport stop (Ubuntu) 2) Increase core dump limit and run GParted as root Either: ulimit -c unlimited sudo gparted or: su - root ulimit -c unlimited gparted 3) Perform crashing action 4) Capture backtrace ls -lrt core* which gpartedbin gdb `which gpartedbin` {COREFILE} --batch --quiet \ -ex 'info threads' -ex 'thread apply all backtrace' \ -ex quit > backtrace.log Please paste the terminal output when running gparted and the contents of the backtrace.log file.
Here are the backtraces both before and after debuginfo-install gparted
Created attachment 344692 [details] backtrace wo. debuginfo
Created attachment 344693 [details] backtrace w debuginfo
PS Small addition to your otherwise excellent instructions. In Fedora core dumps are saved by systemd in /var/lib/systemd/coredump/ even if abrtd is disabled. See man systemd-coredump
GParted is core dumping below GParted::PipeCapture::OnReadable(Glib::IOCondition) when it is running the command: fsck.fat -n -v /dev/sde1 to read the space usage of the file system.
+ Trace 237103
Thread 1 (Thread 0x7f6abfd59f40 (LWP 26715))
Hi, You say the disk is bad. Do you just mean GParted crashes when trying to read it, or is there some way you know it is bad? If so what is wrong with it? Please run the following commands as root and copy the terminal output: blkid /dev/sde1 fsck.fat -n -v /dev/sde1 Thanks, Mike
blkid /dev/sde1 /dev/sde1: SEC_TYPE="msdos" UUID="0000-8E65" TYPE="vfat" PARTUUID="7f7c1da5-01" fsck.fat -n -v /dev/sde1 fsck.fat 4.1 (2017-01-24) Checking we can access the last sector of the filesystem 0x25: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt. Automatically removing dirty bit. Boot sector contents: System ID "mkfs.fat" Media byte 0xf8 (hard disk) 512 bytes per logical sector 65536 bytes per cluster 1 reserved sector First FAT starts at byte 512 (sector 1) 2 FATs, 16 bit entries 131072 bytes per FAT (= 256 sectors) Root directory starts at byte 262656 (sector 513) 2048 root directory entries Data area starts at byte 328192 (sector 641) 39354 data clusters (2579103744 bytes) 16 sectors/track, 4 heads 3047424 hidden sectors 5038080 sectors total FATs differ but appear to be intact. Using first FAT. /LOST.DIR/D3" Bad short file name (D3"). Auto-renaming it. Renamed to FSCK0000.000 which goes on and on, printing many non ASCII characters in the bad names.
My theory is that GParted is choking on the output from fsck.fat. Please provide that output so I can inject it into GParted to try to reproduce the fault. fsck.fat -n -v /dev/sde1 1> fsck.out 2> fsck.err echo $? tar czf fsck.tgz fsck.out fsck.err Please report exit status of fsck.fat and upload fsck.tgz. Thanks, Mike
Seems like a reasonable hyypothesis. See attached. Note I had to interrupt fsck as it keeps going forever. Therefor fsck.err is empty.
Created attachment 344696 [details] fsck.out
I have reproduced the core dump in GParted by making fsck.fat output the same captured output. @mvanross The file names of all the files appears to be random binary data and fsck.fat is reporting every file as corrupt. Do you know why that might be?
@manross 1) Please provide complete output from fsck, and the exist status, as described in comment #9 above. This is so that I can be confident that any fix I produce will work. It may take a while to run as it is reporting errors for every file on the file system. However it will get to the end. 2) The directory portion of all file names are ASCII text while the file name portion appears to be binary data. Do you know why that might be? Known that might help guide the direction of the fix. Thanks, Mike
Hi I now accessed this microSD card via my card reader. I ran fsck.fat -n -v /dev/mmcblk0p1 1> fsck.out 2> fsck.err It took about 5hrs, but now gparted does not crash on it anymore. echo $? 130 zip log files attached. I don't know why the file names are binary. I would not be surprised if the card is basically completely rubish. Note, I'm not necessarily looking into retrieving the data of card, but my bug report is that gparted should not dump core....
Oops, too large to attach. It is at homepages.inf.ed.ac.uk/mvanross/fsck.tgz
Created attachment 347262 [details] [review] Prevent core dump reading non UTF-8 data from external command (draft 1) Hi Curtis, Here's a draft patchset for this. The patch stops GParted crashing and is good enough for committing; however it has revealed another issue. To read all 122 MiB of the above fsck output via PipeCapture takes > 2 days of 100% CPU time with the GParted UI completely unresponsive. Time to read above fsck output (comment 15) from corrupted FAT file system: * First 1 MiB 6.2 sec * First 10 MiB 278 sec * All 122 MiB > 2 days The problem is that as the capture buffer gets bigger Glib::ustring::replace() gets exponentially slower. I haven't succeeded in writing a working replacement yet. Mike
*** Bug 780104 has been marked as a duplicate of this bug. ***
*** Bug 780354 has been marked as a duplicate of this bug. ***
Created attachment 350154 [details] [review] Prevent core dump reading non UTF-8 data from external command (draft 2) Hi Curtis, Here's draft patchset 2 for this. Additionally to patchset draft 1 from comment 16, it makes reading the fsck output really fast. For all 122 MiB of fsck output: * Before: > 2 days of 100% CPU time * After: 6.5 seeconds What I learned was that Glib::ustring is not capable of storing large amounts of data and being edited in reasonable amounts of time. Anyway as Mark pointed out in comment 14 above, fsck.fat took 5 hours to produce all the output so GParted would be reading and waiting that long during the initial startup. Except now with the UI being drawn and the status bar pulsing normally. However 5 hours is too long for a user to be expected to wait. This can't be solved without replacing the use of fsck.fat with something else. I'm NOT intending to do this. Given that PipeCapture class and OnReadable() method have become more complicated I am looking into writing some unit tests for the class. This is why this patchset is considered draft. I want to add a unit test framework first, then add unit tests driving these changes. Thanks, Mike
Hi Mike, My initial peek at patch set (draft 2) in comment #19 looks good. If I understand the changes correctly, there should be no change in the captured output that is shown in the GUI. However, the new code should not cause a segmentation fault when it encounters invalid UTF-8 characters. Further the new code should be much faster in operation, especially for long drawn out operations. I plan to test the patch set with a number of different operations to ensure that the captured output remains correct. Thanks, Curtis
Hi Curtis, Yes, PipeCapture still has exactly the same interface it had before, including updating the Glib::ustring passed to the constructor and firing all callbacks registered against the signal_update slot. As an aside if you want to see how slow the Gtk widgets are at displaying a lot of data, try this: 1) Create a FAT partition. 2) Replace /sbin/fsck.fat with a shell script which outputs 1 MiB of ascii text. mv /sbin/fsck.fat /sbin/fsck.fat.real dd if=/dev/urandom bs=1M of=/tmp/random-1M.bin count=1 base64 < /tmp/random-1M.bin > /tmp/random-1M.base64 truncate -s 1M /tmp/random-1M.base64 cat << EOF > /sbin/fsck.fat #!/bin/sh cat /tmp/random-1M.base64 EOF chmod a+x /sbin/fsck.fat 3) Load GParted. No additional waiting as the 1 MiB of output is read to get the file system usage. (Ignore the warning of no actual usage figures). 4) Run a file system check on the FAT partition. Expand the operation details to see the output of the fsck command. Try resizing the window and moving the scroll bars about. For me the UI is taking about 1 or 2 seconds to update display for this "small" 1 MiB amount of output. Thanks, Mike
Created attachment 352673 [details] [review] Prevent core dump reading non UTF-8 data from external command (v1) Hi Curtis, Here's patchset v1 ready for reviewing and committing. Compared to draft 2 it adds 11 new patches developing unit tests for PipeCapture. Patches are: NEW [01/18] Add initial unit tests for PipeCapture NEW [02/18] Initialise Glib threading system in test_PipeCapture NEW [03/18] Add 1 MiB ASCII PipeCapture test NEW [04/18] Add binary data string difference reporting to Pipe... NEW [05/18] Add check of PipeCapture update callback NEW [06/18] Add test for bug with PipeCapture crashing reading ... EXISTING [07/18] Prevent core dump reading none UTF-8 data from exte... EXISTING [08/18] Move initial clearing of output capture buffers int... NEW [09/18] Test that PipeCapture clears capture buffer before ... EXISTING [10/18] Improve the performance of PipeCapture for large ou... NEW [11/18] Update PipeCaptureTest.MinimalBinaryCrash777973 exp... NEW [12/18] Add test that PipeCapture can read embedded NUL cha... EXISTING [13/18] Workaround g_utf8_find_next_char() not incrementing... NEW [14/18] Add test that PipeCapture can read NUL byte in midd... EXISTING [15/18] Workaround g_utf8_get_char_validate() bug with embe... EXISTING [16/18] Increase PipeCapture maximum read size to 64K EXISTING [17/18] Further improve speed of PipeCapture for non-watched NEW [18/18] Add PipeCapture line discipline tests Generally I'm adding the tests first to highlight the fault and then fixing the code to make the test past. The only changes to the existing patches is a couple of minor comment updates, otherwise they are the same as before. Thanks, Mike
Hi Mike, Thank you for your in-depth work to resolve this segmentation fault, and further to significantly improve performance and also add testing of the PipeCapture class. From a visual review, patch set v1 from comment #22 looks good to me. Using the patch set in a distribution tarball, I have successfully run: ./configure --disable-scrollkeeper make make check make distcheck on the following distros: debian 8 fedora 24 kubuntu 16.04 opensuse 42.2 My regression testing on kubuntu 16.04 of actions such as: create, grow, shrink, move, copy, check, label, set UUID, and delete, has also gone well. If there are no objections then I plan to commit this patch set in the next day or so. Thanks, Curtis
Commit when ready. Mike
Hi Mike, Thank you again for this patch set. Before committing I noticed and fixed one minor typo in the last commit message: Changed FROM: Test that backspace and carriage return are processes correctly and ^ TO: Test that backspace and carriage return are processed correctly and The patch set v1 from comment #22 has been committed to the git repository master branch. The relevant git commits can be viewed at the following links: Add initial unit tests for PipeCapture (#777973) https://git.gnome.org/browse/gparted/commit/?id=b21ee06230adf8b9f4090eef7b46c179f8e26d42 Initialise Glib threading system in test_PipeCapture (#777973) https://git.gnome.org/browse/gparted/commit/?id=d90702d52608d84ef5c68d66d0c92a3438253ab7 Add 1 MiB ASCII PipeCapture test (#777973) https://git.gnome.org/browse/gparted/commit/?id=eb2d571a6c5c8d5108e5dd2a7e7bf3f1c913cb35 Add binary data string difference reporting to PipeCapture tests (#777973) https://git.gnome.org/browse/gparted/commit/?id=dc1c49ba6257045beab18400d6f1c841918614cc Add check of PipeCapture update callback (#777973) https://git.gnome.org/browse/gparted/commit/?id=8b47de88728419daae4f249ec0f1e2f1c516d2ba Add test for bug with PipeCapture crashing reading binary data (#777973) https://git.gnome.org/browse/gparted/commit/?id=1e3324e16b5e63d81289c8d44ce9f166faf9ed47 Prevent core dump reading none UTF-8 data from external command (#777973) https://git.gnome.org/browse/gparted/commit/?id=a03735ac6f5e7e4d7282f7d0b06a97f337830858 Move initial clearing of output capture buffers into PipeCapture class (#777973) https://git.gnome.org/browse/gparted/commit/?id=a233e30efeb80c72b7e9943ee7f615aa006c012f Test that PipeCapture clears capture buffer before it starts (#777973) https://git.gnome.org/browse/gparted/commit/?id=03c2be9b90e996efaab9663a1e6aa0c7df30be42 Improve the performance of PipeCapture for large output (#777973) https://git.gnome.org/browse/gparted/commit/?id=b5b3d199f8e82b7185ec2112f46149c397965117 Update PipeCaptureTest.MinimalBinaryCrash777973 expected string (#777973) https://git.gnome.org/browse/gparted/commit/?id=0a3e8487a07243528219dc8b79be561cc7f845c7 Add test that PipeCapture can read embedded NUL character (#777973) https://git.gnome.org/browse/gparted/commit/?id=22573b4eede9dab13b6a26d39ebc0f363f567e0e Workaround g_utf8_find_next_char() not incrementing past NUL char (#777973) https://git.gnome.org/browse/gparted/commit/?id=3a6a304c64b669f71c530a07d64caa93dc2f23ca Add test that PipeCapture can read NUL byte in middle of UTF-8 char (#777973) https://git.gnome.org/browse/gparted/commit/?id=6b82616d2e5dfddea8b73cc65d64bfc7e08d8388 Workaround g_utf8_get_char_validate() bug with embedded NUL bytes (#777973) https://git.gnome.org/browse/gparted/commit/?id=8dbbb47ce2db0ee733ff909c1ead2f4de9475596 Increase PipeCapture maximum read size to 64K (#777973) https://git.gnome.org/browse/gparted/commit/?id=6f49f3049dc8fb72ef771074db73b60a79925962 Further improve speed of PipeCapture for non-watched output (#777973) https://git.gnome.org/browse/gparted/commit/?id=25780c611bf99ce3f819d00c58616fa517a38b0f Add PipeCapture line discipline tests https://git.gnome.org/browse/gparted/commit/?id=f8783d69bc848866265910bf1a2de814a4cba821 Curtis
This enhancement was included in the GParted 0.29.0 release on August 7, 2017.