GNOME Bugzilla – Bug 786502
Support reading Unicode labels when file system specific tools aren't available
Last modified: 2017-10-10 17:24:10 UTC
As stated in https://bugzilla.gnome.org/show_bug.cgi?id=662537#c22 it is needed to call blkid with -d parameter to tell blkid to output text in UTF-8. Because by default output is in ^ and M- encoding. So GParted should either call blkid with -d parameter or implements ^ and M- decoding to support unicode labels.
Created attachment 357961 [details] ^ and M- notation decoder Here is blkid ^ and M- encoder: https://github.com/karelzak/util-linux/blob/master/misc-utils/blkid.c#L117 I wrote simple decoder for it, see attachment. It could be useful for GParted support.
Created attachment 357965 [details] [review] [PATCH] Fix parsing Unicode labels I tried to include ^ and M- notation decoder into GParted. See attached patch.
Note that the scope of this bug is actually limited to not being able to read unicode labels for file systems which don't provide their own label reading tool. Namely: exfat, f2fs, hfs, hfsplus, udf, ufs. Reading unicode labels from btrfs, ext2/3/4, jfs, linux_swap, nilfs2, ntfs, reiser4, reiserfs and xfs works correctly. (fat16/32 doesn't support unicode characters in the label anyway). I have found 3 problems with blkid and it's implemented encoding scheme which means it's not reliable in all cases: 1) Blkid on current but older distributions don't encode double quotes, where as latest distributions do. 2) The blkid encoding doesn't encode "M-x" in the input stream so when decoding it is not possible to distinguish between encoding and a genuine "M-x" from the original input. 3) The blkid encoding doesn't encode "^" in the input stream so when decoding it is not possible to distinguish between encoding and a genuine "^X" from the original input. [1] CentOS 7 # blkid -V blkid from util-linux 2.23.2 (libblkid 2.23.0, 25-Apr-2013) # mkfs.btrfs -L 'my " drive' /dev/sdb1 # blkid /dev/sdb1 /dev/sdb1: LABEL="my " drive" UUID=... UUID_SUB=... TYPE="btrfs" [2] Fedora 26 # blkid -V blkid from util-linux 2.30.1 (libblkid 2.30.1, 20-Jul-2017) # mkfs.btrfs -L 'M-P д' /dev/sdb1 # blkid /dev/sdb1 /dev/sdb1: LABEL="M-P M-PM-4" UUID=... UUID_SUB=... TYPE="btrfs" [3] Fedora 26 # label="^B" # echo -n "$label" | hexdump -C 00000000 5e 42 02 "^B." 00000003 # cat /dev/zero > /dev/sdb1 # mkfs.btrfs -L "$label" /dev/sdb1 # blkid /dev/sdb1 /dev/sdb1: LABEL="^B^B" UUID=... UUID_SUB=... TYPE="btrfs"
(In reply to Mike Fleetwood from comment #3) > Note that the scope of this bug is actually limited to not being able to > read unicode labels for file systems which don't provide their own label > reading tool. Namely: exfat, f2fs, hfs, hfsplus, udf, ufs. I saw this problem when tested UDF labels. > Reading unicode labels from btrfs, ext2/3/4, jfs, linux_swap, nilfs2, > ntfs, reiser4, reiserfs and xfs works correctly. (fat16/32 doesn't > support unicode characters in the label anyway). Or in case those reading tools are not installed. > I have found 3 problems with blkid and it's implemented encoding scheme > which means it's not reliable in all cases: > 1) Blkid on current but older distributions don't encode double quotes, > where as latest distributions do. Patch in this bug does not address this problem, nor it does not change behaviour how GParted parse output from old blkid. > 2) The blkid encoding doesn't encode "M-x" in the input stream so when > decoding it is not possible to distinguish between encoding and a > genuine "M-x" from the original input. > 3) The blkid encoding doesn't encode "^" in the input stream so when > decoding it is not possible to distinguish between encoding and a > genuine "^X" from the original input. But this is a problem... Currently GParted does not show Unicode label correctly and I see there 3 options: 1) Stay this bug as is, without changing code 2) Call blkid with -d option which outputs in 8bit UTF-8 3) Try to decide ^ and M- like in my patch Or probably 2) with fallback to 1) or 3) when -d is not supported.
Option 4: Use 'blkid -o value -s LABEL /dev/$partition'. In detail, when querying the label in the FS_Info module run above blkid command to get the label for just the partition of interest. Output format value only (-o value) prints only the queried value (-s LABEL) without any encoding, only adding a trailing new line character. This works at least as far back as blkid 1.0.0 (12-Feb-2003). Testing on Fedora 26 ... No label cases: # cat /dev/zero > /dev/sdb1 # mkfs.btrfs /dev/sdb1 # blkid -o value -s LABEL /dev/sdb1 | hexdump -C (No output, not even a new line, because no label) # cat /dev/zero > /dev/sdb1 # mkfs.btrfs -L '' /dev/sdb1 # blkid -o value -s LABEL /dev/sdb1 | hexdump -C (Again no output becuase no label) Label cases: # cat /dev/zero > /dev/sdb1 # mkfs.btrfs -L 'M-P д' /dev/sdb1 # blkid -o value -S LABEL /dev/sdb1 M-P д # blkid -o value -S LABEL /dev/sdb1 | hexdump -C 00000000 4d 2d 50 20 d0 b4 0a |M-P ...| 00000007 # cat /dev/zero > /dev/sdb1 # label="^B" # echo -n "$label" | hexdump -C 00000000 5e 42 02 "^B." 00000003 # mkfs.btrfs -L "$label" /dev/sdb1 # blkid -o value -S LABEL /dev/sdb1 ^B # blkid -o value -S LABEL /dev/sdb1 | hexdump -C 00000000 5e 42 02 0a |^B..| 00000004 I have also tested this works correctly of CentOS 5 mkfs.ext4 and blkid 1.0.0. This way appears to get the label correctly for all versions of blkid. Just remember to cut off the final character if it is a new line character.
(In reply to Mike Fleetwood from comment #5) > Option 4: Use 'blkid -o value -s LABEL /dev/$partition'. Seems like a good idea. And call it only when file system specific tool is not available...
(In reply to Pali from comment #6) > And call it only when file system specific tool is not available... GParted already only calls FS_Info::get_label() after any file system specific tool has been tried. https://git.gnome.org/browse/gparted/tree/src/GParted_Core.cc?id=GPARTED_0_29_0#n1287
Update bug summary to qualify the scope of the issue.
Created attachment 358674 [details] [review] Support blkid reading Unicode labels (v1) Hi Curtis, Here's patchset v1 to fix this. It uses the method outlined in comment #5 above. It supports Unicode labels when read via blkid so in cases when the FS specific tool isn't install or doesn't exist. It then extends that to support GParted recognising the use of Unicode LABEL=<label> in /etc/fstab to specify block devices by FS label. Successfully tested on Fedora 26 and CentOS 6 with various FSs such as hfs+ (and udf on Fedora 26) which don't provide their own tool for reading Unicode labels. Thanks, Mike
Hi Mike & Pali, Thank you for your work on Unicode labels. My review of patch set v1 from comment #9 has gone well. I encountered only one minor typo in the P3/3 commit message: get_labe() returns the cached label, loading it into the cache first ^^ get_label() returns the cached label, loading it into the cache first I have made this change in my copy of the code in anticipation of pushing these commit to our git master. Regression testing has been successful on the following distros: Fedora 25 Debian 9 Kubuntu 16.04 openSUSE 42.3 If there are no objections then I will commit patch set v1 in the next day or so. Curtis
Looks good, thanks Mike! If it is working fine for reading Unicode UDF labels, I'm happy with it. Anyway, in this patch (but also in whole GParted code) I see one incorrect pattern in usage of Utils::execute_command. Caller does not escape correctly user input argument which could lead to execution of arbitrary code. I can imagine that somebody could try to set filesystem label to: "; sh /tmp/my_script & true " It could lead to execution of the my_script.
(In reply to Pali from comment #11) > Anyway, in this patch (but also in whole GParted code) I see one incorrect > pattern in usage of Utils::execute_command. Caller does not escape correctly > user input argument which could lead to execution of arbitrary code. > > I can imagine that somebody could try to set filesystem label to: > "; sh /tmp/my_script & true " > > It could lead to execution of the my_script. Was there a *specific* scenario you are concerned about? Remember that because of the very nature of what the GParted application is, it allows the user destroy their entire hard drive by formatting a new file system over the top or copying a second hard drive over the first.
I know that GParted under root can do anything with disk. But still it is a good idea to start external process correctly. Probably somebody could have good reason to include either " or ; or \ character into filesystem label. And currently it is a problem. Some filesystems (e.g. ntfs or udf) support storing arbitrary Unicode sequence so such shell special characters are also fully valid for label.
Hi Pali, As a proof of concept, can you test to demonstrate the label handling issue you raised in comment #11? I think the label handling issue is prevented by the call to Glib::shell_parse_argv() [1] method within Utils::execute_command() [2] method. [1] https://people.gnome.org/~gcampagna/docs/GLib-2.0/GLib.shell_parse_argv.html [2] https://git.gnome.org/browse/gparted/tree/src/Utils.cc?h=GPARTED_0_29_0#n595 Curtis
Hi Pali & Mike, As further clarification, when a user has root access, they have the complete ability to destroy the disk. If they do this to themselves by creating a label such as in comment #11, then they deserve what they get. An issue only arises if the the person didn't create the label but suffered disk alteration as a consequence of some other action. A sample scenario would be if the person inserted a USB Flash Drive with a specially crafted label made by someone else. If the label was executed, then that would be an issue. Curtis
I did quick check and problem happen when user try to change label to: " -A " For FAT32 system it throws error message in GParted log window that mlabel was called with invalid option -A. Thanks to that shell escaping code execution of script seems not to be possible. But it is possible to inject custom parameters to mkfs/label tools.
Pali, Would you like to develop a patch for the test issue you identified? Curtis
Created attachment 359003 [details] Pics of Two GParted Versions with Unicode and Mischievous Labels Hi Mike & Pali, I've done some more testing with Debian 9 with Unicode characters and the label issue. For the label issue I created a file and mischievous label on an ext4 file system in partition /dev/sdb1 as follows: # Create 1 letter shell script at top of root file system with: $ cat << _EOF_ > z > #!/bin/sh > touch /myfile.txt > _EOF_ $ chmod 755 z $ sudo mv z / $ ls -l /z -rwxr-xr-x 1 user user 28 Sep 2 12:58 /z $ # Create mischievous label $ sudo e2label /dev/sdb1 '"; /z & true "' Run both GParted 0.28.1 and latest git HEAD with patch set v1 from comment #9. Both versions only display the mischievous label, and do not execute the label. See attached picture. After running both versions, there is no /myfile.txt file at the top of the root file system. $ ls -l /my* ls: cannot access '/my*': No such file or directory Curtis
Now I'm looking at that Utils::execute_command. It converts ustring command to array of argv via Glib::shell_parse_argv. Glib::shell_parse_argv is just wrapper around g_shell_parse_argv and that uses tokenize_command_line for splitting string into argv array, which is later unescaped (via g_shell_unquote). And tokenize_command_line is really funny function. Looking at whole implementation I think that any user input should be first passed into g_shell_quote function (or via Glib:: equivalent if exists). https://fossies.org/dox/glibmm-2.55.1/shell_8cc_source.html#l00035 https://sourcecodebrowser.com/glib2.0/2.24.0/gshell_8c.html#a31cb4a74ef238d5488b7d14c96cdbee9 https://sourcecodebrowser.com/glib2.0/2.24.0/gshell_8c.html#a105ba01fb62d0415457eaa58f5f5021a https://sourcecodebrowser.com/glib2.0/2.24.0/gshell_8c.html#aab6d2157aba1c972155f21a3136bb8ed
Thank you Mike and Pali for your work to improve Unicode support for volume labels. Because patch set v1 from comment #9 addresses the issue raised in this bug report, I am committing the code. The issue regarding creating labels such as: "; sh /tmp/my_script & true " is a separate issue and should be tracked in a separate bug report. Patch set v1 has been committed to the git repository for inclusion in the next release of GParted. The relevant git commits can be viewed at the following links: Support reading Unicode character labels via blkid (#786502) https://git.gnome.org/browse/gparted/commit/?id=1c984ae06d9fd1f4d28ae960c1cdb0a3442ccf3f Update FS_Info cache with Unicode safe labels read from blkid (#786502) https://git.gnome.org/browse/gparted/commit/?id=b022c1e3a949640214d8b6e4a1aa395a53858bb4 Support /etc/fstab using Unicode labels as mount points (#786502) https://git.gnome.org/browse/gparted/commit/?id=73fe1dbf5cc95eff057bcd7f56ff2b1a710169ec Curtis
I opened https://bugzilla.gnome.org/show_bug.cgi?id=787203 bug.
This enhancement was included in the GParted 0.30.0 release on October 10, 2017.