After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 662537 - Ext4 unicode labels not shown correctly
Ext4 unicode labels not shown correctly
Status: RESOLVED FIXED
Product: gparted
Classification: Other
Component: application
0.9.1
Other Linux
: Normal minor
: ---
Assigned To: gparted maintainers alias
gparted maintainers alias
Depends on:
Blocks:
 
 
Reported: 2011-10-23 17:07 UTC by Jonas Kulla
Modified: 2011-12-13 17:36 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Diff file with update to change volume label tool read order (1.18 KB, patch)
2011-10-25 15:54 UTC, Curtis Gedak
none Details | Review
Diff file with update to change volume label tool read order attempt 2 (2.03 KB, patch)
2011-10-25 18:06 UTC, Curtis Gedak
none Details | Review
Diff file with update to change volume label tool read order attempt 3 (2.02 KB, patch)
2011-10-25 19:28 UTC, Curtis Gedak
none Details | Review
screenshot (32.03 KB, image/png)
2011-10-25 20:32 UTC, Jonas Kulla
  Details
Diff file with update to change volume label tool read order attempt 4 (2.46 KB, patch)
2011-10-25 22:49 UTC, Curtis Gedak
none Details | Review
screenshot (31.69 KB, image/png)
2011-10-27 01:07 UTC, Jonas Kulla
  Details
Diff file with update to change volume label tool read order attempt 5 (2.06 KB, patch)
2011-10-27 19:04 UTC, Curtis Gedak
none Details | Review
Diff file with update to change volume label tool read order attempt 6 (1.96 KB, patch)
2011-10-27 19:09 UTC, Curtis Gedak
none Details | Review
Diff file with update to change volume label tool read order attempt 7 (6.84 KB, patch)
2011-10-28 22:53 UTC, Curtis Gedak
none Details | Review

Description Jonas Kulla 2011-10-23 17:07:54 UTC
If the label of an ext4 partition contains any unicode characters,
it is not displayed correctly in gprated.

For example, the label "あかさ" is displayed as:
M-cM-^AM-^BM-cM-^AM-^KM-cM-^AM

Also, the text entry for setting a partition label doesn't let me type unicode through ibus.
If I write it somewhere else, then copy paste it in and set the label,
it is set correctly, albeit being displayed as garbage (example above).
Comment 1 Curtis Gedak 2011-10-24 14:45:50 UTC
Would you be able to provide the output from the following two commands (with root privilege)?

   e2label /path-to-your-labeled-partition

Where /path-to-your-labeled-partition is something like /dev/sda6

   blkid /path-to-your-labeled-partition

Where /path-to-your-labeled-partition is something like /dev/sda6
Comment 2 Jonas Kulla 2011-10-25 13:43:11 UTC
e2label:
あかさ

blkid:
/dev/sda6: LABEL="M-cM-^AM-^BM-cM-^AM-^KM-cM-^AM-^U" UUID="2550ae15-4193-4d66-9a83-bf4f17960294" TYPE="ext4"

(Does this mean libparted is not at fault?)
Comment 3 Curtis Gedak 2011-10-25 15:54:49 UTC
Created attachment 199941 [details] [review]
Diff file with update to change volume label tool read order

Thanks for the output.  The problem is not with libparted.  It appears that the blkid command is not displaying the volume label in the way we would expect.

I have attached a diff file that reverses the order of which tool to use to first try to read the volume label.  To apply the diff to a git repository, use:

     git apply bug662537-unicode-labels-incorrect.diff

If you are unfamiliar with git, you can learn the basics at:
http://gparted.org/git.php

Would you be able to test this on your computer?
Comment 4 Jonas Kulla 2011-10-25 17:52:45 UTC
Thank you for your work!
I recently learned how to use git so this wasn't a problem for me.
With your patch, the label is now (almost) displayed correctly.

Another (minor) weirdness that appeared now is that the line of /dev/sda6 is
wider than the others.
"あかさ" would be "E3 81 82 E3 81 8B E3 81 95" in hex,
but when I had e2label write its output to a file and opened that with a hex editor,
what appeared was "E3 81 82 E3 81 8B E3 81 95 0A".
So, it seems that the "0A" at the end which doesn't belong to the characters is wrongly interpreted as an ascii "new line" I think?
It's just weird, because when I set the label anew to "あかさ",
the "0A" isn't there yet because the partitions line is as wide as the others.
But as soon as I apply the changes (and the label is actually set), the "0A" appears.
Comment 5 Curtis Gedak 2011-10-25 18:06:47 UTC
Created attachment 199953 [details] [review]
Diff file with update to change volume label tool read order attempt 2

Thank you for testing.  :-)

Attached is a new diff file that should address the extra 0x0A at the end of the label.

You will need to undo the previous patch before applying this patch.  You can get rid of all changes to git with the following command:
   git reset --hard

Please let me know if this fixes the problem, and also if you experience any other problems with this diff file.
Comment 6 Jonas Kulla 2011-10-25 19:14:48 UTC
I did a "git checkout HEAD~1", it should work the same right?

Anyway, sadly nothing changed.. the label shows fine
but the "new line" is still there..

I double checked the source and the new regexps are all in place =/
Comment 7 Curtis Gedak 2011-10-25 19:28:33 UTC
Created attachment 199961 [details] [review]
Diff file with update to change volume label tool read order attempt 3

I think that "git checkout HEAD~1" goes back to the parent of HEAD and is not the same as "get reset --hard" which will remove all changes and revert back to the head.

Attached is a third attempt, though I am at a loss as to why the second diff file did not address the problem.

Please let me know if this fixes the problem, and also if you experience any
other problems with this diff file.
Comment 8 Curtis Gedak 2011-10-25 19:35:08 UTC
(In reply to comment #4)
> Another (minor) weirdness that appeared now is that the line of /dev/sda6 is
> wider than the others.
> "あかさ" would be "E3 81 82 E3 81 8B E3 81 95" in hex,
> but when I had e2label write its output to a file and opened that with a hex
> editor,
> what appeared was "E3 81 82 E3 81 8B E3 81 95 0A".

The newline from the e2label command is appended to the output of the label, but is not part of the label.  Otherwise the output would be something like:


user@ubuntu:~$ sudo e2label /dev/sda2
myvolumelabeluser@ubuntu:~$


Instead of what does currently occur:


user@ubuntu:~$ sudo e2label /dev/sda2
myvolumelabel
user@ubuntu:~$


Are you seeing the newline in the GParted display of the label?
Comment 9 Jonas Kulla 2011-10-25 20:32:45 UTC
Created attachment 199966 [details]
screenshot

Sadly, the problem presists =/

I have cut together 2 screenshots to somehow demonstrate how it looks
Comment 10 Curtis Gedak 2011-10-25 22:49:03 UTC
Created attachment 199986 [details] [review]
Diff file with update to change volume label tool read order attempt 4

Thank you for the screen shot.  This certainly displays the problem quite clearly.

With this next diff file I have changed the execute_command to not use the "C" locale since this might be part of the problem.

Please try this new patch to see if this addresses the problem.
Comment 11 Jonas Kulla 2011-10-25 23:51:06 UTC
I'm sorry, this didn't fix it either.

I experimented around a little bit, and the regexp_label function sure does
behave a little bit weirdly.
For example, I changed the expression to "^([^\n])".
Normally, this should match the first character when it isn't a LF, right?
Well, with non-unicode labels, it does exactly that, so "Fedora15" becomes "F",
but with unicode characters, it treats three of them as one character!
"あかさたなはま" becomes "あかさ".
Also, I tried this with and without C-locale enabled, and the results are the same.

Maybe the regexp function wasn't even written with Unicode/UTF-8 in mind?
Comment 12 Curtis Gedak 2011-10-26 18:47:19 UTC
Great investigative work Jonas!

It would appear that we definitely have a problem with unicode characters and the current regexp method.

After looking at the screen shot in comment #10, I noticed that the upper label takes up less space and perhaps does not have the extra newline.

Did you set this label from the command line?

Thinking of methods that do not properly support unicode, I believe the Utils::trim method is one of these.  I mention this because Utils::trim is invoked on the user typed in label before it is assigned.  Perhaps this method might be the cause of the extra newline.

I am investigating how to make the regexp and trim methods into unicode aware methods.

In the meantime would you be able to test removing the Utils::trim method from the Dialog_Partition_Label::get_new_label() method in src/Dialog_Partition_Label.cc?
Comment 13 Jonas Kulla 2011-10-27 01:07:34 UTC
Created attachment 200072 [details]
screenshot

Thanks for still sticking with me (=

About the previous screenshot:
First, I normally set the label to the unicode string (not applied yet),
so it would show up as a preview, which is visibly not flawed.
I also opened the "set label" dialog again to verify there are no strange
characters there (yet), and made this into the first part of the screenshot.
Then I applied the changes, and after GParted rescanned the devices,
LF at the end shows up, visible (1) through the wider row you mentioned (that's
what I previously meant with "wider line", sorry), and (2) through the
"set label" dialog I opened up again, where you can actually "see" the LF
through the placeholder character, and that's what in the second part of the shot.

About the trim function, I'm not completely sure, but I think it actually
works fine with unicode. Why? Because, when I open up the "set label" dialog
and actually input the label + a LF at the end, when I click "OK",
the LF is already removed from the label in the preview.
Also, the LF always first appears after applying, and is clearly not a part
of the label as you yourself said (it'd mess up my thunar view else),
so the error clearly only happens when it is read out,
which basically makes this bug just a cosmetic annoyance more than a real problem..

But it is still pretty annoying to not have a program work 100% the way you
intend it to, right?
So I quickly taught myself some basic C++, and added this dirty hack to ext4.cc:

if ( partition.label[ partition.label.length()-1 ] == 0x0A )
{
	size_t sz = partition.label.length();
	partition.label.resize( sz-1 );
}

I guess I'll happily be using this until the real cause of all this is
investigated.. BUT!
While testing this with every possible combination, I found yet another
weirdness. It only happens in special cases:
The end of the label gets replaced by some weird placeholder which I've never
seen before (described further below), and the console actually keeps spewing out:

(gpartedbin:28813): Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text()

After some experimenting, I have (somehow) determined the pattern of it:
Every sequence of roughly 3 ascii characters is treated as one unicode character;
if the amount of unicode characters exceeds the number of 5,
the rest of the label will be, after applying, cut off and replaced with
a placeholder character which indicates some sort of error (see new screenshot).
If however, the 5th (or 6th) character is a sequence of ascii characters,
no pango errors occur, but this last sequence is still halfway cut off
in a (to me still) random fashion. Also, this error works "best"
if you use three byte unicode symbols (e.g. '€'),
with two bytes like the '°' the pattern is a little bit more random.

I switched back to the master branch to make sure my changes didn't
cause any of this, and there it doesn't happen because the label doesn't
get read correctly in the first place. So I then applied your very
first patch which only made the unicode visible, and here,
this problem already happens. This time the effects reach outside of
GParted, as the label is also displayed in the cut off pattern as described
above everywhere else (filemanager etc.), although I think the pango errors
only happen withing GParted.
Check the new screenshot for this because it is really hard to explain..

PS. I checked this without the Util::trim function in the new label dialog,
and it didn't seem to affect this particular problem.
Comment 14 Jonas Kulla 2011-10-27 01:09:05 UTC
Oh, and by the way, I never set the label from the command line
as I do everything through GParted
Comment 15 Curtis Gedak 2011-10-27 19:04:32 UTC
Created attachment 200130 [details] [review]
Diff file with update to change volume label tool read order attempt 5

Based upon your research it would appear that the Utils::trim method works properly.  As such I have created another patch that wraps the Utils::trim method around the Utils::regexp_label method.

This is likely a temporary solution until I can figure out how to get regular expressions working correctly for unicode characters.

Currently I am looking at the following glibmm library:
   http://developer.gnome.org/glibmm/unstable/classGlib_1_1Regex.html

Would you be able to test this new patch?
Comment 16 Curtis Gedak 2011-10-27 19:09:23 UTC
Created attachment 200131 [details] [review]
Diff file with update to change volume label tool read order attempt 6

Silly me.  ;-)

In this code I don't even need the regular expression.  Hence this new diff file replaces the Utils::regexp_label with Utils::trim method.

Please test and let me know how this new diff file works.
Comment 17 Jonas Kulla 2011-10-27 22:23:42 UTC
Thanks! It seems like trim does the job well,
no more LF at the end (=

The invalid ustrings however have me worrying a lot;
right now I am trying everything to get C++ to print out
the bytes of the (malformed?) strings,
so I can compare them and locate what kind of manipulation/damage
has been done to them in the process.
What I tried is to insert something along the lines of

Glib::ustring::size_type sz = partition_temp.label.size();
std::cout << "**************\nRead partition: "
<< partition_temp.label << "\nValid? "
<< partition_temp.label.validate()
<< "\nLabel length: " << sz << "\nIn hex:";
Glib::ustring::size_type i;
for (i=0;i<sz;i++) {std::cout << " 0x" << std::hex
<< partition_temp.label.at(i) << std::dec;}

But for unicode strings, it just keeps saying their length is 0!

Clearly the regexp_label function doesn't use the Glib regexp, does it?
Because I couldn't find the sruct "regex_t" anywhere in the documentation
of said Glib class, I'm assuming so. But isn't this flawing the whole
program to begin with? I mean, Glib::ustring is being used everywhere,
but I don't think there are any classes outside Glibmm that can
deal properly with the multibyte ustrings in Glib::ustring.
(or maybe I have no idea what I'm talking about? sorry, I have
had absolutely zero experience with C++/Glibmm til this bug report..)
Comment 18 Curtis Gedak 2011-10-28 22:53:47 UTC
Created attachment 200222 [details] [review]
Diff file with update to change volume label tool read order attempt 7

(In reply to comment #17)
> Thanks! It seems like trim does the job well,
> no more LF at the end (=

Glad to hear it.  :)

Does this also work for unicode strings that are not just 3 characters long?
For example, 5 characters?


> Clearly the regexp_label function doesn't use the Glib regexp, does it?

Yes, you are correct.  The existing code did not use the Glib::Regex class.

The attached diff file has a replacement that does use Glib::Regex, and also includes some fixes for regexp_label calls in ntfs and btrfs file systems too.  The Glib::Regex class is supposed to handle unicode characters.

Would you be able to test this new diff file and provide your comments?
Comment 19 Jonas Kulla 2011-10-29 14:29:48 UTC
Hey, good news!
I finally found the source of all the malformed strings.
(I wish I had known that GParted is just an interface to external progs for almost everything earlier..)

Anyway, so when a label is to be written, GParted calls e2label, right?
Here's an excerpt of e2label.c:

#define VOLNAMSZ 16
[...]
strncpy(sb.s_volume_name, label, VOLNAMSZ);
if (strlen(label) > VOLNAMSZ)
	fprintf(stderr, _("Warning: label too long, truncating.\n"));

So labels have a max size of 16 bytes, eh? Didn't know that :P
This would normally work for ascii (and UTF-8 containing it),
but with multibyte chars the possibility is there that they get
chopped in half, leaving garbage bytes.

As I see it the partition label dialog is fs agnostic,
so the option to pop up a MsgDialog along the lines of "Label too long!"
or "Label was too long and has been trimmed" would be rather cumbersome I guess.

I added this to ext4.cc to at least avoid bad strings:

bool ext4::write_label([...])
{
	// trim label to 16 bytes
	Glib::ustring temp_label(partition.label);
	while ( temp_label.bytes() > 16)
		{ temp_label.resize( temp_label.length()-1 ); }
	[then use temp_label instead of partition.label]
}

which seems to work ok.
Anyway, there's the remaining problem that users aren't notified of this trim
and could be confused to after the write.
Maybe add a message to the error log shown after each operation?
It's up to you GParted devs I guess (;

Oh, and good job on the regex change, I think that was a good thing to do nonetheless ^^

Btw: Do you know where I should ask about ibus not working in the label dialog?
I tested the entry widget with both 2.4 and 3.0, and they both worked
Comment 20 Curtis Gedak 2011-10-29 16:53:56 UTC
Thanks again Jonas for all your investigative work and testing.  :-)

You've been a great help with addressing the problem(s) in this bug report.

(In reply to comment #19)
> (I wish I had known that GParted is just an interface to external progs for
> almost everything earlier..)

Yes, as you discovered GParted is graphical front end.  GParted uses the libparted library to edit partition tables, and other free software file system tools to manipulate the file systems.


> Anyway, so when a label is to be written, GParted calls e2label, right?

Yes, for ext2, ext3, and ext4 file systems.


> As I see it the partition label dialog is fs agnostic,
> so the option to pop up a MsgDialog along the lines of "Label too long!"
> or "Label was too long and has been trimmed" would be rather cumbersome I
> guess.

Yes, it could be cumbersome.  I'll have to think on this a while to see if I can come up with an elegant solution.


> I added this to ext4.cc to at least avoid bad strings:
> 
> bool ext4::write_label([...])
> {
>     // trim label to 16 bytes
>     Glib::ustring temp_label(partition.label);
>     while ( temp_label.bytes() > 16)
>         { temp_label.resize( temp_label.length()-1 ); }
>     [then use temp_label instead of partition.label]
> }
> 
> which seems to work ok.
> Anyway, there's the remaining problem that users aren't notified of this trim
> and could be confused to after the write.
> Maybe add a message to the error log shown after each operation?
> It's up to you GParted devs I guess (;

Good suggestion.

GParted contains similar code for FAT16 and FAT32 file systems as can be seen in the following method:
http://git.gnome.org/browse/gparted/tree/src/Utils.cc#n349

I would prefer a solution that kept the file system logic contained within the specific file system source code files.  The current solution does not expand well to new file systems.  ;-(

> 
> Oh, and good job on the regex change, I think that was a good thing to do
> nonetheless ^^

Agreed. :-)

 
> Btw: Do you know where I should ask about ibus not working in the label dialog?
> I tested the entry widget with both 2.4 and 3.0, and they both worked

That's a good question.  I knew nothing about ibus until you mentioned it.  I did a quick search for "linux ibus" and this is the first result:
http://code.google.com/p/ibus/

Is this the ibus code you are referring too?
Comment 21 Jonas Kulla 2011-10-29 19:01:50 UTC
(In reply to comment #20)
> Thanks again Jonas for all your investigative work and testing.  :-)

Well thank you for sticking through with me ^^ I just wanted to fix this asap as it directly affects me (I use lots of unicode labels)


> > Anyway, so when a label is to be written, GParted calls e2label, right?
> 
> Yes, for ext2, ext3, and ext4 file systems.

Yeah, that's what I meant :P Sorry, I usually mean "ext* label" when I say "label"

> GParted contains similar code for FAT16 and FAT32 file systems as can be seen
> in the following method:
> http://git.gnome.org/browse/gparted/tree/src/Utils.cc#n349
> 
> I would prefer a solution that kept the file system logic contained within the
> specific file system source code files.  The current solution does not expand
> well to new file systems.  ;-(

If I understand you correctly, then there would be a necessity for an abstract "label_maxsize" or something value in Filesystem
(which I think is the parent class of ext*, btrfs etc.?),
and then the call to a "check_label_length" function,
which in turn could use a "Utils::trim_to_n_bytes", would have to be made mandatory.
(As it seems NTFS does have a bytecap of 32 as well, which isn't reflected in the code I think)
This should make the integration with the 'set label' dialog pretty easy,
if you choose to do so.
 
> Is this the ibus code you are referring too?

Yep. I think I'll just start at Fedora Forums and ask if anybody experiences
the same problem, then most likely I'll be led on the right path.
Comment 22 Curtis Gedak 2011-10-31 16:51:09 UTC
While thinking about this problem, I thought it would also be worth following this up from the perspective of the util-linux blkid command.

Some Internet searching brought me to the following email thread:

   Re: Problem with blkid
   http://www.spinics.net/lists/util-linux-ng/msg04242.html

From reading this email thread it appears that newer versions of blkid will have a "-d" option to indicate that the label output should not be encoded.

According to the util-linux ChangeLog, this new "-d" option should be available with util-linux version 2.20.

   util-linux v2.20
   http://lwn.net/Articles/456688/
Comment 23 Curtis Gedak 2011-11-01 19:18:13 UTC
An enhancement to permit GParted to properly display ext2/3/4 unicode volume labels has been committed to the Gnome git repository for inclusion in the next release of GParted.

The relevant git commit can be viewed at the following link:
http://git.gnome.org/browse/gparted/commit/?id=cf8293cf78efe2e2e8b9b74279083c1f986d236e
Comment 24 Jonas Kulla 2011-11-02 13:21:51 UTC
>Some Internet searching brought me to the following email thread

Interesting. So the label has never been read incorrectly to begin with,
but displayed in such a way as to not break the shell when unprintable chars
appear (like "echo あかさ | cat -v" does). You never stop learning..

Now that you filed the patch, is it appropriate to mark this bug as "RESOLVED"?
Comment 25 Curtis Gedak 2011-11-02 19:31:57 UTC
Yes, we could mark this bug as "RESOLVED".

Lately I prefer to keep the bug report open until a new GParted release is created that contains the associated changes.  That way if someone runs into the same problem they are less likely to create another new bug report.

If you wish to close the report now, then I am okay with that.

Otherwise I plan to close it when the next version of GParted is released.  :-)
Comment 26 Jonas Kulla 2011-11-04 15:54:12 UTC
(In reply to comment #25)

> If you wish to close the report now, then I am okay with that.
> 
> Otherwise I plan to close it when the next version of GParted is released.  :-)

Sounds like the best solution. I wouldn't mess with the bug status myself anyway
as I'm not a developer :P
Comment 27 Curtis Gedak 2011-12-13 17:36:52 UTC
The enhancement to address this bug report has been included in GParted 0.11.0 released on Dec. 13, 2011.