GNOME Bugzilla – Bug 594918
g_format_size_for_display should output TB/PB/EB instead of GB for sizes that big and output more decimal places
Last modified: 2010-09-01 13:54:49 UTC
Created attachment 143004 [details] [review]
2 decimal file size and bigger file sizes
bug 594886 is what sparked the idea to "tweak" the way the function
g_format_size_for_display calculates the file size.
To start with the bug 594886 talks about 1 decimal not being precise enough. I
think that's true as well so i changed that to 2 decimals.
Furthermore i took the kibibyte systems (because we divide in 1024 not 1000)
for the file tags: MiB, GiB, TiB etc... and put them in a gchar array. Then a
simple for loop calculates the lowest 1024 range and the for index is the tag
for the file size.
The effective difference this patch will show is 1.1MB will now be (after
applied) 1.11 MiB which is more precise and has the right tag behind it.
Note: What i actually want is g_format_size_for_display that takes 2 arguments:
1. size in bytes
2. decimal precision
But i'm not quite sure how to implement that (i know it in c++)... i mean, the
decimal precision is now done with those float precisions like .2f for 2
decimal precision but how would you make that 2 value dynamic so that it can be
.1f or .3f (or anything really).. In c++ there is just setprecision(2) and
Anyway, can this patch be included in glib?
Created attachment 143007 [details] [review]
same patch just without the kibibyte in it.
Created attachment 143021 [details] [review]
new patch with various fixes and a flexible decimal precision
This patch fixes:
- Language translations
- Overflow when a smart guy gets a 1024 YB file
- Flexible decimal precision. meaning a 1 MB file gets 2 decimals. 10MB get 1 decimal and 100 MB gets no decimals.
You can't ever get ZB or YB here, since goffset is only 64 bits...
I happen to have a (different) patch adding TB/PB/EB in my tree (no changes in decimal places shown); attaching.
Created attachment 143022 [details] [review]
(Just for reference, binary SI units were rejected already in bug 512443; see also bug 554172.)
This patch was created on IRC with my help and the idea to display numbers that always contain 3 digits, no matter if and where a thousands separator gets placed. This seems to be what Windows Vista does for drive sizes, too.
Of course, this patch would look a lot better, if we didn't use 1024 as a thousands separator, because now we have sizes like "1012 bytes" - which should really be "1.01 KB", but I'm not the one to decide that.
Oh, and I'm still finding issues:
The bytes case needs to be special cased, so it properly says "1 byte" and not "1 bytes". And of course it should say "30 bytes" and not "30.0 bytes".
Otherwise that patch looks fine to me and I'd commit it like that after string freeze.
Created attachment 143030 [details] [review]
*** Bug 594886 has been marked as a duplicate of this bug. ***
Can this patch (the "Some fixes" one) be pushed in glib?
I must say I'd prefer a patch that extends the if-else cascade instead of introducing a pointless array. But I have no opposition to the functional aspects of the patch.
Review of attachment 143030 [details] [review]:
@@ +1734,3 @@
+ guint i, decimals;
+ gdouble displayed_size;
+ const gchar* sizes =
To save relocations, use
static const char sizes = ...
@@ +1743,3 @@
+ N_("%.*f EB"),
+ N_("%.*f ZB"),
+ N_("%.*f YB")
ZB and YB can't occur, as already pointed out in comment 3.
@@ +1751,3 @@
+ displayed_size = (gdouble) size / 1024;
+ for (i = 0; displayed_size >= 1024 && i < G_N_ELEMENTS(sizes); i++)
The | && i < G_N_ELEMENTS(sizes)| condition is unnecessary (and if it wasn't, the index in |sizes[i]| below would be out of bounds.)
@@ +1756,2 @@
+ decimals = (displayed_size <= 9) ? 2 : (displayed_size <= 99) ? 1 : 0;
Why <= 9 and <= 99, instead of < 10.0 and < 100.0 ?
(In reply to comment #10)
> I must say I'd prefer a patch that extends the if-else cascade instead of
> introducing a pointless array. But I have no opposition to the functional
> aspects of the patch.
The patch in attachment 143022 [details] [review] does this (although it doesn't change the decimals displayed like the other patch).
Created attachment 145690 [details] [review]
for comment 11,
i tried to make in all your suggestions but have an issue with the array thing:
static const char sizes = ...
when i do that i get compiler warnings.. so i did it slightly different which seems to work fine as well. (unless i miss something)
the: "The | && i < G_N_ELEMENTS(sizes)| condition is unnecessary" is not unnecessary as it prevents out of bounds for people that go beyond 1024 EB.
for the decimals of 10.0 and 100.0, your right. it's better readable that way so adjusted it as well.
Some other things i fixed (amazing how much can be fixed in such a small patch)..
- I tried to get that second return (actually the first return) out of the code since i found that one a little odd. There now is only one return which i like best ^_^
- g_format_size_for_display returned a char while the return function inside g_format_size_for_display returns gchar (g_strdup_printf) so fixed that in the source and header file.
- This might be a bit ugly hence the note in comment code. Because the byte entry isn't in the array every for iteration is in fact ending in a i number of 1 to much. so i just do "i - 1" in the return line. I hope this is allowed for glib standards (whatever those are)?
I'm currently running a glib with this patch and it seems to run fine everywhere so far.
(In reply to comment #13)
> i tried to make in all your suggestions but have an issue with the array thing:
> static const char sizes = ...
> when i do that i get compiler warnings.. so i did it slightly different which
> seems to work fine as well. (unless i miss something)
What is the exact compiler warning? I don't get any, here.
> the: "The | && i < G_N_ELEMENTS(sizes)| condition is unnecessary" is not
> unnecessary as it prevents out of bounds for people that go beyond 1024 EB.
This *can't* go up to 1024 EB since goffset is gint64.
> for the decimals of 10.0 and 100.0, your right. it's better readable that way
> so adjusted it as well.
(The point was that these conditions aren't equivalent at all, since displayed_size is double, not integer.)
oh, oops. missed the EB one.
about the 10.0 and 100.0
yes, that's true as well.
btw. can't we use a unsigned long long for filesizes since they will be able to handle EB, ZB and YB ... or is that expensive somewhere else?
Created attachment 145782 [details] [review]
removed EB from patch
Same patch as before but without EB.
Can this be accepted for glib?
You misunderstood. EB itself can be reached, just not 1024 EB.
Then it's imho better to leave EB out.
Review of attachment 145782 [details] [review]:
@@ +1771,3 @@
+ displayed_size /= 1024;
wrong coding style: G_N_ELEMENTS needs a space before the parenthesis and the curly braces need to be removed
@@ +1774,3 @@
+ decimals = (displayed_size < 10.0) ? 2 : (displayed_size < 100.0) ? 1 : 0;
I'd use an if() here instead of being overly clever with the ternary ?:.
if we want to keep the ternary conditional it should be re-indented:
(cond1) ? res1
: (cond2) ? res2 : res3
@@ +1778,3 @@
+ /* note: the i - 1 is needed because the byte entry is missing from the array which is due to the 2 possible byte values "byte" and "bytes" */
+ return (i == 0)
+ ? g_strdup_printf (g_dngettext(GETTEXT_PACKAGE, "%u byte", "%u bytes",(guint) size), (guint) size)
missing space between g_dngettext and parenthesis, and the indentation of the statements in the ternary conditional is wrong. again, maybe an if() would be a better option for maintainability and readability
Created attachment 145953 [details] [review]
coding style update
Fixed all critics from comment #19
Sorry, I still prefer to keep the if-else style of the original code over the array.
I agree, its not obviously more efficient (it has a bunch of divisions) and its way harder to read. Just keep the chained if-then-else.
I really don't get what you 2 persist on having visual code paths. The best way to go from a 1234567890 size to a human readable size is in a loop. That is common knowledge. KDE does that as well: http://websvn.kde.org/trunk/KDE/kdelibs/kdecore/localization/klocale.cpp?revision=1047927&view=markup from line 1429.
I dislike the if-else style in this case thus will simply not make it. This loop way is way more extensible then those if-else statements.. You now only need to add one more element to the array and a new size format is added.
Anyway. My patch is in my style and is the right one for the current case (again, look at how KDE did it). Do with it what you want. If you insist on having the if-else style then you have to ask someone else to make this.
> I dislike the if-else style in this case thus will simply not make it
Thats fine, your choice. Eventually somebody else will produce an acceptable patch
(In reply to comment #21)
> Sorry, I still prefer to keep the if-else style of the original code over the
So then why not commit chpe's patch and close this as FIXED?
chpe's patch doesn't contain the always-show-3-digits refinement. But I guess we can leave that for another day. So go ahead and commit it, please.
Actually, I'm not sure the always-show-3-digits thing is something we want.
It makes reading lots of numbers, say in a list column, harder.
Its much harder to scan the numbers in the always-three-digits case and get a feeling for their size. And in many cases this is more important than a more exact byte-count, like when looking for large files. (Of course, using prefixes on units makes size estimation at-a-glance harder too, but we shouldn't make it even worse).
If you need the exact byte count then it is availible in the file properties dialog, with 100% accuracy.
Now, i can see using the always-three-digits method for drive sizes, as there you generally just want a nice looking label, but g_format_size_for_display is mostly used for file sizes, and there i think glancing the file size easy is more important.
*** Bug 628044 has been marked as a duplicate of this bug. ***