After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 594918 - g_format_size_for_display should output TB/PB/EB instead of GB for sizes that big and output more decimal places
g_format_size_for_display should output TB/PB/EB instead of GB for sizes that...
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
: 594886 628044 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2009-09-11 18:03 UTC by Mark
Modified: 2010-09-01 13:54 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
2 decimal file size and bigger file sizes (1.61 KB, patch)
2009-09-11 18:03 UTC, Mark
none Details | Review
same patch just without the kibibyte in it. (1.60 KB, patch)
2009-09-11 18:19 UTC, Mark
none Details | Review
new patch with various fixes and a flexible decimal precision (1.84 KB, patch)
2009-09-11 20:44 UTC, Mark
none Details | Review
alternative patch (2.36 KB, patch)
2009-09-11 20:51 UTC, Christian Persch
committed Details | Review
Some fixes (1.75 KB, patch)
2009-09-11 21:37 UTC, Mark
needs-work Details | Review
updated patch (2.77 KB, patch)
2009-10-17 19:20 UTC, Mark
none Details | Review
removed EB from patch (2.75 KB, patch)
2009-10-19 12:44 UTC, Mark
needs-work Details | Review
coding style update (2.70 KB, patch)
2009-10-21 13:48 UTC, Mark
none Details | Review

Description Mark 2009-09-11 18:03:15 UTC
Created attachment 143004 [details] [review]
2 decimal file size and bigger file sizes

Hey,

bug 594886 is what sparked the idea to "tweak" the way the function
g_format_size_for_display calculates the file size.

To start with the bug 594886 talks about 1 decimal not being precise enough. I
think that's true as well so i changed that to 2 decimals.

Furthermore i took the kibibyte systems (because we divide in 1024 not 1000)
for the file tags: MiB, GiB, TiB etc... and put them in a gchar array. Then a
simple for loop calculates the lowest 1024 range and the for index is the tag
for the file size.

The effective difference this patch will show is 1.1MB will now be (after
applied) 1.11 MiB which is more precise and has the right tag behind it.

Note: What i actually want is g_format_size_for_display that takes 2 arguments:
1. size in bytes
2. decimal precision

But i'm not quite sure how to implement that (i know it in c++)... i mean, the
decimal precision is now done with those float precisions like .2f for 2
decimal precision but how would you make that 2 value dynamic so that it can be
.1f or .3f (or anything really).. In c++ there is just setprecision(2) and
done..

Anyway, can this patch be included in glib?

Thanx,
Mark.
Comment 1 Mark 2009-09-11 18:19:09 UTC
Created attachment 143007 [details] [review]
same patch just without the kibibyte in it.
Comment 2 Mark 2009-09-11 20:44:59 UTC
Created attachment 143021 [details] [review]
new patch with various fixes and a flexible decimal precision

This patch fixes:
- Language translations
- Overflow when a smart guy gets a 1024 YB file
- Flexible decimal precision. meaning a 1 MB file gets 2 decimals. 10MB get 1 decimal and 100 MB gets no decimals.
Comment 3 Christian Persch 2009-09-11 20:50:55 UTC
You can't ever get ZB or YB here, since goffset is only 64 bits...

I happen to have a (different) patch adding TB/PB/EB in my tree (no changes in decimal places shown); attaching.
Comment 4 Christian Persch 2009-09-11 20:51:54 UTC
Created attachment 143022 [details] [review]
alternative patch
Comment 5 Christian Persch 2009-09-11 20:54:24 UTC
(Just for reference, binary SI units were rejected already in bug 512443; see also bug 554172.)
Comment 6 Benjamin Otte (Company) 2009-09-11 21:30:18 UTC
This patch was created on IRC with my help and the idea to display numbers that always contain 3 digits, no matter if and where a thousands separator gets placed. This seems to be what Windows Vista does for drive sizes, too.

Of course, this patch would look a lot better, if we didn't use 1024 as a thousands separator, because now we have sizes like "1012 bytes" - which should really be "1.01 KB", but I'm not the one to decide that.

Oh, and I'm still finding issues:
The bytes case needs to be special cased, so it properly says "1 byte" and not "1 bytes". And of course it should say "30 bytes" and not "30.0 bytes".

Otherwise that patch looks fine to me and I'd commit it like that after string freeze.
Comment 7 Mark 2009-09-11 21:37:53 UTC
Created attachment 143030 [details] [review]
Some fixes
Comment 8 Cosimo Cecchi 2009-10-07 15:52:57 UTC
*** Bug 594886 has been marked as a duplicate of this bug. ***
Comment 9 Mark 2009-10-16 16:39:47 UTC
Can this patch (the "Some fixes" one) be pushed in glib?
Comment 10 Matthias Clasen 2009-10-16 17:23:17 UTC
I must say I'd prefer a patch that extends the if-else cascade instead of introducing a pointless array. But I have no opposition to the functional aspects of the patch.
Comment 11 Christian Persch 2009-10-16 17:30:58 UTC
Review of attachment 143030 [details] [review]:

::: glib/gfileutils.c
@@ +1734,3 @@
+  guint i, decimals;
+  gdouble displayed_size;
+  const gchar* sizes[] = 

To save relocations, use 
static const char sizes[][8] = ...

@@ +1743,3 @@
+    N_("%.*f EB"),
+    N_("%.*f ZB"),
+    N_("%.*f YB")

ZB and YB can't occur, as already pointed out in comment 3.

@@ +1751,3 @@
+  displayed_size = (gdouble) size / 1024;
+  
+  for (i = 0; displayed_size >= 1024 && i < G_N_ELEMENTS(sizes); i++)

The | && i < G_N_ELEMENTS(sizes)| condition is unnecessary (and if it wasn't, the index in |sizes[i]| below would be out of bounds.)

@@ +1756,2 @@
     }
+  decimals = (displayed_size <= 9) ? 2 : (displayed_size <= 99) ? 1 : 0;

Why <= 9 and <= 99, instead of < 10.0 and < 100.0 ?
Comment 12 Christian Persch 2009-10-16 17:32:47 UTC
(In reply to comment #10)
> I must say I'd prefer a patch that extends the if-else cascade instead of
> introducing a pointless array. But I have no opposition to the functional
> aspects of the patch.

The patch in attachment 143022 [details] [review] does this (although it doesn't change the decimals displayed like the other patch).
Comment 13 Mark 2009-10-17 19:20:59 UTC
Created attachment 145690 [details] [review]
updated patch

for comment 11,
i tried to make in all your suggestions but have an issue with the array thing:
static const char sizes[][8] = ...

when i do that i get compiler warnings.. so i did it slightly different which seems to work fine as well. (unless i miss something)

the: "The | && i < G_N_ELEMENTS(sizes)| condition is unnecessary" is not unnecessary as it prevents out of bounds for people that go beyond 1024 EB.

for the decimals of 10.0 and 100.0, your right. it's better readable that way so adjusted it as well.

Some other things i fixed (amazing how much can be fixed in such a small patch)..
- I tried to get that second return (actually the first return) out of the code since i found that one a little odd. There now is only one return which i like best ^_^

- g_format_size_for_display returned a char while the return function inside g_format_size_for_display returns gchar (g_strdup_printf) so fixed that in the source and header file.

- This might be a bit ugly hence the note in comment code. Because the byte entry isn't in the array every for iteration is in fact ending in a i number of 1 to much. so i just do "i - 1" in the return line. I hope this is allowed for glib standards (whatever those are)?

I'm currently running a glib with this patch and it seems to run fine everywhere so far.
Comment 14 Christian Persch 2009-10-17 20:13:58 UTC
(In reply to comment #13)
> i tried to make in all your suggestions but have an issue with the array thing:
> static const char sizes[][8] = ...
> 
> when i do that i get compiler warnings.. so i did it slightly different which
> seems to work fine as well. (unless i miss something)

What is the exact compiler warning? I don't get any, here.

> the: "The | && i < G_N_ELEMENTS(sizes)| condition is unnecessary" is not
> unnecessary as it prevents out of bounds for people that go beyond 1024 EB.

This *can't* go up to 1024 EB since goffset is gint64.
 
> for the decimals of 10.0 and 100.0, your right. it's better readable that way
> so adjusted it as well.

(The point was that these conditions aren't equivalent at all, since displayed_size is double, not integer.)
Comment 15 Mark 2009-10-17 20:21:49 UTC
comment 14

oh, oops. missed the EB one.

about the 10.0 and 100.0
yes, that's true as well.

btw. can't we use a unsigned long long for filesizes since they will be able to handle EB, ZB and YB ... or is that expensive somewhere else?
Comment 16 Mark 2009-10-19 12:44:40 UTC
Created attachment 145782 [details] [review]
removed EB from patch

Same patch as before but without EB.
Can this be accepted for glib?
Comment 17 Christian Persch 2009-10-19 16:28:50 UTC
You misunderstood. EB itself can be reached, just not 1024 EB.
Comment 18 Mark 2009-10-19 19:30:06 UTC
Then it's imho better to leave EB out.
Comment 19 Emmanuele Bassi (:ebassi) 2009-10-21 13:06:16 UTC
Review of attachment 145782 [details] [review]:

::: glib-2.22.2-orig/glib/gfileutils.c
@@ +1771,3 @@
+  {
+    displayed_size /= 1024;
+  }

wrong coding style: G_N_ELEMENTS needs a space before the parenthesis and the curly braces need to be removed

@@ +1774,3 @@
+  
+  decimals = (displayed_size < 10.0) ? 2 : (displayed_size < 100.0) ? 1 : 0;
+  

I'd use an if() here instead of being overly clever with the ternary ?:.

if we want to keep the ternary conditional it should be re-indented:

  (cond1) ? res1
          : (cond2) ? res2 : res3

@@ +1778,3 @@
+  /* note: the i - 1 is needed because the byte entry is missing from the array which is due to the 2 possible byte values "byte" and "bytes" */
+  return (i == 0) 
+            ? g_strdup_printf (g_dngettext(GETTEXT_PACKAGE, "%u byte", "%u bytes",(guint) size), (guint) size)

missing space between g_dngettext and parenthesis, and the indentation of the statements in the ternary conditional is wrong. again, maybe an if() would be a better option for maintainability and readability
Comment 20 Mark 2009-10-21 13:48:19 UTC
Created attachment 145953 [details] [review]
coding style update

Fixed all critics from comment #19
Comment 21 Matthias Clasen 2009-10-25 05:56:27 UTC
Sorry, I still prefer to keep the if-else style of the original code over the array.
Comment 22 Alexander Larsson 2009-11-25 14:08:11 UTC
I agree, its not obviously more efficient (it has a bunch of divisions) and its way harder to read. Just keep the chained if-then-else.
Comment 23 Mark 2009-11-25 17:13:00 UTC
I really don't get what you 2 persist on having visual code paths. The best way to go from a 1234567890 size to a human readable size is in a loop. That is common knowledge. KDE does that as well: http://websvn.kde.org/trunk/KDE/kdelibs/kdecore/localization/klocale.cpp?revision=1047927&view=markup from line 1429.

I dislike the if-else style in this case thus will simply not make it. This loop way is way more extensible then those if-else statements.. You now only need to add one more element to the array and a new size format is added.

Anyway. My patch is in my style and is the right one for the current case (again, look at how KDE did it). Do with it what you want. If you insist on having the if-else style then you have to ask someone else to make this.
Comment 24 Matthias Clasen 2009-11-25 17:51:41 UTC
> I dislike the if-else style in this case thus will simply not make it

Thats fine, your choice. Eventually somebody else will produce an acceptable patch
Comment 25 A. Walton 2009-11-26 02:25:17 UTC
(In reply to comment #21)
> Sorry, I still prefer to keep the if-else style of the original code over the
> array.

So then why not commit chpe's patch and close this as FIXED?
Comment 26 Matthias Clasen 2009-11-26 02:45:39 UTC
chpe's patch doesn't contain the always-show-3-digits refinement. But I guess we can leave that for another day. So go ahead and commit it, please.
Comment 27 Alexander Larsson 2009-11-26 09:18:01 UTC
Actually, I'm not sure the always-show-3-digits thing is something we want.
It makes reading lots of numbers, say in a list column, harder.

100.0
 10.2
344.4
  1.1
 42.0

vs

 100
10.2
 344
1.12
42.0

Its much harder to scan the numbers in the always-three-digits case and get a feeling for their size. And in many cases this is more important than a more exact byte-count, like when looking for large files. (Of course, using prefixes on units makes size estimation at-a-glance harder too, but we shouldn't make it even worse).

If you need the exact byte count then it is availible in the file properties dialog, with 100% accuracy.

Now, i can see using the always-three-digits method for drive sizes, as there you generally just want a nice looking label, but g_format_size_for_display is mostly used for file sizes, and there i think glancing the file size easy is more important.
Comment 28 Christian Dywan 2010-09-01 13:54:49 UTC
*** Bug 628044 has been marked as a duplicate of this bug. ***