After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 554172 - g_format_size_for_display() should use correct IEC units
g_format_size_for_display() should use correct IEC units
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
2.18.x
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
: 640432 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2008-09-28 12:57 UTC by Christian Neumair
Modified: 2011-07-21 04:17 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
g_format_size_for_display(): Use correct base-10 units (3.24 KB, patch)
2010-03-10 07:58 UTC, Martin Pitt
rejected Details | Review
Screenshot illustrating a size related problem (35.62 KB, image/png)
2010-03-11 14:57 UTC, simon80
  Details
units_draft1.patch (9.50 KB, patch)
2010-03-15 17:01 UTC, Benjamin Drung
none Details | Review
units_draft2.patch (10.68 KB, patch)
2010-03-16 11:25 UTC, Benjamin Drung
none Details | Review

Description Christian Neumair 2008-09-28 12:57:38 UTC
g_format_size_for_display() should clearly use IEC units.

Troughout various bug reports and mailing lists, there were many (emotional) discussions, and I am trying to make a short, unbiased and historically and technically correct summary of the situation - ignoring previous discussions.

Please try to fully understand all conventions and their implications before commenting on the bug report.

1. historical perspective

a) ancient usage of k* & K*

Historically in physics, for hundreds of years, a k prefix of a physical quantity referred to a thousand units of that quantity. Consequently, kB refers to 1000 bytes.

In the 70s, KB with (uppercase!) was introduced for referring to 1024 bytes.

b) shift in language usage

When mass storage industry came up, that convention shifted towards using KB for 1000 bytes. Possibly it looks nicer on the retail boxes than gB. A formal SI unit standard was put forward that exclusively allowed kB, mB, gB (base 1000), but it was ignored.

Later, the IEC recognized that the storage industry and conventions leaned towards KB, referring to the base 1000 quantity. Thus, the IEC defined 1 KB (= 1 kB, SI) = 1000 bytes, and introduced the new units KiB, MiB, GiB, where Ki = 1024 bytes.

This somewhat breaks with the historical tradition mentioned under a), but it does not require the optical and intellectual challenge of distinguishing upper- and lowercase characters from the users.

2. current situation

* nobody uses SI conventions (k*, m*, g*)
* storage manufacturers correctly use the IEC base-ten convention: storage size is specified in KB, MB, GB
* memory manufacturers wrongly use a mish-mash of both IEC conventions: The actual memory size quantity is base-two, but the units are base 1000. Thus, you get 24 memory cells per base unit “for free”, compared to what you would expect with the actual definitions in mind. The inherent “base-two” property of memory cells stems from the binary logic of the address lines.

* Windows (XP) used the same memory manufacturer mish-mash! Thus, people plugging in their X GB disk, suddently see X*(1000/1024)^3 disks, believing i na hardware fault.

* many unix tools use the 1024 convention. For instance ls -h, unless told otherwise with --si. Note that ls -h uses the ancient usage described under a), which implies K=1024 and can be considered wrong nowadays in the light of the IEC convention. It should probably be modified as well.

* g_format_size_for_display() (and thus Nautilus) also uses the wrong memory facturer / Windows XP convention, which does not conform to any of the IEC or SI standards.

* gio special-cases the calculation for disk storage sizes, cf. bug 550100.

3. suggestion

Use one of the two correct IEC conventions. Since nowadays we are on a mass market, and everybody has ten digits, I propose to use the 1000 convention. 

Granted, people will be surprised when their memory sticks and main memory looks smaller than advertized, but on the other hand once the migration took place, there will be no confusion whatsoever - it's just that Windows probably still uses an odd convention.

Note that consistency is also required, because at the moment a user will not be able to tell whether a 1 GB file fits on a 1 GB storage unit, due to the file size being base 1024, and the unit being base 1000.

A remarkable quote from [1] also suggests to use base 1000 units:

“Almost all computer user tasks (and many high-level programming tasks) have no natural affinity or need for explicit powers of two. The consumer confusion between powers of 1000 and powers of 1024 may derive largely from some operating systems and applications that were originally written by and for programmers, and which thus reported quantities such as file sizes in familiar (to programmers) powers of 1024 while using SI (powers of 1000) abbreviations.”

Note that the article is a bit wrong in that SI abbreviations are used. As mentioned above, K* is not an SI abbreviation, but one of the two possible IEC abbreviations.

Additional reading:
[1] http://en.wikipedia.org/wiki/Binary_prefix#Prefixes
[2] http://en.wikipedia.org/wiki/Gigabyte
Comment 1 Christian Neumair 2008-09-28 13:12:00 UTC
Bug 427807 which suggested to add the correct Ki prefix to the base-1024 units used in g_format_size_for_display() has bug 427807 comment 2 pointing to an old mailing list discussion:

http://mail.gnome.org/archives/gtk-devel-list/2007-December/msg00240.html

It turns out that someone decided to use base 1024.

It was also pointed out that sometimes you want memory sizes, and I agree that hardware inspection GUIs should tell you that 1024 * 1024 * 1024 * 8 memory cells are 1 GB.

Maybe adding g_format_memory_size_for_display() makes sense. It would clearly use the IEC KiB convention, and be base-1024.
Comment 2 Christian Neumair 2008-09-28 13:13:55 UTC
> I agree that hardware inspection GUIs should tell you that 1024 * 1024 * 1024 * 8 memory cells are 1 GB.

Sorry, that should have been 1 GiB!
Comment 3 Patryk Zawadzki 2008-09-28 13:39:23 UTC
(In reply to comment #0)
> 2. current situation
> 
> * nobody uses SI conventions (k*, m*, g*)

Actually there is no "g" prefix and "m" stands for "milli", I doubt you want to divide a single bit by 125 (a millibyte is a millioctet) ;)
Comment 4 Jürg Billeter 2008-09-28 14:33:03 UTC
I really don't think that the new IEC prefixes help in any way. People usually  know about Megabyte and Gigabyte but they have no idea what Mebibyte and Gibibyte should be. And if they don't know that, how is it going to help to avoid confusion? The common way as I see it is to use power of 1000 for hard disk sizes and power of 1024 for everything else that is measured in bytes (file and memory sizes) and always use KB, MB, or GB. Bitrates are measured in power of 1000 using kbps, Mbps, or Gbps.

I recognize that it may be an issue, however, neither switching everything to power of 1000 (inconsistent with common use and most other software) nor switching to Kibibyte (almost completely unknown) makes much sense to me. I'd rather consistently use power of 1024 with KB/MB/GB except possibly when used as label where it could make more sense to be consistent with what the hard disk manufacturers put on the package.
Comment 5 Christian Neumair 2008-09-28 15:23:03 UTC
> neither switching everything to power of 1000 (inconsistent with common use and most other software)

Please always consider that we are software developers, and our users are not. It doesn't help anybody if we implement a "this has been a mess since ever, but K probably means 100 in the context of disk sizes and 1024 otherwise" policy. Using K to mean *both* is a no-go, at least in the context of files and disks.

For everybody who does not share my conclusion, maybe we could at least agree that we need a policy that allows people to distinguish the base?

That said, it should be one of the following:

a) historical: kB => 1000, KB => 1024
b) SI: kB => 1000, nothing else [which I proposed for the SAKE of simplicity, logic and world-wide decimal traditions]
c) KB => 1000, KiB => 1024
d) KB => 1024, nothing else

The current policy of

e) KB for file or memory size => 1024
   KB for storage size => 1000

is totally awkward. The basic canon should be: If you are able to write an end-user documentation that is able to explain why this policy is used, and that is based on the concept's inner logic, then write it. If you however have to realize that the conventions are due to historic appliances and origins [which I extensively elaborated on in comment 0], it is simply inconsistency! 

And inconsistency is totally against all UI principles we followed since the HIG has been written. Call me an interface Nazi, but if you take the HIG seriously, you simply can not support that file sizes and disk sizes use a different base. Users won't understand why a 1 GB file (if it's actually 1 GiB) can not be copied to a 1 GB medium. They also won't understand why the 34 GB disk has a different capacity as advertized [if we went with policy d)].

Granted, memory sticks would be displayed with a different capacity as well, but they show up with *more* capacity rather than less which is a positive experience.
Comment 6 Christian Neumair 2008-09-28 15:24:49 UTC
> b) SI: kB => 1000, nothing else [which I proposed for the SAKE of simplicity,
logic and world-wide decimal traditions]

Actually I proposed the IEC flavor, where k is also uppercase. But that doesn't change the idea of using 1000 rather than 1024.
Comment 7 Patryk Zawadzki 2008-09-28 15:34:21 UTC
(In reply to comment #4)
> The common way as I see it is to use power of 1000 for hard
> disk sizes and power of 1024 for everything else that is measured in bytes
> (file and memory sizes) and always use KB, MB, or GB. Bitrates are measured in
> power of 1000 using kbps, Mbps, or Gbps.

We're performing stunts on thin ice here. If my disk size is 10 MiB and my file size is 5 MB then how much space is left and in what units?

It's like measuring container volume in gallons and liquid volume in liters in the very same user interface.
Comment 8 Behdad Esfahbod 2008-09-28 17:49:43 UTC
It's not like a 1GB file fits into a 1GB disk anyway.  Most all file systems take some of your disk away.
Comment 9 Patryk Zawadzki 2008-09-28 18:27:02 UTC
Behdad:

If you are referring to my previous comment - true, but I still believe these are apples vs. oranges and shouldn't be mixed (if you think my gallons vs. liters argument is silly then consider land vs. nautic miles).

To me it makes little sense to change the base so that reported disk sizes match claims of their producers as they should be using the proper unit in the first place (1000 GiB/1024 GB). The question should be which unit to use globally so file sizes can be compared to the storage device capacities.
Comment 10 Behdad Esfahbod 2008-09-28 18:30:39 UTC
(In reply to comment #9)
> Behdad:
> 
> If you are referring to my previous comment - true, but I still believe these
> are apples vs. oranges and shouldn't be mixed (if you think my gallons vs.
> liters argument is silly then consider land vs. nautic miles).

I was mostly referring to Christian's argument that a 1GB file should fit on a 1GB drive...

> To me it makes little sense to change the base so that reported disk sizes
> match claims of their producers as they should be using the proper unit in the
> first place (1000 GiB/1024 GB). The question should be which unit to use
> globally so file sizes can be compared to the storage device capacities.
> 

Comment 11 Filippo Argiolas 2008-09-28 19:05:59 UTC
Just to add some bit: a good reference for correct binary prefixes can also be found at http://physics.nist.gov/cuu/Units/binary.html and in units(7) man page. 

(In reply to comment #6)
> Actually I proposed the IEC flavor, where k is also uppercase. But that doesn't
> change the idea of using 1000 rather than 1024.

I looked for a reference about this case ambiguity but didn't find any... have a link?

Anyway, I don't think we should care too much about common sense... we should just use what current standards prescribe.
That's the only way to avoid each kind of ambiguity and be sure that a unit is understood correctly everywhere.
That's the reason because standards exists and we should do our little part here to push their adoption and solve this messy situation.

So.. +1 for SI (10 powers) prefixes.
Comment 12 Dan Winship 2008-09-28 19:35:13 UTC
(In reply to comment #11)
> Anyway, I don't think we should care too much about common sense... we should
> just use what current standards prescribe.

Unfortunately, just because something is standard doesn't mean it's good/right. (See also: OOXML). So -1 for standards and +1 for common sense. (It's entirely possible that common sense will lead us to adopt the standard, but the common sense has to come first.)
Comment 13 Patryk Zawadzki 2008-09-28 19:40:29 UTC
Common sense varies from country to country and changes over time. Also, common sense is something we can shape by promoting one way or another.
Comment 14 Filippo Argiolas 2008-09-28 19:57:35 UTC
(In reply to comment #12)
> Unfortunately, just because something is standard doesn't mean it's good/right.
> (See also: OOXML). So -1 for standards and +1 for common sense.

Common sense has the little drawback that it can be different amongst different
people.
So if you want to be sure that a measured quantity means the same thing for
everyone you have to agree over a common standard. It's not a matter of it
being good or bad it's just mandatory to give a meaning to that quantity.

Note that standards can fail (ooxml case) when each party wants to push its own
unit/format/etc so there is no way to have a common one. But that's clearly not
the case here, we have only one widely shared, trusted and adopted standard: SI.
Comment 15 Alexander Larsson 2008-09-30 12:33:00 UTC
I'm strongly against this, and has always been. For almost the entire lifetime of computers a kilobyte has been 1024, with no problems. Then the IEC standards show up to "clear up the confusion" and people started using 1000 as the base causing massive confusion amongst users and leading to class action suits.

The fact that 1024 is not the iso prefix used for physical unit have basically no relation to the memory/data size case. "Normal" people don't actually convert the prefixed sizes to exact byte counts, and if you actually do so, its pretty likely that you are aware of the 1024 issues.

Anyway, I've argued this over the lifetime of gnome multiple times. I don't think that we'll get a better consensus this time, so I'm not sure why this was brought up again.
Comment 16 Jean-Christophe Dubacq 2008-09-30 18:39:20 UTC
For network speeds, the power of ten scale is always used. Consistency pleads for powers of 10. BTW, I am teaching CS to students, and always teach them to use the kiB (actually kio) abbreviation if they use the binary scale.
Comment 17 Tor Lillqvist 2008-09-30 21:47:26 UTC
"actually kio"? I am speechless.

(Yes, of course I know that in a strict sense, "octet" is more correct than "byte", and that it is the term preferred in many (all?) ITU and ISO standards. But we are not talking about terminology used to write standards here.)

My opinion is that for end-user interfaces, this whole issue is moot. There should almost never be any reason to show exact quantitative information expressed as megabytes or gigabytes, either the power-of-ten or power-of-two variants, to real end-users. Instead just tell them how close to being full the media in question is. That is what they want to know anyway. (Compare to cameras: At least my camera doesn't show how many gigabytes are free on the CF card on its LCD. It shows an estimate of how many more pictures will fit.)

For media identification purposes, just display what is most likely to be actually used to identify the the media, i.e. what is printed on it or how it was advertised at the shop. For USB sticks etc, binary gigabytes, "the 16 GB CompactFlash card". For hard drives, decimal gigabytes, "the 500 GB hard disk".
Comment 18 Martin Ejdestig 2008-09-30 22:01:33 UTC
FWIW, I'm in complete agreement with what Tor said in comment #17.

This whole discussion is a bit... on the nerdy side. Most people just don't care. Show them what they want to know instead. (How many pictures more can fit on this camera, how far away is the disk from being full. Etc.)
Comment 19 Patryk Zawadzki 2008-09-30 22:04:55 UTC
You both seem to ignore the "will this fit" or "how much time would it take to compress and send this directory over the net" scenarios.
Comment 20 Martin Ejdestig 2008-09-30 22:25:52 UTC
Huh?

Perhaps you could show the user somehow if it will fit or not instead of letting them figure it out by themselves? (When e.g. hovering over a folder put some mark somewhere or use some color or both, I don't now, if the file they are dnd'ing won't fit.)

And why not show how long it would take to send whatever they want to send (before they actually send it... If it is critical, that is...) in whatever app they use to send stuff?

And I'm also not saying that exact size shouldn't be available somewhere. It's just that most people have a hard time grasping it so they probably don't care about it.

I bet your <insert non tech-savy relative or friend> can't figure out how long it takes to send 234MB of data over an Internet connection with 1Mbit/s upstream, nor should she. Hence, this discussion is irrelevant for most people.

But by all means, you're free to discuss irrelevant issues. I won't stop you. ;)
Comment 21 Patryk Zawadzki 2008-10-01 07:33:15 UTC
There are people who don't care about the file sizes and there are people who do. For the first group _this_ bug report is totally irrelevant so I don't see your point.

I am also not a fan of dumbing things down too much. Eventually we'd get to the point where your ATM/credit/debit card has a green light to indicate you have enough money to buy a computer and a red light to say otherwise. "People just don't care about Euros and cents."

I also think the whole thread is not quite right about 1024 multipliers confusing people. An average Jane user is actually convinced a megabyte is a unit on its own. You can claim that people know SI to try and win the argument but when was the last time any of us heard of a gigameter or a teragram?
Comment 22 Martin Ejdestig 2008-10-01 11:36:31 UTC
My main point is that however this is resolved, we are going to have the same discussion in a year or so again (see the end of Alexander's last comment). And it is not that important to most people.

Anyways, EOB (end of bug ;) for my part.
Comment 23 Tobias Domhan 2008-10-12 13:11:18 UTC
that others do it wrong is not quite an argument.

all summarized there are people who care about standards and there are standard users on the other side. though the average user is confused anyways, as there are many applications that use the standard, why isn't this thing solved for gnome gernerally.(e.g. gparted, and even the GNOME System monitor use the standard)
and even if this isn't changed by default why can't it be changed by more advanced users through gconf or something...if there is more than one opinion you normally try to find compromises.
I thought this was the advantage of oss, that the user can have a influence and can decide himself how he wants the piece of software to work, not being depended on stupid firms making their decisions.
just my 2 cents. 

p.s. concerning the teragram -> kilogram is the si unit
Comment 24 Peter Bloomfield 2008-10-21 23:12:28 UTC
I am just embarrassed to have contributed a patch to an otherwise standards-compliant app that replaces home-brewed code with g_format_size_for_display, only to find that it's *not* standards-compliant.  Standards are our friends!

I propose that we deprecate g_format_size_for_display and replace it with a decimal version and a binary version, both using ISO/IEC-compliant suffices.  I leave naming to the gurus.
Comment 25 Ondra Pelech 2009-04-25 17:01:03 UTC
I've filled a bug on the HIG regarding to this stuff.

http://bugzilla.gnome.org/show_bug.cgi?id=580232
Comment 26 Benjamin Otte (Company) 2009-09-11 21:45:01 UTC
I'm in the 1000 camp, too.

It even has a practical advantage: It makes it easy to display file sizes with 3 digits of relevance (see patches ad discussion in bug 594918). With 1024 we're left with files listed as 1003MB, which is 4 digits - it should be 1.05GB instead.
Comment 27 Martin Pitt 2010-02-11 09:06:34 UTC
(In reply to comment #12)
> Unfortunately, just because something is standard doesn't mean it's good/right.
> (See also: OOXML). So -1 for standards and +1 for common sense.

While I heavily disagree on the "-1 for standards" (the SI system has been a huge success in the world, after all), "common sense" also seems to dictate to use the k/M/G prefixes in their proper (SI, decimal) form. Ask any non-computer geek about their idea of a "kilo" or "mega", and I bet they'll answer "1000"/"1,000,000". 

Ask a computer geek, and they'll probably say "Depends...". These days it's even harder to justify the base-2 prefixes. Network devices have always used base-10, hard disks have used them for many years now. The only thing that's still base-2 is RAM.

This also matches the expectations of "will this 9.8 GB file fit on a 10 GB disk?" If the 9.8 GB file is actually 9.8 GiB (and thus much bigger than 9.8 GB), it will very clearly not. With a real 9.8 GB file it might not fit either because of the large overhead of some file systems, but with an efficient file system it at least has a chance to.

So a heavy +1 on sanity and standardization (i. e. keep a kilo a kilo and mean "1000") from me as well.
Comment 28 Ondra Pelech 2010-02-11 16:03:10 UTC
i think that thare are 2 separate major problems which should be decided... (gues what? :) ) *separately*.

problem #1: which units do we want to use for what?

problem #2: do we want to use proper names for the units that we choose?

answer #2: definitely yes! if we use binary units, make sure that user sees KiB, MiB, ...), if decimal, let it be (kB, MB, ... )

answer #1: I don't care much, as long as there are proper names for units used. however decimal units have a significant downside (along with some benefits) - consistency with other operating systems. common users won't notice the unit name change, but will notice the "number change". "how is it possible, that my GNOME machine says different size, that my Windows/MacOS machine does?"

anyway, this is a matter of user interface, so it would be very good, if people from GNOME Human Interface Guidelines made a stand. current situation is a mess.
thanks ;)
Comment 29 simon80 2010-03-01 04:01:01 UTC
The reason I found this bug is that Nautilus has switched to showing "80GB Filesystem" in lieu of a label, while GB represents 1024-based units everywhere else in the same UI. This is so absurd that even laypeople will notice the inconsistency, despite no prior awareness of the distinction.

I strongly disagree that the units should be SI units simply to be consistent with the marketing tactics of OEMs (which would simply reduce awareness of this, to their favour), but it's clear from the comments here that there is no consensus on that issue. However, if both unit types are used, then it's at least necessary to avoid using the same suffixes on both types.

I support the use of SI/IEC suffixes, if only for the sake of consistency. "Common sense" tells me that it's better to have an unfamiliar, but distinct pair of suffixes than one ambiguous suffix.
Comment 30 Tor Lillqvist 2010-03-01 09:22:06 UTC
> This is so absurd that even laypeople will notice the inconsistency

You must be joking, surely?
Comment 31 simon80 2010-03-01 14:27:54 UTC
Heh - well, those of them that go to the properties dialog of their "80GB Filesystem" or whatever will notice. To say that none of them at all would see it is quite a stretch.
Comment 32 Martin Pitt 2010-03-10 07:58:23 UTC
Created attachment 155704 [details] [review]
g_format_size_for_display(): Use correct base-10 units

Since this violates the recently instated units policy in Ubuntu (https://wiki.ubuntu.com/UnitsPolicy) we applied this patch in Ubuntu to make g_format_size_for_display() standards conformant (original patch by Benjamin Drung, I adapted the test cases accordingly).

The obvious other alternative would be to keep the 1024 multiplicator, and fix the units to be KiB, MiB, etc., but I'm strongly convinced that this would just lead to more confusion (everything except RAM size is commonly written in standard base-10 prefixes these days, after all).

Please fix it one way or the other, and stop claiming that 1 MB == 2^20 Bytes.

Thank you for considering!
Comment 33 David Zeuthen (not reading bugmail) 2010-03-10 17:07:43 UTC
I like the patch in comment 32 but it breaks the ABI and, as such, can't be accepted. Here's a longer explanation

<davidz> pitti: wow, heh: https://bugzilla.gnome.org/show_bug.cgi?id=554172#c32
 pitti: that's a very old wound you are opening there ;-)
 but, hey, I fully support that
 I doubt the patch will go into glib though
 and I guess it's sorta-kinda an ABI break too
 I think if we decided to adopt base 10 units in GNOME (as we should) we would do it by adding new API
 and porting users over
 and not touch g_format_size_for_display() (we'd deprecate it though)
 while I support the change, it's a little unnerving that one distro (and a large one at that) changes the ABI like that
 anyway, just some thoughts - hope it helps
 and remember to don your asbestos suit ;-)
Comment 34 David Zeuthen (not reading bugmail) 2010-03-10 17:28:09 UTC
Moreover, I would actually suggest some new API a'la

 gchar *g_format_size (goffset size, GFormatSizeFlags flags);

where

 typedef enum {
   G_FORMAT_SIZE_FLAGS_NONE = 0,
   G_FORMAT_SIZE_FLAGS_FORCE_BASE_2_UNIT = (1<<0),
   G_FORMAT_SIZE_FLAGS_FORCE_BASE_10_UNIT = (1<<1),
 } GFormatSizeFlags;

where we don't guarantee whether a base 2 or base 10 unit is used (unless the base has been requested through @flags). 

(So, yes, this allows enterprising desktop environments (though some extension point magic </handwaving>) to use GConf/DConf/whatever to read a user setting. We could also add FORCE_BASE_2_UNIT_WITH_BASE_10_DESIGNATIONS to emulate g_format_size_for_display() but I honestly think that is super misleading. Or if you don't want to throw user settings at the problem, just hardcode base 10 (with MB) or base 2 (with MiB) or base 2 with wrong units (with MB)). The point is that we don't _gurantee_ anything about it. Which makes it possible for some distros to actually make a decision that works for them.)

Btw, here are two examples of where it is useful to use both units

 http://people.freedesktop.org/~david/nautilus-lvm2-b.png
 ("Extent Size")

 http://people.freedesktop.org/~david/md-raid-create.png
 ("Stripe Size")

though note that this is from an app targeted for system admins (e.g. people expected to know the difference between MB and MiB) and, for good measure, the app always show the size in bytes too.
Comment 35 Martin Pitt 2010-03-10 17:55:38 UTC
Thanks for the followup. I also got pinged by mclasen and desrt, I actually hadn't expected that much pushback on it.

I could also work on a patch which leaves g_format_size_for_display() alone and broken, and call it deprecated, and add this new function instead. (Or, if that can't land in 2.30 any more, just locally patch nautilus, since that's really where people see it most).
Comment 36 simon80 2010-03-10 20:04:44 UTC
Are there any apps that actually depend on the broken output to the point where the unexpected change outweighs the benefit of fixing this?

Nautilus, for example, is currently using the same unit suffix for two different meanings, even within one dialog. I don't have time to verify the sequence of events, but I'm guessing someone already "changed the ABI" to produce the current result that I'm seeing in nautilus 2.28. In the worst case, applying Martin's patch will move from incorrect units to units that are correct, but possibly different from other sizes shown in the same UI. This seems to be solely an improvement to me.
Comment 37 Olav Vitters 2010-03-10 20:26:13 UTC
(In reply to comment #36)
> Are there any apps that actually depend on the broken output to the point where
> the unexpected change outweighs the benefit of fixing this?

Please be a bit more respectful towards others who do not share your opinion.
Comment 38 Martin Pitt 2010-03-10 22:51:34 UTC
It seems that there are two different issues here:

 (1) It's arguable whether the patch is a bug fix or an ABI break, but for the benefit of the doubt let's assume it is an ABI break. So one concern is that the patch should be redesigned to introduce a new function and obsolete the current one.

 (2) It still doesn't seem to be consensus that the current behaviour is undesirable in the first place. While many people speak in favor of using standard base-10 by default, there are others who want to keep base-2. This is a much harder question to answer, and I'm not quite clear who can/will do that decision? It should be in glib, though. If we offer two equivalent functions, then we'll end up with half of the applications using either, and the situation will be even worse than we have now.
Comment 39 Benjamin Otte (Company) 2010-03-11 08:05:00 UTC
I have a problem with marking this as an API break, even if the documentation clearly states that it uses base-2. The function is supposed to format the size for displaying it and people have used it assuming that it will result in the best way to represent a size in a concise human-readable way. So arguably the docs give too many guarantees and we should fix that to instead say something like "The function gives no guarantees on the precise format, so if you rely on a specific representation of the number, do not use this function."

That said, I'd say that introducing a new function and only converting some functions is an even worse break, though it's a UI break. But if Nautilus talks about a 10GB file and file-roller says it's 11GB, just because one application uses the new function and one the old one is not a good idea.
I also think it's a lot easier to find the few cases that rely on the format used and fix just them then to introduce a new formatting function and get fixes for all apps using the current one into upstream.

Side note: The docs also say that the function rounds to the nearest tenth, so if we ever wanted to make it say things like "1012kB" instead of "1012.4kB", would that be an ABI break, too?
Comment 40 Dan Winship 2010-03-11 14:39:03 UTC
(In reply to comment #39)
> But if Nautilus talks
> about a 10GB file and file-roller says it's 11GB, just because one application
> uses the new function and one the old one is not a good idea.

I thought this was already the case (maybe not with nautilus vs file-roller, but definitely with some apps), because some people are ignoring g_format_size_for_display() because it's "broken" and are generating "correct" labels on their own.
Comment 41 simon80 2010-03-11 14:50:06 UTC
(In reply to comment #37)
> (In reply to comment #36)
> > Are there any apps that actually depend on the broken output to the point where
> > the unexpected change outweighs the benefit of fixing this?
> 
> Please be a bit more respectful towards others who do not share your opinion.

Sorry, I didn't mean to marginalize anyone's opinions.

Given that Nautilus, and possibly others have started using base-10 quantities with the same unit suffixes (e.g. MB) that g_format_size_for_display is using for base-2 quantities, I thought it would be reasonable for me to describe things that way. Since there are no alternate suffixes for base-10 quantities to use, it seems that the only alternatives are to give the base-2 quantities a distinct suffix (MiB, etc.), to avoid ambiguities, or only display sizes in base-2.
Comment 42 simon80 2010-03-11 14:57:46 UTC
Created attachment 155865 [details]
Screenshot illustrating a size related problem

(In reply to comment #40)
> (In reply to comment #39)
> > But if Nautilus talks
> > about a 10GB file and file-roller says it's 11GB, just because one application
> > uses the new function and one the old one is not a good idea.
> 
> I thought this was already the case (maybe not with nautilus vs file-roller,
> but definitely with some apps), because some people are ignoring
> g_format_size_for_display() because it's "broken" and are generating "correct"
> labels on their own.

It's actually the case with Nautilus vs Nautilus. This screenshot was taken with Nautilus 2.28.4 on Debian testing.
Comment 43 David Zeuthen (not reading bugmail) 2010-03-11 16:21:34 UTC
(In reply to comment #40)
> (In reply to comment #39)
> > But if Nautilus talks
> > about a 10GB file and file-roller says it's 11GB, just because one application
> > uses the new function and one the old one is not a good idea.
> 
> I thought this was already the case (maybe not with nautilus vs file-roller,
> but definitely with some apps), because some people are ignoring
> g_format_size_for_display() because it's "broken" and are generating "correct"
> labels on their own.

Yes, the names for GDrive, GVolume and GMount objects (typically) stem from the gnome-disk-utility GVfs backend. That backend gets the names from a library provided by gnome-disk-utility. And, yes, gnome-disk-utility is not using g_format_size_for_display() because, yes, the author (me) thinks that it is broken and, yes, we strive to be correct in gnome-disk-utility.

And, yes, this what is causing the discrepancy as seen in the screenshot from comment 42. The label "250 GB Filesystem" stems from gnome-disk-utility, the rest is from Nautilus itself.
Comment 44 Charles Kerr 2010-03-14 18:29:41 UTC
Changing this downstream, as Ubuntu has done, is a *terrible* mistake.

The problem occurs when applications using g_format_size_for_display()
also allow user input of file sizes. For example, let's say an application
allows users to set the size of a disk cache. What units should the GUI
use when accepting user input?

Transmission's preferences dialog lets users specify speed limits in KiB/s.
If Transmission were to display speeds with g_format_size_for_display() + "/s"
as recommended by Benjamin Drung (the author of the g_format_size_for_display()
patch), Transmission will appear to end users to be exceeding the limit.

Conversely, if Transmission fully adopts Ubuntu's Unit Policy, its speeds
will be off in the other direction (never reaching the speed limit) on distros
using the upstream implementation of g_format_size_for_display().

In *both* cases, Transmission's input and display units will be inconsistent
with one another on at least one platform. In order to display units
consistently AND portably, an application would need to pass test input
to g_format_size_for_display() in order to see if it's the base 2 or base 10
version, and then tailor its input dialogs accordingly, as well as massaging sizes and speeds entered by the user.
Comment 45 Benjamin Otte (Company) 2010-03-14 18:46:15 UTC
Or you could just have a g_size_from_format() function that does the reverse mapping. Which would not just be awesome because glib would make sure it's converted according to its own specs, but it'd also ensure that diferent apps take the same input format.
Comment 46 Benjamin Drung 2010-03-15 01:53:00 UTC
g_format_size_for_display() should be deprecated and a set of new functions should be created.

* One for formatting a size for displaying it to the user. It should take an addition parameter describing the type of size. file size, RAM size, network bandwidth, etc.? It should have an option for showing the value in base-2 and base-10 (like the Linux kernel). Example interface: g_size_from_format2(int size, int type, bool both_basis's)
* One for the other way around. Should it take the value and unit separately?
* One for displaying the correct unit? foo(1) = KiB or kB, foo(2) = MiB or MB, and so on.
Comment 47 Martin Pitt 2010-03-15 11:34:18 UTC
For the record, we'll back out the g_format_size_for_display() patch in Ubuntu and instead will deal with applications individually (First and foremost we'll fix nautilus). 

In our current state of the release cycle we don't want to spend large amounts of time with fallout bugs like transmission (see Charles' post above). While transmission should be fixed to use proper units, it should be done in transmission consistently, and not one part in glib.

So, I withdraw my patch proposal, and instead propose to generally deprecate g_format_size_for_display(). If we can't fix it to do the right thing due to backwards compatibility, we have to declare it broken and get rid of it.

Thanks, and sorry for the hassle.
Comment 48 Benjamin Otte (Company) 2010-03-15 11:48:28 UTC
So you'll introduce a new function in Ubuntu to make sure that Nautilus formats sizes differently from all other (GNOME) applications?
Comment 49 Martin Pitt 2010-03-15 12:07:56 UTC
(In reply to comment #48)
> So you'll introduce a new function in Ubuntu

Yes, for the time being.

> to make sure that Nautilus formats sizes differently from all other (GNOME) applications?

Haha. Since that's already the case even within nautilus itself (see http://launchpadlibrarian.net/40938824/karmic-disk-properties.png), and things like gvfs or gnome-system-monitor already use correct base-10 units/prefixes, it can hardly get any worse.

At least nautilus will agree to itself then.
Comment 50 Colin Watson 2010-03-15 12:41:12 UTC
I didn't see it mentioned so far on this bug, but note that CDs are another medium normally labelled in base-2 units written as "MB" - "700 MB" -> 700 * 1024 * 1024 bytes.  This can lead to confusion when downloading an ISO9660 image and looking at it in a file manager to see whether it will fit on your medium, which is something people do still sometimes need to do since apparently 650 MB CDs are still available in some regions.
Comment 51 Benjamin Drung 2010-03-15 12:46:10 UTC
In contrast DVDs and hard drives are normally labelled in base-10. I see no solution that would fix all devices (CDs, DVDs, hard drives).
Comment 52 Colin Watson 2010-03-15 12:52:10 UTC
N.B. that I am not saying that there exists a solution that would "fix" all devices.  When designing UI such as this, though, it is important to have all the facts and not be defensive about items that don't fit.
Comment 53 David Zeuthen (not reading bugmail) 2010-03-15 15:40:48 UTC
(In reply to comment #50)
> I didn't see it mentioned so far on this bug, but note that CDs are another
> medium normally labelled in base-2 units written as "MB" - "700 MB" -> 700 *
> 1024 * 1024 bytes.  This can lead to confusion when downloading an ISO9660
> image and looking at it in a file manager to see whether it will fit on your
> medium, which is something people do still sometimes need to do since
> apparently 650 MB CDs are still available in some regions.

Right, even if we switch all of GNOME to proper base 10 units, we're still going to run into a couple of problems like that. My experience is that it helps to use formatting such as "700 MB (700,000,042 bytes)" when permitted. Looks like Nautilus isn't currently doing this.

(Fortunately this particular problem wasn't carried over to DVD or Blu-Ray discs - the packaging typically uses "4.7 GB", "25 GB" etc. with proper base 10 units. See http://en.wikipedia.org/wiki/DVD#DVD_capacity and http://en.wikipedia.org/wiki/Blu-ray#Technical_specifications for details.)
Comment 54 Charles Kerr 2010-03-15 16:26:19 UTC
(In reply to comment #47)
> For the record, we'll back out the g_format_size_for_display() patch in Ubuntu
> and instead will deal with applications individually (First and foremost we'll
> fix nautilus). 
> 
> In our current state of the release cycle we don't want to spend large amounts
> of time with fallout bugs like transmission (see Charles' post above). While
> transmission should be fixed to use proper units, it should be done in
> transmission consistently, and not one part in glib.

Also for the record, I agree that this is something that does need fixing in Transmission.  I'm not clear on what the Right Thing is, but the current state ain't it. :)

Reading over the last years' worth of mailing list flamewars on this topic, I see there are many people who feel strongly on all sides of this issue.  I agree with David and the Benjamins that a programmatic solution may be the way to handle this.  However just adding a GFormatSizeFlags argument g_format_size_for_display() is not enough, because then how will applications know which flag to use...
Comment 55 Benjamin Drung 2010-03-15 17:01:21 UTC
Created attachment 156197 [details] [review]
units_draft1.patch

Yes, having a GFormatSizeFlags argument is not the solution, because this will work contrary to consistency.

Attached the first draft for the new g_format_value_for_display function. Better names for this function are welcome. The functions for input conversion are not yet implemented.

There are three helper functions:
* g_format_size_in_base2 - formats size in base-2 using IEC prefixes
* g_format_size_in_base10 - formats size in base-10 using SI prefixes
* g_format_size_historical - formats size in base-2 using SI prefixes (except KB)

g_format_value_for_display maps to these helper functions depending on configuration and value type. Currently the default is base10, but you can set it with BASE2_DEFAULT to base2. If there is a demand for it, historical can be an option, too.

What do you think about it? Do you like it? Do you hate it? What should I improve? I like to hear (constructive) criticism.

Should I remove long_description parameter and put it into a separate function? Should formatting bandwidth be in a separate function?

Should the new function support bits, too (kbit, Mbit, ...)?

What do you think about extending LC_MEASUREMENT for overriding the used basis (similar to [1])?

[1] http://mail.gnome.org/archives/gtk-devel-list/2007-December/msg00250.html
Comment 56 Benjamin Otte (Company) 2010-03-15 18:16:30 UTC
You don't fix the problem of people using inconsistent representations by giving them APIs to express all of them easily.
I still think there should be one function in glib for formatting a size to display to the user. Anything else should require the hard work of applications implementing it on their own.

It's not the job of a general purpose library to provide support for bugs in the CD-ROM specification.
Comment 57 Matthias Clasen 2010-03-15 20:34:02 UTC
> It's not the job of a general purpose library to provide support for bugs in
> the CD-ROM specification.

The one reason why this function ended up in glib is that it is needed by some gio implementations. The fact that those implmentations live in gvfs has caused this function to be public. In hindsight, I should have never accepted it as public api. I'm very tempted to just deprecate it, considering the amount heat that it causes.
Comment 58 Benjamin Drung 2010-03-15 20:42:22 UTC
I would be fine with the function deprecated in glib, when there is another library providing a similar function. Which library should provide the function instead?
Comment 59 Benjamin Drung 2010-03-15 22:58:18 UTC
(In reply to comment #56)
> You don't fix the problem of people using inconsistent representations by
> giving them APIs to express all of them easily.

The three helper functions are internally. They are not exported. Or do you complain about the "type" parameter?

Should I rewrite the patch to provide these two functions?
g_format_file_size_for_display(size)
g_format_transfer_rate_for_display(transfer_rate)
Comment 60 Benjamin Otte (Company) 2010-03-15 23:06:38 UTC
I would consider that way superior to having an enum, because most users of that API would not know which value to use when. Having a clear statement in the function name helps a lot.
Comment 61 Christian Dywan 2010-03-16 09:28:46 UTC
I like the suggestion of functions that depend on the context, rather than specifying the unit. If there is g_format_transfer_rate_for_display, it is very clear where to use it, nobody has to bother reading up on specs. g_format_size_for_display could remain the default choice in case there is no contextual variant.
Comment 62 Benjamin Drung 2010-03-16 09:35:27 UTC
Keeping g_format_size_for_display will lead to problems if applications uses input values. Transmission uses hardcoded "KB/s" (which is actually "KiB/s") for input values. When we change g_format_size_for_display to use either the SI or IEC standard, transmission will be inconsistent. Therefore we should deprecate g_format_size_for_display and introduce a new one. I called the new one g_format_file_size_for_display. Better names are welcome.
Comment 63 Christian Dywan 2010-03-16 10:33:35 UTC
My idea here is, the documentation could explain specific functions for an appropriate context. But if the developer encounters a case that is missing, he would need something else. And a possibly incorrect display is still better than using the wrong one only to 'make it compile with deprecation flags'.

I do like the name g_format_file_size_for_display very much personally.
Comment 64 Benjamin Drung 2010-03-16 11:25:42 UTC
Created attachment 156260 [details] [review]
units_draft2.patch

If developers encounter a missing case, they should use g_format_file_size_for_display. It is designed a replacement for g_format_size_for_display. A more general name for g_format_file_size_for_display is appreciated.

Attached the second draft. Your opinions?
Comment 65 Patryk Zawadzki 2010-03-16 11:39:45 UTC
How about *_disk_size_* and *_memory_size_*? It would clearly differentiate between the two main use cases.
Comment 66 Christian Dywan 2010-03-16 11:44:16 UTC
Review of attachment 156260 [details] [review]:

I think the functions should include one example of what the result can look like, since that is not obvious without seeing the source code.

You should leave one space between a function name and brackets.
Comment 67 Benjamin Drung 2010-03-16 11:47:19 UTC
(In reply to comment #65)
> How about *_disk_size_* and *_memory_size_*? It would clearly differentiate
> between the two main use cases.

Then we have three categories: *_file_size_*, *_disk_size_*, and *_memory_size_*. *_disk_size_* would use base-10 then.
Comment 68 Patryk Zawadzki 2010-03-16 12:33:55 UTC
(In reply to comment #67)
> Then we have three categories: *_file_size_*, *_disk_size_*, and
> *_memory_size_*. *_disk_size_* would use base-10 then.

Has it been decided that files should use base-2 units? If not, I'd suggest dropping *_file_size_* and recommend using *_disk_size_* for objects that live on disks.

Ideally gio or gvfs could provide foo_format_size_for_uri that would automatically detect the media type and use correct units for CD drives and whatnot.
Comment 69 Claude Paroz 2010-03-16 12:36:13 UTC
Review of attachment 156260 [details] [review]:

On an i18n point of view, avoid generally g_strconcat. E.g. g_strconcat(size, "/s", NULL) should better be something like g_strdup_printf(_(%s/s), size)
Comment 70 Benjamin Drung 2010-03-16 13:27:30 UTC
Do we all agree on these three statements:

* base-2 units must use the IEC standard (Ki, Mi, etc.)
* base-10 units must use the SI standard (k, M, etc.)
* it's not allowed to use SI standard for base-2

The question is which base to use for which context. Some decisions are easy:

* RAM sizes are base-2 (2 GiB RAM)
* disk sizes are base-10 (500 GB hard drive, 4 GB flash disk)

File sizes can be in base-2 or base-10. Therefore there should be at least a configure option to set the preferred base (I called it BASE2_DEFAULT in the draft).

(In reply to comment #68)
> Has it been decided that files should use base-2 units? If not, I'd suggest
> dropping *_file_size_* and recommend using *_disk_size_* for objects that live
> on disks.

*_disk_size_* was designed for presenting the size of a disk, but not for objects that live on them.
Comment 71 Martin Pitt 2010-03-16 13:45:29 UTC
(In reply to comment #70)

> * disk sizes are base-10 (500 GB hard drive, 4 GB flash disk)
> 
> File sizes can be in base-2 or base-10. 
> [...]
> *_disk_size_* was designed for presenting the size of a disk, but not for
> objects that live on them.

Erm, this bug was originally intended to _simplify_ the units chaos. Why on earth would file sizes be displayed in base-2 while disk sizes are in base-10?

I do appreciate that we need to live with RAM size being in base-2 (and system-monitor does that well), but everything else should just use what humans with ten fingers have used for five centuries, please.
Comment 72 Matthias Clasen 2010-03-16 18:10:35 UTC
> Do we all agree on these three statements:


No, we don't.
Comment 73 Benjamin Drung 2010-03-16 18:18:35 UTC
So you want KB, MB, GB, and so on for base-2 units? Should everything be in base-2 for you?
Comment 74 Charles Kerr 2010-03-16 18:57:36 UTC
(In reply to comment #62)
> Keeping g_format_size_for_display will lead to problems if applications uses
> input values. Transmission uses hardcoded "KB/s" (which is actually "KiB/s")
> for input values. When we change g_format_size_for_display to use either the SI
> or IEC standard, transmission will be inconsistent. Therefore we should
> deprecate g_format_size_for_display and introduce a new one. I called the new
> one g_format_file_size_for_display. Better names are welcome.

I like this suggestion, as well as the variants for memory and bandwidth suggested downthread.

We still need a mechanism to handle user input.  In the case of Transmission that you cited, The preferences dialog says something along the lines of 

    Upload speed limit (KB/s):     [ 100    ]
    Download speed limit (KB/s):   [ 100    ]

How should this be revised to play nicely with *both* Ubuntu's Units Policy and distributions using other units?  Obviously the spinbox could be set with the value of g_format_bandwidth_size_for_display()... but how does Transmission know what to put in "Upload speed limit %s/s"?
Comment 75 Benjamin Otte (Company) 2010-03-16 19:28:58 UTC
Just some notes (all IMNSHO):
1) Using g_format_size_for_display() for transfer rates is most likely a bug.
2) glib is supposed to provide APIs for common things. The need to display RAM sizes is not common and therefore should be done by the apps that need them instead of in glib.
3) file sizes and disk sizes are actually the same thing. Calling them "sizes" and using a function named something like g_format_size_for_display() makes a lot of sense to me.
4) It's easier to fix the apps that assume base-2 then port all current apps to a new function because the old one was deprecated.

With that reasoning I arrive at something that is very close to Martin's patch.
Comment 76 Charles Kerr 2010-03-16 19:32:27 UTC
(In reply to comment #70)
> Do we all agree on these three statements:
> 
> * base-2 units must use the IEC standard (Ki, Mi, etc.)
> * base-10 units must use the SI standard (k, M, etc.)
> * it's not allowed to use SI standard for base-2

You know full well there's no agreement. ;)

You said last week in your blog that there's no agreement.
<http://overbenny.wordpress.com/2010/03/10/ubuntu-units-policy/>

There was no agreement when you asked the ubuntu desktop devel mailing list in May 2009, spawning a 50+ post flame thread.
<https://lists.ubuntu.com/archives/ubuntu-devel-discuss/2009-May/008277.html>

There was no agreement when you asked the gtk+ devel mailing list in June 2009, which got you accused of wanting to "have the same bikeshed argument that you've had over on the Ubuntu list here on GNOME's list".
<http://mail.gnome.org/archives/gtk-devel-list/2009-June/msg00021.html>

What is the harm in letting distros and users use a GConf setting to define what units their system uses?
Comment 77 Ondra Pelech 2010-03-16 21:37:52 UTC
i agree, that moving things to base-10 (however it might cause a little inconsistency with other OSes: people would see other numbers in GNOME, than in Windows) is a very smart idea.

but why not convert even RAM (although it always comes in numbers based by 2) to base-10? it would be consistent with everything else seen in GNOME and would make life easier for common non-technical (not used to 1024) users.
Comment 78 Ondra Pelech 2010-03-16 21:45:03 UTC
(In reply to comment #76)
> What is the harm in letting distros and users use a GConf setting to define
> what units their system uses?

The current situation is just very messy and confusing. Any other possible units policy would be better than the one we have today. If you don't feel the need for change, please at least don't discourage other people who are willing to spare their free time to make this very needed change. thanks.
Comment 79 simon80 2010-03-16 22:15:52 UTC
Given the lack of consensus on this, perhaps the only solution suitable for everyone would be to have input and output formatting functions for such categories as disk, memory, and  network sizes, and then allow gconf to control whether base-10 or base-2 is used for each.
Comment 80 Benjamin Drung 2010-03-16 22:36:05 UTC
Would everybody be happy with one of these three options:

(1) everything in base-10 with SI prefixes, except RAM sizes in base-2 with IEC prefixes (following Ubuntu's units policy)
(2) everything in base-2 with IEC prefixes, except disk labels in base-10 with SI prefixes
(3) everything in base-2 with K, M, G prefixes (current behavior of g_format_size_for_display)

Is there anybody who wants another implementation than of of these?

We can create a configure flag to select one behavior. Having a user setting would be even better.

@Charles: Yes, the input/output function are not yet implemented. I am thinking about something like this:

char *get_file_size_unit(int index);
char *get_tranfer_rate_unit(int index);
int *input_file_size_to_bytes(int index, double input_value);
int *input_tranfer_rate_to_bytes(int index, double input_value);

If case (1) is selected, the output of these functions will be:
get_file_size_unit(0) = "byte(s)"
get_file_size_unit(1) = "kB"
get_file_size_unit(2) = "MB"
get_tranfer_rate_unit(1) = "kB/s"
input_file_size_to_bytes(0, 50) = 50
input_file_size_to_bytes(1, 50) = 50000
input_file_size_to_bytes(2, 42) = 42,000,000
input_tranfer_rate_to_bytes(1, 10) = 10000

Is glib the right place for these functions or should the whole unit conversion put into a separate library? There were two post indicating that glib is the wrong place. Are there more people voting for having it in glib?
Comment 81 Charles Kerr 2010-03-16 22:52:55 UTC
(In reply to comment #78)
> (In reply to comment #76)
> > What is the harm in letting distros and users use a GConf setting to define
> > what units their system uses?
> 
> The current situation is just very messy and confusing. Any other possible
> units policy would be better than the one we have today. If you don't feel the
> need for change, please at least don't discourage other people who are willing
> to spare their free time to make this very needed change. thanks.

You didn't read what I wrote.

I'm discouraging the proposal to swap one hard-coded policy for another.

I'm encouraging a glib mechanism that lets applications work well with distros/users who choose a policy.
Comment 82 Charles Kerr 2010-03-16 23:02:09 UTC
(In reply to comment #80)
> Would everybody be happy with one of these three options:
> 
> (1) everything in base-10 with SI prefixes, except RAM sizes in base-2 with IEC
> prefixes (following Ubuntu's units policy)
> (2) everything in base-2 with IEC prefixes, except disk labels in base-10 with
> SI prefixes
> (3) everything in base-2 with K, M, G prefixes (current behavior of
> g_format_size_for_display)

I think that would cover all the arguments that I've seen.  Those are the three cases I coded in my first draft Transmission patch ("iec", "si", "traditional").

Possibly get_file_size_unit() and get_transfer_rate_unit() should return const char*.

Possibly the "index" arguments should take an enum for clarity.

As for whether glib is the right place for this... that's what my vote would be for, but it's a call for the glib maintainers.
Comment 83 Benjamin Drung 2010-03-17 00:00:46 UTC
(In reply to comment #82)
> > Possibly get_file_size_unit() and get_transfer_rate_unit() should return const
> char*.

Yes, const char* is the correct return type.

> Possibly the "index" arguments should take an enum for clarity.

A enum would be a good idea. Do you have a good name for it? "Thousand" does not fit very well.

> As for whether glib is the right place for this... that's what my vote would be
> for, but it's a call for the glib maintainers.

There are pros and cons. Arguments for having it in glib:

* it's a central place for a GTK+ application
* no extra library is required

Argument for putting it in a separate library:

* it can/will be used by non-glib applications (goal: one library for all desktop environments)
* it can be made user configurable (glib won't add a dependency on gconf for only one variable)

Glib developers, what's your opinion? Do you want these functions (currently 8) in glib or should it be put into a separate library?
Comment 84 Charles Kerr 2010-03-17 14:32:17 UTC
http://bugs.kde.org/show_bug.cgi?id=57240 has some interesting discussion on how this issue was handled in KDE.  The conlusion they came to has some parallels with this discussion:

  > Revision 996857 adds support for JEDEC units and metric units in
  > addition to the IEC units already supported.  (JEDEC is the standard
  > for the "traditional" units used in KDE 3.5.  I was as surprised to
  > find out they actually were standardized as anyone else...).
  >
  > Add the BinaryUnitDialect=0 option to the [Locale] group of your
  > $KDEHOME/share/config/kdeglobals file.
  >
  > setting to 0 (or leaving unset, of course) gives you the current
  > default of IEC units since they are unambiguous and in accordance
  > with the current revision of the relevant standards documents. 
  > Setting to 1 gives you JEDEC units.  Setting to 2 gives you metric
  > (yes, real powers-of-10, lowercase k instead of K, metric units).
  >
  > Everyone has what they want, unless what you want is to keep someone
  > else from having what they want.  There will be no GUI.
Comment 85 Benjamin Drung 2010-03-26 13:45:15 UTC
I wrote a blog post [1]. I want to hear your opinions: Do you want a new set of functions for size formatting in glib or not?

[1] http://overbenny.wordpress.com/2010/03/26/how-to-get-units-consistent-across-all-applications/
Comment 86 David Zeuthen (not reading bugmail) 2010-03-26 17:23:01 UTC
I think that two things are needed

 - a section in the GNOME HIG for guidance of what units to use.

   Maybe the HIG should say

   - that SI units should be preferred but when appropriate, the
     developer should use whatever makes sense. E.g. for a Disk Utility
     application, the only meaningful thing is to use IEC when displaying
     the stripe size since it's always a power of two. (See e.g. screenshots
     in comment 34)

   - if you have room, spell out the size every time you are using
     units - e.g. prefer "250 GB (250,000,123,456 bytes)" and "250 GB"
     and "4 MiB (4194304)" to "4 MiB". If room permits.

   with emphasis on "maybe". E.g. if alexl were to write this part of
   the HIG instead of me maybe he would choose defaulting to JEDAC
   instead of SI. Etc. etc.

 - Deprecate g_format_size_for_display()

 - New API in GLib

   gchar *g_format_size (gsize size, GFormatFlags flags);
   gboolean g_parse_size (const gchar *str,
                          gsize        *out_size,
                          GFormatFlags *out_unit);

   typedef enum {
     G_FORMAT_FLAGS_SI = (1<<0),
     G_FORMAT_FLAGS_IEC = (1<<1),
     G_FORMAT_FLAGS_JEDAC = (1<<2),
     G_FORMAT_FLAGS_LONG_OUTPUT = (1<<3),
   } GFormatFlags;

   which I hope the latter is self-explanatory. The former can take
   LONG_OUTPUT to return e.g. "250 GB (250,000,123,456 bytes)".

Anyway, the details here are not too important. The main points here are that

 - GLib becomes policy-free; and 

 - Developers get guidance
Comment 87 Patryk Zawadzki 2010-03-26 18:08:13 UTC
David:

I'd say glib should provide the generic function as proposed but applications should not call it directly. GNOME should implement (or accept as a blessed dep) a library that provides:

1) foo_format_file_size, foo_format_partition_size etc.
2) a policy that maps the above to the appropriate glib calls (could be chosen at compile time or determined basing on ENV or derived from the current moon phase, you get the idea)

This way each distro or even each admin can override all the units in a consistent way without hacking all the applications.
Comment 88 Benjamin Otte (Company) 2010-03-26 19:10:35 UTC
(In reply to comment #87)
> This way each distro or even each admin can override all the units in a
> consistent way without hacking all the applications.
> 
I still fail to understand why people think that it's a good idea that my work computer, my home computer and my friend's computer all report different values for the same USB stick.
Comment 89 Patryk Zawadzki 2010-03-26 19:44:32 UTC
Benjamin:

Me neither but it seems it's the only way we can resolve this bug without some major distro flipping us a finger.
Comment 90 Krzysztof Klimonda 2010-03-26 19:53:53 UTC
But is it really going to be resolved? The idea behind the spec was to reduce confusion. If the displayed size is going to vary from distribution to distribution (and from installation to installation) all we achieve is changing what people are confused about. I can already imagine people complain "but this file was only 100M on my friend's computer and now it's over 110M on mine - is it infected by some virus?".
Yes, if we force one definition (after the consensus what the "right" definition is is achieved) some exotic distribution may disagree and patch it but should we care about small minority if we can make it standard across bigger players?
Comment 91 simon80 2010-03-26 21:45:52 UTC
Yes, it is clearly better for the chosen units to be unambiguously suffixed, but potentially inconsistent between systems, than for them to be ambiguously suffixed and definitely inconsistent within the same dialog.

Understanding the difference between MiBs and MBs is pretty much like understanding the difference between Imperial and SI units, and yet it would be absurd to try to get all users and distributions to accept the use of only SI or only Imperial for displaying arbitrary physical quanitities. I think the suggestions from comments 86 and 87 are the way to go here.
Comment 92 Benjamin Drung 2010-03-27 01:21:33 UTC
(In reply to comment #87)
> I'd say glib should provide the generic function as proposed but applications
> should not call it directly. GNOME should implement (or accept as a blessed
> dep) a library that provides:

Why not having everything in this library?

Due to all comments in the forum, blog posts, and bug reports, the library must be configurable to the user preferences. Some prefer base-10, others base-2 (similar to Imperial vs. SI).
Comment 93 Patryk Zawadzki 2010-03-27 09:02:13 UTC
(In reply to comment #92)
> Why not having everything in this library?

I was under the impression that glib itself (or gio) required the basic implementation. I don't think we should make glib depend on some libfilesize.
Comment 94 Philip Withnall 2010-03-27 09:28:51 UTC
(In reply to comment #89)
> Benjamin:
> 
> Me neither but it seems it's the only way we can resolve this bug without some
> major distro flipping us a finger.

Then they can apply their own patches when packaging things, same as it's always worked. As long as GLib ships a sensible, reasoned default, I see no problems.
Comment 95 Benjamin Drung 2010-03-28 00:04:22 UTC
(In reply to comment #93)
> I was under the impression that glib itself (or gio) required the basic
> implementation. I don't think we should make glib depend on some libfilesize.

I though about a standalone library. glib should not depend on this library, but the file size presenting applications.
Comment 96 Christian Dywan 2010-03-28 02:27:09 UTC
(In reply to comment #95)
> (In reply to comment #93)
> > I was under the impression that glib itself (or gio) required the basic
> > implementation. I don't think we should make glib depend on some libfilesize.
> 
> I though about a standalone library. glib should not depend on this library,
> but the file size presenting applications.

So every second or third app will depend on that library, which does little more than formatting sizes? Seems questionable to me. I'd rather suggest Glib contains a way to format base 2 and base 10 so that translations are done once and shared, and then applications may decide to use the results correctly.
Comment 97 Benjamin Drung 2010-04-01 00:52:10 UTC
I watched the responses to Ubuntu's units policy (defaulting to use base 10). There are many people, who prefer base-2 over base-10. Therefore we need a way to change the behavior of the functions. There should be a distribution default (configure flag on compile time), a system default, and a user setting.

Because we need an user configuration, I prefer a separate library over glib or do you want that in glib too?
Comment 98 Benjamin Drung 2010-04-01 00:54:12 UTC
I watched the responses to Ubuntu's units policy (defaulting to use base 10). There are many people, who prefer base-2 over base-10. Therefore we need a way to change the behavior of the functions. There should be a distribution default (configure flag on compile time), a system default, and a user setting.

Because we need an user configuration, I prefer a separate library over glib or do you want that in glib too?
Comment 99 Benjamin Otte (Company) 2010-04-01 07:15:58 UTC
(In reply to comment #98)
> There are many people, who prefer base-2 over base-10. Therefore we need a way
> to change the behavior of the functions.
>
That reasoning is wrong.
Especially because such a preference meets all of Havoc's arguments about not adding them at http://ometer.com/free-software-ui.html and the "users" arguing about it meet http://www.freebsd.org/cgi/getmsg.cgi?fetch=506636+517178+/usr/local/www/db/text/1999/freebsd-hackers/19991003.freebsd-hackers
Comment 100 Colin Walters 2011-06-02 19:50:58 UTC
*** Bug 640432 has been marked as a duplicate of this bug. ***
Comment 101 Allison Karlitskaya (desrt) 2011-07-20 08:37:41 UTC
Over the past while I've found myself coming around to the idea of base-10 units.  It's just _correct_.  We can't really argue against the absolutely pervasive use of the metric system, and we further cannot argue against the adopted conventions of every single storage manufacturer on the planet.

My only concern is the one that David raises about how things like stripe sizes are better displayed in KiB units.  We should clearly use those units (properly) for those cases.  Perhaps that involves the creation of another function or perhaps those programs should just do it for themselves.
Comment 102 Allison Karlitskaya (desrt) 2011-07-20 18:11:16 UTC
David and I came to a concensus on IRC.  Benjamin agreed and Matthias indicated that he was happy to accept whatever conclusion David and I came up with.

I just pushed 3 patches to GLib master which implement the decision.  The important one is this:

commit afd1e3697065c1bd23fe9a1cacf43d8744d0bc9b
Author: Ryan Lortie <desrt@desrt.ca>
Date:   Wed Jul 20 19:44:39 2011 +0200

    Change GLib size units policy
    
    This commit changes GLib size units policy.  We now prefer SI units and
    allow for use of proper IEC units where desired.
    
    g_format_size_for_display() which incorrectly mixed IEC units with SI
    suffixes is left unmodified, but has been deprecated.
    
    g_format_size() has been introduced which uses SI units and suffixes.
    
    g_format_size_full() has also been added which takes a flags argument to
    allow for use of IEC units (with correct suffixes).  It also allows for
    a "long format" output which includes the total number of bytes.  For
    example: "238.5 MB (238,472,938 bytes)".
Comment 103 Peter Bloomfield 2011-07-20 20:46:43 UTC
Congrtulations on this return to sanity!  Long awaited...
Comment 104 Martin Pitt 2011-07-21 04:17:32 UTC
Thank you Ryan! Very nice.