After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 632049 - docs: not immediately clear what g_variant_get_fixed_array expects
docs: not immediately clear what g_variant_get_fixed_array expects
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: gvariant
2.26.x
Other Linux
: Normal minor
: ---
Assigned To: Allison Karlitskaya (desrt)
gtkdev
Depends on:
Blocks:
 
 
Reported: 2010-10-13 11:30 UTC by Simon McVittie
Modified: 2011-10-04 12:20 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
GVariant docs: be clear that the serialization format isn't the same as D-Bus (963 bytes, patch)
2010-10-13 12:25 UTC, Simon McVittie
none Details | Review
g_variant_get_fixed_array: document which types are appropriate (2.46 KB, patch)
2010-10-13 12:25 UTC, Simon McVittie
none Details | Review
Fix some dangling cross-references in GVariant (3.00 KB, patch)
2010-10-13 12:25 UTC, Simon McVittie
none Details | Review
[1/7] GVariant docs: be clear that the serialisation format isn't the same as D-Bus (989 bytes, patch)
2011-10-03 14:10 UTC, Simon McVittie
none Details | Review
[2/7] g_variant_get_fixed_array: document which types are appropriate (2.48 KB, patch)
2011-10-03 14:11 UTC, Simon McVittie
none Details | Review
[3/7] Escape percent sign in g_variant_new_parsed documentation (952 bytes, patch)
2011-10-03 14:11 UTC, Simon McVittie
none Details | Review
[4/7] Fix a typo in describing GVariant fixed array functions (1.40 KB, patch)
2011-10-03 14:12 UTC, Simon McVittie
none Details | Review
[5/7] g_variant_new_from_data: mention that the caller might need to byteswap (1.04 KB, patch)
2011-10-03 14:12 UTC, Simon McVittie
none Details | Review
[6/7] g_variant_get_data: mention what you need to know to deserialise (1.82 KB, patch)
2011-10-03 14:12 UTC, Simon McVittie
none Details | Review
[7/7] In the GVariant intro, mention lack of built-in endianness, and rationale (1.46 KB, patch)
2011-10-03 14:13 UTC, Simon McVittie
none Details | Review

Description Simon McVittie 2010-10-13 11:30:05 UTC
g_variant_get_fixed_array doesn't tell you what type you'll get; it's pretty obvious for most types, but not at all obvious for booleans (which are one boolean per byte, i.e. not gboolean).

Also, an insufficiently observant reader (like me, apparently) could easily think that GVariant's serialization format is meant to be the same as D-Bus'; it isn't.

Patches on the way.
Comment 1 Simon McVittie 2010-10-13 12:25:01 UTC
Created attachment 172254 [details] [review]
GVariant docs: be clear that the serialization format isn't the same as D-Bus
Comment 2 Simon McVittie 2010-10-13 12:25:27 UTC
Created attachment 172255 [details] [review]
g_variant_get_fixed_array: document which types are appropriate
Comment 3 Simon McVittie 2010-10-13 12:25:43 UTC
Created attachment 172256 [details] [review]
Fix some dangling cross-references in GVariant
Comment 4 David Zeuthen (not reading bugmail) 2010-10-13 15:00:14 UTC
The fact that booleans are encoded as one byte is going to make it hard to optimize gdbus for arrays of scalar types (right now we're not being very clever). Not that I think it matters much (people are more like to use arrays of byte or int32)... but worth noting anyway.
Comment 5 Simon McVittie 2010-10-14 11:52:41 UTC
You'll have to have a slow path for arrays of boolean, which as you say won't be hit in practice. I think all the other arrays of fixed-length scalar are the same as on D-Bus, though, unless D-Bus has even weirder padding semantics than I remember?

My interest in this is that I've just merged a branch into dbus-glib to convert GVariant into dbus-glib's GValue representation, so we can start migrating telepathy-glib's API to GVariant before we actually break ABI to switch to GDBus (the idea is that the API user passes in a GVariant, telepathy-glib converts it to the types it always used to use in order to do the D-Bus call, and that particular API call can remain the same after moving to GDBus).

In that branch, I've used a fast path for types that are the same size in a dbus-glib GArray as in g_variant_get_fixed_array (effectively just memcpy them into the array as-is), and a very slow path via dbus-glib's specialized-container strangeness for anything that differs in size. Currently that means boolean and [u]int16 go via the slow path; in principle it might also be used for [u]int32 if we're ever ported to a platform where int is 64-bit.
Comment 6 David Zeuthen (not reading bugmail) 2010-10-18 14:58:29 UTC
(In reply to comment #5)
> You'll have to have a slow path for arrays of boolean, which as you say won't
> be hit in practice. I think all the other arrays of fixed-length scalar are the
> same as on D-Bus, though, unless D-Bus has even weirder padding semantics than
> I remember?
> 
> My interest in this is that I've just merged a branch into dbus-glib to convert
> GVariant into dbus-glib's GValue representation, so we can start migrating
> telepathy-glib's API to GVariant before we actually break ABI to switch to
> GDBus (the idea is that the API user passes in a GVariant, telepathy-glib
> converts it to the types it always used to use in order to do the D-Bus call,
> and that particular API call can remain the same after moving to GDBus).
> 
> In that branch, I've used a fast path for types that are the same size in a
> dbus-glib GArray as in g_variant_get_fixed_array (effectively just memcpy them
> into the array as-is), and a very slow path via dbus-glib's
> specialized-container strangeness for anything that differs in size. Currently
> that means boolean and [u]int16 go via the slow path; in principle it might
> also be used for [u]int32 if we're ever ported to a platform where int is
> 64-bit.

Sounds about right - but only if the endianness is the same!
Comment 7 Simon McVittie 2010-10-18 15:10:04 UTC
(In reply to comment #6)
> Sounds about right - but only if the endianness is the same!

That's true, although byteswapping the data is somewhat orthogonal to converting word sizes, and could happen as a second "possibly-slow-path" step.

Am I right in thinking that g_variant_get_fixed_array() always returns data in the current machine's native endianness, but GDBusMessage doesn't necessarily?

Do the patches I've attached here look correct/useful to merge?
Comment 8 David Zeuthen (not reading bugmail) 2010-10-18 15:43:55 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > Sounds about right - but only if the endianness is the same!
> 
> That's true, although byteswapping the data is somewhat orthogonal to
> converting word sizes, and could happen as a second "possibly-slow-path" step.

True (and a good point).

> Am I right in thinking that g_variant_get_fixed_array() always returns data in
> the current machine's native endianness, 

Good question. I think so. At least g_variant_store() says that

 The stored data is in machine native byte order

which is a shame as it means that payload differs on little- and big-endian boxes. 

(Which basically means that GVariant payload cannot be installed in /usr/share as it can't be shared between archs.. unless you don't care about multilib.. e.g. consider MyCoolLibrary.srpm that has a MyCoolLibrary.x86_64.rpm, MyCoolLibrary.i686.rpm and MyCoolLibrary-shared.noarch.rpm .. typically for a multilib install the user wants to have both the x86_64 and i686 version installed. Anyway..)

Also g_variant_new_from_data() doesn't say if the data needs to be in host byte order and if not if conversion happens and so on... hmm.. maybe it even affects what g_variant_get_fixed_array() returns... Ryan?

> but GDBusMessage doesn't necessarily?

See

 http://library.gnome.org/devel/gio/unstable/GDBusMessage.html#g-dbus-message-to-blob
 http://library.gnome.org/devel/gio/unstable/GDBusMessage.html#g-dbus-message-new-from-blob

FWIW, you can use g_dbus_message_set_byte_order() to get a blob of the byte order you want.

> Do the patches I've attached here look correct/useful to merge?

They definitely look useful to me.
Comment 9 Allison Karlitskaya (desrt) 2010-10-18 17:03:17 UTC
There is g_variant_byteswap() that byteswaps each numeric value contained inside of a GVariant* to produce an equally-valid GVariant* (but with the numbers changed around).

g_variant_store()ing the result of that gives you something that you can g_variant_new_from_data() on a system of the "other" endian to get the original result.

This works because GVariant itself always uses little endian internally.
Comment 10 Simon McVittie 2010-10-18 18:10:49 UTC
OK, now I'm confused. desrt, could you please answer these, and I'll write patches to add the answers to the docs:

Does g_variant_get_fixed_array give you a pointer to little-endian data, native-endian data (i.e. LE on x86, BE on ppc etc.), or data whose endianness is determined in some other way?

Does g_variant_store() write LE data, native-endian data, or data whose endianness is determined in some other way?

If I call g_variant_new_from_data(), should I give it LE or native-endian input?
Comment 11 Allison Karlitskaya (desrt) 2010-10-18 23:00:54 UTC
I'll give a couple of clarifying examples.

Consider an array of int16 ('an') as the most simple type where this is important.  An array of int16 is represented in GVariant exactly as it is in C: a number of bytes that is a multiple of two.  The length of the array is apparent from its size in bytes.  Those bytes are in "native" order (ie: they appear in memory in the same order as the naive C programmer would expect).

Now consider an array of arrays of int16 ('aan').  We can't just have that as a raw bunch of int16s since we don't know where the boundaries between the sub-arrays are.  Those boundaries ("offsets") are stored in the serialised data as integers, of course.  The byte order of those integers is *always* *always* little-endian.

That means that if you take a GVariant that was created on a big endian system (big endian integers, little endian offsets) and load it onto a little endian system (little endian integers, little endian offsets) the offsets will be interpreted in the same way but the integers of the contents (ie: the actual elements of the array) will be byteswapped.

This means that the structure of a GVariant is always valid regardless of what sort of system it is loaded on, but the multi-byte integer content may be messed up (in exactly the way you'd expect).

g_variant_byteswap() swaps the endianness of the multi-byte integer content without modifying endianness of the offsets (since those are always little endian).  Note that g_variant_byteswap() is actually implemented with a call to g_variant_ref() in the case that the data is known not to contain multi-byte integer data (ie: arrays of strings, etc).
Comment 12 David Zeuthen (not reading bugmail) 2010-10-19 00:01:06 UTC
So, ugh, this means that it's impossible to implement g_variant_get_endianness()? E.g. you need out-of-band data to store such things...
Comment 13 Simon McVittie 2010-10-19 09:51:04 UTC
Presumably, the rationale for not storing the endianness is that GVariant is a recursive type system, and in a sane format, endianness only needs to be stored once per blob of data (once per D-Bus message, once per file on disk, etc.). However, I think it's worth highlighting in the docs that the result of g_variant_store() has a specified endianness, and so can't go in an architecture-independent directory unless its endianness is indicated out-of-band.

I infer that the answers to my questions are:

(In reply to comment #10)
> Does g_variant_get_fixed_array give you a pointer to little-endian data,
> native-endian data (i.e. LE on x86, BE on ppc etc.), or data whose endianness
> is determined in some other way?

Native-endian.

> Does g_variant_store() write LE data, native-endian data, or data whose
> endianness is determined in some other way?

Native-endian, with the endianness not represented in the blob: so if you want to store a blob of GVariants in an architecture-independent way, you must also store the endianness out-of-band.

> If I call g_variant_new_from_data(), should I give it LE or native-endian
> input?

Native-endian. To find out what that is, test whether G_BYTE_ORDER is G_LITTLE_ENDIAN or G_BIG_ENDIAN.

G_PDP_ENDIAN is not supported by GVariant.

and some bonus questions:

Q: If I have a blob of GVariants in memory, how do I save them to disk in a way suitable for /usr/share or a potentially NFS /home?

A:
- Test whether G_BYTE_ORDER is G_LITTLE_ENDIAN or G_BIG_ENDIAN.
- Decide which endianness you want to write in: this is part of your serialization format. Your options are: always use BE, always use LE, or write a marker for the endianness followed by native-endian data.
- If you want to write out "wrong-endian" data for the current CPU (e.g. your convention is "always use LE" but you find that you're running on powerpc, or vice versa), g_variant_byteswap() it. This is insufficient for PDP endianness, but that's not supported anyway.

Q: If I have a blob of GVariants on my disk, how do I load it into memory in a form that makes sense?

A:
- Determine from out-of-band information which endianness it is. This might be a marker in the file, or it might be the knowledge that your app always writes out little-endian data, even on ppc.
- Read it.
- If it's "wrong-endian" for the current CPU, g_variant_byteswap() it, keep the byteswapped version, and throw away the original. Again, this is insufficient for PDP endianness, but that's not supported anyway.
Comment 14 David Zeuthen (not reading bugmail) 2010-10-19 14:54:53 UTC
(In reply to comment #13)
> Presumably, the rationale for not storing the endianness is that GVariant is a
> recursive type system, and in a sane format, endianness only needs to be stored
> once per blob of data (once per D-Bus message, once per file on disk, etc.).
> However, I think it's worth highlighting in the docs that the result of
> g_variant_store() has a specified endianness, and so can't go in an
> architecture-independent directory unless its endianness is indicated
> out-of-band.

Sure, sure, that's fine. What's missing is putting all the information gathered from all this detective work in this bug into the docs...
Comment 15 Simon McVittie 2011-10-03 14:10:46 UTC
Created attachment 198090 [details] [review]
[1/7] GVariant docs: be clear that the serialisation format isn't the same as D-Bus

---

Now with what appears to be GVariant's preferred spelling of "serialise".
Comment 16 Simon McVittie 2011-10-03 14:11:10 UTC
Created attachment 198091 [details] [review]
[2/7] g_variant_get_fixed_array: document which types are appropriate
Comment 17 Simon McVittie 2011-10-03 14:11:46 UTC
Created attachment 198092 [details] [review]
[3/7] Escape percent sign in g_variant_new_parsed documentation

Strings matching /%[a-z]/ are special syntax for gtk-doc.

---

The rest of Attachment #172256 [details] was already fixed.
Comment 18 Simon McVittie 2011-10-03 14:12:03 UTC
Created attachment 198093 [details] [review]
[4/7] Fix a typo in describing GVariant fixed array functions
Comment 19 Simon McVittie 2011-10-03 14:12:18 UTC
Created attachment 198094 [details] [review]
[5/7] g_variant_new_from_data: mention that the caller might need to byteswap
Comment 20 Simon McVittie 2011-10-03 14:12:38 UTC
Created attachment 198095 [details] [review]
[6/7] g_variant_get_data: mention what you need to know to deserialise

Also include a shorter version in the docs for g_variant_store, with a
pointer to g_variant_get_data.
Comment 21 Simon McVittie 2011-10-03 14:13:01 UTC
Created attachment 198096 [details] [review]
[7/7] In the GVariant intro, mention lack of built-in endianness, and rationale

Presumably, the rationale for not storing the endianness is that
GVariant is a recursive type system, and in a sane format, endianness
only needs to be stored once per blob of data (once per D-Bus message,
once per file on disk, etc.).
Comment 22 Simon McVittie 2011-10-03 14:17:24 UTC
(In reply to comment #8)
> Also g_variant_new_from_data() doesn't say if the data needs to be in host byte
> order and if not if conversion happens and so on...

Fixed. I put more details in g_variant_store() and g_variant_get_data() too, because those are the places where you're getting a string suitable for putting in a file, which you might naively assume was in a platform-independent byte order.

(In reply to comment #14)
> Sure, sure, that's fine. What's missing is putting all the information gathered
> from all this detective work in this bug into the docs...

I *think* I've covered everything now...

(In reply to comment #8)
> > Do the patches I've attached here look correct/useful to merge?
> 
> They definitely look useful to me.

Please review / consider merging?
Comment 23 Allison Karlitskaya (desrt) 2011-10-03 16:00:10 UTC
Review of attachment 198092 [details] [review]:

::: glib/gvariant-parser.c
@@ +2488,3 @@
  * You may not use this function to return, unmodified, a single
  * #GVariant pointer from the argument list.  ie: @format may not solely
+ * be anything along the lines of "%*", "%?", "%r", or anything starting

gtk-doc likes \%
Comment 24 Allison Karlitskaya (desrt) 2011-10-03 16:02:15 UTC
Review of attachment 198095 [details] [review]:

::: glib/gvariant-core.c
@@ +828,3 @@
+ * %G_VARIANT_TYPE_VARDICT and it is always in little-endian order") or
+ * explicitly (by storing the type and/or endianness in addition to the
+ * serialised data).

perhaps mention that a reasonable strategy is boxing inside a variant (and then you know that the type of the file is always 'v').
Comment 25 Allison Karlitskaya (desrt) 2011-10-03 16:03:15 UTC
These all look great, Simon.  Just a couple of tweaks, and then feel free to push to both master and glib-2-30.
Comment 26 Simon McVittie 2011-10-04 12:20:39 UTC
Fixed in master for 2.31.0 (commits up to 7973d9f8bac34fa) and in glib-2-30 for 2.30.1 (commits up to 04da052c1e8710).

(In reply to comment #24)
> Review of attachment 198095 [details] [review]:
> 
> ::: glib/gvariant-core.c
> @@ +828,3 @@
> + * %G_VARIANT_TYPE_VARDICT and it is always in little-endian order") or
> + * explicitly (by storing the type and/or endianness in addition to the
> + * serialised data).
> 
> perhaps mention that a reasonable strategy is boxing inside a variant (and then
> you know that the type of the file is always 'v').

I changed the word VARDICT to VARIANT to hint at this strategy.

Perhaps this deserves a more tutorial-style section, "Using GVariant in file formats and network protocols" or something, but I'm not going to write that right now (and you have more experience with that anyway); I'm more interested in documenting hard requirements than best practices at the moment.

(In reply to comment #23)
> gtk-doc likes \%

Fixed.