GNOME Bugzilla – Bug 632049
docs: not immediately clear what g_variant_get_fixed_array expects
Last modified: 2011-10-04 12:20:39 UTC
g_variant_get_fixed_array doesn't tell you what type you'll get; it's pretty obvious for most types, but not at all obvious for booleans (which are one boolean per byte, i.e. not gboolean). Also, an insufficiently observant reader (like me, apparently) could easily think that GVariant's serialization format is meant to be the same as D-Bus'; it isn't. Patches on the way.
Created attachment 172254 [details] [review] GVariant docs: be clear that the serialization format isn't the same as D-Bus
Created attachment 172255 [details] [review] g_variant_get_fixed_array: document which types are appropriate
Created attachment 172256 [details] [review] Fix some dangling cross-references in GVariant
The fact that booleans are encoded as one byte is going to make it hard to optimize gdbus for arrays of scalar types (right now we're not being very clever). Not that I think it matters much (people are more like to use arrays of byte or int32)... but worth noting anyway.
You'll have to have a slow path for arrays of boolean, which as you say won't be hit in practice. I think all the other arrays of fixed-length scalar are the same as on D-Bus, though, unless D-Bus has even weirder padding semantics than I remember? My interest in this is that I've just merged a branch into dbus-glib to convert GVariant into dbus-glib's GValue representation, so we can start migrating telepathy-glib's API to GVariant before we actually break ABI to switch to GDBus (the idea is that the API user passes in a GVariant, telepathy-glib converts it to the types it always used to use in order to do the D-Bus call, and that particular API call can remain the same after moving to GDBus). In that branch, I've used a fast path for types that are the same size in a dbus-glib GArray as in g_variant_get_fixed_array (effectively just memcpy them into the array as-is), and a very slow path via dbus-glib's specialized-container strangeness for anything that differs in size. Currently that means boolean and [u]int16 go via the slow path; in principle it might also be used for [u]int32 if we're ever ported to a platform where int is 64-bit.
(In reply to comment #5) > You'll have to have a slow path for arrays of boolean, which as you say won't > be hit in practice. I think all the other arrays of fixed-length scalar are the > same as on D-Bus, though, unless D-Bus has even weirder padding semantics than > I remember? > > My interest in this is that I've just merged a branch into dbus-glib to convert > GVariant into dbus-glib's GValue representation, so we can start migrating > telepathy-glib's API to GVariant before we actually break ABI to switch to > GDBus (the idea is that the API user passes in a GVariant, telepathy-glib > converts it to the types it always used to use in order to do the D-Bus call, > and that particular API call can remain the same after moving to GDBus). > > In that branch, I've used a fast path for types that are the same size in a > dbus-glib GArray as in g_variant_get_fixed_array (effectively just memcpy them > into the array as-is), and a very slow path via dbus-glib's > specialized-container strangeness for anything that differs in size. Currently > that means boolean and [u]int16 go via the slow path; in principle it might > also be used for [u]int32 if we're ever ported to a platform where int is > 64-bit. Sounds about right - but only if the endianness is the same!
(In reply to comment #6) > Sounds about right - but only if the endianness is the same! That's true, although byteswapping the data is somewhat orthogonal to converting word sizes, and could happen as a second "possibly-slow-path" step. Am I right in thinking that g_variant_get_fixed_array() always returns data in the current machine's native endianness, but GDBusMessage doesn't necessarily? Do the patches I've attached here look correct/useful to merge?
(In reply to comment #7) > (In reply to comment #6) > > Sounds about right - but only if the endianness is the same! > > That's true, although byteswapping the data is somewhat orthogonal to > converting word sizes, and could happen as a second "possibly-slow-path" step. True (and a good point). > Am I right in thinking that g_variant_get_fixed_array() always returns data in > the current machine's native endianness, Good question. I think so. At least g_variant_store() says that The stored data is in machine native byte order which is a shame as it means that payload differs on little- and big-endian boxes. (Which basically means that GVariant payload cannot be installed in /usr/share as it can't be shared between archs.. unless you don't care about multilib.. e.g. consider MyCoolLibrary.srpm that has a MyCoolLibrary.x86_64.rpm, MyCoolLibrary.i686.rpm and MyCoolLibrary-shared.noarch.rpm .. typically for a multilib install the user wants to have both the x86_64 and i686 version installed. Anyway..) Also g_variant_new_from_data() doesn't say if the data needs to be in host byte order and if not if conversion happens and so on... hmm.. maybe it even affects what g_variant_get_fixed_array() returns... Ryan? > but GDBusMessage doesn't necessarily? See http://library.gnome.org/devel/gio/unstable/GDBusMessage.html#g-dbus-message-to-blob http://library.gnome.org/devel/gio/unstable/GDBusMessage.html#g-dbus-message-new-from-blob FWIW, you can use g_dbus_message_set_byte_order() to get a blob of the byte order you want. > Do the patches I've attached here look correct/useful to merge? They definitely look useful to me.
There is g_variant_byteswap() that byteswaps each numeric value contained inside of a GVariant* to produce an equally-valid GVariant* (but with the numbers changed around). g_variant_store()ing the result of that gives you something that you can g_variant_new_from_data() on a system of the "other" endian to get the original result. This works because GVariant itself always uses little endian internally.
OK, now I'm confused. desrt, could you please answer these, and I'll write patches to add the answers to the docs: Does g_variant_get_fixed_array give you a pointer to little-endian data, native-endian data (i.e. LE on x86, BE on ppc etc.), or data whose endianness is determined in some other way? Does g_variant_store() write LE data, native-endian data, or data whose endianness is determined in some other way? If I call g_variant_new_from_data(), should I give it LE or native-endian input?
I'll give a couple of clarifying examples. Consider an array of int16 ('an') as the most simple type where this is important. An array of int16 is represented in GVariant exactly as it is in C: a number of bytes that is a multiple of two. The length of the array is apparent from its size in bytes. Those bytes are in "native" order (ie: they appear in memory in the same order as the naive C programmer would expect). Now consider an array of arrays of int16 ('aan'). We can't just have that as a raw bunch of int16s since we don't know where the boundaries between the sub-arrays are. Those boundaries ("offsets") are stored in the serialised data as integers, of course. The byte order of those integers is *always* *always* little-endian. That means that if you take a GVariant that was created on a big endian system (big endian integers, little endian offsets) and load it onto a little endian system (little endian integers, little endian offsets) the offsets will be interpreted in the same way but the integers of the contents (ie: the actual elements of the array) will be byteswapped. This means that the structure of a GVariant is always valid regardless of what sort of system it is loaded on, but the multi-byte integer content may be messed up (in exactly the way you'd expect). g_variant_byteswap() swaps the endianness of the multi-byte integer content without modifying endianness of the offsets (since those are always little endian). Note that g_variant_byteswap() is actually implemented with a call to g_variant_ref() in the case that the data is known not to contain multi-byte integer data (ie: arrays of strings, etc).
So, ugh, this means that it's impossible to implement g_variant_get_endianness()? E.g. you need out-of-band data to store such things...
Presumably, the rationale for not storing the endianness is that GVariant is a recursive type system, and in a sane format, endianness only needs to be stored once per blob of data (once per D-Bus message, once per file on disk, etc.). However, I think it's worth highlighting in the docs that the result of g_variant_store() has a specified endianness, and so can't go in an architecture-independent directory unless its endianness is indicated out-of-band. I infer that the answers to my questions are: (In reply to comment #10) > Does g_variant_get_fixed_array give you a pointer to little-endian data, > native-endian data (i.e. LE on x86, BE on ppc etc.), or data whose endianness > is determined in some other way? Native-endian. > Does g_variant_store() write LE data, native-endian data, or data whose > endianness is determined in some other way? Native-endian, with the endianness not represented in the blob: so if you want to store a blob of GVariants in an architecture-independent way, you must also store the endianness out-of-band. > If I call g_variant_new_from_data(), should I give it LE or native-endian > input? Native-endian. To find out what that is, test whether G_BYTE_ORDER is G_LITTLE_ENDIAN or G_BIG_ENDIAN. G_PDP_ENDIAN is not supported by GVariant. and some bonus questions: Q: If I have a blob of GVariants in memory, how do I save them to disk in a way suitable for /usr/share or a potentially NFS /home? A: - Test whether G_BYTE_ORDER is G_LITTLE_ENDIAN or G_BIG_ENDIAN. - Decide which endianness you want to write in: this is part of your serialization format. Your options are: always use BE, always use LE, or write a marker for the endianness followed by native-endian data. - If you want to write out "wrong-endian" data for the current CPU (e.g. your convention is "always use LE" but you find that you're running on powerpc, or vice versa), g_variant_byteswap() it. This is insufficient for PDP endianness, but that's not supported anyway. Q: If I have a blob of GVariants on my disk, how do I load it into memory in a form that makes sense? A: - Determine from out-of-band information which endianness it is. This might be a marker in the file, or it might be the knowledge that your app always writes out little-endian data, even on ppc. - Read it. - If it's "wrong-endian" for the current CPU, g_variant_byteswap() it, keep the byteswapped version, and throw away the original. Again, this is insufficient for PDP endianness, but that's not supported anyway.
(In reply to comment #13) > Presumably, the rationale for not storing the endianness is that GVariant is a > recursive type system, and in a sane format, endianness only needs to be stored > once per blob of data (once per D-Bus message, once per file on disk, etc.). > However, I think it's worth highlighting in the docs that the result of > g_variant_store() has a specified endianness, and so can't go in an > architecture-independent directory unless its endianness is indicated > out-of-band. Sure, sure, that's fine. What's missing is putting all the information gathered from all this detective work in this bug into the docs...
Created attachment 198090 [details] [review] [1/7] GVariant docs: be clear that the serialisation format isn't the same as D-Bus --- Now with what appears to be GVariant's preferred spelling of "serialise".
Created attachment 198091 [details] [review] [2/7] g_variant_get_fixed_array: document which types are appropriate
Created attachment 198092 [details] [review] [3/7] Escape percent sign in g_variant_new_parsed documentation Strings matching /%[a-z]/ are special syntax for gtk-doc. --- The rest of Attachment #172256 [details] was already fixed.
Created attachment 198093 [details] [review] [4/7] Fix a typo in describing GVariant fixed array functions
Created attachment 198094 [details] [review] [5/7] g_variant_new_from_data: mention that the caller might need to byteswap
Created attachment 198095 [details] [review] [6/7] g_variant_get_data: mention what you need to know to deserialise Also include a shorter version in the docs for g_variant_store, with a pointer to g_variant_get_data.
Created attachment 198096 [details] [review] [7/7] In the GVariant intro, mention lack of built-in endianness, and rationale Presumably, the rationale for not storing the endianness is that GVariant is a recursive type system, and in a sane format, endianness only needs to be stored once per blob of data (once per D-Bus message, once per file on disk, etc.).
(In reply to comment #8) > Also g_variant_new_from_data() doesn't say if the data needs to be in host byte > order and if not if conversion happens and so on... Fixed. I put more details in g_variant_store() and g_variant_get_data() too, because those are the places where you're getting a string suitable for putting in a file, which you might naively assume was in a platform-independent byte order. (In reply to comment #14) > Sure, sure, that's fine. What's missing is putting all the information gathered > from all this detective work in this bug into the docs... I *think* I've covered everything now... (In reply to comment #8) > > Do the patches I've attached here look correct/useful to merge? > > They definitely look useful to me. Please review / consider merging?
Review of attachment 198092 [details] [review]: ::: glib/gvariant-parser.c @@ +2488,3 @@ * You may not use this function to return, unmodified, a single * #GVariant pointer from the argument list. ie: @format may not solely + * be anything along the lines of "%*", "%?", "%r", or anything starting gtk-doc likes \%
Review of attachment 198095 [details] [review]: ::: glib/gvariant-core.c @@ +828,3 @@ + * %G_VARIANT_TYPE_VARDICT and it is always in little-endian order") or + * explicitly (by storing the type and/or endianness in addition to the + * serialised data). perhaps mention that a reasonable strategy is boxing inside a variant (and then you know that the type of the file is always 'v').
These all look great, Simon. Just a couple of tweaks, and then feel free to push to both master and glib-2-30.
Fixed in master for 2.31.0 (commits up to 7973d9f8bac34fa) and in glib-2-30 for 2.30.1 (commits up to 04da052c1e8710). (In reply to comment #24) > Review of attachment 198095 [details] [review]: > > ::: glib/gvariant-core.c > @@ +828,3 @@ > + * %G_VARIANT_TYPE_VARDICT and it is always in little-endian order") or > + * explicitly (by storing the type and/or endianness in addition to the > + * serialised data). > > perhaps mention that a reasonable strategy is boxing inside a variant (and then > you know that the type of the file is always 'v'). I changed the word VARDICT to VARIANT to hint at this strategy. Perhaps this deserves a more tutorial-style section, "Using GVariant in file formats and network protocols" or something, but I'm not going to write that right now (and you have more experience with that anyway); I'm more interested in documenting hard requirements than best practices at the moment. (In reply to comment #23) > gtk-doc likes \% Fixed.