GNOME Bugzilla – Bug 625408
make GVariant dictionaries more useful
Last modified: 2018-05-24 12:34:24 UTC
GVariant dictionaries are currently a straightforward mapping of what appears on D-Bus: an array of DICT_ENTRY, where a DICT_ENTRY is a pair (2-tuple) with the first entry constrained to be a basic type. However, D-Bus APIs typically use a dictionary as a lookup table/hash table sort of data structure, in which it's useful to do random-access lookups: * look up a value by a GVariant key, return the corresponding value or NULL * if there are duplicate keys, this is an error (the D-Bus spec forbids this); silently return either the first or the last, whichever makes more sense (the D-Bus spec allows this) * should be faster than O(n) * it's very convenient for C if the value returned is owned by the dict, meaning it doesn't need to be freed as long as the caller owns a ref to the dict I can see two viable ways to achieve this: 1. Put a GHashTable cache inside the GVariant, populate it on first use, and free it at the same time as the GVariant 2. Make a GVariantDict object which contains the GVariant and the GHashTable, and give it convenience API The second is the only one that can be done outside GLib, so I'll start prototyping it; we can turn it into the first later if that's considered better API. It would also be useful to have a more convenient API than this for some common special cases: * dict<string, something>, i.e. a{s*} in GVariant notation * dict<string, variant>, i.e. a{sv}, in which there's as little unboxing boilerplate as possible * perhaps dict<object path, something>, i.e. a{o*} in GVariant notation, if the string special case doesn't also accept object paths Some background on a{sv}: telepathy-glib currently has the tp_asv_get_* family of functions, which are a very convenient API to deal with dbus-glib's representation of an a{sv} (a GHashTable<string, GValue>): GHashTable *asv; const gchar *badger; guint mushroom; gboolean is_valid_integer; asv = tp_channel_get_properties (channel); badger = tp_asv_get_string (asv, "com.example.MyExtension.Badger"); mushroom = tp_asv_get_uint32 (asv, "com.example.MyExtension.Mushroom", &is_valid_integer); For more, see: http://telepathy.freedesktop.org/doc/telepathy-glib/telepathy-glib-asv.html Ideally, I'd like to have an API this easy to deal with a GVariant a{sv} in GLib itself. telepathy-glib would then be able to use a GVariant or GVariantDict in places where it currently uses a dbus-glib GHashTable<string,GValue> to give extensible data to Telepathy clients, and those clients would be able to do things like: GVariantDict *asv; const gchar *badger; asv = tp_channel_get_properties (channel); badger = g_variant_asv_get_string (asv, "com.example.MyExtension.Badger");
i started on something like this myself recently, calling it GVarDict -- i was planning on adding G_VARIANT_TYPE_VARDICT. another benefit of having a separate data structure is that we can build it in the usual GHashTable sort of way. i'll braindump on what i was planning: vardict = g_vardict_new (NULL); g_vardict_set_value (vardict, "value", g_variant_new_int32 (42)); g_vardict_set (vardict, "foo", "i", 123); g_vardict_set (vardict, "bar", "s", "hello"); g_vardict_set (vardict, "foo", "i", 456); /* replacement */ g_vardict_set_string (vardict, "baz", "maybe some API like this too"); g_vardict_get_string (vardict, "foo") /* returns NULL, not a string */ g_vardict_get (vardict, "foo", "s", &str); /* returns FALSE, 'str' not updated */ g_vardict_get (vardict, "foo", "i", &num); /* TRUE, 'num' set to 456 */ GVariant *value = g_vardict_end (vardict); although maybe we would do the memory management in the same way as GVariantBuilder... on the other side, we could do: vardict = g_variant_new (some_gvariant_asv_value); /* check if "foo" exists and has type "i" and set x, else return FALSE */ g_vardict_get (vardict, "foo", "i", &x); g_vardict_set (vardict, "foo", "i", 444); /* this fails */ the vardict would have the idea that we can 'lock' it. that would be the case after 'end' or if we provided a non-NULL value to start. in that case _set doesn't work, but _get is fine. i was also thinking that for _get, we would have some 'index' in the vardict that started at zero, then we do this: get (vardict v, string s) { if (v.table.get (s)) return that; while (index < v.value.n_children ()) { child = v.value.get_child (index++); if (v.table.get (child.key) == null) { v.table.set (child.key, child.value); if (s == child.key) return child.value; } } return null; } this way we never do more work than is required and we also don't duplicate effort.
(In reply to comment #1) > i started on something like this myself recently, calling it GVarDict -- i was > planning on adding G_VARIANT_TYPE_VARDICT. I assume you mean a boxed G_TYPE_VARDICT that wouldn't itself be stored/storable in GVariant? If so, then this sounds good to me. Am I right in inferring from the API that GVarDict would be analogous to Qt's QVariantMap, i.e. it would only support a{sv}, not a{s*} or a{?*}? I think a{sv} covers 90% of people's requirements, although APIs like Telepathy do contain maps keyed by object path and by uint32, and maps with string keys but non-variant values, so if GVarDict isn't suitable for those, it may be worth thinking about how (read-only?) convenience APIs for those could be named. When thinking about this as API on GVariant rather than a wrapper, I vaguely thought about g_variant_map_lookup (generic a{?*} case), g_variant_map_string_lookup (a{s*} case), and g_variant_map_asv_get_foo (a{sv} convenience API). > vardict = g_variant_new (some_gvariant_asv_value); > /* check if "foo" exists and has type "i" and set x, else return FALSE */ > g_vardict_get (vardict, "foo", "i", &x); In telepathy-glib we have a pattern like: guint tp_asv_get_uint32 (a{sv}, const gchar *, gboolean *valid) which returns 0 if invalid, setting *valid if given. This preserves the pattern of "make the important thing the return, and the not-always-needed thing the optional "out" argument", making it very convenient for the common case where you don't care about distinguishing between 0, invalid-type and omission - the 0 entry in an enum frequently means "unknown", "invalid" or "normal", so you can often do things like: /* sets TP_CHANNEL_TEXT_MESSAGE_TYPE_NORMAL if invalid, which is a sane fallback */ set_message_type (foo, tp_asv_get_uint32 (asv, "message-type", NULL)); For optional boolean flags this is extremely useful, because omission nearly always means False, so you rarely need @valid. We also do a very limited amount of type coercion. I can see that this is more controversial, but it's useful to be nice to bindings that are less pedantic about types. For instance, if dbus-python is given an int while establishing a Telepathy connection, and guesses that maybe you meant an Int32, but it's actually a port number so we actually wanted a UInt16, it's useful to just range-check it and carry on. Similarly, dbus-glib is unable to send 16-bit integers at all, even when not in a variant (it will quietly convert them into gint/guint, which are then interpreted as 32-bit). A byte ('y') counts as a sort of uint, in my mind, so when I say "[u]int" here, I mean bytes too. The type coercions implemented in telepathy-glib are: * any size of [unsigned] int -> any other size of [u]int, but only if the value is in range for the target size * any size of [u]int -> double In particular, we don't implement any of these, although I think they might be reasonable: * object path -> string (just stringify it) * signature -> string (likewise) * [u]int -> boolean (C-like "is it zero?" logic) * [u]int -> string (base 10) and neither do we implement these, which I think are cans of worms that should be avoided: * double -> string (precision?) * double -> [u]int (do you truncate or round or what?) * string -> [u]int, double, or anything you can parse from it (scary!)
yes. your assumptions are correct. and i'm mostly interested in the 90% case you mention. for the other cases i'm almost comfortable telling people to "deal". on the other hand, it's a little bit inconsistent to only support one particular type (even if in reality this is the common case). i like this pattern: if (something_worked (x, y, z)) { do stuff... } so i prefer the API to go that way -- it's also somewhat more conventional. fwiw, GVariant used to have a lot of API to handle situations like this but I pulled it out because it was half-baked (and worked by iterating). In particular, there used to be an API that took two format strings: one for the key (in) and one for the value (out). That was a little bit crazy. It's a shame you're not here to discuss this....
(In reply to comment #2) > In telepathy-glib we have a pattern like: > > guint tp_asv_get_uint32 (a{sv}, const gchar *, gboolean *valid) > > which returns 0 if invalid, setting *valid if given. I've remembered another reason why we did tp_asv_* this way round, which is rather more significant: as written here, you can assign the result of tp_asv_get_uint32 to a guint or gulong or whatever (anything sufficiently large), but if you're passing a pointer "out" argument, you have to use precisely the type that was expected. Worse, this type mismatch: gboolean hypothetical_getter (a{sv}, const gchar *, gulong *output); guint32 result = 0; if (!hypothetical_getter (map, "snakes", &result)) snakes = 42; will be fine on i386, but not on x86-64! I've implemented a prototype of this API, TpVariantMap (named by analogy with QVariantMap, which is also a string -> variant mapping), in a branch for telepathy-glib. I'd be interested to hear your thoughts on it: http://git.collabora.co.uk/?p=user/smcv/telepathy-glib-smcv.git;a=shortlog;h=refs/heads/variant-map (In reply to comment #1) > vardict = g_vardict_new (NULL); I decided tp_variant_map_new (void) and tp_variant_map_new_frozen (GVariant *) would be clearer; I also added tp_variant_map_is_frozen() so people can use it in their g_return_if_fail calls. > g_vardict_set_value (vardict, "value", g_variant_new_int32 (42)); tp_variant_map_set_value has these semantics; you can also use a NULL value, which is actually deletion. > g_vardict_set (vardict, "foo", "i", 123); > g_vardict_set (vardict, "bar", "s", "hello"); > g_vardict_set (vardict, "foo", "i", 456); /* replacement */ I made it plural: tp_variant_map_set (vardict, "foo", "i", 123, "pair", "(is)", 99, "red balloons", NULL); > g_vardict_set_string (vardict, "baz", "maybe some API like this too"); I didn't bother with this; the format-string version is just as easy. We can add type-specific setters later if language bindings need them. > g_vardict_get_string (vardict, "foo") /* returns NULL, not a string */ Same semantics. Also, you can already "get" before you've frozen the map. > g_vardict_get (vardict, "foo", "s", &str); /* returns FALSE, 'str' not updated > */ I haven't added anything like this yet. > g_vardict_get (vardict, "foo", "i", &num); /* TRUE, 'num' set to 456 */ > GVariant *value = g_vardict_end (vardict); I called it tp_variant_map_freeze() and made it safe to call repeatedly (you get the same GVariant every time, borrowed from the TpVariantMap). > vardict = g_variant_new (some_gvariant_asv_value); tp_variant_map_new_frozen > /* check if "foo" exists and has type "i" and set x, else return FALSE */ > g_vardict_get (vardict, "foo", "i", &x); I haven't implemented this yet, but could. One problem with this is that, like g_object_get, it requires you to point to precisely the right type, and has no way to warn you if you get it wrong. We've had real bugs from this in Telepathy: g_object_get (thing, "port-number", &port, NULL), where port is a guint16, but port-number is of G_TYPE_UINT because there's no uint16 type. I did implement typed getter helpers, which automatically coerce integer types as described above (very useful if you got your a{sv} from Python or dbus-glib). > g_vardict_set (vardict, "foo", "i", 444); /* this fails */ It fails with a critical warning by using g_return_if_fail, I assume that was what you had in mind? > the vardict would have the idea that we can 'lock' it. that would be the case > after 'end' or if we provided a non-NULL value to start. in that case _set > doesn't work, but _get is fine. I called locking "freezing" to avoid confusion with mutexes (my implementation isn't thread safe, although it could be made so), but otherwise yes. > i was also thinking that for _get, we would have some 'index' in the vardict I didn't bother with this optimization, but if profiling indicates that it's a major problem, it wouldn't be an API break to add it.
We have a prototype for telepathy-glib over at https://bugs.freedesktop.org/show_bug.cgi?id=30423 on which we'd appreciate comments. It's pretty important for our exit strategy from dbus-glib, so we'll be pushing ahead with it whether it gets GLib approval or not. In particular, the "freeze"/"end" API is problematic if you're building up a TpVariantMap to give to some sort of D-Bus call (after you've used it, you can't safely modify it any more (for instance to make a second similar call), a situation analogous to something taking ownership of your data structure), so we're considering some sort of copy-on-write semantics.
Practically speaking, how important is it to you that lookups are non-linear? From a purely practical standpoint, I have difficulty imagining a situation where more than a dozen or so properties are sent as part of what you are intending to use this API for. It seems that the extra time wasted linearly searching a list of that size is much less than the cost of building a hashtable. I understand that we need to have the hash table for building, of course. Just asking for the read side.
(In reply to comment #6) > Practically speaking, how important is it to you that lookups are non-linear? It's not essential for our uses, but it seems a good idea to have the receiving and sending side work the same, rather than needing two code paths for "get". We could change it to be a linear search for "get" in the no-hash-table case without changing the API, if you feel particularly strongly about that. At the moment I'm more interested in the API than the implementation, though. > From a purely practical standpoint, I have difficulty imagining a situation > where more than a dozen or so properties are sent Not far off; a conference call in Telepathy currently has 15 immutable properties (things in an a{sv} which stay the same over its lifetime and can be used for dispatching).
g_variant_lookup() does most of what I requested here; it makes lookup in a{s*} and a{o*} relatively easy, but it doesn't have the tolerance for minor type differences (aka "be nice to dbus-python and dbus-glib users") that I mentioned in Comment #2. One way to do this tolerance would for application authors to open-code it whenever they parse an a{sv}, or have an asv_lookup_uint16 function (either in the app or GLib), similar to this logic: guint16 port; if (!g_variant_lookup (asv, "port", "q", &port)) { guint32 maybe_its_dbus_glib; gint32 maybe_its_dbus_python; if (g_variant_lookup (asv, "port", "u", &maybe_its_dbus_glib) { if (maybe_its_dbus_glib > 65535) { g_set_error (error, ...); return; } } else if (g_variant_lookup (asv, "port", "i", &maybe_its_dbus_python)) { if (maybe_its_dbus_python < 0 || maybe_its_dbus_python > 65535) { g_set_error (error, ...); return; } } else { port = PORT_DEFAULT; } }
Created attachment 267134 [details] [review] GVariant varargs docs: add ^aay example Add a trivial example of the use of some '^' conversions with g_variant_new().
Created attachment 267135 [details] [review] add GVariantDict ...the long-requested mutable dictionary helper for GVariant.
Created attachment 267136 [details] [review] GVariant: add new "^a{sv}" format string This allows conversions to and from GVariantDict.
Created attachment 267137 [details] [review] test GVariantDict Add some testcases for GVariantDict
(In reply to comment #8) > g_variant_lookup() does most of what I requested here [...] > but it doesn't have the tolerance for minor type > differences (aka "be nice to dbus-python and dbus-glib users") that I > mentioned in Comment #2. Still the case for GVariantDict, afaics... I assume this means you never want to have that feature?
(In reply to comment #13) > Still the case for GVariantDict, afaics... I assume this means you never want > to have that feature? Probably not, no. At least I don't consider it to be particular related to the subject at hand. If we ever did add it, I'd want it to be more generic -- perhaps at the level of g_variant_get() or event g_variant_get_uint32() being able to deal with 'i' as well... and even then I lean toward "probably not".
Created attachment 267395 [details] [review] gobject: box GVariantDict We will want to use this in GApplication for a signal and a property.
Created attachment 267397 [details] [review] add GVariantDict Minor changes plus the addition of a _contains() method.
I must admit that I don't see the value of duplicating half the GHashTable api here. Whats wrong with just GHashTable *g_variant_dict_to_hash_table (GVariant *variant); GVariant *g_hash_table_to_variant (GHashTable *ht); ?
You could make the same argument about GVariantBuilder: why not just construct it with a GPtrArray of GVariant instances and then use g_variant_new_array() or g_variant_new_tuple() to convert it? From a practical standpoint you get: - stack allocation - a nice ability to create an empty hash table with the correct hash and free funcs set up for you - proper memory management including: - proper strdup of the key - proper handling of floating refs on adding values - proper refcounting when getting values out - varargs convenience APIs similar to g_variant_builder_add() - better bindability -- having GHashTable on an API for this purpose is unimaginably gross -- and this API needs to appear on GApplication, including being passed an an argument of a signal. - a very nice way to wrap it all up at the end with automatic freeing of the working-space memory in a way that would be completely inappropriate for an API that took a GHashTable. I can't tell the number of times I've written a function that ends like: ... g_variant_builder_end (&builder); } and the same will be true of this API.
Review of attachment 267397 [details] [review]: Nice work, thanks! ::: glib/gvariant.c @@ +3710,3 @@ + * Consider the following two examples that do the same thing in each + * style: take an existing dictionary and look up the "count" uint32 + * key, prefixing 1 to it if it is found, or returning an error if the s/prefixing/adding @@ +3797,3 @@ +/** + * g_variant_dict_new: + * @from_asv: (allow none): the #GVariant with which to initialise the allow-none @@ +3820,3 @@ + GVariantDict *dict; + + dict = (GVariantDict *) g_slice_new (struct heap_dict); g_slice_alloc() is better here. g_slice_new() only adds type safety which the cast makes moot. @@ +3831,3 @@ + * g_variant_dict_init: (skip) + * @dict: a #GVariantDict + * @from_asv: (allow none): the initial value for @dict allow-none @@ +3905,3 @@ + g_return_val_if_fail (is_valid_dict (dict), NULL); + g_return_val_if_fail (key != NULL, NULL); + g_return_val_if_fail (format_string != NULL, NULL); This function returns a boolean @@ +4096,3 @@ + * @dict: a #GVariantDict + * + * Returns the current value of @dict, clearing it in the process. I think this could be a bit more explicit: "Returns the current value of @dict as a #GVariant..."
(In reply to comment #18) > You could make the same argument about GVariantBuilder: why not just construct > it with a GPtrArray of GVariant instances and then use g_variant_new_array() or > g_variant_new_tuple() to convert it? > > From a practical standpoint you get: > > - stack allocation > > - a nice ability to create an empty hash table with the correct hash and > free funcs set up for you > > - proper memory management including: > > - proper strdup of the key > > - proper handling of floating refs on adding values > > - proper refcounting when getting values out > > - varargs convenience APIs similar to g_variant_builder_add() > > - better bindability -- having GHashTable on an API for this purpose is > unimaginably gross -- and this API needs to appear on GApplication, > including being passed an an argument of a signal. > > - a very nice way to wrap it all up at the end with automatic freeing of the > working-space memory in a way that would be completely inappropriate for an > API that took a GHashTable. I can't tell the number of times I've written > a function that ends like: > > ... > > g_variant_builder_end (&builder); > } > > and the same will be true of this API. Thanks, there are some good reasons here. I just had to ask...
(In reply to comment #18) > - better bindability -- having GHashTable on an API for this purpose is > unimaginably gross -- and this API needs to appear on GApplication, > including being passed an an argument of a signal. the other arguments are convincing, but just point out: higher level languages using gobject-introspection (though it is also true for higher level languages using static bindings) routinely transform GHashTables to and from native associative arrays/objects transparently, so it would definitely not be "gross" in at least: Python, Perl, and JavaScript. Vala would probably be the only language that looked bad, but that's because Vala does not have native associative array types. I'd even expect some ad hoc code that handles GVariantDict as a native associative array in languages that have them, to reduce the API footprint.
Created attachment 268053 [details] [review] add GVariantDict ...the long-requested mutable dictionary helper for GVariant.
Review of attachment 268053 [details] [review]: Thanks. ::: glib/gvariant.c @@ +3780,3 @@ + GVariantDict *dict; + + dict = (GVariantDict *) g_slice_new (struct heap_dict); Please change this to g_slice_alloc()
Review of attachment 267395 [details] [review]: Thanks
Review of attachment 267137 [details] [review]: Thanks.
Review of attachment 267136 [details] [review]: I don't think I like this. Right now, '^' is only used for basic c types (gchar ** and gchar *). If ^a{sv} starts returning a GVariantDict, this rule would break and I'd wonder whether ^ay returns a GBytes or a gchar *.
Attachment 267395 [details] pushed as 78ec35f - gobject: box GVariantDict Attachment 268053 [details] pushed as 14e62d1 - add GVariantDict I didn't push the testcase because it tests the varargs API which is not yet pushed.
(In reply to comment #27) > I don't think I like this. Right now, '^' is only used for basic c types (gchar > ** and gchar *). If ^a{sv} starts returning a GVariantDict, this rule would > break and I'd wonder whether ^ay returns a GBytes or a gchar *. Although I agree with this statement (and the example given about GBytes is particularly poignant) I would prefer not to continue to consume more punctuation... Another possibility is to do some very evil (which would actually work very well). We could retcon that 'a' means GVariantDict/GVariantIter only for non-dictionary types (which are already treated in special ways in many other APIs both in GVariant and generally in D-Bus). a{sv} would accept and return GVariantDict. It would be relatively easy to maintain compatibility with existing uses. This is a varargs API, so pointer types are not an issue. The structs all contain magic numbers, so we could consume either a builder or dict depending on what was passed. The only tricky part is that we need to return something that can be used either as an iter or a dict and that's not really very tricky at all: the dict accessors could simply check if we have an iter and do a one-time one-way conversion on first access. This sounds awful but I think it would actually be extremely manageable. We could also take a new character... maybe $ or something? This would let us also support GBytes for 'ay'. The final possibility is that we do nothing -- if you want to use GVariantDict then you have to figure it out for yourself.
Since we're talking about evil things: let us propose that nobody has ever used a GVariantBuilder or GVariantIter on individual characters and do a (real hard) API break here and just do GBytes for that case... (ie: 'ay' in format string).
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/326.