GNOME Bugzilla – Bug 756121
Add hash and equality functions for JsonNode
Last modified: 2016-03-01 16:09:42 UTC
In order to efficiently insert nodes into a hash table, it would be useful to have a memoised hash function for JsonNode, which stores the calculated hash value in the node to avoid recomputing it for (potentially) a very large tree of children. I suggest using the JSON Schema definition of node equality, and some complementary hash function: http://json-schema.org/latest/json-schema-core.html#anchor9 The problem with this is that JsonNode is not immutable, so any pre-computed hash function could potentially be invalidated by changes to a descendent node further down the tree. The hash function could only take local data into account, but that’s going to result in lots of collisions; for example, in the case of lots of JSON objects which have the same set of properties, but differing values.
Created attachment 315907 [details] [review] core: Add immutability support to core objects Add an immutable mode to JsonNode, JsonObject, JsonArray and JsonValue. This is an optional mode which objects enter by calling json_*_seal(). It is a one-way transition, which means that we can build and manipulate objects as much as desired, before sealing them and enjoying the benefits of immutable objects: no need to take copies when handling them, persistent hash values (still to be implemented).
Created attachment 315908 [details] [review] array: Ensure JsonArray is zero-initialised
Created attachment 315909 [details] [review] node: Add json_node_ref() and json_node_unref() Add reference counting semantics to JsonNode, in addition to the existing init/unset and alloc/free semantics. json_node_free() must only be used with nodes allocated using json_node_alloc(). json_node_unref() may be used with all nodes (if correctly paired; it may be paired with json_node_alloc()). It is not valid to call json_node_free() on a node whose reference count is not 1.
Created attachment 315910 [details] [review] core: Remove atomic operations for reference counting They are not needed — json-glib is not at all thread safe.
Created attachment 315911 [details] [review] core: Add JSON node, object, array hashes Now that these objects can be marked as immutable, it is possible to calculate and cache hash values for each of them. This allows efficient hash-based deduplication of large numbers of JSON nodes, as needed by Walbottle for JSON test vector generation. To complement the new hash functions, each of JsonNode, JsonValue, JsonObject and JsonArray also now have an equal() comparison method. This compares them structurally and recursively, using the definition of equality from the JSON Schema specification, which seems as good as any other. http://json-schema.org/latest/json-schema-core.html#anchor9
Emmanuele, ping?
Created attachment 319227 [details] [review] core: Add JSON node, object, array hashes Now that these objects can be marked as immutable, it is possible to calculate and cache hash values for each of them. This allows efficient hash-based deduplication of large numbers of JSON nodes, as needed by Walbottle for JSON test vector generation. To complement the new hash functions, each of JsonNode, JsonValue, JsonObject and JsonArray also now have an equal() comparison method. This compares them structurally and recursively, using the definition of equality from the JSON Schema specification, which seems as good as any other. http://json-schema.org/latest/json-schema-core.html#anchor9
(In reply to Philip Withnall from comment #7) > Created attachment 319227 [details] [review] [review] > core: Add JSON node, object, array hashes > > Now that these objects can be marked as immutable, it is possible to > calculate and cache hash values for each of them. This allows efficient > hash-based deduplication of large numbers of JSON nodes, as needed by > Walbottle for JSON test vector generation. > > To complement the new hash functions, each of JsonNode, JsonValue, > JsonObject and JsonArray also now have an equal() comparison method. > This compares them structurally and recursively, using the definition of > equality from the JSON Schema specification, which seems as good as any > other. > > http://json-schema.org/latest/json-schema-core.html#anchor9 Updated to fix some problems with memoising the immutable hashes, especially since JsonObject was being created with g_slice_new() instead of g_slice_new0().
Review of attachment 315907 [details] [review]: ::: json-glib/json-parser.c @@ +216,3 @@ + * + * Since: UNRELEASED + */ I'm not a fan of this being a construct-only property. Parsing is an "atomic" process, and calling load_from_* will generate a new tree. This means that it's entirely feasible to create a JsonParser and reuse it multiple times. On the other hand, immutability is only something applied to the currently parsed tree. This means that it should be possible to set and unset the immutable bit depending on the current parsed file. I'd even opt for making all trees generated by JsonParser immutable by default, but libraries like Clutter do some ad hoc injection of JsonNodes, which means that the tree would have to be set as immutable only at the end of the parsing process, thus not buying us much.
Review of attachment 315908 [details] [review]: Sure, even though we're overriding all the fields ourselves in the same function. Doesn't cost us much anyway.
Review of attachment 315909 [details] [review]: So, what happens if I put a node inside a JsonArray and then mark the array immutable; and then put the JsonNode into a JsonObject without marking it immutable? This is where the whole memory management semantics go belly up, I'm afraid.
Review of attachment 315910 [details] [review]: I'd rather like to make json-glib thread safe, but that's a more comprehensive work, so fine by me.
Review of attachment 319227 [details] [review]: Looks generally good, but I have the nagging feeling that marking everything immutable by default will end up breaking Clutter. I'll have to test it.
(In reply to Emmanuele Bassi (:ebassi) from comment #11) > Review of attachment 315909 [details] [review] [review]: > > So, what happens if I put a node inside a JsonArray and then mark the array > immutable; and then put the JsonNode into a JsonObject without marking it > immutable? I guess you end up with a mutable JsonObject containing an immutable JsonNode. > This is where the whole memory management semantics go belly up, I'm afraid. Is this a use case which was supported before? It's not something I remember thinking of, but if it needs to be supported I can see what I can do. I think the approach has to be for all of this to be optional: users like Clutter get the old behaviour (new/free-based memory management, mutable semantics) by default, unless they explicitly use the new functions, in which case magic happens. (In reply to Emmanuele Bassi (:ebassi) from comment #13) > Review of attachment 319227 [details] [review] [review]: > > Looks generally good, but I have the nagging feeling that marking everything > immutable by default will end up breaking Clutter. I'll have to test it. I think (pending your testing of Clutter) that making everything immutable by default would effectively be an API break. I suspect we will have to go with mutable-by-default and allow users to opt-in to immutability.
Committed the two smaller patches: 28c7347 core: Remove atomic operations for reference counting d8720da array: Ensure JsonArray is zero-initialised What’s the next step for this? Are you testing Clutter, or are you waiting for something from me?
In the end, I decided to JFCI, and pushed the patches. We can figure out small bits of API later on, as well as the semantics of JsonParser. Attachment 315907 [details] pushed as 58f479b - core: Add immutability support to core objects Attachment 315909 [details] pushed as 1de237a - node: Add json_node_ref() and json_node_unref() Attachment 319227 [details] pushed as 6ddbc94 - core: Add JSON node, object, array hashes