Bug 756121 – Add hash and equality functions for JsonNode

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 756121 - Add hash and equality functions for JsonNode


Summary:	Add hash and equality functions for JsonNode


Status:	RESOLVED FIXED

Product:	json-glib
Classification:	Core
Component:	Core
Version:	git master
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	json-glib-maint
QA Contact:	json-glib-maint

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2015-10-06 13:51 UTC by Philip Withnall
Modified:	2016-03-01 16:09 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
core: Add immutability support to core objects (41.27 KB, patch) 2015-11-19 17:49 UTC, Philip Withnall	committed	Details \| Review
array: Ensure JsonArray is zero-initialised (971 bytes, patch) 2015-11-19 17:49 UTC, Philip Withnall	committed	Details \| Review
node: Add json_node_ref() and json_node_unref() (23.90 KB, patch) 2015-11-19 17:49 UTC, Philip Withnall	committed	Details \| Review
core: Remove atomic operations for reference counting (3.11 KB, patch) 2015-11-19 17:49 UTC, Philip Withnall	committed	Details \| Review
core: Add JSON node, object, array hashes (21.52 KB, patch) 2015-11-19 17:49 UTC, Philip Withnall	none	Details \| Review
core: Add JSON node, object, array hashes (22.76 KB, patch) 2016-01-17 22:06 UTC, Philip Withnall	committed	Details \| Review

Description Philip Withnall 2015-10-06 13:51:50 UTC

In order to efficiently insert nodes into a hash table, it would be useful to have a memoised hash function for JsonNode, which stores the calculated hash value in the node to avoid recomputing it for (potentially) a very large tree of children.

I suggest using the JSON Schema definition of node equality, and some complementary hash function:
http://json-schema.org/latest/json-schema-core.html#anchor9

The problem with this is that JsonNode is not immutable, so any pre-computed hash function could potentially be invalidated by changes to a descendent node further down the tree.

The hash function could only take local data into account, but that’s going to result in lots of collisions; for example, in the case of lots of JSON objects which have the same set of properties, but differing values.

Comment 1 Philip Withnall 2015-11-19 17:49:21 UTC

Created attachment 315907 [details] [review]
core: Add immutability support to core objects

Add an immutable mode to JsonNode, JsonObject, JsonArray and JsonValue.
This is an optional mode which objects enter by calling json_*_seal().
It is a one-way transition, which means that we can build and manipulate
objects as much as desired, before sealing them and enjoying the
benefits of immutable objects: no need to take copies when handling
them, persistent hash values (still to be implemented).

Comment 2 Philip Withnall 2015-11-19 17:49:25 UTC

Created attachment 315908 [details] [review]
array: Ensure JsonArray is zero-initialised

Comment 3 Philip Withnall 2015-11-19 17:49:29 UTC

Created attachment 315909 [details] [review]
node: Add json_node_ref() and json_node_unref()

Add reference counting semantics to JsonNode, in addition to the
existing init/unset and alloc/free semantics.

json_node_free() must only be used with nodes allocated using
json_node_alloc(). json_node_unref() may be used with all nodes (if
correctly paired; it may be paired with json_node_alloc()).

It is not valid to call json_node_free() on a node whose reference count
is not 1.

Comment 4 Philip Withnall 2015-11-19 17:49:33 UTC

Created attachment 315910 [details] [review]
core: Remove atomic operations for reference counting

They are not needed — json-glib is not at all thread safe.

Comment 5 Philip Withnall 2015-11-19 17:49:38 UTC

Created attachment 315911 [details] [review]
core: Add JSON node, object, array hashes

Now that these objects can be marked as immutable, it is possible to
calculate and cache hash values for each of them. This allows efficient
hash-based deduplication of large numbers of JSON nodes, as needed by
Walbottle for JSON test vector generation.

To complement the new hash functions, each of JsonNode, JsonValue,
JsonObject and JsonArray also now have an equal() comparison method.
This compares them structurally and recursively, using the definition of
equality from the JSON Schema specification, which seems as good as any
other.

http://json-schema.org/latest/json-schema-core.html#anchor9

Comment 6 Philip Withnall 2016-01-16 15:07:26 UTC

Emmanuele, ping?

Comment 7 Philip Withnall 2016-01-17 22:06:46 UTC

Created attachment 319227 [details] [review]
core: Add JSON node, object, array hashes

Now that these objects can be marked as immutable, it is possible to
calculate and cache hash values for each of them. This allows efficient
hash-based deduplication of large numbers of JSON nodes, as needed by
Walbottle for JSON test vector generation.

To complement the new hash functions, each of JsonNode, JsonValue,
JsonObject and JsonArray also now have an equal() comparison method.
This compares them structurally and recursively, using the definition of
equality from the JSON Schema specification, which seems as good as any
other.

http://json-schema.org/latest/json-schema-core.html#anchor9

Comment 8 Philip Withnall 2016-01-17 22:07:32 UTC

(In reply to Philip Withnall from comment #7)
> Created attachment 319227 [details] [review] [review]
> core: Add JSON node, object, array hashes
> 
> Now that these objects can be marked as immutable, it is possible to
> calculate and cache hash values for each of them. This allows efficient
> hash-based deduplication of large numbers of JSON nodes, as needed by
> Walbottle for JSON test vector generation.
> 
> To complement the new hash functions, each of JsonNode, JsonValue,
> JsonObject and JsonArray also now have an equal() comparison method.
> This compares them structurally and recursively, using the definition of
> equality from the JSON Schema specification, which seems as good as any
> other.
> 
> http://json-schema.org/latest/json-schema-core.html#anchor9

Updated to fix some problems with memoising the immutable hashes, especially since JsonObject was being created with g_slice_new() instead of g_slice_new0().

Comment 9 Emmanuele Bassi (:ebassi) 2016-01-21 12:55:27 UTC

Review of attachment 315907 [details] [review]:

::: json-glib/json-parser.c
@@ +216,3 @@
+   *
+   * Since: UNRELEASED
+   */

I'm not a fan of this being a construct-only property.

Parsing is an "atomic" process, and calling load_from_* will generate a new tree.

This means that it's entirely feasible to create a JsonParser and reuse it multiple times. On the other hand, immutability is only something applied to the currently parsed tree. This means that it should be possible to set and unset the immutable bit depending on the current parsed file.

I'd even opt for making all trees generated by JsonParser immutable by default, but libraries like Clutter do some ad hoc injection of JsonNodes, which means that the tree would have to be set as immutable only at the end of the parsing process, thus not buying us much.

Comment 10 Emmanuele Bassi (:ebassi) 2016-01-21 12:56:28 UTC

Review of attachment 315908 [details] [review]:

Sure, even though we're overriding all the fields ourselves in the same function. Doesn't cost us much anyway.

Comment 11 Emmanuele Bassi (:ebassi) 2016-01-21 12:59:21 UTC

Review of attachment 315909 [details] [review]:

So, what happens if I put a node inside a JsonArray and then mark the array immutable; and then put the JsonNode into a JsonObject without marking it immutable?

This is where the whole memory management semantics go belly up, I'm afraid.

Comment 12 Emmanuele Bassi (:ebassi) 2016-01-21 13:00:07 UTC

Review of attachment 315910 [details] [review]:

I'd rather like to make json-glib thread safe, but that's a more comprehensive work, so fine by me.

Comment 13 Emmanuele Bassi (:ebassi) 2016-01-21 13:02:32 UTC

Review of attachment 319227 [details] [review]:

Looks generally good, but I have the nagging feeling that marking everything immutable by default will end up breaking Clutter. I'll have to test it.

Comment 14 Philip Withnall 2016-01-21 13:11:31 UTC

(In reply to Emmanuele Bassi (:ebassi) from comment #11)
> Review of attachment 315909 [details] [review] [review]:
> 
> So, what happens if I put a node inside a JsonArray and then mark the array
> immutable; and then put the JsonNode into a JsonObject without marking it
> immutable?

I guess you end up with a mutable JsonObject containing an immutable JsonNode.

> This is where the whole memory management semantics go belly up, I'm afraid.

Is this a use case which was supported before? It's not something I remember thinking of, but if it needs to be supported I can see what I can do.

I think the approach has to be for all of this to be optional: users like Clutter get the old behaviour (new/free-based memory management, mutable semantics) by default, unless they explicitly use the new functions, in which case magic happens.

(In reply to Emmanuele Bassi (:ebassi) from comment #13)
> Review of attachment 319227 [details] [review] [review]:
> 
> Looks generally good, but I have the nagging feeling that marking everything
> immutable by default will end up breaking Clutter. I'll have to test it.

I think (pending your testing of Clutter) that making everything immutable by default would effectively be an API break. I suspect we will have to go with mutable-by-default and allow users to opt-in to immutability.

Comment 15 Philip Withnall 2016-01-28 09:25:53 UTC

Committed the two smaller patches:

28c7347 core: Remove atomic operations for reference counting
d8720da array: Ensure JsonArray is zero-initialised

What’s the next step for this? Are you testing Clutter, or are you waiting for something from me?

Comment 16 Emmanuele Bassi (:ebassi) 2016-03-01 16:09:25 UTC

In the end, I decided to JFCI, and pushed the patches. We can figure out small bits of API later on, as well as the semantics
of JsonParser.

Attachment 315907 [details] pushed as 58f479b - core: Add immutability support to core objects
Attachment 315909 [details] pushed as 1de237a - node: Add json_node_ref() and json_node_unref()
Attachment 319227 [details] pushed as 6ddbc94 - core: Add JSON node, object, array hashes