After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 747279 - Parser fails on empty string keys
Parser fails on empty string keys
Status: RESOLVED FIXED
Product: json-glib
Classification: Core
Component: Parser
0.16.x
Other Linux
: Normal normal
: ---
Assigned To: json-glib-maint
json-glib-maint
Depends on:
Blocks:
 
 
Reported: 2015-04-03 02:23 UTC by will334
Modified: 2017-03-11 15:51 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Allow empty string as object member name (5.99 KB, patch)
2017-03-05 01:36 UTC, Dave Gilbert
committed Details | Review

Description will334 2015-04-03 02:23:49 UTC
The valid JSON construct

    {"": "This is a string."}

causes the parser to throw an error (JSON_PARSER_ERROR_EMPTY_MEMBER_NAME) and stop parsing. This behaviour has been added in the following commit: https://github.com/ebassi/json-glib/commit/028e540bc2b78f87e5a1d2f0c36126b7636f2809

This should actually parse succesfully.
Comment 1 Emmanuele Bassi (:ebassi) 2015-04-03 02:51:41 UTC
An empty string is arguably a valid subset of a string, and the various RFCs only state that the member name should be a string and unique. Ostensibly, this would imply that an empty string is also a valid object member. In practice, though, this would make it impossible to retrieve that object member from, say, any dictionary or associative array type, so I'm not incredibly motivated to consider an empty string as a valid member name.

In summary: yes, it may be correct, but it's also, quite frankly, pointless.

Is this actually posing an interoperability problem?
Comment 2 will334 2015-04-03 10:43:10 UTC
Quoting RFC4627:

   A string is a sequence of zero or more Unicode characters [UNICODE].
   (...)
   An object is an unordered collection of zero or more name/value
   pairs, where a name is a string and a value is a string, number,
   boolean, null, object, or array.

So this explicitly allows strings with zero characters. I just looked at a few implementation in various languages (Python, JavaScript, PHP, RapidJSON (C++), Newtonsoft Json.NET, ...) and they do support empty names.

AFAIK you are using GHashTable to store keys (correct me if I'm wrong, not too familiar with the code) so this shouldn't be a problem to retrieve, e.g.:

    GHashTable* hash = g_hash_table_new(g_str_hash, g_str_equal);
    g_hash_table_insert(hash, "", "This is a string.");
    printf("Empty key: %s\n", g_hash_table_lookup(hash, ""));

I might be overlooking something, I'm not an expert with both json-glib and glib itself.
Comment 3 Richard van der Hoff 2016-10-25 09:23:33 UTC
Yes, an empty string can be used just fine as a hash key.

I'm working with an API which sometimes uses empty strings as object member names, and this prevents me using json-glib to parse the response.
Comment 4 Dave Gilbert 2017-02-19 13:49:10 UTC
(In reply to Emmanuele Bassi (:ebassi) from comment #1)
> An empty string is arguably a valid subset of a string, and the various RFCs
> only state that the member name should be a string and unique. Ostensibly,
> this would imply that an empty string is also a valid object member. In
> practice, though, this would make it impossible to retrieve that object
> member from, say, any dictionary or associative array type, so I'm not
> incredibly motivated to consider an empty string as a valid member name.

It would be pointless if it wasn't for json_object_iter_* and friends,
that lets you iterate through an object without actually using the key names, so you can retrieve the content.

> In summary: yes, it may be correct, but it's also, quite frankly, pointless.

No, it's useful - and being hit on existing APIs.
I think the use case is that you have a key'd hash and they want to serialise/deserialise it.  They don't know what the keys are - and don't care - but they know they're unique; so the hash becomes an object with each entry in the object having the key whatever that might be; it's valid and looking at it from that point of view seems reasonable.  And this parser is breaking on it.

Please remove this check.
 
> Is this actually posing an interoperability problem?
Comment 5 Dave Gilbert 2017-02-19 15:26:12 UTC
For reference here is a pretty-printed fragment of the Matrix sync reply that is triggering it today; I agree it's odd - but apparently legal;

        "ephemeral": {
          "events": [
            {
              "content": {
                "user_ids": []
              },
              "type": "m.typing"
            },
            {
              "content": {
 >>>>>>         "": {
                  "m.read": {
                    "@maralorn:darmstadt.ccc.de": {
                      "ts": 1477142860750
                    }
                  }
                },
                "$14866445213246603dWZQP:matrix.org": {
                  "m.read": {
                    "@jcgruenhage:gruenhage.xyz": {
                      "ts": 1486644671120
                    }
                  }
                },
Comment 6 Emmanuele Bassi (:ebassi) 2017-02-27 10:57:23 UTC
> No, it's useful - and being hit on existing APIs.

By *one* existing API.

> I think the use case is that you have a key'd hash and they want to
> serialise/deserialise it.  They don't know what the keys are - and don't
> care - but they know they're unique.

If they "don't care" or "don't know" what the keys are, how can they *possibly* known the keys are unique? And if they know they are unique, why not just dump them in there?

> so the hash becomes an object with each entry in the object having the
> key whatever that might be; it's valid and looking at it from that point
> of view seems reasonable.

It's entirely *not* reasonable, especially in light of how JSON works. Just because it's syntactically valid it does not imply it's a reasonable thing to do. What prevents more than one entry to have the same empty string? There's no validation performed at any point in time.

Additionally, one could argue that the fact that a key may be an empty string is yet another demonstration of why the JSON specification is badly written.

> And this parser is breaking on it.

Of course it is, because it's an eminently *bad* use of JSON as a serialization format. If you want an iterable value, use an array. Objects are for direct access, like a JavaScript object.

Anyway, since it seems this idiocy needs to be interoperable with other languages, I'd gladly review a patch that relaxed the check to allow *empty* strings (i.e. non-NULL values) in the JsonObject API, and added test cases, so that we don't break it.
Comment 7 Dave Gilbert 2017-02-27 11:55:23 UTC
(In reply to Emmanuele Bassi (:ebassi) from comment #6)
> > No, it's useful - and being hit on existing APIs.
> 
> By *one* existing API.

Well, one I and Richard hit - who knows how many more.

> > I think the use case is that you have a key'd hash and they want to
> > serialise/deserialise it.  They don't know what the keys are - and don't
> > care - but they know they're unique.
> 
> If they "don't care" or "don't know" what the keys are, how can they
> *possibly* known the keys are unique? And if they know they are unique, why
> not just dump them in there?

They're just dumping a hash; so at most only one of the entries in the hash is "", so they *are* dumping the keys into the JSON - it's just that one of the keys is "".

> > so the hash becomes an object with each entry in the object having the
> > key whatever that might be; it's valid and looking at it from that point
> > of view seems reasonable.
> 
> It's entirely *not* reasonable, especially in light of how JSON works. Just
> because it's syntactically valid it does not imply it's a reasonable thing
> to do. What prevents more than one entry to have the same empty string?
> There's no validation performed at any point in time.

Because it's dumping it out from a hash and this value is the key of the hash; if they weren't unique they'd have landed in the same hash bucket.  As for validation, that's no different from making sure two entries in the object don't have the same string - that's a separate problem, but one where serialising a hash isn't a problem from the source end.
 
> Additionally, one could argue that the fact that a key may be an empty
> string is yet another demonstration of why the JSON specification is badly
> written.

Yep, not arguing there.
 
> > And this parser is breaking on it.
> 
> Of course it is, because it's an eminently *bad* use of JSON as a
> serialization format. If you want an iterable value, use an array. Objects
> are for direct access, like a JavaScript object.

Yes, agreed, although the JSON array is unfortunately a lot more verbose to do the same thing, because you either need an array of one entry objects, or a pair of matched arrays. So where this is:

  { "": "1", "red": "2", "green", "3" }

an array would have to be one of either:

  [ { "key": "", "value": "1" }, { "key": "red", "value": "2" }, { "key": "green", "value": "3" } ]

or

  { "keys": [ "", "red", "green" ], "values": ["1", "2", "3"] }

neither of them are pretty, the original is the least verbose and arguably the easiest to read.

> Anyway, since it seems this idiocy needs to be interoperable with other
> languages, I'd gladly review a patch that relaxed the check to allow *empty*
> strings (i.e. non-NULL values) in the JsonObject API, and added test cases,
> so that we don't break it.

OK, thanks, I'll look at that.

Dave
Comment 8 Richard van der Hoff 2017-02-27 13:39:13 UTC
> If you want an iterable value, use an array. Objects are for direct access, like a JavaScript object.

I might be missing something, but I don't think this is anything specific to iteration. As already noted in comment 2, I ought to be able to retrieve the value for an empty key with g_hash_table_lookup:

    GHashTable* hash = g_hash_table_new(g_str_hash, g_str_equal);
    g_hash_table_insert(hash, "", "This is a string.");
    printf("Empty key: %s\n", g_hash_table_lookup(hash, ""));
Comment 9 Emmanuele Bassi (:ebassi) 2017-02-27 14:09:03 UTC
(In reply to Richard van der Hoff from comment #8)
> > If you want an iterable value, use an array. Objects are for direct access, like a JavaScript object.
> 
> I might be missing something, but I don't think this is anything specific to
> iteration. As already noted in comment 2, I ought to be able to retrieve the
> value for an empty key with g_hash_table_lookup:
> 
>     GHashTable* hash = g_hash_table_new(g_str_hash, g_str_equal);
>     g_hash_table_insert(hash, "", "This is a string.");
>     printf("Empty key: %s\n", g_hash_table_lookup(hash, ""));

That's why I said:

'''
I'd gladly review a patch that relaxed the check to allow *empty* strings (i.e. non-NULL values) in the JsonObject API
'''

Right now, we don't distinguish between "NULL" keys and "empty string" keys. I still want to catch people sending NULL keys to JsonObject, because that's a good debugging tool; I can allow empty ('\0') keys for the sake of interoperability.
Comment 10 Dave Gilbert 2017-03-05 01:36:39 UTC
Created attachment 347249 [details] [review]
Allow empty string as object member name

Commit 028e540 disallowed empty member names in objects, however
they are unfortunately valid JSON.  This patch reenables an empty
string as a member name.

Tests are updated to allow the empty string case, and to test
the use of an empty string in generation, iteration etc.
Comment 11 Emmanuele Bassi (:ebassi) 2017-03-10 11:06:46 UTC
Review of attachment 347249 [details] [review]:

Okay, this looks good.

Thanks!
Comment 12 Emmanuele Bassi (:ebassi) 2017-03-11 15:51:21 UTC
Attachment 347249 [details] pushed as 799e165 - Allow empty string as object member name