Bug 696407 – add a function to convert arbitrary data to valid JSON strings

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 696407 - add a function to convert arbitrary data to valid JSON strings


Summary:	add a function to convert arbitrary data to valid JSON strings


Status:	RESOLVED OBSOLETE

Product:	json-glib
Classification:	Core
Component:	Core
Version:	git master
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	json-glib-maint
QA Contact:	json-glib-maint

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2013-03-22 16:58 UTC by Emmanuele Bassi (:ebassi)
Modified:	2017-09-05 10:39 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Emmanuele Bassi (:ebassi) 2013-03-22 16:58:18 UTC

JSON strings are defined in the RFC as:

The representation of strings is similar to conventions used in the C
family of programming languages. A string begins and ends with
quotation marks. All Unicode characters may be placed within the
quotation marks except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).

Any character may be escaped. If the character is in the Basic
Multilingual Plane (U+0000 through U+FFFF), then it may be
represented as a six-character sequence: a reverse solidus, followed
by the lowercase letter u, followed by four hexadecimal digits that
encode the character's code point. The hexadecimal letters A though
F can be upper or lowercase. So, for example, a string containing
only a single reverse solidus character may be represented as
"\u005C".

Alternatively, there are two-character sequence escape
representations of some popular characters. So, for example, a
string containing only a single reverse solidus character may be
represented more compactly as "\\".

To escape an extended character that is not in the Basic Multilingual
Plane, the character is represented as a twelve-character sequence,
encoding the UTF-16 surrogate pair. So, for example, a string
containing only the G clef character (U+1D11E) may be represented as
"\uD834\uDD1E".

see: http://www.ietf.org/rfc/rfc4627.txt?number=4627

the parsing code can deal with escaped Unicode code points in both UTF-8 and UTF-16 surrogate pairs, but we don't have anything that can generate escaped sequences from arbitrary data.

all functions in JSON-GLib dealing with strings assume that the string is UTF-8 encoded, and without control points; we cannot change that to work compatibly: we'd have to add a "length" argument to all functions dealing with strings, or we'd have to duplicate each entry point dealing with strings.

instead, we could add a function like:

char *json_escape_string (const guint8 *data, gsize len);

that behaves like g_markup_escape_text(), and escapes all Unicode characters (including control characters) into \uXXXX and \uXXXX\uXXXX sequences.

Comment 1 Emmanuele Bassi (:ebassi) 2017-09-05 10:39:23 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/json-glib/issues/5.