After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 696407 - add a function to convert arbitrary data to valid JSON strings
add a function to convert arbitrary data to valid JSON strings
Status: RESOLVED OBSOLETE
Product: json-glib
Classification: Core
Component: Core
git master
Other All
: Normal enhancement
: ---
Assigned To: json-glib-maint
json-glib-maint
Depends on:
Blocks:
 
 
Reported: 2013-03-22 16:58 UTC by Emmanuele Bassi (:ebassi)
Modified: 2017-09-05 10:39 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Emmanuele Bassi (:ebassi) 2013-03-22 16:58:18 UTC
JSON strings are defined in the RFC as:

   The representation of strings is similar to conventions used in the C
   family of programming languages.  A string begins and ends with
   quotation marks.  All Unicode characters may be placed within the
   quotation marks except for the characters that must be escaped:
   quotation mark, reverse solidus, and the control characters (U+0000
   through U+001F).

   Any character may be escaped.  If the character is in the Basic
   Multilingual Plane (U+0000 through U+FFFF), then it may be
   represented as a six-character sequence: a reverse solidus, followed
   by the lowercase letter u, followed by four hexadecimal digits that
   encode the character's code point.  The hexadecimal letters A though
   F can be upper or lowercase.  So, for example, a string containing
   only a single reverse solidus character may be represented as
   "\u005C".

   Alternatively, there are two-character sequence escape
   representations of some popular characters.  So, for example, a
   string containing only a single reverse solidus character may be
   represented more compactly as "\\".

   To escape an extended character that is not in the Basic Multilingual
   Plane, the character is represented as a twelve-character sequence,
   encoding the UTF-16 surrogate pair.  So, for example, a string
   containing only the G clef character (U+1D11E) may be represented as
   "\uD834\uDD1E".

see: http://www.ietf.org/rfc/rfc4627.txt?number=4627

the parsing code can deal with escaped Unicode code points in both UTF-8 and UTF-16 surrogate pairs, but we don't have anything that can generate escaped sequences from arbitrary data.

all functions in JSON-GLib dealing with strings assume that the string is UTF-8 encoded, and without control points; we cannot change that to work compatibly: we'd have to add a "length" argument to all functions dealing with strings, or we'd have to duplicate each entry point dealing with strings.

instead, we could add a function like:

  char *json_escape_string (const guint8 *data, gsize len);

that behaves like g_markup_escape_text(), and escapes all Unicode characters (including control characters) into \uXXXX and \uXXXX\uXXXX sequences.
Comment 1 Emmanuele Bassi (:ebassi) 2017-09-05 10:39:23 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/json-glib/issues/5.