Bug 704709 – Add support for the \uXXXX escape sequence

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 704709 - Add support for the \uXXXX escape sequence


Summary:	Add support for the \uXXXX escape sequence


Status:	RESOLVED OBSOLETE

Product:	vala
Classification:	Core
Component:	Basic Types
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	1.0
Assigned To:	Vala maintainers
QA Contact:	Vala maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2013-07-22 20:36 UTC by Evgeny Bobkin
Modified:	2018-05-22 14:53 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
patch proposal for the \u escape character support (5.47 KB, patch) 2013-07-22 20:36 UTC, Evgeny Bobkin	accepted-commit_now	Details \| Review
patch proposal for the \u escape character support (5.50 KB, patch) 2013-07-23 09:17 UTC, Evgeny Bobkin	committed	Details \| Review
Fix regression for the \x escape sequence (3.40 KB, patch) 2013-07-23 11:38 UTC, Evgeny Bobkin	committed	Details \| Review

Description Evgeny Bobkin 2013-07-22 20:36:39 UTC

Created attachment 249841 [details] [review]
patch proposal for the \u escape character support

Vala does not support the \u escape sequences.

So the compilation ends logically with:
error: invalid escape sequence

Moreover, there is no validation of the supported escape sequence \xYY, where Y represents a hex digit.

Comment 1 Luca Bruno 2013-07-23 07:35:21 UTC

Review of attachment 249841 [details] [review]:

Looks fine thanks. Do you have committ access?

Comment 2 Evgeny Bobkin 2013-07-23 09:17:40 UTC

Created attachment 249872 [details] [review]
patch proposal for the \u escape character support

fixed identation

Comment 3 Evgeny Bobkin 2013-07-23 09:30:41 UTC

Thank you. Committed.

Comment 4 Christian Persch 2013-07-23 09:57:18 UTC

Does this correctly handle surrogates ?

Comment 5 Luca Bruno 2013-07-23 10:12:03 UTC

(In reply to comment #4)
> Does this correctly handle surrogates ?

If you mean the \Uxxxxxxxx syntax, not yet. Do you mean anything else?

Comment 6 Evgeny Bobkin 2013-07-23 11:38:15 UTC

Created attachment 249885 [details] [review]
Fix regression for the \x escape sequence

Comment 7 Christian Persch 2013-07-23 11:44:53 UTC

\U for directly referencing non-BMP characters would be nice too.

But since \u is limited to 4 digits, I was wondering if it handles UTF-16 surrogate pairs, e.g. would this test pass? (the character is U+10000, but bugzilla can't handle non-BMP characters either)

	string s1 = "Non-BMP Test: \xF0\x90\x80\x80";
        string s2 = "Non-BMP Test: \uD800\uDC00";

	assert (s1 == s2);

Comment 8 Evgeny Bobkin 2013-07-23 11:58:59 UTC

(In reply to comment #7)
> \U for directly referencing non-BMP characters would be nice too.
> 
> But since \u is limited to 4 digits, I was wondering if it handles UTF-16
> surrogate pairs, e.g. would this test pass? (the character is U+10000, but
> bugzilla can't handle non-BMP characters either)
> 
>     string s1 = "Non-BMP Test: \xF0\x90\x80\x80";
>         string s2 = "Non-BMP Test: \uD800\uDC00";
> 
>     assert (s1 == s2);

it will not due to the gcc error:

test-x.c:9:20: error: \uD800 is not a valid universal character
test-x.c:9:20: error: \uDC00 is not a valid universal character

Comment 9 Maciej (Matthew) Piechotka 2013-07-23 12:00:12 UTC

(In reply to comment #7)
> But since \u is limited to 4 digits, I was wondering if it handles UTF-16
> surrogate pairs, e.g. would this test pass?

Is there any reason why we should use UTF-16 at all? I'd expect that Linux ecosystem moved to UTF-8 anyway[1] and IIRC this is expected by glib/gtk+ stack. To add to the problems UTF-16 have 4 favours (LE/BE and with/without BOM).

[1] http://www.utf8everywhere.org/

Comment 10 Jürg Billeter 2013-07-23 12:01:34 UTC

I'd tend to not supporting UTF-16 surrogate pairs, \U should be sufficient. Each escape sequence should denote a valid character and \uD800 is not a valid character. If special handling of surrogate pairs is common in languages supporting \u, we should probably support it anyway for consistency. In a quick glance at the C11 spec, I haven't seen any mention of surrogate pairs, t hough, so I expect that it's not supported in C11.

Comment 11 GNOME Infrastructure Team 2018-05-22 14:53:52 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/vala/issues/397.