After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 704709 - Add support for the \uXXXX escape sequence
Add support for the \uXXXX escape sequence
Status: RESOLVED OBSOLETE
Product: vala
Classification: Core
Component: Basic Types
unspecified
Other Linux
: Normal normal
: 1.0
Assigned To: Vala maintainers
Vala maintainers
Depends on:
Blocks:
 
 
Reported: 2013-07-22 20:36 UTC by Evgeny Bobkin
Modified: 2018-05-22 14:53 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch proposal for the \u escape character support (5.47 KB, patch)
2013-07-22 20:36 UTC, Evgeny Bobkin
accepted-commit_now Details | Review
patch proposal for the \u escape character support (5.50 KB, patch)
2013-07-23 09:17 UTC, Evgeny Bobkin
committed Details | Review
Fix regression for the \x escape sequence (3.40 KB, patch)
2013-07-23 11:38 UTC, Evgeny Bobkin
committed Details | Review

Description Evgeny Bobkin 2013-07-22 20:36:39 UTC
Created attachment 249841 [details] [review]
patch proposal for the \u escape character support

Vala does not support the \u escape sequences.

So the compilation ends logically with:
error: invalid escape sequence

Moreover, there is no validation of the supported escape sequence \xYY, where Y represents a hex digit.
Comment 1 Luca Bruno 2013-07-23 07:35:21 UTC
Review of attachment 249841 [details] [review]:

Looks fine thanks. Do you have committ access?
Comment 2 Evgeny Bobkin 2013-07-23 09:17:40 UTC
Created attachment 249872 [details] [review]
patch proposal for the \u escape character support

fixed identation
Comment 3 Evgeny Bobkin 2013-07-23 09:30:41 UTC
Thank you. Committed.
Comment 4 Christian Persch 2013-07-23 09:57:18 UTC
Does this correctly handle surrogates ?
Comment 5 Luca Bruno 2013-07-23 10:12:03 UTC
(In reply to comment #4)
> Does this correctly handle surrogates ?

If you mean the \Uxxxxxxxx syntax, not yet. Do you mean anything else?
Comment 6 Evgeny Bobkin 2013-07-23 11:38:15 UTC
Created attachment 249885 [details] [review]
Fix regression for the \x escape sequence
Comment 7 Christian Persch 2013-07-23 11:44:53 UTC
\U for directly referencing non-BMP characters would be nice too.

But since \u is limited to 4 digits, I was wondering if it handles UTF-16 surrogate pairs, e.g. would this test pass? (the character is U+10000, but bugzilla can't handle non-BMP characters either)

	string s1 = "Non-BMP Test: \xF0\x90\x80\x80";
        string s2 = "Non-BMP Test: \uD800\uDC00";

	assert (s1 == s2);
Comment 8 Evgeny Bobkin 2013-07-23 11:58:59 UTC
(In reply to comment #7)
> \U for directly referencing non-BMP characters would be nice too.
> 
> But since \u is limited to 4 digits, I was wondering if it handles UTF-16
> surrogate pairs, e.g. would this test pass? (the character is U+10000, but
> bugzilla can't handle non-BMP characters either)
> 
>     string s1 = "Non-BMP Test: \xF0\x90\x80\x80";
>         string s2 = "Non-BMP Test: \uD800\uDC00";
> 
>     assert (s1 == s2);

it will not due to the gcc error:

test-x.c:9:20: error: \uD800 is not a valid universal character
test-x.c:9:20: error: \uDC00 is not a valid universal character
Comment 9 Maciej (Matthew) Piechotka 2013-07-23 12:00:12 UTC
(In reply to comment #7)
> But since \u is limited to 4 digits, I was wondering if it handles UTF-16
> surrogate pairs, e.g. would this test pass?

Is there any reason why we should use UTF-16 at all? I'd expect that Linux ecosystem moved to UTF-8 anyway[1] and IIRC this is expected by glib/gtk+ stack. To add to the problems UTF-16 have 4 favours (LE/BE and with/without BOM).

[1] http://www.utf8everywhere.org/
Comment 10 Jürg Billeter 2013-07-23 12:01:34 UTC
I'd tend to not supporting UTF-16 surrogate pairs, \U should be sufficient. Each escape sequence should denote a valid character and \uD800 is not a valid character. If special handling of surrogate pairs is common in languages supporting \u, we should probably support it anyway for consistency. In a quick glance at the C11 spec, I haven't seen any mention of surrogate pairs, t hough, so I expect that it's not supported in C11.
Comment 11 GNOME Infrastructure Team 2018-05-22 14:53:52 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/vala/issues/397.