GNOME Bugzilla – Bug 704709
Add support for the \uXXXX escape sequence
Last modified: 2018-05-22 14:53:52 UTC
Created attachment 249841 [details] [review] patch proposal for the \u escape character support Vala does not support the \u escape sequences. So the compilation ends logically with: error: invalid escape sequence Moreover, there is no validation of the supported escape sequence \xYY, where Y represents a hex digit.
Review of attachment 249841 [details] [review]: Looks fine thanks. Do you have committ access?
Created attachment 249872 [details] [review] patch proposal for the \u escape character support fixed identation
Thank you. Committed.
Does this correctly handle surrogates ?
(In reply to comment #4) > Does this correctly handle surrogates ? If you mean the \Uxxxxxxxx syntax, not yet. Do you mean anything else?
Created attachment 249885 [details] [review] Fix regression for the \x escape sequence
\U for directly referencing non-BMP characters would be nice too. But since \u is limited to 4 digits, I was wondering if it handles UTF-16 surrogate pairs, e.g. would this test pass? (the character is U+10000, but bugzilla can't handle non-BMP characters either) string s1 = "Non-BMP Test: \xF0\x90\x80\x80"; string s2 = "Non-BMP Test: \uD800\uDC00"; assert (s1 == s2);
(In reply to comment #7) > \U for directly referencing non-BMP characters would be nice too. > > But since \u is limited to 4 digits, I was wondering if it handles UTF-16 > surrogate pairs, e.g. would this test pass? (the character is U+10000, but > bugzilla can't handle non-BMP characters either) > > string s1 = "Non-BMP Test: \xF0\x90\x80\x80"; > string s2 = "Non-BMP Test: \uD800\uDC00"; > > assert (s1 == s2); it will not due to the gcc error: test-x.c:9:20: error: \uD800 is not a valid universal character test-x.c:9:20: error: \uDC00 is not a valid universal character
(In reply to comment #7) > But since \u is limited to 4 digits, I was wondering if it handles UTF-16 > surrogate pairs, e.g. would this test pass? Is there any reason why we should use UTF-16 at all? I'd expect that Linux ecosystem moved to UTF-8 anyway[1] and IIRC this is expected by glib/gtk+ stack. To add to the problems UTF-16 have 4 favours (LE/BE and with/without BOM). [1] http://www.utf8everywhere.org/
I'd tend to not supporting UTF-16 surrogate pairs, \U should be sufficient. Each escape sequence should denote a valid character and \uD800 is not a valid character. If special handling of surrogate pairs is common in languages supporting \u, we should probably support it anyway for consistency. In a quick glance at the C11 spec, I haven't seen any mention of surrogate pairs, t hough, so I expect that it's not supported in C11.
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/vala/issues/397.