After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 761504 - W32 registry GSettings backend does not use Unicode
W32 registry GSettings backend does not use Unicode
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: win32
unspecified
Other Windows
: Normal normal
: ---
Assigned To: gtk-win32 maintainers
gtk-win32 maintainers
Depends on:
Blocks:
 
 
Reported: 2016-02-03 14:07 UTC by LRN
Modified: 2016-02-05 09:04 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
registrybackend: use unicode calls intead of the ansi ones (11.83 KB, patch)
2016-02-04 07:43 UTC, Ignacio Casal Quinteiro (nacho)
none Details | Review
registrybackend: use unicode calls intead of the ansi ones (11.50 KB, patch)
2016-02-04 08:12 UTC, Ignacio Casal Quinteiro (nacho)
committed Details | Review

Description LRN 2016-02-03 14:07:53 UTC
The source code has the following comment:
> - RegCreateKeyA is used - Windows can also handle UTF16LE strings.
>   GSettings doesn't pay any attention to encoding, so by using ANSI we
>   hopefully avoid passing any invalid Unicode.

Doing that creates two problems:

1) The way this works is the following:
* "string" in UTF-8 is passed to registry-writing function as if it was ANSI string, encoded in ACP (active code page).
* registry function converts it to UTF16 for internal storage
* registry stores UTF16 string

* registry-reading function reads UTF16 string from the registry
* registry-reading function converts UTF16 string to ANSI string, encoding it in ACP
* ANSI string is returned to gsettings backend, which re-interprets it as UTF-8-encoded string

The problem here is that ACP is not static. It can be changed in the OS settings (the "Language for non-Unicode programs" setting).
If ACP changes, the UTF16<->ACP encoding changes, so any previously-stored UTF16 string will come back looking differently (the exact difference depends on previous and current ACP values), and interpreting it as UTF-8 string will produce different result that could even be invalid, if we're unlucky.

2) Any other code in existence that interacts with Windows Registry has no idea that the string values stored by GSettings are, in fact, UTF8-encoded-and-then-pretended-to-be-ANSI. For such code (including the stock Registry Editor) such strings will look like garbage (though an experienced eye will recognize them as mis-represented UTF-8, there's no convenient way to access the actual string). Conversely, trying to manually set some GSettings value using Registry Editor or other non-GSettings tools will not produce desirable result when such values are later read by GSettings.
Comment 1 LRN 2016-02-03 14:13:36 UTC
I can't find any reference that indicates what the internal string encoding format for the registry is, but since UTF16 is used practically everywhere in Windows, i'm assuming that it's used for the registry as well.

Anyway, the fix for this bug is to switch from ANSI to Wide-character functions, doing UTF8<->UTF16 conversion ourselves (this is a bit of a hassle for us, but should not degrade the performance, as W32 API would have done a ANSI<->UTF16 conversion anyway; it might be even faster).

Note that this will change the way GSettings interprets registry values, thus anything non-ASCII (and maybe even ASCII, depending on ACP!) that was previously stored in GSettings will be read as garbage. Some kind of autocorrection code could be added (which queries ANSI values and checks whether they look as UTF-8) to mitigate this somewhat.
Comment 2 Paolo Borelli 2016-02-03 14:28:16 UTC
(In reply to LRN from comment #1)
> Note that this will change the way GSettings interprets registry values,
> thus anything non-ASCII (and maybe even ASCII, depending on ACP!) that was
> previously stored in GSettings will be read as garbage. Some kind of
> autocorrection code could be added (which queries ANSI values and checks
> whether they look as UTF-8) to mitigate this somewhat.


I would not worry with autocorrection or any extra magic... in the worse case if we fail to read as utf8 we fail the read the setting and we fall back to the default value of the schema
Comment 3 Ignacio Casal Quinteiro (nacho) 2016-02-04 07:43:09 UTC
Created attachment 320409 [details] [review]
registrybackend: use unicode calls intead of the ansi ones
Comment 4 LRN 2016-02-04 07:55:39 UTC
Review of attachment 320409 [details] [review]:

Looks OK, apart from a few nitpicks. Did you try it with non-ASCII values? Also, by the way, does GSettings support non-ASCII setting names?

::: gio/gregistrysettingsbackend.c
@@ +1258,3 @@
                        RegistryEvent    *event)
 {
+  gunichar2 bufferw[MAX_KEY_NAME_LENGTH + 2];

Probably no need for +2, since it's already gunichar2.

@@ +1287,3 @@
   while (1)
     {
+      DWORD bufferw_size = MAX_KEY_NAME_LENGTH;

But here it's not even +1. Why?
MSDN says that bufferw_size on input should include the terminating NUL, and that on output it does not include the terminating NUL. So it should be MAX_KEY_NAME_LENGTH+2, to match the size of bufferw array.

@@ +1333,3 @@
   while (1)
     {
+      DWORD bufferw_size = MAX_KEY_NAME_LENGTH;

Ditto.
Comment 5 Ignacio Casal Quinteiro (nacho) 2016-02-04 08:12:18 UTC
Created attachment 320411 [details] [review]
registrybackend: use unicode calls intead of the ansi ones
Comment 6 LRN 2016-02-04 08:15:15 UTC
Review of attachment 320411 [details] [review]:

Okay.
Comment 7 Ignacio Casal Quinteiro (nacho) 2016-02-04 08:16:13 UTC
Attachment 320411 [details] pushed as 05dd91a - registrybackend: use unicode calls intead of the ansi ones
Comment 8 LRN 2016-02-05 05:42:42 UTC
Not okay.

Turns out, we've missed this line:

result = RegSetValueExW (hpath, value_namew, 0, value.type, value_data, value_data_size);

and value_data is UTF8-encoded at this point.
Comment 9 Ignacio Casal Quinteiro (nacho) 2016-02-05 09:04:11 UTC
This problem has been fixed in the unstable development version. The fix will be available in the next major software release. You may need to upgrade your Linux distribution to obtain that newer version.