GNOME Bugzilla – Bug 761504
W32 registry GSettings backend does not use Unicode
Last modified: 2016-02-05 09:04:11 UTC
The source code has the following comment: > - RegCreateKeyA is used - Windows can also handle UTF16LE strings. > GSettings doesn't pay any attention to encoding, so by using ANSI we > hopefully avoid passing any invalid Unicode. Doing that creates two problems: 1) The way this works is the following: * "string" in UTF-8 is passed to registry-writing function as if it was ANSI string, encoded in ACP (active code page). * registry function converts it to UTF16 for internal storage * registry stores UTF16 string * registry-reading function reads UTF16 string from the registry * registry-reading function converts UTF16 string to ANSI string, encoding it in ACP * ANSI string is returned to gsettings backend, which re-interprets it as UTF-8-encoded string The problem here is that ACP is not static. It can be changed in the OS settings (the "Language for non-Unicode programs" setting). If ACP changes, the UTF16<->ACP encoding changes, so any previously-stored UTF16 string will come back looking differently (the exact difference depends on previous and current ACP values), and interpreting it as UTF-8 string will produce different result that could even be invalid, if we're unlucky. 2) Any other code in existence that interacts with Windows Registry has no idea that the string values stored by GSettings are, in fact, UTF8-encoded-and-then-pretended-to-be-ANSI. For such code (including the stock Registry Editor) such strings will look like garbage (though an experienced eye will recognize them as mis-represented UTF-8, there's no convenient way to access the actual string). Conversely, trying to manually set some GSettings value using Registry Editor or other non-GSettings tools will not produce desirable result when such values are later read by GSettings.
I can't find any reference that indicates what the internal string encoding format for the registry is, but since UTF16 is used practically everywhere in Windows, i'm assuming that it's used for the registry as well. Anyway, the fix for this bug is to switch from ANSI to Wide-character functions, doing UTF8<->UTF16 conversion ourselves (this is a bit of a hassle for us, but should not degrade the performance, as W32 API would have done a ANSI<->UTF16 conversion anyway; it might be even faster). Note that this will change the way GSettings interprets registry values, thus anything non-ASCII (and maybe even ASCII, depending on ACP!) that was previously stored in GSettings will be read as garbage. Some kind of autocorrection code could be added (which queries ANSI values and checks whether they look as UTF-8) to mitigate this somewhat.
(In reply to LRN from comment #1) > Note that this will change the way GSettings interprets registry values, > thus anything non-ASCII (and maybe even ASCII, depending on ACP!) that was > previously stored in GSettings will be read as garbage. Some kind of > autocorrection code could be added (which queries ANSI values and checks > whether they look as UTF-8) to mitigate this somewhat. I would not worry with autocorrection or any extra magic... in the worse case if we fail to read as utf8 we fail the read the setting and we fall back to the default value of the schema
Created attachment 320409 [details] [review] registrybackend: use unicode calls intead of the ansi ones
Review of attachment 320409 [details] [review]: Looks OK, apart from a few nitpicks. Did you try it with non-ASCII values? Also, by the way, does GSettings support non-ASCII setting names? ::: gio/gregistrysettingsbackend.c @@ +1258,3 @@ RegistryEvent *event) { + gunichar2 bufferw[MAX_KEY_NAME_LENGTH + 2]; Probably no need for +2, since it's already gunichar2. @@ +1287,3 @@ while (1) { + DWORD bufferw_size = MAX_KEY_NAME_LENGTH; But here it's not even +1. Why? MSDN says that bufferw_size on input should include the terminating NUL, and that on output it does not include the terminating NUL. So it should be MAX_KEY_NAME_LENGTH+2, to match the size of bufferw array. @@ +1333,3 @@ while (1) { + DWORD bufferw_size = MAX_KEY_NAME_LENGTH; Ditto.
Created attachment 320411 [details] [review] registrybackend: use unicode calls intead of the ansi ones
Review of attachment 320411 [details] [review]: Okay.
Attachment 320411 [details] pushed as 05dd91a - registrybackend: use unicode calls intead of the ansi ones
Not okay. Turns out, we've missed this line: result = RegSetValueExW (hpath, value_namew, 0, value.type, value_data, value_data_size); and value_data is UTF8-encoded at this point.
This problem has been fixed in the unstable development version. The fix will be available in the next major software release. You may need to upgrade your Linux distribution to obtain that newer version.