GNOME Bugzilla – Bug 732127
Should start up with non-UTF-8
Last modified: 2021-06-10 20:48:33 UTC
Created attachment 279068 [details] [review] Fix proof of concept gnome-terminal-server refuses to start up with non-UTF-8 locales. This causes problems with those users who are still stuck with crappy legacy charsets. Yeah those charsets still such and everyone should've switched already, but g-t shouldn't be the first application forcing it. gnome-terminal-server hardly uses the locale for anything. Maybe for dbus messages, for fontconfig settings etc. It should either live happily with a non-UTF-8 locale, or (try to) set a UTF-8 even if the environment says otherwise. I attach a patch that tries to set a UTF-8 locale internally even if the locale says otherwise. It tries to fall back to the current locale with a .UTF-8 modifier, followed by C.UTF-8(*) and en_US.UTF-8, then gives up. (*) Is it officially part of glibc by now? Or added by eglibc or Ubuntu?
s/such/suck/
By the way... without this patch, and no g-t-s running: $ LC_ALL=hu_HU.ISO-8859-2 /usr/lib/gnome-terminal/gnome-terminal-server Non UTF-8 locale is not supported! $ LC_ALL=hu_HU.ISO-8859-2 gnome-terminal => starts up So the check is only there if g-t-s is started explicitly, not when launched by the client??
This isn't the right fix, IMO. We shouldn't fix up the user's locale, but expect it to be configured correctly. Now this forcing of UTF-8 in the *server* is because misconfiguration of the locale in the dbus activation environment seems to cause problems (e.g. debian bug #726363), likely caused by the user not using the full gnome stack (gdm, gnome-session, gnome-settings-demon) to launch the session. As for the *client*, it's expected that it works fine for non-UTF-8 locale. However, it doesn't do what it *should* do, that is to forward the locale and use its charset to set the vte encoding in the to-be-created tab. We should do that, and when bug 731208 is done, possibly also automatically set that encoding as active.
(In reply to comment #3) > This isn't the right fix, IMO. We shouldn't fix up the user's locale, but > expect it to be configured correctly. We're not "fixing it up" – at least I think "fixing it up" would mean modifying LANG/LC_* for our terminals. We're just ignoring/overriding them internally for our convenience which should be our internal private business. Using a non-UTF-8 locale is *not* incorrect. Or, if it is, it should be rejected/fixed at gdm/gnome-shell/unity/whatever high level to guarantee that the whole desktop is UTF-8. Refusing it in 1 single application while all the other apps happily start up and live with a legacy encoding is just not right. > Now this forcing of UTF-8 in the *server* is because misconfiguration of the > locale in the dbus activation environment seems to cause problems (e.g. debian > bug #726363), likely caused by the user not using the full gnome stack (gdm, > gnome-session, gnome-settings-demon) to launch the session. If dbus misbehaves with a non-UTF-8 locale then either dbus is crap and should be fixed, or we use its API incorrectly, or we should work around the issue (my patch would be one way to do that). [I'm not familiar with the dbus API.] The "default locale" is a quite vague concept, it's unclear what that should or should not influence. An app probably needs to deal with several charsets. Vte needs to convert back-n-forth among multiple ones. A text editor needs to open a file of any charset regardless of its own locale. A web browser needs to be able to properly display webpages of various encodings independently of its locale. Gtk1 used to handle strings in the locale's encoding, Gtk2 ignores the locale for this purpose and overrides to UTF-8, there's nothing wrong with that. If we need to ignore the locale's charset and use UTF-8 at some places, just let's do that, but don't act like a raging child who cries and refuses to do stuff because it'd take 5 lines of source code to do that but no, we're not doing this, we're letting users down instead. This just really doesn't sound right. The setlocale() method was desined in a way that the process's internal locale does not need to match the environment variables, so it's a perfectly legitimate thing to override them for our internal purposes if that's how it is most convenient for us. From the user's point of view: gnome-terminal should just freaking start up with any locale, there's no reason it shouldn't. From the developer's point of view: it's a 5-line fix, I've attached it. What's the problem??? I really don't get it. > As for the *client*, it's expected that it works fine for non-UTF-8 locale. > However, it doesn't do what it *should* do, that is to forward the locale and > use its charset to set the vte encoding in the to-be-created tab. We should do > that, and when bug 731208 is done, possibly also automatically set that > encoding as active. Bug 731208 is pretty much irrelevant here. Bug 732128 is relevant, but orthogonal to this issue.
Im using the 16.04 alpha. This bug has been open for quite some time, is it intentioned that this is NOT going to be resolved for the LTS realease?
WONTFIX as per comment 3.
*** Bug 773708 has been marked as a duplicate of this bug. ***
(In reply to Christian Persch from comment #3) > This isn't the right fix, IMO. We shouldn't fix up the user's locale, but > expect it to be configured correctly. I would like to suggest reconsidering this. In general I would agree that the user's locale ought to be configured correctly. However, it leads to a really bad failure mode for a GUI user: gnome-terminal does not start up, for no obvious reason unless you read logs, and looks to a naive user as though it's entirely broken; and steps to diagnose or fix it will typically start "open a terminal...", which doesn't work unless you either switch out of the GUI to a getty environment where it's hard to read documentation, or have a redundant non-GNOME terminal installed. Yes it logs a message, and yes it exits with a distinctive status, but neither the message nor the exit status ends up anywhere that a GUI user is likely to see it without using their (currently non-working) terminal. In particular, one semi-common scenario that fails in this way is a minimal system with no locale configuration at all, for example starting from a minimal Debian image (no locale support at all) and installing the gnome-core metapackage. In this specific case, glibc defaults to the C (aka POSIX) locale, whose character set is ASCII. This default is mandated by POSIX and seems highly unlikely to be changed. Even if gnome-terminal in Debian was given a dependency on the locales package (not necessarily desirable because it's rather large), we cannot guarantee that a system-wide default locale other than C has actually been chosen. Modern Linux distributions provide a C.UTF-8 locale, which (unlike the "full" locales like en_US.UTF-8) is typically an essential part of the system and always installed - at least, that's true in recent Debian. In the specific case where the locale is "C", switching to C.UTF-8 for the gnome-terminal would make it work correctly without imposing any particular choice of language on the system. If the user has somehow managed to configure a locale other than "C", then they are clearly able to set the environment variables somehow, so I can understand your reasoning for not wanting to interfere: whatever mechanism they used to set fr_FR.ISO-8859-1 or whatever, they can also use that mechanism to select a UTF-8 locale. However, C is special because it's the fallback, and can be achieved through inaction as well as through action. Would you consider a more minimal patch that treats C as C.UTF-8, and does not mess with any other locale? Whether we like it or not, the default in the absence of environment variables is defined to be C, so it would seem good to deal with that particular case gracefully. > Now this forcing of UTF-8 in the *server* is because misconfiguration of the > locale in the dbus activation environment seems to cause problems (e.g. > debian bug #726363) That bug report <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=726363> does not appear to contain any particular diagnosis, and the bug reporter says he is using the C.UTF-8 locale already. Do you have information elsewhere pointing to his locale being wrong in his (systemd or) D-Bus activation environment, perhaps a GNOME bug report from the same person with more detail?
(In reply to Simon McVittie from comment #9) > In particular, one semi-common scenario that fails in this way is a minimal > system with no locale configuration at all, for example starting from a > minimal Debian image (no locale support at all) and installing the > gnome-core metapackage. Does gnome-session as started by gdm actually ever get an unset locale? (Note that the only supported configuration is using gnome-shell on a gnome-session started by gdm.) > In this specific case, glibc defaults to the C (aka > POSIX) locale, whose character set is ASCII. This default is mandated by > POSIX and seems highly unlikely to be changed. It's braindead, but that's POSIX for you :-) > If the user has somehow managed to configure a locale other than "C", then > they are clearly able to set the environment variables somehow, so I can > understand your reasoning for not wanting to interfere: whatever mechanism > they used to set fr_FR.ISO-8859-1 or whatever, they can also use that > mechanism to select a UTF-8 locale. However, C is special because it's the > fallback, and can be achieved through inaction as well as through action. > > Would you consider a more minimal patch that treats C as C.UTF-8, and does > not mess with any other locale? Whether we like it or not, the default in > the absence of environment variables is defined to be C, so it would seem > good to deal with that particular case gracefully. So this is the situation that LC_ALL, LC_* and LANG are all unset, so glibc uses C locale, right? I would be ok with a patch that does try a setlocale to C.UTF-8 in gnome-terminal-server in this case, and proceed with startup if that succeeds. > > Now this forcing of UTF-8 in the *server* is because misconfiguration of the > > locale in the dbus activation environment seems to cause problems (e.g. > > debian bug #726363) > > That bug report <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=726363> > does not appear to contain any particular diagnosis, and the bug reporter > says he is using the C.UTF-8 locale already. Do you have information > elsewhere pointing to his locale being wrong in his (systemd or) D-Bus > activation environment, perhaps a GNOME bug report from the same person with > more detail? Hmm maybe I pasted the wrong bug number? I recall there were at least 3 different reports (perhaps not all on b.d.o) where non-ASCII characters were not being shown that turned out to be fixed by not using C locale for g-t-server.
(Reopening because you said you'd consider a patch) (In reply to Christian Persch from comment #10) > (In reply to Simon McVittie from comment #9) > > In particular, one semi-common scenario that fails in this way is a minimal > > system with no locale configuration at all, for example starting from a > > minimal Debian image (no locale support at all) and installing the > > gnome-core metapackage. > > Does gnome-session as started by gdm actually ever get an unset locale? > (Note that the only supported configuration is using gnome-shell on a > gnome-session started by gdm.) I used a minimal virtual machine image (prepared with vmdebootstrap similar to what's described in https://manpages.debian.org/stretch/autopkgtest/autopkgtest-virt-qemu.1.en.html), ran "apt install gnome-core xterm", rebooted, logged in as the test user clicked Activities, searched for "term" and ran GNOME Terminal. The startup notification spinner spins for a while, but the terminal does not start. Note that this minimal virtual machine does not have the locales or locales-all packages, so according to `locale -a` the only locales available are C, C.UTF-8 and POSIX. gnome-session's /proc/$pid/environ contains LANG=C and GDM_LANG=C. So do the /proc/$pid/environ for a process that was launched by systemd (I used gvfs-metadata.service as my example) and a process that was launched by dbus-daemon (I used dconf as my example). > > Would you consider a more minimal patch that treats C as C.UTF-8, and does > > not mess with any other locale? Whether we like it or not, the default in > > the absence of environment variables is defined to be C, so it would seem > > good to deal with that particular case gracefully. > > So this is the situation that LC_ALL, LC_* and LANG are all unset, so glibc > uses C locale, right? I would be ok with a patch that does try a setlocale > to C.UTF-8 in gnome-terminal-server in this case, and proceed with startup > if that succeeds. The patch I sketched out (which I'll test now) calls setlocale(LC_ALL, NULL) and checks whether it returns exactly "C" or "POSIX", to avoid having to hard-code knowledge of the various environment variables that glibc respects. This will happen if the environment variables either explicitly configure the C locale (as gdm does on my test system) or are completely missing.
Created attachment 356233 [details] [review] server: If locale is C or POSIX, quietly try C.UTF-8 before failing gnome-terminal explicitly does not support non-UTF-8 locales such as fr_FR.ISO-8859-1. Normally this is not a problem, because modern Unix systems normally use UTF-8 locales like fr_FR.UTF-8. However, there is one semi-common case where this is a problem: the situation where there has been no locale configuration at all. This should not normally be the case on a full installation of an OS distribution, but it's an easy situation to get into on developers' minimal test systems (for example installing the gnome-core metapackage onto a minimal Debian virtual machine, without also installing the locales package and configuring a non-trivial system-wide default locale). In this situation, gdm would set LANG=C for the C (aka POSIX) locale, whose character set is ANSI_X3.4-1968, better known as ASCII. gnome-terminal recognises this as not UTF-8 and fails to launch. The C locale is also used if the various locale-related environment variables are not set at all. Modern Linux distributions provide a C.UTF-8 locale, which differs from C only in its character set. For example, in Debian and its derivatives, C and C.UTF-8 are part of the Essential set, even though the locales package (containing the rest of the locales) is not. On unconfigured systems, we have nothing to lose from trying the C.UTF-8 locale; if it works, it's suitable for us, and if it doesn't work, we were about to exit unsuccessfully anyway. If the user is really using exclusively ASCII (which is what the C locale claims they are doing) then interpreting the same bytes as UTF-8 is equivalent, since UTF-8 is an ASCII superset. If the user has explicitly configured a non-C locale, either via LC_ALL, LANG or at least one of the other LC_* variables, then the result of setlocale() will not be "C" or "POSIX" (in non-trivial cases it might be a semicolon-separated list of individual settings), and we do nothing and continue to fail as before. This is done to avoid excess complexity: if the user has successfully configured one locale-related environment variable by some mechanism, then they should be able to configure all of them via that same mechanism, so there seems no need to handle that case further. On GNU/Linux, the POSIX locale is simply an alias for C, so setlocale() will never actually return "POSIX". I'm checking for both in case there are OSs where POSIX is the canonical name and C is the alias.
(In reply to Christian Persch from comment #10) > > That bug report <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=726363> > > does not appear to contain any particular diagnosis, and the bug reporter > > says he is using the C.UTF-8 locale already. Do you have information > > elsewhere pointing to his locale being wrong in his (systemd or) D-Bus > > activation environment, perhaps a GNOME bug report from the same person with > > more detail? > > Hmm maybe I pasted the wrong bug number? I recall there were at least 3 > different reports (perhaps not all on b.d.o) where non-ASCII characters were > not being shown that turned out to be fixed by not using C locale for > g-t-server. There doesn't seem to be anything on that bug to indicate a concrete diagnosis, or how it can be fixed or worked around, but perhaps there was discussion elsewhere that didn't get cc'd to the Debian bug tracking system. Note that my patch does not affect the LANG that is set in the actual shell session, which is sent by the launcher (/usr/bin/gnome-terminal) according to the environment in which it was invoked. Utilities like pstree and systemd-cgls will display ASCII-art when invoked in LANG=C, or proper Unicode box-drawing characters in LANG=C.UTF-8. If you want the C locale to be transformed into C.UTF-8 for the shell session, that would be a separate, similar change during the launcher's startup. In particular, on my test VM the patched gnome-terminal starts correctly, but using it to run pstree does not result in Unicode box-drawing characters. I wonder whether the check for a UTF-8 locale was put in the wrong place - should it have been in the launcher, not the server? Or perhaps both?
(In reply to Simon McVittie from comment #13) > In particular, on my test VM > the patched gnome-terminal starts correctly, but using it to run pstree does > not result in Unicode box-drawing characters. (unless you run LANG=C.UTF-8 pstree, which works fine)
Comment on attachment 356233 [details] [review] server: If locale is C or POSIX, quietly try C.UTF-8 before failing + if (g_strcmp0 (locale, "C") == 0 || g_strcmp0 (locale, "POSIX") == 0) { Since it's already established that locale != NULL, just use g_str_equal() instead of g_strcmp0(...) == 0. > On GNU/Linux, the POSIX locale is simply an alias for C, so setlocale() > will never actually return "POSIX". I'm checking for both in case there > are OSs where POSIX is the canonical name and C is the alias. Are there actually OSes like that? :-) + g_setenv ("LANG", "C.UTF-8", TRUE); Is that really necessary? + g_printerr ("C.UTF-8 locale not supported.\n"); Should point out that this was a fallback for C locale, sth like "C locale not supported, and fallback C.UTF-8 not available"? + return _EXIT_FAILURE_UNSUPPORTED_LOCALE; Please use a new, distinct EXIT value for this case.
(In reply to Simon McVittie from comment #13) > Note that my patch does not affect the LANG that is set in the actual shell > session, which is sent by the launcher (/usr/bin/gnome-terminal) according > to the environment in which it was invoked. Yes, and this is by design. Since this patch only fixes up the C -> C.UTF-8 locale for the g-t-server and not the gnome-session/gnome-shell environment, the newly create terminal (created by gnome-terminal client) will get the C locale, with all the drawbacks that has.
(In reply to Christian Persch from comment #15) > + if (g_strcmp0 (locale, "C") == 0 || g_strcmp0 (locale, "POSIX") == 0) { > > Since it's already established that locale != NULL, just use g_str_equal() > instead of g_strcmp0(...) == 0. Makes sense > > > On GNU/Linux, the POSIX locale is simply an alias for C, so setlocale() > > will never actually return "POSIX". I'm checking for both in case there > > are OSs where POSIX is the canonical name and C is the alias. > > Are there actually OSes like that? :-) If you're confident that there aren't, I can remove it. It seemed like a harmless bit of defensive programming. > + g_setenv ("LANG", "C.UTF-8", TRUE); > > Is that really necessary? It is if g-t-server spawns any subprocesses using its own environment (as opposed to the environment that was sent over by the gnome-terminal launcher) and those subprocesses care about having a UTF-8 locale. If you're confident that it doesn't, then the setlocale() is enough. > + g_printerr ("C.UTF-8 locale not supported.\n"); > > Should point out that this was a fallback for C locale, sth like "C locale > not supported, and fallback C.UTF-8 not available"? Sure. > + return _EXIT_FAILURE_UNSUPPORTED_LOCALE; > > Please use a new, distinct EXIT value for this case. I agree UNSUPPORTED_LOCALE was wrong, but actually, should it have been _EXIT_FAILURE_NO_UTF8? That's the exit status we'd have previously had for trying to use the non-UTF-8 C locale, which is the real reason we're failing; failing to fall back is secondary. (I'm not sure what value these various different exit codes are actually adding: as far as I can tell the only difference they make is that they get logged to the Journal, but so does our stderr, which is easier to understand.)
(In reply to Christian Persch from comment #16) > Yes, and this is by design. Since this patch only fixes up the C -> C.UTF-8 > locale for the g-t-server and not the gnome-session/gnome-shell environment, > the newly create terminal (created by gnome-terminal client) will get the C > locale, with all the drawbacks that has. And that's what you want? (Bear in mind that this makes pstree, systemd-cgls, etc. display ASCII art, not Unicode box-drawing characters, which seems weird if avoiding that is the reason you added the UTF-8 locale check in the server - hence me wondering whether this check was in the wrong place).
Is C.UTF-8 upstream glibc already, or still only a Debian&co patch?
(limited net access and time; will review the recent comments next week) IIRC I attached a patch that tries to fallback on C.UTF-8, en_US.UTF-8 and the UTF-8 counterpart of the current locale, in some order. That patch was conceptually rejected then. _If_ C.UTF-8 is not mainstream then I believe it's better to try all three. Ideally anything we do should remain inside g-t, invisibe to the shell we launch.
We should locate and really understand the core issue. Most apps handle utf8 fine even with 8bit locales. Why is g-t an exception? Can we fix _that_? Any chance wcwidth is the only culprit, and changing it to g_unicode_char_width (or what's the exact name) as mentioned in another vte bug would solve all our issues? Or at least take us one big step closer to the right fix?
(In reply to Egmont Koblinger from comment #19) > Is C.UTF-8 upstream glibc already, or still only a Debian&co patch? Looks like it's *still* something everyone has to patch in individually :-( https://sourceware.org/glibc/wiki/Proposals/C.UTF-8 https://sourceware.org/bugzilla/show_bug.cgi?id=17318 (But trying to use it is never worse behaviour than what we have now, even if we fail because it isn't there.)
(In reply to Egmont Koblinger from comment #20) > IIRC I attached a patch that tries to fallback on C.UTF-8, en_US.UTF-8 and > the UTF-8 counterpart of the current locale, in some order. That patch was > conceptually rejected then. _If_ C.UTF-8 is not mainstream then I believe > it's better to try all three. The conceptual change that I made, relative to your original patch, is that in my patch this fallback logic only happens if the current locale is "C" or unset. Your patch is active if the system is either unconfigured (no steps have been taken to create or enable any locales at all), or explicitly configured for a non-UTF-8 locale. My patch is only active if the system is unconfigured, and leaves the misconfigured case failing, which seems to be what Christian wants it to do. In the unconfigured case, "the UTF-8 counterpart of the current locale" *is* C.UTF-8, so there's nothing new to do there. Trying en_US.UTF-8 after C.UTF-8, just in case it happens to be present and functional, seems like a reasonable enhancement; although I'm a little reluctant to introduce too much complexity into this fallback path. (In reply to Egmont Koblinger from comment #21) > We should locate and really understand the core issue. Most apps handle utf8 > fine even with 8bit locales. Why is g-t an exception? Can we fix _that_? I would be very happy to see gnome-terminal working, either normally or with degraded functionality, in unsupported locales rather than just failing. The current UX (Comment #11) seems a lot more user-hostile than <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=726363>, so perhaps the cure is worse than the disease here.
Created attachment 373502 [details] [review] Updated Simon's patch based on master and comments/suggestions I've updated Simon's previous patch to apply to master branch and taken the comments and suggestions into account including: Uses g_str_equal instead of g_strcmp0 A new exit code (I've no druthers about the name, we can tweak as needed) A clearer exit message when trying the fallback locale fails as well.
Since, as noted above, this will only change the locale for g-t-server not anything running inside the terminal, I don't think this is the right place. C locale is not right for *any* UI; and so it is gnome-session/gdm that should fall forward to C.UTF-8 if the locale is C; and it should do so for all its child processes by updating the environment, including the 'systemd --user' activation environment (so that dbus autostarted processes, like g-t-server, get the new locale, too).
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gnome-terminal/-/issues/7472.