After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 522131 - Need a way to handle known-to-be-UTF-8 encoded arguments in GOption
Need a way to handle known-to-be-UTF-8 encoded arguments in GOption
Status: RESOLVED DUPLICATE of bug 722025
Product: glib
Classification: Platform
Component: general
unspecified
Other All
: High normal
: ---
Assigned To: gtkdev
gtkdev
: 659509 704252 (view as bug list)
Depends on:
Blocks: 361321 570592 619629 691169
 
 
Reported: 2008-03-12 23:56 UTC by Jody Goldberg
Modified: 2014-01-30 12:22 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
minimal api to do what we need. (3.34 KB, patch)
2008-03-12 23:57 UTC, Jody Goldberg
none Details | Review
Suggested patch (12.40 KB, patch)
2008-08-07 10:15 UTC, Tor Lillqvist
none Details | Review
Patch to use GetCommandLineW in GOption (5.94 KB, patch)
2008-12-10 17:49 UTC, Krzysztof Kosiński
none Details | Review
Patch introducing g_option_prepare_argv (8.90 KB, patch)
2008-12-13 21:35 UTC, Krzysztof Kosiński
none Details | Review
Patch using GetCommandLineW with __argv check (12.17 KB, patch)
2008-12-16 01:06 UTC, Krzysztof Kosiński
needs-work Details | Review
Patch using GetCommandLineW with __argv check - corrected (6.43 KB, patch)
2008-12-24 19:47 UTC, Krzysztof Kosiński
none Details | Review
GOption Windows test program (924 bytes, application/octet-stream)
2009-01-09 22:35 UTC, Krzysztof Kosiński
  Details
GOption Windows test program (924 bytes, application/octet-stream)
2009-01-09 22:37 UTC, Krzysztof Kosiński
  Details
Patch using CommandLineW + __argv check + glob check (7.89 KB, patch)
2009-01-09 23:43 UTC, Krzysztof Kosiński
none Details | Review
Patch using CommandLineW + __argv check + glob check v2 (7.53 KB, patch)
2009-01-11 16:11 UTC, Krzysztof Kosiński
none Details | Review
New GCommandLine API (28.50 KB, patch)
2010-01-29 23:27 UTC, Krzysztof Kosiński
reviewed Details | Review
Helper program (769 bytes, application/octet-stream)
2010-03-23 18:07 UTC, Krzysztof Kosiński
  Details
Add g_option_context_parse_utf8 (21.33 KB, patch)
2010-06-03 13:19 UTC, Christian Persch
none Details | Review
0001-Add-g_option_context_parse_command_line.patch (3.08 KB, patch)
2010-06-03 16:05 UTC, Hib Eris
none Details | Review
0001-Add-g_option_context_parse_utf8.patch (19.01 KB, patch)
2010-06-05 17:00 UTC, Hib Eris
none Details | Review
0002-Add-g_win32_get_utf8_argv.patch (3.56 KB, patch)
2010-06-05 17:01 UTC, Hib Eris
none Details | Review
0003-Add-g_option_context_parse_command_line.patch (3.12 KB, patch)
2010-06-05 17:02 UTC, Hib Eris
rejected Details | Review
0001-Add-g_option_context_parse_utf8.patch (16.92 KB, patch)
2013-02-06 09:59 UTC, Hib Eris
none Details | Review
0002-Add-g_win32_get_utf8_argv.patch (3.18 KB, patch)
2013-02-06 10:00 UTC, Hib Eris
none Details | Review
0001-Add-g_option_context_parse_utf8.patch (14.25 KB, patch)
2013-03-08 09:10 UTC, Hib Eris
none Details | Review
0002-Reorder-code-to-avoid-forward-declaration.patch (4.78 KB, patch)
2013-03-08 09:13 UTC, Hib Eris
none Details | Review
0003-Add-g_win32_get_utf8_argv.patch (3.18 KB, patch)
2013-03-08 09:13 UTC, Hib Eris
none Details | Review

Description Jody Goldberg 2008-03-12 23:56:55 UTC
On win32 we go out of our way to extract the UTF16 encoded argument list via GetCommandLineW to support complex filenames from the command line (see #361321).  The result is converted directly to utf8 rather than losing information going to locale encoding.  Unfortunately GOption insists on re-encoding the content from what it assumes i locale encoding -> utf8.

I'd like a mechanism to disable that.  Attached is a possible minimalist API.  It would be nice to have the GetCommandLineW and conversion in glib too, but I can't think of a reasonably elegant interface.
Comment 1 Jody Goldberg 2008-03-12 23:57:59 UTC
Created attachment 107190 [details] [review]
minimal api to do what we need.

not terribly pretty, but it gets the job done.
Comment 2 Tor Lillqvist 2008-03-28 22:37:17 UTC
Yeah, this is a problem. I don't like the word "delocalize", though, perhaps  "use_utf8" would be better?

void g_option_context_set_use_utf8 (GOptionContext *context,
                                    gboolean        use_utf8)

etc?

I wonder if we could even make the use of UTF-8 in GOption always turned on on Win32? It would techincally be a regression. But are there many serious GTK+ apps running on Windows out there other than gnumeric and GIMP, that we two can take care of, that need to handle exotic file names on the command line? This would mean that we would then need some new g_win32 API to set up a UTF-8 argv, and that apps would need to call that before doing their argv parsing. Would that be cleaner, I don't know...
Comment 3 Tor Lillqvist 2008-08-07 10:15:50 UTC
Created attachment 116044 [details] [review]
Suggested patch

This patch is mostly like Jody's patch, only the public API is a bit different. I add a new function g_option_context_parse_utf8() that assumes the passed argv is in UTF-8.

Note that file name arguments are still returned in the GLib file name encoding on POSIX when using g_option_context_parse_utf8(), i.e. the corresponding args will be passed through g_filename_from_utf8(). Does this make sense? Or should also file name args be returned as they are in the input argv in that case? OTOH, I guess most "sensible" POSIX systems these days use UTF-8 based locales anyway...
Comment 4 Tor Lillqvist 2008-08-18 08:37:37 UTC
Changed component to "general" as this really isn't Windows-specific.
Comment 5 Krzysztof Kosiński 2008-12-10 00:30:09 UTC
I stumbled into the same issue when trying to port Inkscape to GOption. Generally, this issue is pretty severe as it makes GOption useless on Windows and in any cross-platform applications that run on Windows, because files with names not representable in the system codepage aren't a rare sight there. An application using GOption won't be able to open those files at all.
Comment 6 Behdad Esfahbod 2008-12-10 00:34:28 UTC
I think we should simply try to validate the data as UTF-8 and if it is, use it.  Chances of backfiring are quite low.
Comment 7 Krzysztof Kosiński 2008-12-10 16:24:41 UTC
(In reply to comment #6)
> I think we should simply try to validate the data as UTF-8 and if it is, use
> it.  Chances of backfiring are quite low.
Chances of backfiring are low but non-zero - this is not an acceptable solution. There are still going to be cases where locale data validates as UTF-8, and it will cause unpredictable bugs where a program won't be able to open filenames with certain sequences in them. I'd rather use popt over GOption rather than subject users to such brain damage. The current API is not sufficient to handle both cases (UTF-8 and locale-encoded argv).

One good solution would be to call GetCommandLineW on Windows and ignore the arguments altogether. This would actually make GOption cross-platform, because people wouldn't have to call the Windows API to get fundamental things done. However, this would be a rather substantial change in behavior which should go into a major release rather than a bugfix release. I can create a patch that implements this if this approach is acceptable to others as well.
Comment 8 Tor Lillqvist 2008-12-10 17:20:04 UTC
> One good solution would be to call GetCommandLineW on Windows 
> and ignore the arguments altogether

Yes and no. Yes, calling GetCommandLineW() and then CommandLineToArgvW() is the only way to get a Unicode argument vector in a program that has a plain main(char **, int). (And mingw doesn't even support having a _wmain(wchar_t**, int) instead of the normal main()).

But no, I don't see how that as such by itself is a "solution" to this problem. Wouldn't it be natural to then convert the wchar_t** argv into a UTF-8 one and pass it on to a g_option_context_parse_utf8() as added in my patch.
Comment 9 Tor Lillqvist 2008-12-10 17:22:49 UTC
Argh, sorry for swapping the order of argv and argc in the parameter list of main() and _wmain() above;) 
Comment 10 Behdad Esfahbod 2008-12-10 17:27:42 UTC
I promise I'm not smoking anything.  How about this crack:

In GOption, on win32, if an option entry is of type G_OPTION_FLAG_FILENAME, internally call GetCommandLineW, CommandLineToArgvW, and their non-wide versions.  Find the option at hand in the non-wide arg list, then use its wide version instead.
Comment 11 Krzysztof Kosiński 2008-12-10 17:49:44 UTC
Created attachment 124367 [details] [review]
Patch to use GetCommandLineW in GOption

Proposed patch to use GetCommandLineW on Windows and ignore the contents of argc and argv parameters of g_option_context_parse altogether. Unparsed args would still be returned in them. (Note: this patch is not tested yet as I don't have a Windows dev environment at the moment.)
Comment 12 Krzysztof Kosiński 2008-12-10 18:09:25 UTC
(In reply to comment #10)
> In GOption, on win32, if an option entry is of type G_OPTION_FLAG_FILENAME,
> internally call GetCommandLineW, CommandLineToArgvW, and their non-wide
> versions.  Find the option at hand in the non-wide arg list, then use its wide
> version instead.

If we're going to use GetCommandLineW at all, we might as well just convert it to UTF-8 and then call g_shell_parse_argv to retrieve an UTF-8 argument vector, and solve this issue without introducing additional complexity, like I did in my patch.

(In reply to comment #8)
> But no, I don't see how that as such by itself is a "solution" to this problem.
> Wouldn't it be natural to then convert the wchar_t** argv into a UTF-8 one and
> pass it on to a g_option_context_parse_utf8() as added in my patch.

Maybe I wasn't clear enough: g_option_context_parse should just Do What I Mean, that is use argc and argv on non-Windows systems, and on Windows ignore them and call GetCommandLineW to retrieve the Unicode command line and parse it. I added a patch that does this. This way, we could just write things the straightforward way:

int main(int argc, char **argv)
{
   ...
   g_option_context_parse(ctx, &argc, &argv, NULL);
   ...
   /* Open an unparsed argument */
   GFile *file = g_file_new_from_commandline_arg(argv[0]);
   /* Open a file which is given as an option argument */
   GFile *template = g_file_new_from_commandline_arg(template_fn);
   ...
}

and it would work equally well on Windows and non-Windows. Adding extra API and forcing everyone to have conditional Windows code if they just want all files to open properly is IMO unreasonable.
Comment 13 Behdad Esfahbod 2008-12-10 18:12:43 UTC
Your solution is WRONG.  It makes zero sense to toss out what user passed in.  It doesn't have to be exactly what was passed to main.  User may have processed it himself first.  What I propose is the Do What I Mean solution.
Comment 14 Krzysztof Kosiński 2008-12-10 18:35:25 UTC
(In reply to comment #13)
> Your solution is WRONG.  It makes zero sense to toss out what user passed in. 
> It doesn't have to be exactly what was passed to main.  User may have processed
> it himself first.

I thought that special processing is supposed to happen after GOption parsing (that's what unparsed argument return and G_OPTION_REMAINING is for). Anyway, this behavior of ignoring arguments on Windows would be documented. If the user wants to do clever things with arguments, he shouldn't use GOption as it provides an interface adhering to GNU guidelines, or do it after GOption parsing.

When it comes to looking up the equivalent filename in the wide command line: if the user alters the command line before GOption parsing (e.g. modifies filenames or removes some arguments), I don't see how this could work either. Moreover, there's no CommandLineToArgvA function. You'd have to describe a more detailed algorithm of looking up the Unicode filename.

All possible solutions will be ugly, because the Windows command line parameter situation isn't pretty either. I think assuming the command line wasn't modified leads to the least ugly solution.
Comment 15 Behdad Esfahbod 2008-12-10 18:45:56 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > Your solution is WRONG.  It makes zero sense to toss out what user passed in. 
> > It doesn't have to be exactly what was passed to main.  User may have processed
> > it himself first.
> 
> I thought that special processing is supposed to happen after GOption parsing
> (that's what unparsed argument return and G_OPTION_REMAINING is for). Anyway,
> this behavior of ignoring arguments on Windows would be documented. If the user
> wants to do clever things with arguments, he shouldn't use GOption as it
> provides an interface adhering to GNU guidelines, or do it after GOption
> parsing.

Not really.  Imagine I want to implement "sudo" using glib, I have to walk the arg list and terminate it at first non-option argument and only pass the options that belong to me to g_option.


> When it comes to looking up the equivalent filename in the wide command line:
> if the user alters the command line before GOption parsing (e.g. modifies
> filenames or removes some arguments), I don't see how this could work either.
> Moreover, there's no CommandLineToArgvA function. You'd have to describe a more
> detailed algorithm of looking up the Unicode filename.

If there is no way to get the original argc,argv from windows then my solution doesn't work.


> All possible solutions will be ugly, because the Windows command line parameter
> situation isn't pretty either. I think assuming the command line wasn't
> modified leads to the least ugly solution.

*If* we can get the original args in both locale and wide, we can match them together and do better.  That's all I'm saying.
Comment 16 Krzysztof Kosiński 2008-12-10 19:34:49 UTC
(In reply to comment #15)
> Not really.  Imagine I want to implement "sudo" using glib, I have to walk the
> arg list and terminate it at first non-option argument and only pass the
> options that belong to me to g_option.
I didn't think about this, though this use is rather uncommon. With my solution replicating the exact interface of sudo wouldn't be possible, but if it was modified to "sudolike-app [options] -- [command]", it would still work.

> If there is no way to get the original argc,argv from windows then my solution
> doesn't work.
Locale-encoded argv can still be retrieved from the parameters of main(), but it will contain broken filenames (this is the cause of this bug).

> *If* we can get the original args in both locale and wide, we can match them
> together and do better.  That's all I'm saying.
OK, I'll assume you want to use the locale filenames as patterns to match on the Unicode filenames (they contain ? in place of unrepresentable characters). If we match them one at a time, this would still create corner cases with wrong behavior. Consider two files: ł.txt and ń.txt. Those files will both be passed in argv as ?.txt. If we have a command line like "goption ł.txt ń.txt", matching the pattern ?.txt on Unicode arguments twice will produce the result equivalent to the command line "goption ł.txt ł.txt". Also consider a file named "ąc" without any extension. In the command line "goption -c -o ąc" it would match the "-c" option, again leading to an error. Some other approach would have to be used.

Can you come up with a more specific algorithm?
Comment 17 Krzysztof Kosiński 2008-12-10 20:16:54 UTC
Totally different solution: add a function called g_option_prepare_argv that does nothing on POSIX systems, but on Windows replaces argc and argv with UTF-8 strings recoded from GetCommandLineW. This function should be defined on all systems, and should always be called before any user processing is done on argc/argv when using GOption. GOption should then assume that on Windows argc/argv are in UTF-8.

void g_option_prepare_argv(gint *argcp, gchar ***argvp)
{
#ifdef G_OS_WIN32
  LPWSTR windows_cmdline = GetCommandLineW ();
  gchar *cmdline = g_convert ((gchar*) windows_cmdline, wcslen (windows_cmdline) * 2, "UTF-8", "UTF-16", NULL, NULL, NULL);
  g_shell_parse_argv(cmdline, argc, argv, NULL);
#endif
}

An alternative but essentially equivalent approach would be to have a function that converts argv to UTF-8 on all systems, e.g. g_argv_to_utf8.
Comment 18 Behdad Esfahbod 2008-12-11 06:02:24 UTC
Well, I'm looking for a solution that does not require new API, and that does not require changing every application.  Because if a solution needs changes to every application to make it work on win32, there's a huge chance that applications don't get it right initially.

This is the pseudo-code for the algorithm I have in mind:

in g_option_context_parse():

  argw = array of all UTF-16 args
  argn = array of all locale-encoded args
  argv = array of args passed to g_option_context_parse()

foreach arg in argv:

  if arg type is FILENAME:
    find arg in argn array, if found at index j:
      replace arg with argw[j] converted to UTF-8
    else
      try to convert arg to UTF-8


That's it.  For more robustness, we can limit the iteration on argn by just going forward.  Hope that makes it more clear.  If not, let me know and I'll try writing full C code.









Comment 19 Tor Lillqvist 2008-12-11 10:56:01 UTC
That might work, yes.
Comment 20 Krzysztof Kosiński 2008-12-13 12:24:28 UTC
I had some thought and I don't like the solutions based on "let's assume argc and argv are from the command line, and do the smart thing". Ultimately we're building in an assumption that either limits the usefulness of GOption, or introduces unpredictable behavior when it's used for things that we didn't foresee.

I think the best way to solve this is to introduce the g_option_prepare_argv function. On all systems, it will: 1. replace argv with UTF-8 strings, 2. switch all GOptionContexts to assume the command line to be in UTF-8 encoding (before that it is assumed to be in locale encoding). This way, unmodified programs will continue to run as they did, and new versions can add the call to g_option_prepare_argv to handle Unicode file names on Windows. This will also allow GOption to easily be used for things similar shell built-ins where the command line is entered by the user, without the need to convert user input to locale encoding, and simplify "manual" command line handling.

Calling g_option_context_parse before g_option_prepare_argv should be deprecated (g_option_context_parse will print a warning) - this way we can reduce the number of code paths in GLib 3.0.
Comment 21 Behdad Esfahbod 2008-12-13 17:57:42 UTC
I think your approach misses the point:  If people have to go out of their way to make win32 work, then majority of applications will not ever be fixed.  That defeats the entire portability goal.  I probably won't change my apps to make that useless extra call.
Comment 22 Krzysztof Kosiński 2008-12-13 21:26:38 UTC
(In reply to comment #21)
> I think your approach misses the point:  If people have to go out of their way
> to make win32 work, then majority of applications will not ever be fixed.  That
> defeats the entire portability goal.  I probably won't change my apps to make
> that useless extra call.
If this call improves functionality on Windows then obviously it is not useless. Generally we have 3 options:

1. Ignore arguments of g_option_context_parse on Windows
2. Add an extra call that fixes things (and converts the command line to UTF-8)
3. Read the arguments of g_option_context_parse but try to guess what the user really wanted by consulting the original command line

Solution 1 limits what can be done with GOption. 2 requires a minor modification to fix existing programs while preserving current behavior. 3 can cause rarely occurring bugs which will be hard to debug (e.g. passing ł.txt ń.txt ć.txt all received as ?.txt and manually removing the first file from argv - the program would open ł.txt and ń.txt instead of ń.txt and ć.txt). Given those choices, I would pick 2. The modification isn't intrusive and doesn't change the behavior of current applications.
Comment 23 Krzysztof Kosiński 2008-12-13 21:35:57 UTC
Created attachment 124605 [details] [review]
Patch introducing g_option_prepare_argv

This patch introduces g_option_prepare_argv - it retrieves the Unicode command line on Windows, converts it to UTF-8, parses it into argv and sets a global variable that tells g_option_context_parse to treat its arguments as UTF-8. If g_option_prepare_argv isn't called, g_option_context_parse will generate a warning. Not tested on Windows, but compiles on Linux.
Comment 24 Behdad Esfahbod 2008-12-13 21:42:30 UTC
(In reply to comment #22)
> (In reply to comment #21)
> > I think your approach misses the point:  If people have to go out of their way
> > to make win32 work, then majority of applications will not ever be fixed.  That
> > defeats the entire portability goal.  I probably won't change my apps to make
> > that useless extra call.
> If this call improves functionality on Windows then obviously it is not
> useless.

If I don't care about Windows, the call is useless.

> Generally we have 3 options:
> 
> 1. Ignore arguments of g_option_context_parse on Windows

Definitely a no.

> 2. Add an extra call that fixes things (and converts the command line to UTF-8)
>
> 3. Read the arguments of g_option_context_parse but try to guess what the user
> really wanted by consulting the original command line
>
> Solution 1 limits what can be done with GOption. 2 requires a minor
> modification to fix existing programs while preserving current behavior.
>
> 3 can
> cause rarely occurring bugs which will be hard to debug (e.g. passing ł.txt
> ń.txt ć.txt all received as ?.txt and manually removing the first file from
> argv - the program would open ł.txt and ń.txt instead of ń.txt and ć.txt).

A full implementation can be smarter and avoid those issues.  The only problem would be when the user uses argv arguments that come totally from somewhere else, not the argv passed to main.

> Given those choices, I would pick 2. The modification isn't intrusive and
> doesn't change the behavior of current applications.

Well, that's your preference.  And I've expressed mine.
Comment 25 Krzysztof Kosiński 2008-12-13 22:51:44 UTC
(In reply to comment #24)
> If I don't care about Windows, the call is useless.
I can change the behavior so that the warning is only generated on Windows, so if you don't care you can skip the call entirely and you won't be bothered.

> A full implementation can be smarter and avoid those issues.  The only problem
> would be when the user uses argv arguments that come totally from somewhere
> else, not the argv passed to main.
How do you propose to solve that issue? All those files have the same path in locale encoding (?.txt). It's not possible to distinguish between them. Also consider files with names in Asian languages on an English machine, where all characters are ?'s in locale encoding - the filenames would only have to be the same length for this problem to occur. If all that is received in argv is "????.svg ????.svg ????.svg" and the user removes one of those filenames from argv before parsing, there's really no way to tell which one was removed.

I think the allowed operations on argv before parsing should be either everything (2) or nothing (1), not mostly anything with caveats (3)
Comment 26 Behdad Esfahbod 2008-12-14 00:01:50 UTC
(In reply to comment #25)
> (In reply to comment #24)
> > If I don't care about Windows, the call is useless.
> I can change the behavior so that the warning is only generated on Windows, so
> if you don't care you can skip the call entirely and you won't be bothered.

I still will be bothered when I get reports that my software does not work on Windows.


> > A full implementation can be smarter and avoid those issues.  The only problem
> > would be when the user uses argv arguments that come totally from somewhere
> > else, not the argv passed to main.
> How do you propose to solve that issue? All those files have the same path in
> locale encoding (?.txt). It's not possible to distinguish between them. Also
> consider files with names in Asian languages on an English machine, where all
> characters are ?'s in locale encoding - the filenames would only have to be the
> same length for this problem to occur. If all that is received in argv is
> "????.svg ????.svg ????.svg" and the user removes one of those filenames from
> argv before parsing, there's really no way to tell which one was removed.
> 
> I think the allowed operations on argv before parsing should be either
> everything (2) or nothing (1), not mostly anything with caveats (3)

I disagree.
Comment 27 Tor Lillqvist 2008-12-14 13:34:10 UTC
> "let's assume argc and argv are from the command line, and 
> do the smart thing"

I might be misunderstanding what you meant here, but anyway: No such assumption is necessary. As far as I can see, it is perfectlty possible and reliable to actually check whether one of the char pointers in an argument vector passed to g_option_whateveritwas() is the same as one of the ones in the char *argv passed to main(), because the Microsoft C library generously provides an API to get at the argv passed to main() from anywhere, see __argc and __argv in stdlib.h. (The work like global variables but are actually macros that expand to function calls.)
Comment 28 Behdad Esfahbod 2008-12-14 18:36:11 UTC
Oh right, the missing piece!  The pointers are equal...  So yeah, that's what we should do.
Comment 29 Krzysztof Kosiński 2008-12-14 21:10:24 UTC
I saw __argc and __argv mentioned somewhere but I forgot about them, I think this may be the best idea so far. Working on next patch.
Comment 30 Behdad Esfahbod 2008-12-14 21:42:33 UTC
Note that if user modifies argv, that probably also modifies __argv.  That is, those are the same array.  So it has the same limitations still.
Comment 31 Krzysztof Kosiński 2008-12-14 21:56:21 UTC
I'll check whether modifying argv also modifies __argv. If it does, it's bad, since the only thing that makes sense is checking whether __argv == *argv, but that doesn't get us any closer to doing the right thing. Moreover it destroys the 1:1 correspondence between Unicode and ANSI argument vectors, which this idea is based upon.

If the __argv idea fails, maybe we should combine solutions 2 and 3 to give a free hand to those who want to play with argv as well as fixing behavior for the common case without any extra calls?
Comment 32 Tor Lillqvist 2008-12-14 22:07:04 UTC
Surely it makes more sense to check whether individual char pointers in the vector passed to g_option_context_parse() match one of those in __argv rather than checking whether the char** pointer matches __argv?
Comment 33 Behdad Esfahbod 2008-12-14 22:14:51 UTC
What I'm saying is that if argv == __argv, then argv[i] == __argv[i] for all i, but that's useless because __argv is not the original argument array anymore, it's the modified version.  But it's still useful.  If __argv and the Unicode arg list are of the same length, we can assume that __argv is the original thing and go ahead, match argv entries against it and replace the Unicode ones.
Comment 34 Krzysztof Kosiński 2008-12-14 22:37:37 UTC
(In reply to comment #32)
> Surely it makes more sense to check whether individual char pointers in the
> vector passed to g_option_context_parse() match one of those in __argv rather
> than checking whether the char** pointer matches __argv?

I checked this. Not only argv == __argv and argv[0] == __argv[0], but also
when I do argv[0] = "something else than it was";, argv[0] == __argv[0] still
holds. It seems that __argv is a stack lookup macro. So using those brings us
no advantage other than knowing whether the passed argv comes from main()
arguments (see below).

(In reply to comment #33)
> But it's still useful.  If __argv and the Unicode
> arg list are of the same length, we can assume that __argv is the original
> thing and go ahead, match argv entries against it and replace the Unicode ones.

Proposed solution 4:

if argv == __argv and argc == __argc
  parse the return value of GetCommandLineW(), ignoring argv
else
  parse argv

This behavior is fairly clear, easy to implement and verify, it does the sane thing when arguments are unrelated to main(), and it satisfies all use cases I have for GOption in Inkscape. Is it OK?
Comment 35 Tor Lillqvist 2008-12-14 22:50:06 UTC
What I am getting at is that a program might well do something like:

allocate a fresh array of char pointers
into some elements of that, copy some of the argv[i] arguments to main(),
into others insert random other strings, like partially hardcoded file names, option flags, whatever
then pass that constructed array of char pointers to g_option_context_parse()

Now, in such a case, clearly it would be benefical to notice in g_option_context_parse() which individual element in the passed-in argument vector equals one of the __argv members, and thus is known to not necessarily have survived the conversion from Unicode to system codepage intact, and for which instead be corresponding argument from GetCommandLineW() + CommandLineToArgvW() + g_utf16_to_utf8() should be used.
Comment 36 Behdad Esfahbod 2008-12-14 23:03:39 UTC
(In reply to comment #34)
> (In reply to comment #32)
> > Surely it makes more sense to check whether individual char pointers in the
> > vector passed to g_option_context_parse() match one of those in __argv rather
> > than checking whether the char** pointer matches __argv?
> 
> I checked this. Not only argv == __argv and argv[0] == __argv[0], but also
> when I do argv[0] = "something else than it was";, argv[0] == __argv[0] still
> holds. It seems that __argv is a stack lookup macro.

No.  It's just that argv and __argv are pointers pointing to the same array of pointers.

> So using those brings us
> no advantage other than knowing whether the passed argv comes from main()
> arguments (see below).
> 
> (In reply to comment #33)
> > But it's still useful.  If __argv and the Unicode
> > arg list are of the same length, we can assume that __argv is the original
> > thing and go ahead, match argv entries against it and replace the Unicode ones.
> 
> Proposed solution 4:
> 
> if argv == __argv and argc == __argc
>   parse the return value of GetCommandLineW(), ignoring argv
> else
>   parse argv
> 
> This behavior is fairly clear, easy to implement and verify, it does the sane
> thing when arguments are unrelated to main(), and it satisfies all use cases I
> have for GOption in Inkscape. Is it OK?

That should definitely be done. But I agree with Tor that we can be smarter.  We can then document that if user wants to modify argv, they should make a copy and modify the copy.

Comment 37 Krzysztof Kosiński 2008-12-14 23:44:07 UTC
(In reply to comment #36)
> No.  It's just that argv and __argv are pointers pointing to the same array of
> pointers.
Tor said that those are macros that expand to some function calls; What I wanted to say is that I suspect those functions (or whatever they are, maybe assembler inlines) just go to the bottom of the stack to retrieve the parameters of the outermost function, which is main(). But that's a technical detail which doesn't concern us.

(In reply to comment #35)
> What I am getting at is that a program might well do something like:
> 
> allocate a fresh array of char pointers
> into some elements of that, copy some of the argv[i] arguments to main(),
> into others insert random other strings, like partially hardcoded file names,
> option flags, whatever
> then pass that constructed array of char pointers to g_option_context_parse()
OK, I'll try to cover this case as well, since I guess it won't be terribly complex.
Comment 38 Krzysztof Kosiński 2008-12-16 01:06:29 UTC
Created attachment 124767 [details] [review]
Patch using GetCommandLineW with __argv check

This one uses GetCommandLineW when __argv == *argv, __argc == *argc and the number of arguments in the Unicode command line is the same. Otherwise, for each argument it checks whether the char pointer is the same as any of those in __argv, and uses the corresponding Unicode argument if it is. This depends on __argv not being modified before the call to g_option_context_parse. Not tested on Windows.
Comment 39 Behdad Esfahbod 2008-12-21 16:20:51 UTC
Krzysztof, please attach a patch generated using the '-u' option.  Also, keep your other changes out (submit them as another bug) and only include the changes needed to resolve this bug.  Thanks.
Comment 40 Krzysztof Kosiński 2008-12-24 19:47:43 UTC
Created attachment 125276 [details] [review]
Patch using GetCommandLineW with __argv check - corrected

Removed unrelated g_string_append -> g_string_append_c changes and rediffed with diff -ru.
Comment 41 Krzysztof Kosiński 2008-12-30 23:37:45 UTC
I tried to test this patch but failed rather miserably. Can someone point me to some instructions on how to build Glib on Windows? I tried to use MSYS but failed when I had to build pkg-config, which doesn't seem to be available as Windows binaries, and compilation attempts fail because of errors about unsupported universal characters...
Comment 42 Tor Lillqvist 2008-12-30 23:55:41 UTC
> Can someone point me to some instructions on how to build 
> Glib on Windows?

In theory, ./configure --prefix=whatever --disable-gtk-doc && make ;)

I.e., the problem is not so much building, but making sure that you have a working build environment. Make sure of that first, with something simpler that GLib. And make sure you initially learn to build from a GLib tarball, to avoid the additional problem of having to have a working autoconf/automake/libtool/etc installation. 

As you said, MSYS is the recommended native build environment. (Not Cygwin, not even if you understand to use gcc -mno-cygwin. There will still be confusion sooner or later.) And cross-compilation from Linux is said to be even easier, or at least faster.

> I had to build pkg-config, which doesn't seem to be available as
> Windows binaries

http://ftp.gnome.org/pub/GNOME/binaries/win32/dependencies/pkg-config-0.23-2.zip

which is linked to from http://www.gtk.org/download-windows.html , even

> compilation attempts fail because of errors about
> unsupported universal characters...

Sorry, that is a total wtf to me;)
Comment 43 Krzysztof Kosiński 2009-01-09 22:13:23 UTC
I have tested the patch, and it works. I can successfully use g_file_from_commandline_arg on Unicode filenames. I'll attach the test program.
Comment 44 Krzysztof Kosiński 2009-01-09 22:35:05 UTC
Created attachment 126142 [details]
GOption Windows test program

Test program. Note that cmd.exe itself is not unicode compliant, and generally it's impossible to specify an Unicode file from the command prompt (!). Globbing doesn't work as well (see below). I had to use the "Run..." box from the Start menu and specify the file manually.

There is an added layer of braindeadness that I wasn't aware of before: the Unicode command line has no glob expansions when the program is invoked from cmd.exe. So if the glob matches only one file, it is passed as the filename. Two files will trip the argc check, so it works correctly in this case.

Theoretically ? and * are not legal filename characters on Windows - should I add a check for this and ignore the Unicode command line when globs are detected in it?
Comment 45 Krzysztof Kosiński 2009-01-09 22:37:44 UTC
Created attachment 126143 [details]
GOption Windows test program

Test program. Note that cmd.exe itself is not unicode compliant, and generally it's impossible to specify an Unicode file from the command prompt (!). Globbing doesn't work as well (see below). I had to use the "Run..." box from the Start menu and specify the file manually.

There is an added layer of braindeadness that I wasn't aware of before: the Unicode command line has no glob expansions when the program is invoked from cmd.exe. So if the glob matches only one file, it is passed as the filename. Two files will trip the argc check, so it works correctly in this case.

Theoretically ? and * are not legal filename characters on Windows, so I'll add a check for this.
Comment 46 Krzysztof Kosiński 2009-01-09 23:43:30 UTC
Created attachment 126148 [details] [review]
Patch using CommandLineW + __argv check + glob check

Patch with a glob check added. Works on Windows.
Comment 47 Krzysztof Kosiński 2009-01-11 16:11:42 UTC
Created attachment 126223 [details] [review]
Patch using CommandLineW + __argv check + glob check v2

Same as before, but replaced g_convert calls with g_utf16_to_utf8 calls.
Comment 48 Krzysztof Kosiński 2009-01-18 21:07:05 UTC
Will this patch make it into Glib 2.20? Is there anything else I have to do?
Comment 49 Josh Andler 2009-09-22 04:38:30 UTC
I'm commenting on this to check to see if Krzysztof's patch fell through the cracks and if not, if it is acceptable. This really needs to get fixed.
Comment 50 Steffen Macke 2009-12-31 14:21:43 UTC
Dia is suffering from this problem (bug #570592). 
Any news regarding Krzystof's patch? Is there anything missing?
Comment 51 Krzysztof Kosiński 2010-01-02 18:57:13 UTC
My patches aren't good, for example I didn't know about the global __wargv that contains the glob-expanded UTF-16 command line. Moreover they subtly change the behavior of existing API.

A better way would be to add a new version of g_option_context_parse function (somewhat like Tor Lillqvist's patch from comment #3) that 'does the correct thing'. For example, there could be a new structure, GCommandLine, which would represent the command line and could be created from various sources.

I started a thread on gtk-devel-list about the best way of fixing this, though no response yet.
Comment 52 Krzysztof Kosiński 2010-01-29 23:27:11 UTC
Created attachment 152608 [details] [review]
New GCommandLine API

The best way to fix this is to abstract away the command line as an opaque structure.

I introduced a very simple GCommandLine structure, as well as a new function called g_option_context_parse_command_line that accepts it. Contents of GCommandLine are always in UTF-8, which simplifies some code. Memory management is adjusted to reflect the fact that we own all strings in a GCommandLine. The old g_option_context_parse function calls the new one and converts the remaining arguments back to locale encoding.
Comment 53 Sven Herzberg 2010-02-02 15:19:36 UTC
(In reply to comment #48)
> Will this patch make it into Glib 2.20? Is there anything else I have to do?

What I'd like to see is a proper test case in glib-2.xx.x/glib/tests.

It should be pretty simple to test exactly the patterns that don't work yet. This would allow people like Tor to quickly apply the patch and include it in glib, if it's okay.
Comment 54 Krzysztof Kosiński 2010-03-23 18:04:55 UTC
If someone has directions on how to make the GLib Git checkout compile on Windows, then I'd be glad to follow them, otherwise you'll have to add the test yourself. The idea is contained in the attached files but after many hours of trying I've determined I'm not cool enough to integrate this into the Windows build process.
Comment 55 Krzysztof Kosiński 2010-03-23 18:07:06 UTC
Created attachment 156898 [details]
Helper program

This helper program should be executed from option-context.c using g_spawn_sync.
Comment 56 Krzysztof Kosiński 2010-03-23 18:11:57 UTC
This is how the code in options-context.c could look. It's a bit sloppy because I added the g_build_filename bit as an afterthought.

void
windows_unicode_test (void)
{
  gint exit, i;
  gboolean ret;
  gchar *fn[4] = {
    "goption-test-ąśćżÓŁŃ.txt",
    "goption-test-ЕЖЗклмн.txt",
    "goption-test-ﺺﺻﺼﺑﺒﺓﺀ.txt",
    "goption-test-₡₪⅔℅№Ω↕.txt"
  };

  g_test_bug ("522131");

  for (i = 0; i < 4; ++i) {
    gchar *file = g_build_filename (g_get_tmp_dir (), fn[i]);
    g_file_set_contents (file, fn[i], -1, NULL);
    g_free (file);
  }

  for (i = 0; i < 4; ++i) {
    gchar *file = g_build_filename (g_get_tmp_dir (), fn[i]);
    gchar *argv[] = {
      "option-unicode.exe",
      "--file",
      file
    };

    ret = g_spawn_sync (NULL, argv, NULL,
                        G_SPAWN_CHILD_INHERITS_STDIN,
                        NULL, NULL, NULL, NULL, &exit, NULL);
    g_assert (ret == TRUE);
    g_assert (exit == 0);
    g_free (file);
  }

  for (i = 0; i < 4; ++i) {
    gchar *file = g_build_filename (g_get_tmp_dir (), fn[i]);
    g_unlink (file);
    g_free (file);
  }
}

By the way, all existing tests in option-context run correctly with my patch.
Comment 57 Behdad Esfahbod 2010-03-31 15:30:14 UTC
I don't feel like following this bug anymore.  If someone wants to pick it up, go for it.
Comment 58 Sven Herzberg 2010-06-02 12:45:43 UTC
(In reply to comment #56)
> This is how the code in options-context.c could look. It's a bit sloppy because
> I added the g_build_filename bit as an afterthought.
> 
> void
> windows_unicode_test (void)
> {
>   gint exit, i;
>   gboolean ret;
>   gchar *fn[4] = {
>     "goption-test-ąśćżÓŁŃ.txt",
>     "goption-test-ЕЖЗклмн.txt",
>     "goption-test-ﺺﺻﺼﺑﺒﺓﺀ.txt",
>     "goption-test-₡₪⅔℅№Ω↕.txt"
>   };

You should use "gchar *fn[] = {" in order to make this more easily expandable.

> 
>   g_test_bug ("522131");
> 
>   for (i = 0; i < 4; ++i) {

You should use G_N_ELEMENTS(fn) instead of the literal 4.

>     gchar *file = g_build_filename (g_get_tmp_dir (), fn[i]);
>     g_file_set_contents (file, fn[i], -1, NULL);
>     g_free (file);
>   }
> 
>   for (i = 0; i < 4; ++i) {

Same here.

>     gchar *file = g_build_filename (g_get_tmp_dir (), fn[i]);
>     gchar *argv[] = {
>       "option-unicode.exe",
>       "--file",
>       file
>     };

This one has a missing NULL sentinel.

> By the way, all existing tests in option-context run correctly with my patch.

Great. I'd like to look into this, once I recreated my setup for building windows things.
Comment 59 Colin Walters 2010-06-02 16:57:57 UTC
Review of attachment 152608 [details] [review]:

I think this patch is going in the right direction.  We should register GCommandLine as a boxed type in gobject.

Also missing some introspection annotations:

::: glib/goption.c
@@ +2008,3 @@
+ * @context: a #GOptionContext
+ * @argc: a pointer to the number of command line arguments
+ * @argv: a pointer to the array of command line arguments

* @argc: (inout): a pointer to the number of command line arguments
 * @argv: (array length=argc) (inout) (allow-none): a pointer to the array of command line arguments

@@ +2463,3 @@
+ * g_command_line_new_from_locale:
+ * @argc: Number of arguments, or -1 if argv is NULL-terminated
+ * @argv: Array of arguments

@argv: (array length=argc): Array of arguments

@@ +2501,3 @@
+ * g_command_line_new_from_utf8:
+ * @argc: Number of arguments, or -1 if argv is NULL-terminated
+ * @argv: Array of arguments

@argv: (array length=argc): Array of arguments

@@ +2555,3 @@
+  copy = g_new0 (GCommandLine, 1);
+  copy->argc = command_line->argc;
+  copy->argv = g_strdupv (command_line->argv);

Using g_strdupv isn't actually right here since it assumes it's NULL terminated, right?

@@ +2565,3 @@
+ * @command_line: The command line
+ * @argc: Place to store the argument count, or %NULL
+ * @argv: Place to store the array of arguments, or %NULL

These should be annotated:

 * @argc: (out) (allow-none): Place to store the argument count, or %NULL
 * @argv: (array length=argc) (out) (allow-none): Place to store the array of arguments, or %NULL
Comment 60 Colin Walters 2010-06-02 17:07:04 UTC
Review of attachment 152608 [details] [review]:

One more thought; does this imply we also need e.g. gtk_init_with_command_line?
Comment 61 Christian Persch 2010-06-02 18:25:15 UTC
First, this patch is an incompatible change wrt. filename arguments: g_option_context_parse creates a GCommandLine with g_command_line_new_from_locale() then parses that with g_option_context_parse_command_line(). Now suppose we have a FILENAME or FILENAME_ARRAY option, or a CALLBACK option with FLAG_FILENAME. The current code will use the original arg string as filename resp. pass it to the callback, while with this patch it will use/pass g_filename_from_utf8(g_locale_to_utf8(arg string)) which it in no way guaranteed to return the original arg string.

Moreover, g_command_line_new_from_locale just assumes g_locale_to_utf8 succeeds for each arg string. I don't see why this would be true? g_locale_to_utf8 docs says it may return NULL.

Next, after parsing the converted-to-UTF-8 arguments, g_option_context_parse() sets the inout ***argv array's to keep only the non-consumed arguments, but the strings it uses are g_locale_from_utf8 (command_line->argv[i]). Again, this is not guaranteed to be the same as the original arg string.

Also, I don't really like the whole approach. Hidden behing all the newGCommandLine API is the simple truth that all it does *for the caller* is

#if G_PLATFORM_WIN32
g_option_context_parse_win32_wargv(context, &err);
#else
g_option_context_parse(context, &my_argc, &my_argv, &err);
#endif

Finally, GCommandLine is just a fancy name for a { GStrv, len } struct, why not simply use GStrv in the API since len can be easly had from g_strv_length() when necessary.
Comment 62 Colin Walters 2010-06-02 19:10:31 UTC
(In reply to comment #61)
> [lots of good points about encoding]

That is a pretty major problem, yes.

> Also, I don't really like the whole approach. Hidden behing all the
> newGCommandLine API is the simple truth that all it does *for the caller* is
> 
> #if G_PLATFORM_WIN32
> g_option_context_parse_win32_wargv(context, &err);
> #else
> g_option_context_parse(context, &my_argc, &my_argv, &err);
> #endif

Well, on Windows then there'd be no way to get at the mutated argument list then, no?

Note on Linux at least, we can easily get argv from /proc/self/cmdline.  How about then an API like this?

char **
g_option_context_parse_default (context, &err)

which on Windows uses __wargv, and on Linux would parse /proc/self/cmdline, returning the remaining args as a UTF-8 encoded GStrv?
Comment 63 Christian Persch 2010-06-02 22:13:49 UTC
I think this bug got too much sidetracked with the idea of pressing the wargs  through g_option_context_parse(). Esp. the idea of mangling the argv array passed to parse() (comment 10 .. comment 37) is wrong, IMHO.


I much more like the API from the early patch (attachment 116044 [details] [review]), that is to add a g_option_context_parse_utf8() that acts just like g_option_context_parse() except that it treats argv as array of UTF-8 strings. Together with a new

char **g_win32_get_argv(int *argc);

that does what g_command_line_new_from_win32_wargv() from the latest patch does,
this should cover all cases, even allowing to manually preprocess the win32 argv before passing it to parse_utf8(), just like we can with the regular argv and parse().

Re the arguments from comment 20 onwards, I agree with comment 21 in that I don't think the argument from comment 20 about extra calls on win32 is right. Porting an app to win32 isn't a no-op anyway, so adding just one extra call and using parse_utf() instead of parse() isn't any extra inconvenience.
Comment 64 Matthias Clasen 2010-06-02 23:49:49 UTC
I concur that parse_utf8 sounds vastly simpler and better than introducing new opaque commandline representations.
Comment 65 Christian Persch 2010-06-03 13:19:26 UTC
Created attachment 162648 [details] [review]
Add g_option_context_parse_utf8

This patch implements g_option_context_parse_utf() and
g_win32_get_wargv() along the lines of comment 63.

Contrary to the original patch (attachment 116044 [details] [review]) I decided that
FILENAME arguments in UTF-8 mode should get the same treatment as in
non-UTF-8 mode, that is they get the exact argument string, and *not*
the string converted to filename encoding, since there's no guarantee it
will be representable in it.

Based on the patches by Tor Lillqvist and Krzysztof Kosiński.

Bug #522131.
Comment 66 Krzysztof Kosiński 2010-06-03 14:16:10 UTC
I would be satisfied with the g_option_context_parse_ut8() approach as well.
Comment 67 Krzysztof Kosiński 2010-06-03 14:24:47 UTC
> FILENAME arguments in UTF-8 mode should get the same treatment as in
> non-UTF-8 mode, that is they get the exact argument string, and *not*
> the string converted to filename encoding, since there's no guarantee it
> will be representable in it.

The UTF-8 mode would be mostly useful on Windows, and the filename encoding there is always UTF-8, so I guess it's OK.
Comment 68 Colin Walters 2010-06-03 14:55:31 UTC
Review of attachment 162648 [details] [review]:

So, application developers are going to have to do something like:

int
main (int argc, char **argv)
{
  GError *error = NULL;

#ifdef G_OS_WIN32
  char **wargv;
  int wargc;
  
  wargv = g_win32_get_wargv (&wargc);
  if (!g_option_context_parse_utf8 (context, wargc, wargv, &error))
    ...
#else
  if (!g_option_context_parse (context, argc, argv, &error))
    ...
#endif

Actually, this also implies you can no longer call e.g. gtk_init (); you *need* to get its option group.

This again leads me to desire a:

g_option_context_parse_default (context, &error);

That assumes we can get the arguments from the platform and DTRT.  We can then even for GTK3 change it to be: gtk_init (void) which is obviously nicer.

Alternatively,

g_option_context_parse_default (context, &argc, &argv, &error);

That's defined to use the platform arguments if possible, otherwise will use the C argc/argv.
Comment 69 Tor Lillqvist 2010-06-03 15:00:55 UTC
I haven't really read through all the recent comments, but just one note: Why the "w" in g_win32_get_wargv()? That makes me think of wide characters (wchar_t), while what apparently it gets is UTF-8. Wouldn't g_win32_get_utf8_argv() be a better name?
Comment 70 Hib Eris 2010-06-03 16:05:11 UTC
Created attachment 162666 [details] [review]
0001-Add-g_option_context_parse_command_line.patch

On top of patch 162648, I would like to suggest something along the lines of this patch. It adds a new function 

gboolean
g_option_context_parse_command_line (GOptionContext *context, GError **err)

This function can be used on either windows or linux (unix?) and finds the arguments from the command line and uses them to call g_option_context_parse(_utf8).

It has some assembler in it and I am not sure this is really portable, but it is at least proof that it can be done.
Comment 71 Hib Eris 2010-06-03 16:56:45 UTC
I forgot to mention, using __wargv does not seem to work for me, so I used CommandLineToArgvW(GetCommandLineW()) to get to the command line arguments.

I have read somewhere that this is the preferred method because __wargv is not always available. It had something to do with whether UNICODE is defined. Unfortunately, I cannot find that source anymore.
Comment 72 Krzysztof Kosiński 2010-06-04 00:01:23 UTC
(In reply to comment #71)
> I forgot to mention, using __wargv does not seem to work for me, so I used
> CommandLineToArgvW(GetCommandLineW()) to get to the command line arguments.

This is wrong because globs in the command line will not be expanded. You need to use the undocumented function __wgetmainargs(). Consult glib/gspawn-win32-helper.c for details.

> I have read somewhere that this is the preferred method because __wargv is not
> always available. It had something to do with whether UNICODE is defined.
> Unfortunately, I cannot find that source anymore.

GCC simply doesn't support __wargv, I learned about this the hard way.
Comment 73 Hib Eris 2010-06-05 17:00:35 UTC
Created attachment 162812 [details] [review]
0001-Add-g_option_context_parse_utf8.patch

I have split up Cristian's patch 102648 in two patches and changed a bit: 

1. 0001-Add-g_option_context_parse_utf8.patch
adding g_option_context_parse_utf(), mostly unchanged except for some whitespace.

2. 0002-Add-g_win32_get_utf8_argv.patch
adding g_win32_get_utf8_argv(), a rewrite from g_win32_get_wargv() using __wgetmainargs() instead of __wargv which is not available in the gcc compiler.
The function g_win32_get_utf8_argv() got an extra argument to control wildcard expansion (globbing).


Additionally, I rewrote my patch 162666 to use g_win32_get_utf8_argv().

3. 0003-Add-g_option_context_parse_command_line.patch
Comment 74 Hib Eris 2010-06-05 17:01:12 UTC
Created attachment 162813 [details] [review]
0002-Add-g_win32_get_utf8_argv.patch
Comment 75 Hib Eris 2010-06-05 17:02:48 UTC
Created attachment 162814 [details] [review]
0003-Add-g_option_context_parse_command_line.patch
Comment 76 Urmas 2011-02-17 23:29:28 UTC
So when to expect release?
Comment 77 André Klapper 2011-02-17 23:36:33 UTC
I'd rather ask for a patch review first. ;-)
Comment 78 Urmas 2011-03-30 03:57:05 UTC
I don't understand irresponcibility of developers. They are probably aware about other projects using this library. Maybe they think the only language in the world is english and only script is 26 letter latin?
Comment 79 André Klapper 2011-03-30 05:22:50 UTC
Urmas: You imply deliberate intention which is not the case. Please read http://live.gnome.org/CodeOfConduct before adding any further technically off-topic comments.
Comment 80 Behdad Esfahbod 2011-08-10 13:04:31 UTC
I hit this problem with mingw32 and wine, so I thought I give it a try.  I tried the patch in comment 74 as that's the approach I prefer.  But seems like wine's GetCommandLineW() also has '?'s.  Not sure if there's something wrong in how I'm invoking wine or they just don't implement it.  I will try when I get access to a Windows machine.  At any rate, I think the approach in that patch is the best way forward.  It fixes 99% of the cases without requiring any change to applications.
Comment 81 Krzysztof Kosiński 2011-08-10 14:06:20 UTC
The behavior of GetCommandLineW() you are seeing in Wine is correct - on Windows this function also does not expand blobs. To get glob expansion you need to call __wgetmainargs like in the patch. This function is already used in the spawn helper programs.
Comment 82 Behdad Esfahbod 2011-08-10 14:46:09 UTC
I tried with __wgetmainargs() too.  Still, with wine I only get "?"s.
Comment 83 Behdad Esfahbod 2011-08-10 16:16:50 UTC
Looks to me like a bug in wine.  Investigating.
Comment 84 Behdad Esfahbod 2011-08-11 05:55:06 UTC
Seems like the problem was mingw32 linking a static msvcrt.a.  So, not a wine bug.
Comment 85 Krzysztof Kosiński 2012-11-16 14:36:23 UTC
The function __wgetmainargs(), which is required for correct behavior (as discussed above), is now documented.
http://msdn.microsoft.com/en-us/library/ff770599.aspx
Comment 86 Max Mustermann 2012-12-20 05:20:37 UTC
*** Bug 659509 has been marked as a duplicate of this bug. ***
Comment 87 Max Mustermann 2012-12-20 05:23:40 UTC
Many of our GIMP users have this problem when trying to open a file with Non-Latin characters from their filemanagers. It was nice if somebody could finally fix it. If you need help testing it on Windows, then let me know.
Comment 88 Behdad Esfahbod 2012-12-20 05:42:13 UTC
Sven, to start with, can you try the patch in comment 74?  You need a bit shuffling to get it working.
Comment 89 Max Mustermann 2013-01-06 13:06:40 UTC
*** Bug 691169 has been marked as a duplicate of this bug. ***
Comment 90 Moltres_rider 2013-01-07 01:08:30 UTC
SERIOUSLY DEVS!!! I see that this bug dates back to 2008!!! *reality check* IT IS 2013!!!

meanwhile the devs are fiddling with making cute images in Gimp, people are unable to open files with Japanese characters in filenames in Gimp...
Comment 91 Behdad Esfahbod 2013-01-07 01:23:49 UTC
Most of us "DEVS" don't have access to Windows.  I tried fixing this using wine and mingw32, but hit a bug in mingw32's implementation that make it impossible to move forward.
Comment 92 Michael Natterer 2013-01-07 01:29:06 UTC
*** Bug 691169 has been marked as a duplicate of this bug. ***
Comment 93 Hib Eris 2013-01-07 13:06:27 UTC
Let's summarize this bug:

On Windows, the arguments of the function main(argv, argc) are in a locale
encoding that cannot represent all arguments that can be given to an
application at startup. For example, when you double-click a file with a
filename that contains Chinese characters on an English Windows version,
the argv from main() does not properly represent the filename.

To handle this properly, an application has to:

1. Get the application arguments from a function other than main() in an
   encoding that can represent them.
2. Parse these arguments.

On Windows it seems best to use __wgetmainargs() to get the application
arguments in the unicode encoding that can properly represent them. For
glib, a more natural choice for the encoding is utf8. The patch from
comment 74 adds a function g_win32_get_utf8_argv() to glib that gets
the application startup arguments encoded in utf8.

When parsing arguments with g_option_context_parse(), arguments are
considered to be in the locale encoding. The patch from comment 73 adds
a function g_option_context_parse_utf8() to glib that does the same
as g_option_context_parse(), except that it considers its arguments to
be in the utf8 encoding.

With these two new glib functions, an application using g_option can
handle its startup arguments correctly using code like this:

#ifdef G_OS_WIN32
	int argc_utf8;
	gchar** argv_utf8;
	argv_utf8 = g_win32_get_utf8_argv (TRUE, &argc_utf8);
	g_option_context_parse_utf8 (context, &argc_utf8, &argv_utf8, error);
#else
	g_option_context_parse (context, argc, argv, error);
#endif
Comment 94 Behdad Esfahbod 2013-02-05 19:27:28 UTC
Review of attachment 162814 [details] [review]:

::: glib/goption.c
@@ +2080,3 @@
+#ifndef G_OS_WIN32
+static
+void getmainargs(int *argc,

This function is a horrible hack.  We can't put this in glib!
Comment 95 Hib Eris 2013-02-05 21:46:34 UTC
(In reply to comment #94)
> Review of attachment 162814 [details] [review]:
> 
> ::: glib/goption.c
> @@ +2080,3 @@
> +#ifndef G_OS_WIN32
> +static
> +void getmainargs(int *argc,
> 
> This function is a horrible hack.  We can't put this in glib!

I agree with you that that patch is a hack. It is only a proof of concept for creating an uniform API that works on both Windows and Unix.

But I think we can better postpone discussing an uniform API and first discuss adding the functionality for glib that can allow application writers to use GOption on Windows, i.e. g_option_context_parse_utf8(): attachment 162812 [details] [review] and g_win32_get_utf8_argv(): attachment 162813 [details] [review]
Comment 96 Behdad Esfahbod 2013-02-05 22:34:07 UTC
(In reply to comment #95)
> (In reply to comment #94)
> > Review of attachment 162814 [details] [review] [details]:
> > 
> > ::: glib/goption.c
> > @@ +2080,3 @@
> > +#ifndef G_OS_WIN32
> > +static
> > +void getmainargs(int *argc,
> > 
> > This function is a horrible hack.  We can't put this in glib!
> 
> I agree with you that that patch is a hack. It is only a proof of concept for
> creating an uniform API that works on both Windows and Unix.
> 
> But I think we can better postpone discussing an uniform API and first discuss
> adding the functionality for glib that can allow application writers to use
> GOption on Windows, i.e. g_option_context_parse_utf8(): attachment 162812 [details] [review] and
> g_win32_get_utf8_argv(): attachment 162813 [details] [review]

Yes, I agree that those two can go in as is.  I'll ping Matthias to get them in.
Comment 97 Hib Eris 2013-02-06 07:42:30 UTC
(In reply to comment #96)
> (In reply to comment #95)
> > (In reply to comment #94)
> > > Review of attachment 162814 [details] [review] [details] [details]:
> > > 
> > > ::: glib/goption.c
> > > @@ +2080,3 @@
> > > +#ifndef G_OS_WIN32
> > > +static
> > > +void getmainargs(int *argc,
> > > 
> > > This function is a horrible hack.  We can't put this in glib!
> > 
> > I agree with you that that patch is a hack. It is only a proof of concept for
> > creating an uniform API that works on both Windows and Unix.
> > 
> > But I think we can better postpone discussing an uniform API and first discuss
> > adding the functionality for glib that can allow application writers to use
> > GOption on Windows, i.e. g_option_context_parse_utf8(): attachment 162812 [details] [review] [details] and
> > g_win32_get_utf8_argv(): attachment 162813 [details] [review] [details]
> 
> Yes, I agree that those two can go in as is.  I'll ping Matthias to get them
> in.

Okay, I will update my patches first to apply to current master.
Comment 98 Hib Eris 2013-02-06 09:59:20 UTC
Created attachment 235296 [details] [review]
0001-Add-g_option_context_parse_utf8.patch

g_option_context_parse_utf8(), updated for current master (2.35.7).
Comment 99 Hib Eris 2013-02-06 10:00:23 UTC
Created attachment 235297 [details] [review]
0002-Add-g_win32_get_utf8_argv.patch

g_win32_get_utf8_argv(), updated for current master (2.35.7)
Comment 100 Hib Eris 2013-03-08 09:10:10 UTC
Created attachment 238363 [details] [review]
0001-Add-g_option_context_parse_utf8.patch

Updated patch to make the diff smaller for easier review.
Comment 101 Hib Eris 2013-03-08 09:13:00 UTC
Created attachment 238364 [details] [review]
0002-Reorder-code-to-avoid-forward-declaration.patch

This code reorders the code to avoid a forward declaration. It might be merged with patch 238363, but is separated to make patch review easier.
Comment 102 Hib Eris 2013-03-08 09:13:50 UTC
Created attachment 238365 [details] [review]
0003-Add-g_win32_get_utf8_argv.patch
Comment 103 Chris Mohler 2013-07-15 20:05:37 UTC
*** Bug 704252 has been marked as a duplicate of this bug. ***
Comment 104 Allison Karlitskaya (desrt) 2014-01-22 22:59:37 UTC
I'm sorry that I didn't find this bug earlier -- I was just pointed to it now.  This has hopefully been fixed now with the introduction of g_win32_get_command_line() and g_option_context_parse_strv() (both of which deal in UTF-8).

*** This bug has been marked as a duplicate of bug 722025 ***
Comment 105 Murray Cumming 2014-01-29 10:32:48 UTC
If g_option_context_parse_strv() handle string encodings somehow differently than g_option_context_parse(), shouldn't their documentation mention that?
https://developer.gnome.org/glib/unstable/glib-Commandline-option-parser.html#g-
Comment 106 Murray Cumming 2014-01-29 11:13:33 UTC
Maybe the g_option_context_parse_strv() documentation should just mention that it's what you'd need to use if you need to use g_win32_get_command_line(), which you need to use on windows.

By the way, I'd rather have a do_the_right_thing_depending_on_what_platform_this_is() function, rather than having to ifdef uses of different functions, but I guess you had good reasons not to do that.
Comment 107 Allison Karlitskaya (desrt) 2014-01-29 13:50:51 UTC
(In reply to comment #105)
> If g_option_context_parse_strv() handle string encodings somehow differently
> than g_option_context_parse(), shouldn't their documentation mention that?
> https://developer.gnome.org/glib/unstable/glib-Commandline-option-parser.html#g-

It is mentioned in the docs, but the docs are not updated on the page yet:

https://git.gnome.org/browse/glib/tree/glib/goption.c#n2546

(In reply to comment #106)
> Maybe the g_option_context_parse_strv() documentation should just mention that
> it's what you'd need to use if you need to use g_win32_get_command_line(),
> which you need to use on windows.

Also already mentioned, including a code sample: https://git.gnome.org/browse/glib/tree/glib/goption.c#n134

> By the way, I'd rather have a
> do_the_right_thing_depending_on_what_platform_this_is() function, rather than
> having to ifdef uses of different functions, but I guess you had good reasons
> not to do that.

It's difficult to do this because on UNIX we have the command line arguments coming via argv, with no other way to get them.  On Win32, we want to completely ignore argv, but we have another mechanism.

Alex proposed a function that would take argv and pass it straight through on UNIX, and completely ignore it on Windows.  This seemed a bit too gross to me, which is why we didn't do it.

If you use GApplication then everything will be handled automatically.
Comment 108 Murray Cumming 2014-01-30 12:22:10 UTC
Yes, but, sorry, I still find this documentation on the function vague:
"
 * On Windows, the strings are expected to be in UTF-8.  This is in
 * contrast to g_option_context_parse() which expects them to be in the
 * system codepage, which is how they are passed as @argv to main().
 * See g_win32_get_command_line() for a solution.
"

That seems like an explanation of the implementation and a therefore a roundabout way of saying "On Windows, you'll need to use this, instead of g_option_context_parse(), and you'll need to use g_win32_get_command_line() to get the arguments first."

Could we add something like that before the existing paragraph that I've quoted above?