After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 111848 - function to canonicalize file names
function to canonicalize file names
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
unspecified
Other All
: Normal enhancement
: ---
Assigned To: gtkdev
gtkdev
: 152751 322822 394577 615899 629171 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2003-04-29 16:43 UTC by Havoc Pennington
Modified: 2018-06-08 17:33 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
port of canonicalize_file_name to glib (5.70 KB, text/plain)
2003-04-29 16:43 UTC, Havoc Pennington
  Details
Implement g_canonicalize_filename (6.58 KB, patch)
2010-03-16 18:37 UTC, Christian Dywan
none Details | Review
Implement g_canonicalize_filename #2 (8.69 KB, patch)
2010-03-30 15:15 UTC, Christian Dywan
none Details | Review
updated patch of comment #17, using flags instead of boolean (13.22 KB, patch)
2011-03-11 10:09 UTC, Paolo Bonzini
none Details | Review
fileutils: Add g_canonicalize_filename (6.73 KB, patch)
2018-04-27 16:09 UTC, Georges Basile Stavracas Neto
none Details | Review
glocalfile: Use g_canonicalize_filename (3.74 KB, patch)
2018-04-27 16:09 UTC, Georges Basile Stavracas Neto
none Details | Review
timezone: Correctly resolve symlink from /etc/localtime (1.37 KB, patch)
2018-04-27 16:09 UTC, Georges Basile Stavracas Neto
none Details | Review
timezone: Correctly resolve symlink from /etc/localtime (1.37 KB, patch)
2018-04-27 21:29 UTC, Georges Basile Stavracas Neto
none Details | Review
fileutils: Add g_canonicalize_filename (7.60 KB, patch)
2018-04-27 21:31 UTC, Georges Basile Stavracas Neto
none Details | Review
glocalfile: Use g_canonicalize_filename (3.74 KB, patch)
2018-04-27 21:31 UTC, Georges Basile Stavracas Neto
committed Details | Review
timezone: Correctly resolve symlink from /etc/localtime (1.37 KB, patch)
2018-04-27 21:32 UTC, Georges Basile Stavracas Neto
committed Details | Review
fileutils: Add g_canonicalize_filename (7.58 KB, patch)
2018-04-28 12:24 UTC, Georges Basile Stavracas Neto
none Details | Review
fileutils: Add g_canonicalize_filename (8.53 KB, patch)
2018-04-30 19:06 UTC, Georges Basile Stavracas Neto
none Details | Review
fileutils: Add g_canonicalize_filename (8.60 KB, patch)
2018-04-30 19:14 UTC, Georges Basile Stavracas Neto
committed Details | Review

Description Havoc Pennington 2003-04-29 16:43:13 UTC
glibc contains canonicalize_file_name() and realpath()

The trivial port to glib is attached. there may be issues of win32, etc.
to consider if it was added to glib.
Comment 1 Havoc Pennington 2003-04-29 16:43:53 UTC
Created attachment 16110 [details]
port of canonicalize_file_name to glib
Comment 2 Owen Taylor 2003-05-23 21:43:12 UTC
Thoughts:

 - I think textual canonicalization is frequently what
   you want instead of following symlinks. Symlinks in
   Unix are more frequently used for "make strange
   file system setup logical" rather than for shortcuts.

   If I go to /web/gtk on www.gtk.org, I don't want
   to see that as /gimp00/web-gtk/. And in bash  
   cd /web/gtk. cd .. gives you /web not /gimp00.

   Perhaps a flags field with FOLLOW_SYMLINKS?

   (I wrote a textual canonicalizer for GtkFileSystemUnix
   recently)

 - Windows does need to be dealt with. Some of it
   is just changing '/' to G_DIR_SEPARATOR, but there
   are other things as well:

   - Conditionalizing the symlink stuff
   - Dealing with drive letters at the start of the string
   - Not sure what the situation is with maximum path
     lengths under Windows. It probably doesn't matter
     since the result can't get longer without symlinks.
   - Adding any other "windows appropriate canonicalization"
Comment 3 Havoc Pennington 2003-05-30 04:39:12 UTC
What I'm using the symlink canonicalization for is a cache 
of file contents (indexed by canonical name because 
I don't want to deal with having two cache entries for the 
same file, e.g. if changing that file; though of course hard links
are also a problem).
Comment 4 Havoc Pennington 2003-05-30 04:41:47 UTC
In using this, it turned out to be useful to be able to canonicalize the 
filename even if the basename doesn't exist. 
i.e. if you have /foo/bar/baz still canonicalize that even with 
baz not existing.

Perhaps all this is a bit specific to my weird code. ;-)

Of course textual canonicalization is independent of file existence.
Comment 5 Matthias Clasen 2004-10-05 19:53:36 UTC
*** Bug 152751 has been marked as a duplicate of this bug. ***
Comment 6 Morten Welinder 2004-10-05 21:34:27 UTC
For the record, this came up in libgsf where in order to get atomic updates
we save to a temporary and a rename on top of the original.  That requires
expansion of the final path component because if it is a symlink it might
cross disks.
Comment 7 Tristan Van Berkom 2006-01-21 17:19:26 UTC
Hmmm,
    I just wrote glade_util_canonical_path() in glade3, I could easily
supply the code if desired (I'd just have refit the coding style)...

My implementation basicly does: 
    g_chdir ( g_path_get_dirname (path) );
    g_get_current_dir ();
    g_chdir ( original dir);
    return g_build_filename (dest_dir, basename (maybe null), NULL);

I decided it was a PITA to do any kind of algorythmic regexp
on the path since I'd have to handle path representations on
different platforms; and since libc's implementation of realpath
does the same "pushd; pwd; popd" thing; I thought it was an
acceptable approach.

I guess ideally this should work with unavailable paths and
provide an option to perform readlink() on systems bearing
symlinks...

Comment 8 Tristan Van Berkom 2006-01-21 17:29:34 UTC
*** Bug 322822 has been marked as a duplicate of this bug. ***
Comment 9 Christian Dywan 2010-03-16 18:37:20 UTC
Created attachment 156296 [details] [review]
Implement g_canonicalize_filename

I used the above patch and updated it. I removed the internal g_realpath, I removed the buffer argument, I removed the use of PATH_MAX in favour of pathconfig, I partly removed the requirement for the path to exist and I replaced / with G_DIR_SEPARATOR.

Left to do is support for resolving non-existing relative paths and handling C:\ and SERVER:\\ on Windows.
Comment 10 Morten Welinder 2010-03-16 19:08:34 UTC
If I read the code right, this actually changes "//foo" to "/foo".
Those two refer to different files.  (But "/foo//bar" is "/foo/bar".)
Comment 11 Christian Dywan 2010-03-16 19:46:45 UTC
(In reply to comment #10)
> If I read the code right, this actually changes "//foo" to "/foo".
> Those two refer to different files.  (But "/foo//bar" is "/foo/bar".)

The forward slash is no valid character in a unix filename and at best "//foo" is invalid if you don't strip the additional slash.
Comment 12 Morten Welinder 2010-03-16 19:58:42 UTC
welinder@sequoia:~> cd /tmp
welinder@sequoia:/tmp> cd //tmp
welinder@sequoia://tmp> cd ///tmp
welinder@sequoia:/tmp> 

Three or more initial slashes is the same as one.
Two is special:

http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap04.html#tag_04_11
Comment 13 Christian Dywan 2010-03-16 21:03:03 UTC
"A pathname that begins with two successive slashes may be interpreted in an implementation-defined manner"

I'm not really sure that clarifies why //tmp shouldn't be resolved to /tmp. For what I want, dash and mksh resolve //tmp to /tmp, unlike bash like in your example.
Comment 14 Christian Dywan 2010-03-30 15:15:49 UTC
Created attachment 157503 [details] [review]
Implement g_canonicalize_filename #2

This update adds support for DOS and UNC paths, ie C:\Foo, C:Foo and \\HOST\Share. The resulting filename uses the native dir separator and \ for UNC paths. Separators except the first of an absolute path can be / or \. Additional unit tests cover that.
Right now the code is system agnostic. We could #ifdef G_OS_WIN32 or leave it.

Left to do: the resolve_links argument.
Comment 15 Tor Lillqvist 2010-03-30 16:22:00 UTC
To be consistent with other file name related GLib API, it presumably should take the name in the "GLib file name encoding" which is UTF-8 on Windows, and in that case it is wrong to concatenate what getcwd() fills in which is in "system codepage" with a relative path in UTF-8.

(And if it really would be intended to on Windows take the argument in the "system codepage", then it would be wrong to step through the file name one char at a time, as the East Asian codepages are double-byte ones, i.e. most characters take two bytes, and the trailing byte of some of these characters is a slash or backslash!)

For Windows, one probably often would wants this function to convert 8.3 names to long names if they exist. The second parameter could be a flag mask that directs whether to do that, or maybe the other way, in addition to resolving symlinks. Symlinks also exist on Windows, and also something similar called junctions. Even shortcuts can be considered similar in functionality, and it might be nice to have a flag to expand shortcuts, too.

Also, on Windows and other systems with case-insensitive but case-preseving file systems (at least Mac OS X I think), maybe convert the case of characters in names of existing files to that actually stored in the file system?

In general, complications like the above probably mean it's not worth to try to combine the same code for both Windows and Unix here.
Comment 16 Christian Dywan 2010-04-16 13:10:15 UTC
*** Bug 615899 has been marked as a duplicate of this bug. ***
Comment 17 Christian Dywan 2010-04-16 13:11:00 UTC
See https://bugzilla.gnome.org/attachment.cgi?id=158853 from bug 615899.
Comment 18 Paolo Bonzini 2010-04-19 16:34:46 UTC
I agree with Tor, for Win32 we can use GetFullPathNameW as in the patch of comment 17.
Comment 19 Tor Lillqvist 2010-09-09 14:33:21 UTC
*** Bug 629171 has been marked as a duplicate of this bug. ***
Comment 20 Paolo Bonzini 2011-01-20 17:25:13 UTC
ping for patch of comment 17?
Comment 21 Tor Lillqvist 2011-01-20 17:45:22 UTC
I think we should instead of just boolean resolve_symlinks have an argument of flag bits, or which G_WHATEVER_EXPAND_SYMLINKS would be an obvious one, and I as outlined in comment #15, we could come up with more, especially for Windows.
Comment 22 Matthew Barnes 2011-03-08 15:19:05 UTC
Another use case for this is in Camel, where we use realpath() to canonicalize the file path to a local mailbox, which may involve symlinks.  Portability issues with realpath() and PATH_MAX were cited recently in bug #642783.
Comment 23 Paolo Bonzini 2011-03-11 10:09:15 UTC
Created attachment 183124 [details] [review]
updated patch of comment #17, using flags instead of boolean
Comment 24 Paolo Bonzini 2012-04-12 12:12:30 UTC
Ping?
Comment 25 Luca Bruno 2012-04-12 17:44:38 UTC
Notice that g_file_get_path () does (almost?) the same job already.
Comment 26 Bastien Nocera 2012-12-18 17:00:52 UTC
(In reply to comment #25)
> Notice that g_file_get_path () does (almost?) the same job already.

It doesn't, and if you mean g_file_resolve_relative_path(), it isn't quite it, as you still need to use g_file_read_link() to read the link destination.
Comment 27 Bastien Nocera 2012-12-18 17:02:10 UTC
*** Bug 394577 has been marked as a duplicate of this bug. ***
Comment 28 Georges Basile Stavracas Neto 2018-04-27 16:09:25 UTC
Created attachment 371460 [details] [review]
fileutils: Add g_canonicalize_filename

Getting the canonical filename is a relatively common
operation when dealing with symbolic links.

This commit exposes GLocalFile's implementation of a
filename canonicalizer function, with a few additions
to make it more useful for consumers of it.

Instead of always assuming g_get_current_dir(), the
exposed function allows passing it as an additional
parameter.

This will be used to fix the GTimeZone code to retrieve
the local timezone from a zoneinfo symlink.
Comment 29 Georges Basile Stavracas Neto 2018-04-27 16:09:41 UTC
Created attachment 371461 [details] [review]
glocalfile: Use g_canonicalize_filename

Drop the local function in favor of the exposed
one.
Comment 30 Georges Basile Stavracas Neto 2018-04-27 16:09:53 UTC
Created attachment 371462 [details] [review]
timezone: Correctly resolve symlink from /etc/localtime

The return value of g_file_read_link ("/etc/localtime") can
be a relative path in the form of "../usr/share/zoneinfo".
This breaks the prefix check that is performed, and makes
the timezone identifier be "../usr/share/zoneinfo/America/Sao_Paulo",
for example, which breaks other parts of the system.

Fix that by canonicalizing the symlink path if we detect
is it a relative path.
Comment 31 Georges Basile Stavracas Neto 2018-04-27 21:29:56 UTC
Created attachment 371474 [details] [review]
timezone: Correctly resolve symlink from /etc/localtime

Added better tests, added the new function to the doc reference and renamed a variable.
Comment 32 Georges Basile Stavracas Neto 2018-04-27 21:31:18 UTC
Created attachment 371475 [details] [review]
fileutils: Add g_canonicalize_filename

I obviously meant to upload this one.
Comment 33 Georges Basile Stavracas Neto 2018-04-27 21:31:56 UTC
Created attachment 371476 [details] [review]
glocalfile: Use g_canonicalize_filename

Reuploading to maintain Bugzilla patch order.
Comment 34 Georges Basile Stavracas Neto 2018-04-27 21:32:41 UTC
Created attachment 371477 [details] [review]
timezone: Correctly resolve symlink from /etc/localtime

Reuploading to maintain Bugzilla patch order.
Comment 35 A. Walton 2018-04-27 22:54:41 UTC
(In reply to Georges Basile Stavracas Neto from comment #32)
> Created attachment 371475 [details] [review] [review]
> fileutils: Add g_canonicalize_filename
> 
> I obviously meant to upload this one.

You probably want to respin this one without using g_auto*, since glib is intended to be portable to compilers beyond GCC/Clang.
Comment 36 Georges Basile Stavracas Neto 2018-04-28 12:24:13 UTC
Created attachment 371493 [details] [review]
fileutils: Add g_canonicalize_filename

Drop g_autofree usage.
Comment 37 Philip Withnall 2018-04-30 15:26:47 UTC
Review of attachment 371476 [details] [review]:

Looks good pending review of the other patches.
Comment 38 Philip Withnall 2018-04-30 15:30:59 UTC
Review of attachment 371477 [details] [review]:

::: glib/gtimezone.c
@@ +448,3 @@
 
+      /* Resolve relative path */
+      if (!g_path_is_absolute (resolved_identifier))

Any reason to make this conditional? What if the symlink contains a non-canonical path? (Is that possible?)
Comment 39 Philip Withnall 2018-04-30 17:06:11 UTC
Review of attachment 371493 [details] [review]:

I was going to say that I think this should have a flags argument (as with previous approaches in thus bug, for example comment #23). However, I think it would be better to handle resolving symlinks (i.e. realpath() behaviour) in a separate function:
 • Resolving symlinks can fail, so such a function would need error handling. If you just want to canonicalise a path without doing I/O, you’d have to deal with the error handling for no reason.
 • There are various options about resolving symlinks, shortcuts, and case normalisation wrt the case actually stored in the file system (see comment #15). Those deserve more thought, and make things quite a bit more complex.
 • Keeping functions which touch the file system, and functions which don’t, separate seems like a good property to have (for much the same reasoning as with the error handling, above).
 • This patch is available now, uses code which has been tested in GIO for quite a while, and doesn’t need significant changes or in-depth review.

So modulo the small changes suggested below, I think this looks good to get into GLib ASAP.

::: glib/gfileutils.c
@@ +2480,3 @@
+/**
+ * g_canonicalize_filename:
+ * @file_name: (type filename): the name of the file

The parameter is actually called `filename`.

@@ +2481,3 @@
+ * g_canonicalize_filename:
+ * @file_name: (type filename): the name of the file
+ * @relative_to: (type filename)(nullable): the relative directory

Nitpick: The coding style is generally to put a space between annotations (`@relative_to: (type filename) (nullable): …`)

Same for the `Returns` line below.

Also, change the description to ‘the relative directory, or %NULL to use the current working directory’ to explain the nullability a bit.

@@ +2484,3 @@
+ *
+ * Gets the canonical file name from @filename, using @relative_to as the
+ * relative path if necessary.

It would be good to clarify what ‘canonical’ means. i.e. All triple slashes turned into single slashes, all `..` and `.`s resolved, returned path is guaranteed to be absolute, symlinks are not followed, no file system I/O is done, etc.

@@ +2490,3 @@
+ *
+ * If @filename is an absolute path, @relative_to is ignored. Otherwise,
+ * @relative_to will be used as the relative to path.

‘used as the relative to path’ doesn’t convey much information. How about ‘prepended to @file_name to make it absolute’?

@@ +2493,3 @@
+ *
+ * Returns: (type filename)(nullable)(transfer full): a newly allocated
+ * string with the canonical file path, or %NULL

When can you end up returning `NULL`? I can only see `NULL` being returned from the g_return_val_if_fail(), which we don’t document because it’s catching a programmer error.

So I don’t think this should be NULLable. In that case, you can document the function as never failing, which is a nice property to have. You should probably also clarify that it will canonicalise paths even if they’re nonexistent.

@@ +2500,3 @@
+{
+  gchar *canon, *start, *p, *q;
+  gint i;

guint

@@ +2508,3 @@
+      if (relative_to != NULL)
+        {
+          g_return_val_if_fail (g_path_is_absolute (relative_to), NULL);

I’d move this to the top of the function as:

g_return_val_if_fail (relative_to == NULL || g_path_is_absolute (relative_to), NULL);

Allowing it to be non-absolute if @filename is absolute is a bit too relaxed.

@@ +2509,3 @@
+        {
+          g_return_val_if_fail (g_path_is_absolute (relative_to), NULL);
+          cwd = g_strdup (relative_to);

You could avoid the allocation here by having separate `cwd` and `cwd_allocated` variables.

gchar *cwd_allocated = NULL;
const gchar *cwd;

if (relative_to != NULL)
  cwd = relative_to;
else
  cwd = cwd_allocated = g_get_current_dir ();

::: tests/testglib.c
@@ +850,3 @@
+    { "/double//dash", "../../foo/bar", "/foo/bar" },
+    { "/usr/share/foo", ".././././bar", "/usr/share/bar" },
+    { "/foo/bar", "../bar/./.././bar", "/foo/bar" },

Would be good to have a test of the triple-slash and double-slash behaviour from comment #12. e.g.

{ "/etc", "///triple/slash", "/triple/slash" },
{ "/etc", "//double/slash", "//double/slash" },
{ "///triple/slash", ".", "/triple/slash" },
{ "//double/slash", ".", "//double/slash" },

@@ +921,3 @@
     g_printerr ("ok\n");
+
+  if (g_test_verbose())

Nitpick: Missing space before `(` (but this might be copypasta from elsewhere in the file; this test file is pretty crufty). Same below.

(Don’t worry about cleaning anything else in the file up. I need to go through and port it all to GTest at some point.)
Comment 40 Georges Basile Stavracas Neto 2018-04-30 19:06:01 UTC
Created attachment 371560 [details] [review]
fileutils: Add g_canonicalize_filename

Applied review, thanks Philip.
Comment 41 Georges Basile Stavracas Neto 2018-04-30 19:07:04 UTC
Comment on attachment 371560 [details] [review]
fileutils: Add g_canonicalize_filename

The patch is missing one change.
Comment 42 Georges Basile Stavracas Neto 2018-04-30 19:14:18 UTC
Created attachment 371561 [details] [review]
fileutils: Add g_canonicalize_filename

This is the correct patch, with more tests, the documentation improvements and et cetera.
Comment 43 Philip Withnall 2018-04-30 20:27:06 UTC
Review of attachment 371561 [details] [review]:

One more comment to address to make sure we get full coverage of the documented behaviour of the function, then this is good to commit. Thanks :-)

::: tests/testglib.c
@@ +856,3 @@
+    { "/etc", "//double/slash", "//double/slash" },
+    { "///triple/slash", ".", "/triple/slash" },
+    { "//double/slash", ".", "//double/slash" },

Could do with a test where @relative_path is %NULL, to verify that the g_get_current_dir() control path is working correctly. Probably best to do as a separate test (rather than putting it in canonicalize_filename_checks) so you can compare the output against g_get_current_dir().
Comment 44 Philip Withnall 2018-04-30 20:35:14 UTC
Review of attachment 371561 [details] [review]:

I’ll fix these issues and push.

::: tests/testglib.c
@@ +937,3 @@
+  for (i = 0; i < n_canonicalize_filename_checks; i++)
+    {
+      g_autofree gchar *canonical_path = g_canonicalize_filename (canonicalize_filename_checks[i].relative_path,

And the g_autofree has to go from here. :-(
Comment 45 Philip Withnall 2018-04-30 21:04:57 UTC
(In reply to Philip Withnall from comment #38)
> Review of attachment 371477 [details] [review] [review]:
> 
> ::: glib/gtimezone.c
> @@ +448,3 @@
>  
> +      /* Resolve relative path */
> +      if (!g_path_is_absolute (resolved_identifier))
> 
> Any reason to make this conditional? What if the symlink contains a
> non-canonical path? (Is that possible?)

We discussed this in private messages and the only reason to have the conditional in there was to avoid an allocation; I’ve removed the conditional.
Comment 46 Philip Withnall 2018-04-30 21:06:17 UTC
All pushed to master, with some tweaks as per my previous comments.

Attachment 371476 [details] pushed as 0f37af7 - glocalfile: Use g_canonicalize_filename
Attachment 371477 [details] pushed as cb32614 - timezone: Correctly resolve symlink from /etc/localtime
Attachment 371561 [details] pushed as b9b642d - fileutils: Add g_canonicalize_filename
Comment 47 Georges Basile Stavracas Neto 2018-04-30 21:36:06 UTC
And a day-late happy birthday, bug 111848!

Sad to say goodbye o/
Comment 48 Jam Risser 2018-06-08 17:33:17 UTC
Can support for ~ be added? Most operating systems resolve it to the home directory.