GNOME Bugzilla – Bug 788180
G_FILE_ATTRIBUTE_ID_FILE is useless on W32
Last modified: 2017-11-07 14:39:41 UTC
Obtaining the G_FILE_ATTRIBUTE_ID_FILE file information does not work correctly on W32. It returns some small number in st_dev (which might be valid after all), but st_ino is always 0. This can be fixed by using GetFileInformationByHandle() internally, as it allows W32 file and volume IDs to be obtained (by luck, file ID is 64-bit unsigned integer, and volume ID is a 32-bit unsigned integer - just what we need). I even had a patch to do this. Problem is, the code in glocalfileinfo.c looks into st_dev a lot. So any changes i make tend to crawl all over the file. The other problem is that the code calls g_stat() and g_lstat() a lot, and subsequently uses the st_dev from the statbuf that it gets. This happens not only in glocalfileinfo.c, but also elsewhere. So if i go ahead with this patch in glocalfileinfo.c only, the user will get different values from g_stat() and from g_file_info. The logical thing to do here is to augment g_stat() and g_lstat() to call GetFileInformationByHandle() internally, and use it to supplement the statbuf that it fills, but i distinctly remember that some glib developers do not like enhancements like this and prefer to just wrap C runtime functions, warts and all, and reflect their problems in documentation. So...what should we do?
Ondrej, Chun-wei, what do you think? If we are sure that g_stat() and g_lstat() always return useless information in st_dev and st_ino on Windows, it seems likely that nobody is using that information on Windows, and hence I would be fine with augmenting g_stat() to also call GetFileInformationByHandle(). However, if either of those fields do sometimes return useful information, we need to be much more careful about changing the semantics of g_stat().
st_dev is meaningful, st_ino is not. Best i can do in this case is to continue to rely on st_dev that stat() reports (it's a small integer; it seems that MS CRT just converts the drive letter to a number, i.e. A == 0, B == 1, C == 2, and returns that; consequently, it probably won't work on filesystems that are), and set st_ino from the results of GetFileInformationByHandle(). MSDN specifically documents that st_ino is meaningless, so it should be safe to change.
...that are without a drive letter (though i can't tell from the top of my head how to access these, and i'm not sure that CRT stat() works on them at all).
Okay, looks like i not only left a sentence unfinished, but also didn't think this through. st_ino is 32-bit, so we can't set it to a 64-bit fileindex (well, we *could* maintain a hashmap of all fake_st_ino <-> real_fileindex...yeah, not happening). In other news: tested CRT stat() and GetFileInformationByHandle() with NT paths (such as "\\\\.\\HarddiskVolume10\\Windows\\Install.log"). Amazingly enough, they work, and stat() even figures out the right st_dev value. So, a correction is in order: we can continue to rely on st_dev. g_stat() will continue to return zero st_ino, nothing we can do. g_file_info() *can* obtain information by handle internally, and use fileindex as a unique identifier for the file. As long as users do not expect that st_ino and ID_FILE are the same, we'll be good (very few places in glib use st_ino, i should be able to plug all these). (For that matter, if users don't expect that st_dev and ID_FILESYSTEM are the same, then all my panic over presenting a inconsistent view of the system to the users is baseless - no one's going to notice; only glib will have to be more careful internally, to use the "right" st_dev). For the record, dwVolumeSerialNumber seems to refer to a partition. Thus, if you have two partitions on the same disk, dwVolumeSerialNumber will be different (i tested this). This seems sufficiently well-mapped to st_dev numbers (which also work on a partition basis). Related: Would it be ok to introduce a glib-internal stat-function that does, under the hood, call W32 API (and not the CRT stat()), and use that in glib where needed? In two variants - the one that accepts utf-16 paths, and the one that accepts utf-8 paths (does the conversion and calls the former). It would fill a special internal struct that is kind of like stat struct, but with 64-bit time, filesize and file ID. It can also fill a normal stat struct (where possible; or just call stat() internally, as an extra...haven't decided yet). Would really make my life simpler, as i'd just need to use a small #ifndef G_OS_WIN32 \ g_stat() \ #else \ g_win32_kind_of_stat_but_better() #endif instead of copying code all other the place.
Created attachment 360653 [details] [review] W32: Bump target NT version to 0x600 (Vista or newer) Also remove now-unnecessary if_nametoindex() implementation (the HAVE_IF_NAMETOINDEX configure check didn't work correctly, it turned out), which prevents glib from building. This patch is required for glib to be able to freely use newer APIs, without LoadLibrary() and GetProcAddress(). Other than that, it does not change existing code that already does LoadLibrary() and GetProcAddress(), nor does it remove #ifdefs that check for _WIN32_WINNT or WINVER at compile time.
Created attachment 360654 [details] [review] W32: Add a stat() implementation for private use This commit adds new W32-only functions to gstdio.c, and a new header file, gstdioprivate.h. These functions are: g_win32_stat_utf8() g_win32_lstat_utf8() g_win32_fstat() and they fill a private structure, GWin32PrivateStat, which has all the fields that normal stat has, as well as some extras. These functions are then used throughout glib and gio to get better data about the system. Specifically: * Full, 64-bit size, guaranteed (g_stat() is forced to use 32-bit st_size) * Full, 64-bit file identifier (st_ino is 0 when normal stat() is used, and still is) * W32 File attributes (which stat() doesn't report); in particular, this allows symlinks to be correctly identified * Full, 64-bit time, guaranteed (g_stat() uses 32-bit st_*time on 32-bit Windows) * Allocated file size (as a W32 replacement for the missing st_blocks) st_mode remains unchanged (thus, no S_ISLNK), so when these are given back to glib users (via g_stat(), for example, which is now implemented by calling g_win32_stat_utf8), this field does not contain anything unexpected. g_lstat() now calls g_win32_lstat_utf8(), which works on symlinks the way it's supposed to. Also adds the g_win32_readlink_utf8() function, which behaves like readlink() (including its inability to return 0-terminated strings and inability to say how large the output buffer should be; these limitations are purely for compatibility with existing glib code). Thus, symlink support should now be much better, although far from being complete. Apart from nitpicking and other usual concerns, this requires testing on MSVC. Specifically, gcc didn't complain about assigning 64-bit integers to 32-bit variables (is it even correct? Should we check for overflow and then just cap at 0xFFFFFFFF instead?) in g_stat() and g_lstat(). That sounds suspicious to me. Also, i'm not sure that '__stat64' is the type that MSVC understands. There may be some header difficulties as well. I had to graft one DDK type into the source code, as MinGW-w64 can't have SDK and DDK included in the code simultaneously (it's possible to work around this by carefully picking sub-headers of Windows.h instead of Windows.h itself, but that won't work in glib, which includes Windows.h without any reservations).
Review of attachment 360653 [details] [review]: Hi, I think I want to say "yay!" at this :) --in the Meson and MSVC builds, we went even further to bump _WIN32_WINNT to 0x0601 (Windows 7), after discussing with Nirbheek and Tim-Philip. Let's wait and see whether there are any objections to this bump (which I think there shouldn't be, but still, even though I think this is the best time for this kind of bump, early in a dev. cycle). We can do other cleanups for removing Windows 5.x support in another bug. With blessings, thank you!
Yes, i asked about this on the IRC, and was pointed to your recent changes to drop support for older versions of MSVC and make Windows 7 the oldest supported OS. Anyway, i don't care which version is it going to be, as long as it's newer than XP.
Review of attachment 360654 [details] [review]: Hi LRN, Compilation-wise, the code mostly builds even on Visual Studio 2008 (!), just one small part in that regard in glib/gfileutils.c... (In reply to LRN from comment #6) > Apart from nitpicking and other usual concerns, this requires testing on > MSVC. > Specifically, gcc didn't complain about assigning 64-bit integers to 32-bit > variables > (is it even correct? Should we check for overflow and then just cap at > 0xFFFFFFFF instead?) I build the code with Visual Studio, apart from the usual C4267 that shows up here and there on x64 builds (there possibly could be C4244 warnings that show up, but since we silence them already by default, I think they and C4267 shouldn't matter much, as they are quite normal), I think it is okay. I did not get any C4311/C4312 warnings when I used the /Wp64 option, as those can and should raise concern. As this is something that I am not that familiar with, is there a test case that demonstrates the changes here? With blessings, thank you! ::: glib/gfileutils.c @@ +2053,3 @@ gchar *buffer; size_t size; ssize_t read_size; We need to change ssize_t to gssize, because Visual Studio does not support ssize_t directly (not even in VS2017). Perhaps we should use gsize for the size_t above as well, to make things a little more consistent, although size_t is supported in Visual Studio for a good while.
Created attachment 361108 [details] [review] W32: Add a stat() implementation for private use v2 v2: * Previous version didn't set the allocated_size info (the file-measurement code inside glib knew enough to check the allocated_size field of the stat structure, but the generic info-retrieving function didn't). This is now fixed. * A testcase is added to gio to highlight some of the improvements: * 64-bit time * unique file IDs * allocated size Symlinks are not tested, as we need to create symlink for this, which might be a bit more difficult than it should be. MS made some moves to make symlink creation a non-privileged operation, but i haven't checked that out yet. Besides, the functions for creating symlinks are still not implemented in W32 glib, so we'd have to use system() or something like that...Problematic. * Added an extra error mapping (these should be expanded, by the way, and the mapping function might possibly be enhanced by giving it more context to allow it to map errors better, as the mapping is rather ambiguous otherwise). * Put GetLastError () result into a separate variable in some places (this is mostly for debugging, prevents gcc from discarding the erorr code too quickly for gdb to see it). * Changed ssize_t to gssize as requested All other uses of ssize_t seem to be in non-W32 sections of the source. Left the size_t alone for now. If that needs to change, it should be in a separate patch.
Review of attachment 360653 [details] [review]: The if_nametoindex() change looks OK to me, although please mention in the commit message that it’s now safe to remove because we depend on Vista (which is when it was introduced). Link to https://msdn.microsoft.com/en-us/library/windows/desktop/bb408409(v=vs.85).aspx (assuming that’s the right reference?). ::: configure.ac @@ +145,3 @@ esac + AC_DEFINE([_WIN32_WINNT], [0x0600], [Target the Windows Vista API]) Please go further and bump this to 0x0601. It seems like a bad idea to have the autoconf and Meson build systems define this differently. They should be in sync.
Review of attachment 361108 [details] [review]: I’ve taken a brief look, and the approach looks OK to me. I need to do a more detailed review. ::: gio/glocalfileinfo.h @@ +43,2 @@ #ifdef G_OS_WIN32 +/* We want 64-bit file size, file ID and symlink support support */ s/support support/support/ ::: gio/thumbnail-verify.h @@ +22,3 @@ #include <glib.h> #include <gstdio.h> +#include <gstdioprivate.h> Why is this #include needed here? ::: glib/Makefile.am @@ +162,3 @@ gslist.c \ gstdio.c \ + gstdioprivate.h \ This will need corresponding modifications to meson.build. ::: glib/gfileutils.c @@ +2053,3 @@ gchar *buffer; size_t size; + gssize read_size; I think it would be best to split the s/ssize_t/gssize/ and s/size_t/gsize/ changes out into a separate commit, and just run that as a regex across the whole of GIO for consistency.
(In reply to Philip Withnall from comment #12) > Review of attachment 361108 [details] [review] [review]: > > > ::: glib/Makefile.am > @@ +162,3 @@ > gslist.c \ > gstdio.c \ > + gstdioprivate.h \ > > This will need corresponding modifications to meson.build. I can make the modifications easily enough, but someone will need to test them (i'm currently not using meson to build glib, until that mess with gtk-doc is sorted out). Also, it doesn't look like meson knows anything about private headers (gasyncqueueprivate.h, for example, isn't mentioned), so i wouldn't even know where to put this. OTOH, from my experience, meson has no problem with people putting headers into the list of source files, so i can probably do just that.
Created attachment 361317 [details] [review] W32: Bump target NT version to 0x601 (7 or newer) Changed the commit message, changed 0x600 (Vista) to 0x601 (7), added more explanations about if_nametoindex().
Created attachment 361318 [details] [review] W32: Add a stat() implementation for private use v3 v3: * Removed unneeded include from gio/thumbnail-verify.h * Fixed a typo * Split the ssize_t -> gssize change into a separate patch
Created attachment 361319 [details] [review] Replace all instances of ssize_t with gssize ssize_t is supported widely, but not universally, so use gssize instead. Currently only one piece of code actually *needs* this change to be compilable with MSVC, the rest are mostly in *nix parts of the code, but these are changed too, for symmetry.
Review of attachment 361317 [details] [review]: Thanks.
Review of attachment 361319 [details] [review]: Lovely, thanks. For symmetry, it would be nice if we could have a separate patch for s/size_t/gsize/ too (even though it’s not required for building anything, since size_t is supported by MSVC).
Comment on attachment 361317 [details] [review] W32: Bump target NT version to 0x601 (7 or newer) Attachment 361317 [details] pushed as 6abdc06 - W32: Bump target NT version to 0x601 (7 or newer)
Created attachment 361323 [details] [review] Replace all instances of ssize_t with gssize v2 v2: * Rebased against git master
Review of attachment 361323 [details] [review]: Thanks for the rebase.
Review of attachment 361318 [details] [review]: Can you add some more tests please? I’d like the behaviour parallels between Unix and Win32 encoded in some tests; and that will also reduce the chance of regressions on the Windows side in future: • g_lstat() being called on symlinks • g_win32_readlink_utf8() (the existing readlink() test in file-test.c is disabled on Windows — maybe enable it?) • Looking at the values in the statbuf from g_stat() after being called on various files/directories/symlinks ::: gio/glocalfile.c @@ +1581,1 @@ res = readlink (link, symlink_value, sizeof (symlink_value) - 1); Should you be calling g_win32_readlink_utf8() here on Windows? @@ +2673,3 @@ return g_local_file_measure_size_error (state->flags, errsv, name, error); } +#elif defined (G_OS_WIN32) I think this preprocessor conditional can stay as just #else. It evaluates as: !defined (AT_FDCWD) && !defined (HAVE_LSTAT) && defined (G_OS_WIN32) which seems appropriate. If you think it’s unclear you can add a comment like: #else /* !AT_FDCWD && !HAVE_LSTAT && G_OS_WIN32 */ I want to avoid the conditionals being changed in future and ending up with no code being emitted for this block. ::: gio/glocalfileinfo.c @@ +140,3 @@ { + guint64 ino; +#ifndef G_OS_WIN32 I think it would be slightly clearer to use #ifdef G_OS_WIN32 (i.e. invert the conditional and its blocks). @@ +975,3 @@ +#elif defined (G_OS_WIN32) + _g_file_info_set_attribute_uint64_by_id (info, G_FILE_ATTRIBUTE_ID_STANDARD_ALLOCATED_SIZE, + statbuf->allocated_size); Nitpick: Looks like a slight indentation problem here. @@ +2162,3 @@ + is_symlink = (res == 0 && S_ISLNK (statbuf.st_mode)); +#else + /* TODO: implement lchmod for W32, should be doable */ Are you planning to do this as a follow-up patch? What’s the impact of not doing it now? ::: gio/tests/file.c @@ +1053,3 @@ +{ + const char *p0 = "zool"; + const char *p1 = "looz"; Probably should use temporary files in the system temp dir for this, otherwise the test will run into problems when run in a read-only source directory (for example; there are other failure modes). @@ +1069,3 @@ + g_remove (p1); + + /* Wa-a-ay past 02/07/2106 @ 6:28am (UTC) */ Probably worth mentioning in the comment that 2106 is the unsigned form of the year 2038 problem. @@ +1078,3 @@ + st.wMilliseconds = 0; + + if (!SystemTimeToFileTime (&st, &ft)) Since this is a test, it’s a little simpler to just do: g_assert_true (SystemTimeToFileTime (&st, &ft)); @@ +1083,3 @@ + f = g_fopen (p0, "w"); + if (f == NULL) + g_error ("failed, couldn't create file %s\n", p0); Similarly: f = g_fopen (…); g_assert_nonnull (f); @@ +1087,3 @@ + h = (HANDLE) _get_osfhandle (fileno (f)); + if (h == INVALID_HANDLE_VALUE) + g_error ("failed, couldn't get handle for opened file %s\n", p0); g_assert (h != INVALID_HANDLE_VALUE); And so on, through the rest of the tests. @@ +1146,3 @@ + * (could be 0 or the size of the FS cluster, but never 1). + */ + g_assert (size_p0 == statbuf_p0.st_size); g_assert_cmpuint (size_p0, ==, statbuf_p0.st_size); etc. @@ +1163,3 @@ + /* Check that GFileInfo doesn't suffer from Y2106 problem. + * Don't check stat(), as its contents may vary depending on + * host platform bitness and it *could* pass this test in ‘bitness’? ::: glib/gstdio.c @@ +57,3 @@ + +/* We can't include Windows DDK and Windows SDK simultaneously, + * so let's copy this here from MinGW-w64 DDK What license is that under? From what I can tell, some of it’s public domain and some of it is LGPL, so it should be fine to copy into GLib; but I’d like a source link to be sure. It would be good to include the link to the upstream sources in this comment anyway just for reference. @@ +121,3 @@ + +static int +_g_win32_stat_utf16_no_trailing_slashes (gunichar2 *filename, `const gunichar2 *filename`? @@ +196,3 @@ + */ +#define SANE_LIMIT 1024 * 10 + DWORD filename_target_len = 1024; This could probably start off a little smaller, say 256B. Typically paths are 100–200 characters long, ×2 for the fact this is UTF-16. @@ +202,3 @@ + while (filename_target_len < SANE_LIMIT) + { + new_len = GetFinalPathNameByHandleW (file_handle, filename_target, filename_target_len, 0); It would be clearer to pass a symbolic value as the fourth argument here. I think it should be FILE_NAME_NORMALIZED, right? Also, you should pass in (filename_target_len - 1) as the third argument, since the docs for GetFinalPathNameByHandleW() says it doesn’t include the nul terminator. @@ +208,3 @@ + + filename_target_len *= 2; + filename_target = g_realloc (filename_target, (filename_target_len) * sizeof (wchar_t)); Actually, the documentation for GetFinalPathNameByHandleW() says that it returns the size of the filename (including the nul terminator) if the input buffer is too small, so you can avoid this exponential buffer growth approach and just reallocate to new_len bytes (after comparing it to SANE_LIMIT for safety). @@ +221,3 @@ + g_clear_pointer (&filename_target, g_free); + CloseHandle (file_handle); + file_handle = INVALID_HANDLE_VALUE; Should we not bail out if this fails? It looks like the code proceeds and does the stat() of the wrong file if we end up with (filename_target == NULL). @@ +254,3 @@ + } + + if (fd < 0) If we end up in the state (file_handle != INVALID_HANDLE_VALUE && fd >= 0), file_handle will never be closed. @@ +316,3 @@ + wchar_t *wfilename; + int result; + int len; len should be size_t @@ +381,3 @@ + } + + h = CreateFileW (filename, Some comments explaining what’s going on here would be useful. @@ +413,3 @@ + + if (to_copy > buf_size * sizeof (wchar_t)) + to_copy = buf_size * sizeof (wchar_t); This multiplication could overflow if (buf_size > G_MAXSIZE / 2). Probably wise to include a bounds check on buf_size at the start of the function. Similarly there are probably other potential overflow problems elsewhere in the patch which should technically be handled. @@ +419,3 @@ + to_copy); + } + else if (rep_buf->ReparseTag == IO_REPARSE_TAG_MOUNT_POINT) Indentation problem. If-statements should be formatted as: if (blah) { /* blah */ } else { /* blah blah */ } @@ +451,3 @@ + return result; + + if (result > 3 * sizeof (wchar_t) && memcmp (buf, L"\\??\\", 3 * sizeof (wchar_t)) == 0) This definitely deserves a comment explaining what’s going on. @@ +491,3 @@ + } + + memcpy (buf, tmp, tmp_len); The documentation for g_utf16_to_utf8() says that the returned items_written (in this case, tmp_len) does *not* include the trailing nul terminator: so I think you also need to do buf[tmp_len] = '\0'; Does the return value for readlink() need to include the nul terminator or not? `man 2 readlink` is not clear.
(In reply to Philip Withnall from comment #22) > Review of attachment 361318 [details] [review] [review]: > > • g_lstat() being called on symlinks Any tests on symlinks inevitably require creation of symlinks. Which will probably widen the scope of this patch, as i currently have not implemented the creation of symlinks (and creating them might run into problems due to permissions - symlinking is a highly-privileged operation by default, AFAIK). > • Looking at the values in the statbuf from g_stat() after being called on > various files/directories/symlinks Again, files/directories are easy, symlinks are not. > > ::: gio/glocalfile.c > @@ +1581,1 @@ > res = readlink (link, symlink_value, sizeof (symlink_value) - 1); > > Should you be calling g_win32_readlink_utf8() here on Windows? expand_symlink() is under #ifndef G_OS_WIN32. The call paths for it are the following: expand_symlink() get_parent() expand_all_symlinks() try_make_relative() g_local_file_trash() (non-W32 variant) expand_symlink() get_parent() find_mountpoint_for() (non-W32 variant) expand_symlink() get_parent() _g_local_file_find_topdir_for() g_local_file_trash() (non-W32 variant) I considered removing the ifdefs and making this work on W32, but eventually decided against that, as the things it does are poorly-compatible with Windows (Windows has no lost+found, and the logic for moving files into trash are completely different (knowing the location of the trash directory is not required)). > ::: gio/glocalfileinfo.c > @@ +2162,3 @@ > + is_symlink = (res == 0 && S_ISLNK (statbuf.st_mode)); > +#else > + /* TODO: implement lchmod for W32, should be doable */ > > Are you planning to do this as a follow-up patch? What’s the impact of not > doing it now? Ah, no, i don't have any concrete plans on doing this. I'm not even sure how useful chmod() / lchmod() is on Windows, as it doesn't even seem to be working with access control lists (the actual W32 permission system). Some comments elsewhere indicate that S_IWRITE and S_IREAD only control the read-only attribute of a file. > > ::: gio/tests/file.c > @@ +1146,3 @@ > + * (could be 0 or the size of the FS cluster, but never 1). > + */ > + g_assert (size_p0 == statbuf_p0.st_size); > > g_assert_cmpuint (size_p0, ==, statbuf_p0.st_size); Would it work correctly on uint64 instead of uint? That's the only reason i haven't used g_assert_cmpuint(). > @@ +1163,3 @@ > + /* Check that GFileInfo doesn't suffer from Y2106 problem. > + * Don't check stat(), as its contents may vary depending on > + * host platform bitness and it *could* pass this test in > > ‘bitness’? "bitness" is a real world. Or should be, if it isn't already. > > ::: glib/gstdio.c > @@ +57,3 @@ > + > +/* We can't include Windows DDK and Windows SDK simultaneously, > + * so let's copy this here from MinGW-w64 DDK > > What license is that under? From what I can tell, some of it’s public domain > and some of it is LGPL, so it should be fine to copy into GLib; but I’d like > a source link to be sure. > > It would be good to include the link to the upstream sources in this comment > anyway just for reference. I suspect that the *real* upstream for this definition is: https://msdn.microsoft.com/en-us/library/ff552012(v=vs.85).aspx and anything on MSDN (unless it bears a separate license, like some of the downloadable code samples) is considered fair game. > @@ +202,3 @@ > + while (filename_target_len < SANE_LIMIT) > + { > + new_len = GetFinalPathNameByHandleW (file_handle, > filename_target, filename_target_len, 0); > > It would be clearer to pass a symbolic value as the fourth argument here. I > think it should be FILE_NAME_NORMALIZED, right? Okay > > Also, you should pass in (filename_target_len - 1) as the third argument, > since the docs for GetFinalPathNameByHandleW() says it doesn’t include the > nul terminator. MSDN is being rather unclear, as usual. My reading of the documentation is that the output string *does* have a NUL-terminator, but returned *length* of the value does *not*, unless the buffer is too small to hold the result (*including* the NUL-terminator), in which case the returned length *does* count the NUL-terminator. So we always tell the API the real size of the buffer. If it returns a value less than filename_target_len - 1, then everything fits. If it returns a value greater than filename_target_len - 1, then it clearly didn't fit. If it returns filename_target_len - 1, it probably fits, but i decided to play it safe and pretend that it doesn't, to account for any weird corner cases, such as the one mentioned in the documentation. > > @@ +208,3 @@ > + > + filename_target_len *= 2; > + filename_target = g_realloc (filename_target, > (filename_target_len) * sizeof (wchar_t)); > > Actually, the documentation for GetFinalPathNameByHandleW() says that it > returns the size of the filename (including the nul terminator) if the input > buffer is too small, so you can avoid this exponential buffer growth > approach and just reallocate to new_len bytes (after comparing it to > SANE_LIMIT for safety). Yeah, now that you've mentioned it... > > @@ +221,3 @@ > + g_clear_pointer (&filename_target, g_free); > + CloseHandle (file_handle); > + file_handle = INVALID_HANDLE_VALUE; > > Should we not bail out if this fails? It looks like the code proceeds and > does the stat() of the wrong file if we end up with (filename_target == > NULL). It also sets file_handle to INVALID_HANDLE_VALUE, which will later trigger the file_handle == INVALID_HANDLE_VALUE || !gfibh_result condition and indeed cause a bailout. > > @@ +254,3 @@ > + } > + > + if (fd < 0) > > If we end up in the state (file_handle != INVALID_HANDLE_VALUE && fd >= 0), > file_handle will never be closed. This is intentional. If fd >= 0, we get file_handle by calling _get_osfhandle(). The handle obtained that way is not a duplicate and must not be closed with CloseHandle() (it will be closed by close() somewhere up the stack). > @@ +381,3 @@ > + } > + > + h = CreateFileW (filename, > > Some comments explaining what’s going on here would be useful. Yeah, and i should probably add an attributes & FILE_ATTRIBUTE_REPARSE_POINT check earlier, to bail out on files that are not reparse points (if they aren't, it's useless to try to readlink() on them). > @@ +419,3 @@ > + to_copy); > + } > + else if (rep_buf->ReparseTag == IO_REPARSE_TAG_MOUNT_POINT) > > Indentation problem. If-statements should be formatted as: > > if (blah) > { > /* blah */ > } > else > { > /* blah blah */ > } > That isn't if-else, that's if-elseif (without a trailing "else" part). The "else" part here is implied to be a "do nothing" statement (unrecognized reparse tag -> don't read anything, to_copy remains 0, and that's what we return). > @@ +491,3 @@ > + } > + > + memcpy (buf, tmp, tmp_len); > > The documentation for g_utf16_to_utf8() says that the returned items_written > (in this case, tmp_len) does *not* include the trailing nul terminator: so I > think you also need to do > buf[tmp_len] = '\0'; > > Does the return value for readlink() need to include the nul terminator or > not? `man 2 readlink` is not clear. https://linux.die.net/man/2/readlink says that it doesn't NUL-terminate, and the example code deals with that fact explicitly, by putting a 0 at the end manually (just as glib code calling readlink() does). I've tried to reproduce that behaviour.
(In reply to LRN from comment #23) > (In reply to Philip Withnall from comment #22) > > Review of attachment 361318 [details] [review] [review] [review]: > > > > • g_lstat() being called on symlinks > Any tests on symlinks inevitably require creation of symlinks. Which will > probably widen the scope of this patch, as i currently have not implemented > the creation of symlinks (and creating them might run into problems due to > permissions - symlinking is a highly-privileged operation by default, AFAIK). Sure, defer adding symlink tests until later then. > > • Looking at the values in the statbuf from g_stat() after being called on > > various files/directories/symlinks > Again, files/directories are easy, symlinks are not. Would be good to get the tests in place for files/directories now, then it will be easy to extend them for symlinks later. > > ::: gio/glocalfile.c > > @@ +1581,1 @@ > > res = readlink (link, symlink_value, sizeof (symlink_value) - 1); > > > > Should you be calling g_win32_readlink_utf8() here on Windows? > > expand_symlink() is under #ifndef G_OS_WIN32. Ah, I missed that. Probably makes sense to split the change to expand_symlink() out into a separate patch, then, since it’s just removing dead code and is unrelated to the rest of the work. > > ::: gio/glocalfileinfo.c > > @@ +2162,3 @@ > > + is_symlink = (res == 0 && S_ISLNK (statbuf.st_mode)); > > +#else > > + /* TODO: implement lchmod for W32, should be doable */ > > > > Are you planning to do this as a follow-up patch? What’s the impact of not > > doing it now? > Ah, no, i don't have any concrete plans on doing this. I'm not even sure how > useful chmod() / lchmod() is on Windows, as it doesn't even seem to be > working with access control lists (the actual W32 permission system). Some > comments elsewhere indicate that S_IWRITE and S_IREAD only control the > read-only attribute of a file. Change you s/TODO/FIXME/ then please? I treat ‘TODO’ as ‘needs to be done before this patch can be merged’ and ‘FIXME’ as ‘should be fixed at some point in the future’. > > ::: gio/tests/file.c > > @@ +1146,3 @@ > > + * (could be 0 or the size of the FS cluster, but never 1). > > + */ > > + g_assert (size_p0 == statbuf_p0.st_size); > > > > g_assert_cmpuint (size_p0, ==, statbuf_p0.st_size); > > Would it work correctly on uint64 instead of uint? That's the only reason i > haven't used g_assert_cmpuint(). Yup. > > @@ +1163,3 @@ > > + /* Check that GFileInfo doesn't suffer from Y2106 problem. > > + * Don't check stat(), as its contents may vary depending on > > + * host platform bitness and it *could* pass this test in > > > > ‘bitness’? > > "bitness" is a real world. Or should be, if it isn't already. Unfortunately not a word I’ve come across. Do you mean endianness, signedness of basic types, or something else? Would be good to use a more widely understood term than ‘bitness’ in the comment. > > ::: glib/gstdio.c > > @@ +57,3 @@ > > + > > +/* We can't include Windows DDK and Windows SDK simultaneously, > > + * so let's copy this here from MinGW-w64 DDK > > > > What license is that under? From what I can tell, some of it’s public domain > > and some of it is LGPL, so it should be fine to copy into GLib; but I’d like > > a source link to be sure. > > > > It would be good to include the link to the upstream sources in this comment > > anyway just for reference. > > I suspect that the *real* upstream for this definition is: > https://msdn.microsoft.com/en-us/library/ff552012(v=vs.85).aspx > and anything on MSDN (unless it bears a separate license, like some of the > downloadable code samples) is considered fair game. Right. Please say so in that comment. > > Also, you should pass in (filename_target_len - 1) as the third argument, > > since the docs for GetFinalPathNameByHandleW() says it doesn’t include the > > nul terminator. > > MSDN is being rather unclear, as usual. My reading of the documentation is > that the output string *does* have a NUL-terminator, but returned *length* > of the value does *not*, unless the buffer is too small to hold the result > (*including* the NUL-terminator), in which case the returned length *does* > count the NUL-terminator. If the documentation is not entirely clear, please test the behaviour of the function in a small test program, and verify that we’re understanding it correctly. I really don’t want to have buffer overflows in this code. > > @@ +221,3 @@ > > + g_clear_pointer (&filename_target, g_free); > > + CloseHandle (file_handle); > > + file_handle = INVALID_HANDLE_VALUE; > > > > Should we not bail out if this fails? It looks like the code proceeds and > > does the stat() of the wrong file if we end up with (filename_target == > > NULL). > > It also sets file_handle to INVALID_HANDLE_VALUE, which will later trigger > the file_handle == INVALID_HANDLE_VALUE || !gfibh_result condition and > indeed cause a bailout. Ah, indeed. If there’s a way of simplifying this control flow (like bailing out at each unrecoverable error, rather than checking for that error throughout the rest of the code until the end of the function is reached and the error can be reported), that would be good. > > @@ +254,3 @@ > > + } > > + > > + if (fd < 0) > > > > If we end up in the state (file_handle != INVALID_HANDLE_VALUE && fd >= 0), > > file_handle will never be closed. > > This is intentional. If fd >= 0, we get file_handle by calling > _get_osfhandle(). The handle obtained that way is not a duplicate and must > not be closed with CloseHandle() (it will be closed by close() somewhere up > the stack). That definitely needs to be mentioned in a code comment. > > @@ +419,3 @@ > > + to_copy); > > + } > > + else if (rep_buf->ReparseTag == IO_REPARSE_TAG_MOUNT_POINT) > > > > Indentation problem. If-statements should be formatted as: > > > > if (blah) > > { > > /* blah */ > > } > > else > > { > > /* blah blah */ > > } > > > > That isn't if-else, that's if-elseif (without a trailing "else" part). The > "else" part here is implied to be a "do nothing" statement (unrecognized > reparse tag -> don't read anything, to_copy remains 0, and that's what we > return). It was meant as an example of the indentation of blocks, not as an example of the code flow. The braces for each block should be indented 2 spaces further than the if-/else-if-/else-statement. In this particular case: if (rep_buf->ReparseTag == IO_REPARSE_TAG_SYMLINK) { … } else if (rep_buf->ReparseTag == IO_REPARSE_TAG_MOUNT_POINT) { … } > > @@ +491,3 @@ > > + } > > + > > + memcpy (buf, tmp, tmp_len); > > > > The documentation for g_utf16_to_utf8() says that the returned items_written > > (in this case, tmp_len) does *not* include the trailing nul terminator: so I > > think you also need to do > > buf[tmp_len] = '\0'; > > > > Does the return value for readlink() need to include the nul terminator or > > not? `man 2 readlink` is not clear. > > https://linux.die.net/man/2/readlink says that it doesn't NUL-terminate, and > the example code deals with that fact explicitly, by putting a 0 at the end > manually (just as glib code calling readlink() does). I've tried to > reproduce that behaviour. Right. I still think you need to explicitly add a nul terminator to buf after the memcpy() call.
Created attachment 361595 [details] [review] W32: Add a stat() implementation for private use v4 v4: * Change #elif defined (G_OS_WIN32) to a plain #else * Reverse #ifdef condition in _g_local_file_info_create_file_id() * Fix indentation in the code that sets st_blocks and allocated_size * Replace a chmod-related TODO with a FIXME * Add a reference to MSDN near _REPARSE_DATA_BUFFER definition * Change _g_win32_stat_utf16_no_trailing_slashes() to take a const string * Rename gfibh_result to succeeded_so_far (as it is not used just by GetFileInformationByHandle anymore) * Move the target name resolution code a bit further down the function * Rework the control flow to do multiple error checks with a bailout throughout the function instead of deferring the error check until the end * Instead of calling GetFinalPathNameByHandleW() in a loop, call it twice: * First time with a 0-sized buffer to get the needed buffer size * Second time with a newly-allocated buffer to get the filename The buffer is slightly overallocated, and the number of returned bytes is checked against that slightly-larger-than-needed value. If the number of bytes approaches that value, the function bails out, thus ensuring its immunity to off-by-one errors. * Document the reason why file_handle is not closed when fd >= 0 * Slightly fix _g_win32_stat_utf8() indentation * Use size_t for strlen() return value * Check that the buffer given to _g_win32_readlink_utf16_raw() is not longer than G_MAXSIZE (when counted in bytes, not gunichar2s) * More of the "put the error code into a variable before feeding it to w32_error_to_errno()" kind of changes * Check that the file is a reparse point before trying to read irs reparse point-related information. If it isn't, bail out early. * Simplify the arguments to CreateFileW(), since the file being opened is always a reparse point * Fix indentation of the reparse tag check blocks * Document _g_win32_readlink_utf16() a bit better. Specifically, comment on the \??\ prefix stripping. * NUL-terminate the string returned by g_win32_readlink_utf8() * Move the new test from file.c to g-file-info.c (turns out, file.c is a unix-only testcase, no wonder it didn't pass!) * Expand the testcase to do checks on a sparse file (to verify that the new internal API works with >4GB files correctly, and that g_stat() does not) and on two directories, one of which is a symlink to the other (to work around the fact that glib currently can't create symlinks, we use pre-existing symlinks that Windows has). * Rebase on top of glib git master Bugfixes: * Add a W32-specific #elif to set file_type to G_FILE_TYPE_SYMBOLIC_LINK (existing non-W32 code looks at S_ISLNK, which isn't available; as a result, files were never given the symlink type, even though glib was aware that they were reparse points) Side-note: only symlinks are counted as symlinks. Junction points are not, even though they mostly behave as symlinks. This can be fixed at any time by expanding the reparse tag check, if that is what glib users want. * Strip the \\?\ prefix from the names returned by GetFinalPathNameByHandleW(), since _wstat64() doesn't understand it. * Remove a stray ';' after DeviceIoControl() call that made it always "fail" * Fix the \??\ prefix stripping (the prefix length is 4, not 3) * Fix the argument given to g_utf16_to_utf8 (it takes the length in gunichar2s, not in bytes) * Add an extra check to ensure that g_win32_readlink_utf8() doesn't overflow the buffer that it is given
Review of attachment 361595 [details] [review]: Looks good apart from 3 nitpicks and a couple of questions. ::: gio/tests/g-file-info.c @@ +460,3 @@ + * (could be 0 or the size of the FS cluster, but never 1). + */ + g_assert_cmpuint (size_p0, ==, statbuf_p0.st_size); Is statbuf_p0 valid at this point? The previous usage of it asserted that g_stat(p0) failed. ::: glib/gstdio.c @@ +235,3 @@ + + /* Just in case, give it a real memory location instead of NULL */ + new_len = GetFinalPathNameByHandleW (file_handle, (wchar_t *) &filename_target_len, 0, 0); Were you going to use FILE_NAME_NORMALIZED here? @@ +271,3 @@ +#define EXTENDED_PREFIX L"\\\\?\\" +#define EXTENDED_PREFIX_LEN wcslen (EXTENDED_PREFIX) +#define EXTENDED_PREFIX_LEN_BYTES (sizeof (wchar_t) * EXTENDED_PREFIX_LEN) Instead of defining these as #defines with the preprocessor, you could just define them as variables in the C code, right? That would make the job of static analysis tools a little easier, and reduce the NUMBER OF SHOUTY VARIABLE NAMES in the code. :-) @@ +354,3 @@ + wchar_t *wfilename; + int result; + size_t len; gsize for consistency with the rest of GLib please. @@ +518,3 @@ +#define NTOBJM_PREFIX L"\\??\\" +#define NTOBJM_PREFIX_LEN_UNICHAR2 (wcslen (NTOBJM_PREFIX)) +#define NTOBJM_PREFIX_LEN_BYTES (NTOBJM_PREFIX_LEN_UNICHAR2 * sizeof (gunichar2)) Similarly here, these could be normal C variables rather than SHOUTY PREPROCESSOR VARIABLES.
(In reply to Philip Withnall from comment #26) > Review of attachment 361595 [details] [review] [review]: > > Looks good apart from 3 nitpicks and a couple of questions. > > ::: gio/tests/g-file-info.c > @@ +460,3 @@ > + * (could be 0 or the size of the FS cluster, but never 1). > + */ > + g_assert_cmpuint (size_p0, ==, statbuf_p0.st_size); > > Is statbuf_p0 valid at this point? The previous usage of it asserted that > g_stat(p0) failed. "g_assert_false (g_stat (p0, &statbuf_p0));" means " assert that g_stat() returned 0". Which is what g_stat() returns on success. I could use g_assert_cmpint (..., ==, 0), i guess...
(In reply to LRN from comment #27) > (In reply to Philip Withnall from comment #26) > > Review of attachment 361595 [details] [review] [review] [review]: > > > > Looks good apart from 3 nitpicks and a couple of questions. > > > > ::: gio/tests/g-file-info.c > > @@ +460,3 @@ > > + * (could be 0 or the size of the FS cluster, but never 1). > > + */ > > + g_assert_cmpuint (size_p0, ==, statbuf_p0.st_size); > > > > Is statbuf_p0 valid at this point? The previous usage of it asserted that > > g_stat(p0) failed. > > "g_assert_false (g_stat (p0, &statbuf_p0));" means " assert that g_stat() > returned 0". Which is what g_stat() returns on success. I could use > g_assert_cmpint (..., ==, 0), i guess... Uff, of course. I think g_assert_cmpint (…, ==, 0) would indeed be clearer.
Created attachment 362744 [details] [review] W32: Add a stat() implementation for private use v5 v5: * Use g_assert_cmpuint () to check the return value of g_stat() * Use local variables instead of macros for prefix checkers * Use gsize instead of size_t for variables that hold string length
Review of attachment 362744 [details] [review]: Not quite. ::: glib/gstdio.c @@ +256,3 @@ + wchar_t *extended_prefix = L"\\\\?\\"; + gsize extended_prefix_len = wcslen (extended_prefix); + gsize extended_prefix_len_bytes = sizeof (wchar_t) * extended_prefix_len; These can all be `const`. @@ +504,3 @@ + wchar_t *ntobjm_prefix = L"\\??\\"; + gsize ntobjm_prefix_len_unichar2 = wcslen (ntobjm_prefix); + gsize ntobjm_prefix_len_bytes = sizeof (gunichar2) * ntobjm_prefix_len_unichar2; These can all be `const`.
Created attachment 362745 [details] [review] W32: Add a stat() implementation for private use v6 v6: * Use const variables for prefix checking
Review of attachment 362745 [details] [review]: OK, let’s go with this. Thanks.
Comment on attachment 362745 [details] [review] W32: Add a stat() implementation for private use v6 Thanks for your work on this. Is there anything else to do on this bug?
No, i think this is it... G_FILE_ATTRIBUTE_ID_FILE is not useless on W32 anymore (as the new testcase in the testsuite now proves), therefore the bug is officially fixed.
This is causing some compilation problems on VC8, as reported on the mailing list (https://mail.gnome.org/archives/gtk-devel-list/2017-November/msg00000.html) > 1) Firstly (in gstdio.c) the function:- '_g_win32_readlink_utf16_raw()' > uses a value called "IO_REPARSE_TAG_SYMLINK". This doesn't seem to be > defined for MSVC - at least not in my version (VC8). > > 2) gstdio.c uses some other identifiers too - notably, > 'FILE_STANDARD_INFO' and 'FILE_NAME_NORMALIZED'. These seem to be > declared in winbase.h which should (in theory) be getting #included by > windows.h. However, MSVC reports them both as undeclared identifiers > (even if I specifically #include winbase.h !!) LRN, can you take a look at these please?
(In reply to Philip Withnall from comment #35) > This is causing some compilation problems on VC8, as reported on the mailing > list > (https://mail.gnome.org/archives/gtk-devel-list/2017-November/msg00000.html) VC8? You mean Visual Studio 2005? https://blogs.msdn.microsoft.com/joshpoley/2009/12/15/the-many-faces-of-visual-c/ I don't think we should care about a 10+ years old toolchain, considering we're now requiring Windows 7+.
It's not really a bother to just do > #ifndef FILE_STANDARD_INFO > #define FILE_STANDARD_INFO ... > #endif and such. AFAIU, the policy is to "not to go out of your way to make glib compatible with tools and/or OSes that are too old", but a few ifdefs should be okay, i think. That said, the fact that FILE_STANDARD_INFO and FILE_NAME_NORMALIZED *are* defined in winbase.h, but MSVC doesn't see them, should be investigated. There might indeed be some macro that needs defining. I wouldn't know, since i don't use MSVC.
(In reply to LRN from comment #37) > It's not really a bother to just do > > #ifndef FILE_STANDARD_INFO > > #define FILE_STANDARD_INFO ... > > #endif > > and such. AFAIU, the policy is to "not to go out of your way to make glib > compatible with tools and/or OSes that are too old", but a few ifdefs should > be okay, i think. I would be happy with something like that. > That said, the fact that FILE_STANDARD_INFO and FILE_NAME_NORMALIZED *are* > defined in winbase.h, but MSVC doesn't see them, should be investigated. > There might indeed be some macro that needs defining. I wouldn't know, since > i don't use MSVC. John, can you provide any more information here?
Hi John, This issue seems like you will really need to use an later SDK, with _WIN32_WINNT defined to 0x0601 (or even 0x0600) or later, or you will need to define them yourself if updating your SDK is not an option, as these constants/macros are indeed defined in the later (Vista+, meaning the 7.0/7.1 SDKs or later that I mentioned in another bug previously) SDKs. With blessings, and cheers!
(In reply to Philip Withnall from comment #38) > > > That said, the fact that FILE_STANDARD_INFO and FILE_NAME_NORMALIZED *are* > > defined in winbase.h, but MSVC doesn't see them, should be investigated. > > There might indeed be some macro that needs defining. I wouldn't know, since > > i don't use MSVC. > > John, can you provide any more information here? Hi Philip, After a bit more work this morning I've realised I made a mistake yesterday. I found a copy of 'winbase.h' which did contain those symbols - but it turned out I'd been looking at the version for TDM-GCC. My MSVC version doesn't in fact contain them. So I guess I'll have to accept that VC8's days as a build tool are gone now... :-( It's not a big problem. For the moment, I'll just stick with the glib-2-54 branch.
Huh, it seems that you've reported these errors as they came to you, without investigating. My guess is, if we try to patch the code to include definitions for FILE_STANDARD_INFO (which is a struct) and FILE_NAME_NORMALIZED (which is just a macro), you'll still end up with link-time failures, because functions GetFinalPathNameByHandleW() and GetFileInformationByHandleEx() will be unavailable in your import libraries. My google-fu says that VC8 is MSVS 2005, which was released in 2005. Vista (which is what you need to get the aforementioned functions) was released in 2006, so there is indeed no way you can build glib-master with that SDK.
Hi John, Again, if you still need to use Visual Studio 2005, perhaps you might want to take the time to see whether your builds (and the programs that you build against) GLib continue to work after you install the Windows SDK 7.0 (Server 2008 SDK) or 7.1 (Windows 7 SDK), which were officially supported configurations (usable by Visual Studio 2005) by Microsoft, given that you have concerns using these SDKs against Visual Studio 2005, and set _WIN32_WINNT to be 0x0600 [Vista] (or like what is now done upstream, 0x0601 [Windows 7]). It is quite likely that XP-compatibility stuff in the code gets removed in the near future. By using the newer SDK and setting _WIN32_WINNT appropriately, your builds should continue to build and link. This also means if your program needs to support XP, GLib-2.56.0-to-be and later will no longer be options, as you might follow in this bug report, as a result of setting _WIN32_WINNT to be 0x0601. The thing is, none of us are running Visual Studio 2005 (so we can't really support it and the Windows/Platform SDK it bundles), and we really want to focus our support on Windows 7 and later, since XP (and Vista) are no longer supported by Microsoft for a while (and we don't have XP around to test on, either). Hope this makes the situation a bit clearer to you. With blessings, and cheers!
(In reply to LRN from comment #41) > > if we try to patch the code to include definitions for FILE_STANDARD_INFO > (which is a struct) and FILE_NAME_NORMALIZED (which is just a macro), > you'll still end up with link-time failures, because functions > GetFinalPathNameByHandleW() and GetFileInformationByHandleEx() will be > unavailable in your import libraries. > Hi Fan and LRN... I was aware that XP is no longer supported but I wasn't aware that VS2005 has also been dropped. That's why I flagged the problem up. If you look in 'glib/gmessages.c' you'll see that GetFileInformationByHandleEx() was in fact being supported already (via a function pointer). That makes it both compile and link, even with VS2005. I just wondered if maybe something similar needed to get done for the new code in gstdio.c It's not a problem if VS2005 is unsupported in master - I'll just revert to using the glib-2-54 branch.
(In reply to John Emmas from comment #43) > It's not a problem if VS2005 is unsupported in master - I'll just revert to > using the glib-2-54 branch. OK, if that works for you, I’m happy to not be including more decade-of-backwards-compatibility workarounds in GLib. Do you have a plan for when to upgrade from glib-2-54?
Hi Philip. I'm maintaining a project that I originally wrote using VS2005 (although I'm not involved in it much these days). We all accept that VS2005 support must come to an end eventually. For more recent stuff I've tended to use VS2013 / VS2015 etc.