Bug 701560 – various improvements for g_file_set_contents()

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 701560 - various improvements for g_file_set_contents()


Summary:	various improvements for g_file_set_contents()


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	general
Version:	unspecified
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Duplicates:	702394 (view as bug list)
Depends on:
Blocks:

Reported:	2013-06-03 19:45 UTC by Allison Karlitskaya (desrt)
Modified:	2013-06-21 21:03 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
g_file_set_contents(): use posix_fallocate() (1.75 KB, patch) 2013-06-03 19:45 UTC, Allison Karlitskaya (desrt)	committed	Details \| Review
g_file_set_contents(): don't allocate display name (2.70 KB, patch) 2013-06-03 20:18 UTC, Allison Karlitskaya (desrt)	reviewed	Details \| Review
g_file_set_contents(): use unistd instead of stdio (3.78 KB, patch) 2013-06-03 20:29 UTC, Allison Karlitskaya (desrt)	reviewed	Details \| Review
g_file_set_contents(): don't fsync on ext3/4 (1.56 KB, patch) 2013-06-03 20:42 UTC, Allison Karlitskaya (desrt)	committed	Details \| Review
g_file_set_contents(): don't allocate display name (5.74 KB, patch) 2013-06-03 21:09 UTC, Allison Karlitskaya (desrt)	committed	Details \| Review
g_file_set_contents(): use unistd instead of stdio (3.32 KB, patch) 2013-06-03 21:18 UTC, Allison Karlitskaya (desrt)	committed	Details \| Review
g_file_set_contents: change {posix_ => }fallocate (1.73 KB, patch) 2013-06-03 21:36 UTC, Allison Karlitskaya (desrt)	committed	Details \| Review
simple test program to just generate many files with g_file_set_contents() and verify them (979 bytes, text/plain) 2013-06-20 17:11 UTC, Colin Walters		Details
0001-Revert-g_file_set_contents-don-t-fsync-on-ext3-4.patch (1.45 KB, patch) 2013-06-20 17:32 UTC, Colin Walters	committed	Details \| Review

Description Allison Karlitskaya (desrt) 2013-06-03 19:45:39 UTC

See patch.

Comment 1 Allison Karlitskaya (desrt) 2013-06-03 19:45:42 UTC

Created attachment 245955 [details] [review]
g_file_set_contents(): use posix_fallocate()

Extents-based filesystems like knowing in advance how much data will be
written to a file in order to prevent fragmentation.  If we have it, use
posix_fallocate() before writing data in g_file_set_contents().

Comment 2 Allison Karlitskaya (desrt) 2013-06-03 19:51:33 UTC

In general, this function could maybe use to be cleaned up a little.  There are two sore spots:

  a) we do a displayname conversion on the hope that there might be an error
     (since all error messages use it) and free it at the end

  b) for some reason we use fdopen() and fwrite() instead of just write()

ie: two unnecessary memory allocations.

Comment 3 Colin Walters 2013-06-03 20:04:57 UTC

Review of attachment 245955 [details] [review]:

Looks fine, one minor comment.

::: glib/gfileutils.c
@@ +1065,3 @@
+       * on the underlying filesystem.
+       */
+      posix_fallocate (fd, 0, length);

I like casting to (void) if we're ignoring the error, since it placates static analysis tools.

Comment 4 Allison Karlitskaya (desrt) 2013-06-03 20:09:51 UTC

Also: from my research online it seem that ext3 and ext4 (after Linux 2.6.30) support omitting the fsync() before rename() with the default mount options.  It would probably be nice to drop the fsync() for those filesystem as well (since we already do it for btrfs).

The question is if we attempt to detect the 'bad' mount options and do the fsync anyway.  I'd say no, because if people set those options then they are indicating that they are interested in the extra performance (even if at the cost of living life on the edge).

Comment 5 Allison Karlitskaya (desrt) 2013-06-03 20:12:40 UTC

Comment on attachment 245955 [details] [review]
g_file_set_contents(): use posix_fallocate()

Attachment 245955 [details] pushed as d3be43f - g_file_set_contents(): use posix_fallocate()

Comment 6 Allison Karlitskaya (desrt) 2013-06-03 20:18:42 UTC

Created attachment 245958 [details] [review]
g_file_set_contents(): don't allocate display name

g_file_set_contents() sets a GError in the event of various failures
that count occur.  It uses g_filename_display_name() in order to get the
filename to include in the messages.

Make it so that we only allocate this string in the actual event of an
error instead of doing it always.

Comment 7 Colin Walters 2013-06-03 20:25:11 UTC

Review of attachment 245958 [details] [review]:

If we're doing code cleanup, this code is crying out for a helper function that sets the error from errno and handles the display name there, like I did with https://git.gnome.org/browse/glib/commit/?id=7fc2ab9493dbb480e2b6815813af9cf8bbfc081e

Comment 8 Allison Karlitskaya (desrt) 2013-06-03 20:29:20 UTC

Created attachment 245959 [details] [review]
g_file_set_contents(): use unistd instead of stdio

Use a normal write() system call instead of fdopen() and fwrite().

This will definitely work on UNIX system and should work on Windows as
well...

As an added bonus, we can use g_close() now as well.

Comment 9 Colin Walters 2013-06-03 20:34:14 UTC

Review of attachment 245959 [details] [review]:

::: glib/gfileutils.c
@@ +1051,3 @@
       errno = 0;
 
+      n_written = write (fd, contents, length);

Er...you need to loop here.  And handle EINTR.

@@ +1123,2 @@
+  if (!g_close (fd, err))
+    goto out;

This change is losing the unlink()-on-error, right?

Comment 10 Allison Karlitskaya (desrt) 2013-06-03 20:42:58 UTC

Created attachment 245960 [details] [review]
g_file_set_contents(): don't fsync on ext3/4

ext3 and ext4 (for quite some time) with default mount options don't
need fsync() to ensure safety of replace-by-rename.  Stop doing that for
these filesystems.

Note: this patch also impacts ext2, which is probably not safe, but I
don't know of any way to check ext2. vs the others because they all have
the same magic numbers (short of opening /proc/mount).

Comment 11 Allison Karlitskaya (desrt) 2013-06-03 20:44:35 UTC

(In reply to comment #9)
> Er...you need to loop here.  And handle EINTR.

For a local file?  Seriously?

> This change is losing the unlink()-on-error, right?

Good catch.  Thanks.

Comment 12 Colin Walters 2013-06-03 21:04:16 UTC

Review of attachment 245960 [details] [review]:

We write out to a tmp file even if there wasn't an already extant file though; it looks like before we could lose data on BTRFS in this case, but you're now extending it to the far more widely deployed ext4 system.

If we're going to do this we should elaborate a bit on the guarantees in the docs.  All we say now is "This write is atomic in the sense that it is first written to a temporary file which is then renamed to the final name.", which doesn't really help much.  We could either:

1) Make it explicit we defer to the filesystem semantics for rename()  (maybe this is what the existing bit is trying to say, but we should at least mention data consistency)
2) Make it explicit we fsync() always (and do that)
3) Only fsync() in the case where we aren't renaming over an existing file (at the small cost of an lstat() of the target, this is what Gio does)

You should probably back up the assertion in the first paragraph with some links.  How about http://lwn.net/Articles/322823/ ?

I don't like much that we're just tossing away potential safety without any idea of the relative performance increase and safety guarantees.  There's lots of code that calls g_file_set_contents() now...

Comment 13 Allison Karlitskaya (desrt) 2013-06-03 21:09:36 UTC

Created attachment 245962 [details] [review]
g_file_set_contents(): don't allocate display name

g_file_set_contents() sets a GError in the event of various failures
that count occur.  It uses g_filename_display_name() in order to get the
filename to include in the messages.

Factor out the error handling to make it easier to allocate the display
name only when we need it (instead of allocating it every time).

Comment 14 Allison Karlitskaya (desrt) 2013-06-03 21:18:12 UTC

Created attachment 245963 [details] [review]
g_file_set_contents(): use unistd instead of stdio

Use a normal write() system call instead of fdopen() and fwrite().

This will definitely work on UNIX system and should work on Windows as
well...

As an added bonus, we can use g_close() now as well.

Comment 15 Allison Karlitskaya (desrt) 2013-06-03 21:24:08 UTC

(In reply to comment #12)
> Review of attachment 245960 [details] [review]:
> 
> We write out to a tmp file even if there wasn't an already extant file though;
> it looks like before we could lose data on BTRFS in this case, but you're now
> extending it to the far more widely deployed ext4 system.


Check out the logic that's already here, though:

    if (g_lstat (dest_file, &statbuf) == 0 &&
	statbuf.st_size > 0 &&
	fsync (fileno (file)) != 0)
      {

ie: we have only ever called fsync() in the "dest already exists and is non-empty" case.  We are not losing any extra safety here.

It seems that you're arguing we should do the opposite: add an explicit sync for the case that the file does _not_ exist.  I can understand that this is nice from the standpoint of "by the time this function returns, your data is on disk" but consider that this is not really true anyway.  The metadata for the rename has not been written out yet.

I think the original intention here was simply that "this function should not destroy what was already there".  We can't really make any further guarantees than this without becoming (much) slower.  I think that this original intention is right, and this is exactly the guarantee that ext3/4 are providing these days, so we don't need to do more than that.

Comment 16 Colin Walters 2013-06-03 21:30:03 UTC

Review of attachment 245962 [details] [review]:

This looks a lot nicer, one minor thing:

::: glib/gfileutils.c
@@ +1013,3 @@
+format_error_message (GError      **error,
+                      const gchar  *format_string,
+                      const gchar  *filename)

Minor: Can we put the filename first since it's not actually used by the format string directly?  Otherwise the 37 neurons in my brain that fire on "mismatched format string and arguments!" keep triggering when reading the code below.

Comment 17 Colin Walters 2013-06-03 21:31:41 UTC

Review of attachment 245963 [details] [review]:

Looks right to me.

Comment 18 Allison Karlitskaya (desrt) 2013-06-03 21:36:37 UTC

Created attachment 245965 [details] [review]
g_file_set_contents: change {posix_ => }fallocate

Use fallocate() instead of posix_fallocate() so that we just fail
instead of getting the emulated version from the libc.

Comment 19 Allison Karlitskaya (desrt) 2013-06-03 21:44:38 UTC

Comment on attachment 245962 [details] [review]
g_file_set_contents(): don't allocate display name

Attachment 245962 [details] pushed as c152ceb - g_file_set_contents(): don't allocate display name

Pushed with suggested change, and also removed the now-unused variable.

Comment 20 Colin Walters 2013-06-03 21:48:14 UTC

(In reply to comment #15)

> ie: we have only ever called fsync() in the "dest already exists and is
> non-empty" case.  We are not losing any extra safety here.

I missed that; OK then.

> It seems that you're arguing we should do the opposite: add an explicit sync
> for the case that the file does _not_ exist.  I can understand that this is
> nice from the standpoint of "by the time this function returns, your data is on
> disk" but consider that this is not really true anyway.  The metadata for the
> rename has not been written out yet.

Many of my applications want transactional semantics - either the file is there with the correct data, or it's not there.  Think cache files.  So it'd be OK if at the end of g_file_set_contents() and the system crashed, the tmpfile was still in place and the new file didn't exist.  In that case the app could just recreate it (although ideally we'd clean up the tmpfile, that's https://bugzilla.gnome.org/show_bug.cgi?id=629301#c2 )

But anyways I'm fine with the current semantics then.

> 
> I think the original intention here was simply that "this function should not
> destroy what was already there".

Yeah.

Comment 21 Allison Karlitskaya (desrt) 2013-06-03 21:50:13 UTC

Comment on attachment 245963 [details] [review]
g_file_set_contents(): use unistd instead of stdio

Attachment 245963 [details] pushed as e40435e - g_file_set_contents(): use unistd instead of stdio

Comment 22 Colin Walters 2013-06-03 22:08:42 UTC

Review of attachment 245965 [details] [review]:

You could reference David's commit in your commit message or here in the bug (at the moment it was just on IRC), otherwise looks good.

Comment 23 Allison Karlitskaya (desrt) 2013-06-04 02:55:48 UTC

Comment on attachment 245965 [details] [review]
g_file_set_contents: change {posix_ => }fallocate

Attachment 245965 [details] pushed as c828aef - g_file_set_contents: change {posix_ => }fallocate

Comment 24 Allison Karlitskaya (desrt) 2013-06-04 02:57:23 UTC

(In reply to comment #22)
> Review of attachment 245965 [details] [review]:
> 
> You could reference David's commit in your commit message or here in the bug
> (at the moment it was just on IRC), otherwise looks good.

<mclasen> desrt: around ?
<mclasen> desrt: seen this ? https://git.gnome.org/browse/gnome-disk-utility/commit/?id=d96d30229782682b1aecf697ec9ba573b7ea6aca
<mclasen> I wonder how that relates to https://git.gnome.org/browse/glib/commit/?id=d3be43fcc5165b7680c9073438ad60a3652c1703

Comment 25 Allison Karlitskaya (desrt) 2013-06-04 03:00:55 UTC

About the "don't bother with fsync" thing, my biggest concern is ext2.  How much do we care?  Should we figure out a way to work around this?

I see two reasonable solutions here:

1) ignore that ext2 exists, and don't ever fsync on these filesystems

2) figure out a way to detect the difference, including the possibility of opening /proc/mounts and parsing it, looking for the filesystem that we are on


The reason that I think that option 2 is better than "always fsync" is because

 - I assume ext2 will be a very rare case indeed

 - in the common case (ext3, ext4) the cost of opening and parsing /proc/mounts
   will be far less than the cost of the fsync().



That said, I'd be fine with just skipping the fsync() for ext2 as well... It's 2013, after all...

Comment 26 Alexander Larsson 2013-06-04 13:44:54 UTC

Review of attachment 245960 [details] [review]:

I agree, and i don't think ext2 is commonly used enough for this to be a real life problem.

Comment 27 Allison Karlitskaya (desrt) 2013-06-04 13:52:23 UTC

Attachment 245960 [details] pushed as 9d0c17b - g_file_set_contents(): don't fsync on ext3/4

Comment 28 Colin Walters 2013-06-20 15:44:54 UTC

Ok, there are two issues here we need to address:


1) We're no longer calling fsync() for *new* files on ext4.  This is a very common deployment, and this can easily lead to bad corruption.  I suspect most of the people hitting dconf database issues are people who are trying new logins.

2) Whether ext4 is really doing what we expect, and for what versions.  There was AFAIK no actual testing of this patch in real world conditions.

Comment 29 Colin Walters 2013-06-20 15:48:21 UTC

Er nevermind, 1) was already discussed here: https://bugzilla.gnome.org/show_bug.cgi?id=701560#c15 

But I have my doubts about 2).

Comment 30 Colin Walters 2013-06-20 17:10:56 UTC

Ok yeah, it's trivial to reproduce 0 size files on unclean shutdown after this patch.  Which was apparently backported to the glib-2-36 branch?!?  Why?

System: Fedora 19 x86_64, Linux 3.9.4
Filesystem: ext4 (data=ordered) on LVM  (Fedora default)

Comment 31 Colin Walters 2013-06-20 17:11:26 UTC

Created attachment 247362 [details]
simple test program to just generate many files with g_file_set_contents() and verify them

Comment 32 Colin Walters 2013-06-20 17:32:06 UTC

Created attachment 247364 [details] [review]
0001-Revert-g_file_set_contents-don-t-fsync-on-ext3-4.patch

Comment 33 Colin Walters 2013-06-20 17:58:54 UTC

Sorry, not zero-length files, but files filled with NUL.  This is AIUI characteristic of data blocks not being committed to disk.

It's worth a quick peursal of:
https://www.kernel.org/doc/Documentation/filesystems/ext4.txt

The docs claim that both data=ordered(*) and auto_da_alloc(*) are defaults.  The combination of these are what theoretically allow replace-by-rename to be safe.

Anyways, this reproduces easily in both my Fedora 19 VM (ext4-on-LVM) as well as my gnome-ostree VM (Linux 3.6.0 vanilla upstream, ext4 on bare virtio block).  More data here would be useful - is this a recent regression?

Comment 34 Colin Walters 2013-06-20 20:50:37 UTC

Comment on attachment 247364 [details] [review]
0001-Revert-g_file_set_contents-don-t-fsync-on-ext3-4.patch

Ryan gave a +1 to this on IRC.

Comment 35 Allison Karlitskaya (desrt) 2013-06-20 21:41:46 UTC

I wrote an email to ext4 list about this issue asking for clarification.  This smells like a kernel bug but I'm willing to entertain the possibility that our use of fallocate() voids the guarantee in the ext4 documentation about replace-by-rename being safe...

If that's the case then I'd either prefer that they fix the kernel or that we drop our use of fallocate() rather than bring back fsync().  If the whole point of putting my data at risk is for ext4 to perform delayed allocation then using fallocate() is probably a waste of time anyway.....

Comment 36 Adam Williamson 2013-06-20 23:38:47 UTC

colin: I don't know the code history, but on the simple 'end user experience' end, it sure quacks and walks like a regression: several people have shown up in IRC or on the downstream bug report lately reporting that their dconf databases have been zapped, and I don't recall *ever* hearing of a case of that prior to this month. It's happened to me personally and to mjg59 and to a bunch of other folks.

Comment 37 Colin Walters 2013-06-21 01:28:07 UTC

See also:

https://bugzilla.gnome.org/show_bug.cgi?id=702785

for fixing up the dconf side of this.

Comment 38 Allison Karlitskaya (desrt) 2013-06-21 12:46:30 UTC

*** Bug 702394 has been marked as a duplicate of this bug. ***

Comment 39 Allison Karlitskaya (desrt) 2013-06-21 13:03:10 UTC

Ted Ts'o's reply to my email, for reference: http://www.spinics.net/lists/linux-ext4/msg38778.html

Seems like the official line from the maintainer (despite what is written in the ext4 documentation) is that there are no guarantees.

I'm trying to get him to update his docs.

Meanwhile, with this issue addressed by Colin's reverts, I guess this is closed again...