After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 679837 - Cannot create boxes if $HOME is on a filesystem without O_DIRECT support
Cannot create boxes if $HOME is on a filesystem without O_DIRECT support
Status: RESOLVED FIXED
Product: gnome-boxes
Classification: Applications
Component: general
3.15.x
Other Linux
: High major
: --
Assigned To: GNOME Boxes maintainer(s)
GNOME Boxes maintainer(s)
: 738872 745571 746662 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2012-07-13 07:38 UTC by RedHand
Modified: 2016-03-31 13:53 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Log from gnome-boxes (2.28 KB, text/x-log)
2012-07-13 07:38 UTC, RedHand
Details

Description RedHand 2012-07-13 07:38:37 UTC
Created attachment 218683 [details]
Log from gnome-boxes

When selecting iso file I'm getting "Box creation failed!".
Comment 1 Christophe Fergeau 2012-07-13 07:59:55 UTC
It's trying to use 
-drive file=/home/redhand/.local/share/gnome-boxes/images/debian-6.0.0-amd64-i386-netinst.iso,if=none,id=drive-ide0-0-0,format=qcow2,cache=none
as the disk image to install, but is failing to do so. What do you have in ~/.local/share/gnome-boxes/images? Which distro are you using/what version of boxes?
Comment 2 RedHand 2012-07-13 08:24:24 UTC
In ~/.local/share/gnome-boxes/images I have nothing. When I copied iso file there nothing has changed.

I'm using Debian sid. Boxes - Installed: 3.4.3+dfsg-1
Comment 3 Christophe Fergeau 2012-07-13 08:47:53 UTC
Do you have access to /dev/kvm?
Comment 4 RedHand 2012-07-13 08:55:18 UTC
Yes, I have.
Comment 5 Zeeshan Ali 2012-07-13 13:26:41 UTC
Seems like some incompatibility between libvirt and qemu from the log.
Comment 6 RedHand 2012-07-13 13:29:56 UTC
libvirt version: 0.9.12-3
qemu version: 1.1.0
Comment 7 Christophe Fergeau 2012-07-14 10:33:15 UTC
15:37 < danpb> zeenix: hmm, yes you have  cache=none there which means it uses   O_DIRECT
15:38 < danpb> so I bet the users's  $HOME/.local/share is on a filessytem that doesn't support O_DIRECT
15:38 < danpb> ask them what filesystem their home directory is on

So what filesystem are you using for /home? Typing 'mount' in a terminal is one way of getting this info.
Comment 8 RedHand 2012-07-15 20:00:59 UTC
Hi.
I have ext4 on my /home partition. Additionally I have ecrypt on it.
Comment 9 Christophe Fergeau 2012-07-16 07:31:23 UTC
Ok, that explains it then
http://thread.gmane.org/gmane.comp.file-systems.ecryptfs.general/220
Boxes should probably check whether the FS supports O_DIRECT before trying to use cache=none
Comment 10 RedHand 2012-07-16 07:40:34 UTC
Ok. So for now the only possibility is to rid of ecrypt?
Comment 11 Christophe Fergeau 2012-07-16 07:47:34 UTC
hmm you could try crazy things like symlinking the dirs boxes needs to a non-ecryptfs FS, or trying to take advantage of the XDG base dir spec to 'redirect' these directories elsewhere. Not sure this would work though...
Or you could patch away  disk.set_driver_cache (DomainDiskCacheType.NONE); from vm-configurator.vala, rebuild with --enable-vala passed to configure, and see if it helps. Not trivial either.
Comment 12 RedHand 2012-07-16 07:50:59 UTC
Thanks for help.
I will try. Stay tuned. ;-)
Comment 13 RedHand 2012-07-16 12:49:59 UTC
I've linked boxes dirs* to a non-ecrypt FS and it works. :-)

*
~/.libvirt and ~/.local/share/gnome-boxes - but I think that only ~/.local/share/gnome-boxes will be ok.

I think that we can close this issue for now. Maybe I'll try XDG later.
Comment 14 Christophe Fergeau 2012-07-16 13:41:58 UTC
I'd rather keep it open, this solution is just a workaround.
Comment 15 Zeeshan Ali 2012-07-28 14:54:55 UTC
(In reply to comment #9)
> Ok, that explains it then
> http://thread.gmane.org/gmane.comp.file-systems.ecryptfs.general/220
> Boxes should probably check whether the FS supports O_DIRECT before trying to
> use cache=none

We probably need to add support in libvirt's storage pool api for that first.
Comment 16 Zeeshan Ali 2013-05-08 02:29:40 UTC
Filed a bug against libvirt about this: https://bugzilla.redhat.com/show_bug.cgi?id=960793
Comment 17 Zeeshan Ali 2014-10-15 23:13:21 UTC
(In reply to comment #16)
> Filed a bug against libvirt about this:
> https://bugzilla.redhat.com/show_bug.cgi?id=960793

libvirt folks have shown no interest so I'm going to mark this as WONTFIX unless/until they do.
Comment 18 Zeeshan Ali 2014-10-20 15:18:33 UTC
*** Bug 738872 has been marked as a duplicate of this bug. ***
Comment 19 Zeeshan Ali 2014-10-20 15:43:01 UTC
For the record, we added explicit cache=none on purpose and IIRC for very good reasons:

https://bugzilla.gnome.org/show_bug.cgi?id=670777#c1
Comment 20 Zeeshan Ali 2015-03-24 13:49:40 UTC
*** Bug 746662 has been marked as a duplicate of this bug. ***
Comment 21 Chris Murphy 2015-03-24 15:16:39 UTC
This bug should be reopened.

"cache=none is almost never useful" with explanation why
https://lists.fedoraproject.org/pipermail/devel/2013-October/190354.html

Detailed explanation of the various cache options
https://github.com/libguestfs/libguestfs/commit/749e947bb0103f19feda0f29b6cbbf3cbfa350da

writeback should give better performance than none because it uses the host cache, where none uses guest cache. This is safe for Linux guests.

For Windows guests, I have no idea what its flushing behavior is, it might be that only writethrough is safe for Windows.

And while it's not a cache option, creating the qcow2 with '-o preallocation=metadata,lazy_refcounts=on,compat=1.1' offers a significant benefit.
Comment 22 Chris Murphy 2015-03-24 16:43:02 UTC
I do not have the privileges to reopen this myself.

Overall cache=writeback has the better read and write performance, using host caching both reads and writes. Writethrough uses host caching reads, but does not cache writes and is hence both safer and a lot slower. I'm not finding any info so far on Windows flush policy, but at least cache=writeback is not any less safe than cache=none and does perform better when the virtio device is used with guest drivers. If IDE (which is maybe required for older versions of Windows?) then cache=none makes sense.

cache=none also is less compatible due to its use of O_DIRECT.

http://jrs-s.net/2013/05/17/kvm-io-benchmarking/
Comment 23 Zeeshan Ali 2015-03-24 16:46:58 UTC
Re-openning on behalf of Chris.
Comment 24 Zeeshan Ali 2015-03-25 12:22:03 UTC
(In reply to Chris Murphy from comment #22)
> 
> If IDE (which is maybe
> required for older versions of Windows?) then cache=none makes sense.

Well IDE (rather than virtio) is also used if user chooses non-express installation. Is there some issue with writeback and IDE? What is the libvirt default btw? There is a "default" driver instead of specifying which option is default in the docs.

Me and Christophe tested 'no cache option' against win7 express installation and we didn't see any performance issues.
Comment 25 Chris Murphy 2015-03-25 14:45:31 UTC
(In reply to Zeeshan Ali (Khattak) from comment #24)
"Without VirtIO drivers, caching does very, very little for you at best, and kills your performance at worst.  (Hello again, horrible writethrough write performance.)  So you really want to use “cache=none” if you’re stuck on IDE."
http://jrs-s.net/2013/05/17/kvm-io-benchmarking/

That suggests the IDE default caching is writethrough; but I have no idea, and I haven't tested IDE caching. I've only tested caching with virtio. It's not directly in the write up, but it sounds like cache=writeback would be no worse than cache=none, the limiting factor is IDE which is apparently quite substantial compared to virtio.

All the documentation I've found how to install paravirt drivers for Windows using virt-manager make me want to hit my head on a wall over and over. So if Boxes has some way of helping the user install Windows with a virtio disk bus and its paravirt drivers instead of IDE, that's a huge plus. However, when I follow identical UI paths to install Fedora 22 vs Windows 8.1 Enterprise, ps -ww -F shows the resulting qemu commands for the two VM's are using different buses it's using virtio for Fedora but IDE for Windows. *shrug*

Anyway, that's sorta getting off track. The main point is cache=writeback is expected to be no worse than none. If the guest is Linux or virtio drivers for Windows are installed there is a performance gain over cache=none.
Comment 26 Zeeshan Ali 2015-03-25 15:08:20 UTC
(In reply to Chris Murphy from comment #25)
> (In reply to Zeeshan Ali (Khattak) from comment #24)
> "Without VirtIO drivers, caching does very, very little for you at best, and
> kills your performance at worst.  (Hello again, horrible writethrough write
> performance.)  So you really want to use “cache=none” if you’re stuck on
> IDE."
> http://jrs-s.net/2013/05/17/kvm-io-benchmarking/
> 
> That suggests the IDE default caching is writethrough; but I have no idea,
> and I haven't tested IDE caching. I've only tested caching with virtio. It's
> not directly in the write up, but it sounds like cache=writeback would be no
> worse than cache=none, the limiting factor is IDE which is apparently quite
> substantial compared to virtio.

Ah ok. I'll test IDE with different options too then.

> All the documentation I've found how to install paravirt drivers for Windows
> using virt-manager make me want to hit my head on a wall over and over. So
> if Boxes has some way of helping the user install Windows with a virtio disk
> bus and its paravirt drivers instead of IDE, that's a huge plus.

It does. :) Took me quite a long time to get it implemented for windows xp and 7 but its there and its implementing in libosinfo so if virt-manager would want to use that, it can.

> However,
> when I follow identical UI paths to install Fedora 22 vs Windows 8.1
> Enterprise, ps -ww -F shows the resulting qemu commands for the two VM's are
> using different buses it's using virtio for Fedora but IDE for Windows.
> *shrug*

There is good reason for that. virt-manager doesn't support express installation or anything such so it relies on what the OS supports out of the box. Fedora support virtio out of the box where as windows does not.
Comment 27 Zeeshan Ali 2015-03-26 16:24:09 UTC
So I did some tests here against win7 installer, with cache leaving to libvirt/qemu defaults (i-e not specifying it) and with cache=writeback. The installation takes around 20 mins still with writeback but with defaults (whatever that is), I see a bit of slowdown and installation takes more than 25mins.

So if there is no portability issues with writeback, I'll simply switch Boxes to use that.
Comment 28 Zeeshan Ali 2015-03-26 17:14:36 UTC
commit: d77856f21b959646aedf28c07324be2dab34d79f

    vm-configurator: Use 'writeback' disk cache option
    
    Turns out that cache=none did not help much in reducing the installation
    time of Windows 7 from hours to 20mins (like we thought) and cache=none
    requires O_DIRECT and hence makes this option very non-portable.
    
    Let's use 'writeback' cache option, which is reported to be more
    efficient than the default option.
Comment 29 Chris Murphy 2015-03-26 17:48:47 UTC
Update to original bugs prompting revisiting this one:
https://bugzilla.gnome.org/show_bug.cgi?id=746662
https://bugzilla.redhat.com/show_bug.cgi?id=1204569

I found the guest I/O blk problem with cache=none + qcow2 on Btrfs happens with Fedora 21 and 22. But does not happen with Fedora 20 even fully updated. So it's not a very recent regression, and it is not (exclusively) a kernel regression at that - possibly in libvirt or qemu, I have no idea what's responsible for or can affect cache handling.

Anyway if it's not difficult, a backport for Boxes 3.14 (Fedora 21) might help unsuspecting Windows users using Btrfs (an interesting combination) who maybe get weird behaviors rather than the prolific ones we get in Linux dmesg when this problem is triggered.
Comment 30 Kevin Wolf 2015-03-27 09:27:34 UTC
(In reply to Zeeshan Ali (Khattak) from comment #27)
> installation takes around 20 mins still with writeback but with defaults
> (whatever that is), I see a bit of slowdown and installation takes more than
> 25mins.

The default is writeback, so if you measured different numbers for both, this
probably just means that your benchmark isn't very stable.

More generally, what I think you need to understand is that statements as
differentiated as "writeback is the bestest!!1" are silly. Different cache modes
behave differently and have advantages in different respects. That's why there is
an option and not just a single hardcoded caching policy. This also means that
you can't just say "what's right for libguestfs must be right for Boxes, too", as
they are different use cases.

cache=writeback is great if you have rather shortlived VMs running on the same
image in quick succession, so that they can benefit from the warm host cache
while their own cache is still warming up. This is true for libguestfs.

cache=none is great for longer running VMs because it avoids the wasteful
duplicated caching in both the host and the guest, and it is more efficient
when you're running enough VMs or having heavy I/O load so that the data
doesn't keep sitting in the cache, but is soon evicted anyway. For live
migration with shared storage, O_DIRECT is a requirement, too. So for the big
enterprise setups, cache=none is the undoubted champion.

Boxes sits somewhere in the middle. I agree that writeback might make sense in
desktop use cases (which is why it's the default in qemu), but you need to
check for yourself what the overall optimum is for all typical Boxes use cases
(and not just the installation of one example OS).

One other thing to mention is that if we qemu developers look for top
performance (as in heavy I/O benchmarks), that's almost automatically
cache=none,aio=native, because everything else just doesn't even come close.

You might want to try aio=native (only effective with O_DIRECT, though!) as
another option. However, it's reported to help best when running directly on
block devices and not through a filesystem, so maybe it's not for Boxes.
Comment 31 Richard Jones 2015-03-27 16:08:41 UTC
[Because someone asked me about this bug off-list]

Listen to Kevin, he knows what he's talking about.  The default
for libguestfs is chosen for reasons that don't apply to general
purpose VMs:

 - our appliances are very short-lived

 - we don't need migration

 - we do want to handle images that exist on filesystems without
   O_DIRECT support

I should note that libguestfs also uses cache=unsafe, when it's
the right thing for libguestfs, but it's almost never the right
thing for general purpose VMs.

Also libguestfs has to interoperate with users using O_DIRECT, and
that requires judicious use of fsync() in libguestfs.  Using
writeback/unsafe is much more complicated than it appears to be.
Comment 32 Chris Murphy 2015-03-27 16:17:21 UTC
Interesting. So per usual no easy answer, it requires a test that adequately mimics the actual usage. I have no idea what the typical Boxes use cases are? Maybe the Windows case tends to be long lived, thus making cache=none more applicable as a default; and other Linux tend to be short lived (?) and cache=writeback is better?

I don't know if it's optimizing this is important enough for Boxes that it's worth adding UI elements; but if so, maybe a Cache pop-up or radio button to choose a policy based on usage rather than the strict term? e.g. "Short Lived VM" vs "Long Lived VM".
Comment 33 Zeeshan Ali 2015-03-28 15:34:53 UTC
Thanks for chiming in Kevin and Richard! I know that win7 installation is in no way *the* criteria for choosing the cache option. The reason I was using this as test case was that we started using cache=none as one of the measure to bring down the win7 installation down from 3 hours to 20 minutes. Now that it turns out that cache=none played no (or not significant enough) role in that and keeping in mind that the main issue here is non-portability of cache=none and libvirt not providing a way detect if a storage pool supports cache=none or not, I decided to switch to 'writeback'. 

Now that Kevin informs me that 'writeback' isn't good for long running VMs, I must ask how long are we talking here and how bad does it get compared to cache=none?

As for Boxes' use case, AFAIK users typically do have permanent VMs.
Comment 34 Chris Murphy 2015-03-28 15:58:24 UTC
Could it make sense to use one cache setting for installations (first use of the VM?) and something else for subsequent uses?
Comment 35 Richard Jones 2015-03-28 16:03:18 UTC
Can we fix libvirt so it can report available caching modes?  The
check itself is trivial - in fact we used to have to do it in
libguestfs back when we were using O_DIRECT:

https://github.com/libguestfs/libguestfs/blob/stable-1.22/src/drives.c#L645-L662

Anyway it sounds like a useful and worthwhile addition to libvirt,
and there's an obvious need for it.
Comment 36 Zeeshan Ali 2015-03-28 16:05:01 UTC
(In reply to Chris Murphy from comment #34)
> Could it make sense to use one cache setting for installations (first use of
> the VM?) and something else for subsequent uses?

Its possible but that wouldn't help with the main issue: portability. As we already observed, the difference between the two main cache options here isn't significant enough during installation anyway.
Comment 37 Zeeshan Ali 2015-03-28 16:06:19 UTC
(In reply to Richard Jones from comment #35)
> Can we fix libvirt so it can report available caching modes?  The
> check itself is trivial - in fact we used to have to do it in
> libguestfs back when we were using O_DIRECT:
> 
> https://github.com/libguestfs/libguestfs/blob/stable-1.22/src/drives.c#L645-
> L662
> 
> Anyway it sounds like a useful and worthwhile addition to libvirt,
> and there's an obvious need for it.

Yeah, please see comment#16 and comment#17.
Comment 38 Chris Murphy 2015-03-28 16:23:56 UTC
(In reply to Zeeshan Ali (Khattak) from comment #36)
> (In reply to Chris Murphy from comment #34)
> > Could it make sense to use one cache setting for installations (first use of
> > the VM?) and something else for subsequent uses?
> 
> Its possible but that wouldn't help with the main issue: portability. As we
> already observed, the difference between the two main cache options here
> isn't significant enough during installation anyway.

Got it. I was thinking of cache=unsafe for the installation, which performs better than any of them, and the unsafeness for an install would be non-critical. And doesn't have the portability concern. The last time I tested it, it was a lot faster, but 20 minutes isn't so bad (3 hours is dreadful).
Comment 39 Zeeshan Ali 2015-03-28 16:53:56 UTC
(In reply to Chris Murphy from comment #38)
> (In reply to Zeeshan Ali (Khattak) from comment #36)
> > (In reply to Chris Murphy from comment #34)
> > > Could it make sense to use one cache setting for installations (first use of
> > > the VM?) and something else for subsequent uses?
> > 
> > Its possible but that wouldn't help with the main issue: portability. As we
> > already observed, the difference between the two main cache options here
> > isn't significant enough during installation anyway.
> 
> Got it. I was thinking of cache=unsafe for the installation, which performs
> better than any of them, and the unsafeness for an install would be
> non-critical.

Why would it be non-critical? If installation ends up with a crash in the guest, that would make for a very bad user experience.
Comment 40 Chris Murphy 2015-03-28 18:54:29 UTC
If installation ends up with a crash at any point, it's an unreliable installation and should be tossed. It doesn't matter what the cache setting is, or even if it's baremetal. Get rid of it and start over.

However. I don't know if a completely successful no crash installation with a fast reboot after install maybe doesn't always result in a proper flush to qcow2 when cache=unsafe. So it may be a bad idea anyway.
Comment 41 Richard Jones 2015-03-28 20:36:08 UTC
That is pretty much the principle behind virt-builder's use of
cache=unsafe.  It's a big performance win.  However what works for
virt-builder may not work for Boxes.

virt-builder strictly controls the guest installation and customization,
for all guest types.  Boxes may not "know" when a guest is in the
installation phase, when that finishes, or what counts as a failure.

Also virt-builder has to have special fsync handling between the
switch from cache=unsafe to O_DIRECT methods as I alluded to above.
Comment 42 Zeeshan Ali 2015-03-30 14:40:45 UTC
*** Bug 745571 has been marked as a duplicate of this bug. ***
Comment 43 Alberto Ruiz 2015-06-20 14:19:25 UTC
Any updates on this bug?
Comment 44 Zeeshan Ali 2015-06-23 13:50:01 UTC
(In reply to Alberto Ruiz from comment #43)
> Any updates on this bug?

Yes? It's in "FIXED" state. :)

commit: d77856f21b959646aedf28c07324be2dab34d79f
Date:   Thu Mar 26 17:08:57 2015 +0000

    vm-configurator: Use 'writeback' disk cache option
    
    Turns out that cache=none did not help much in reducing the installation
    time of Windows 7 from hours to 20mins (like we thought) and cache=none
    requires O_DIRECT and hence makes this option very non-portable.
    
    Let's use 'writeback' cache option, which is reported to be more
    efficient than the default option.