[one-users] File system performance testing suite tailored to OpenNebula

Wed Sep 11 12:41:54 PDT 2013

Actually the point is that *it is* possible to get near-native performance, when appropriate tuning or precautions are taken.
Take as an example the graphs in page 5:
the throughput is *higher* with XFS as host filesystem than the raw device (BD in the graph) for the filesystem workload, and using XFS it's within 10% (apart for ext3, that has an higher performance hit); for the database workload it's JFS that's on a par or slightly faster.
Another important fact is latency (added latency due to multiple stacked FS) and again, the graph on page 6 shows that there are specific combinations of guest/host FS have very small added latencies due to filesystem stacking.
It is also clear that the default ext4 used in many guest VMs is absolutely sub-optimal for write workloads, where JFS is twice as fast.
Other aspects to consider:
The default io scheduler in linux is *abysmal* for VM workloads. Deadline is the clear winner, along with noop for SSD disks. Other small touches may be tuning the default readahead for rotational media (and removing it for ssd), increasing the retention of read cache pages, increasing (a little) the flush time of the write cache, that even with a 5 second sweep time increases the iops rate for write workloads by increasing the opportunities for optimizing the disk head path, and on and on...
so, my point is that it is possible with relatively small effort, to get near-disk performance from kvm with libvirt (same concept, with different aspects, for Xen). 
it's a fascinating area of work, and we had one of our people work for two weeks only doing tests using a windows VM with a benchmark application inside, over a large number of different fs/kvm parameters. We found out a lot of interesting cases :-)
cheers
carlo daffara
cloudweavers

----- Messaggio originale -----
Da: "João Pagaime" <joao.pagaime at gmail.com>
A: users at lists.opennebula.org
Inviato: Mercoledì, 11 settembre 2013 19:31:07
Oggetto: Re: [one-users] File system performance testing suite tailored to OpenNebula

thanks  for pointing out the paper

I've glanced at it and somewhat confirmed my impressions on write 
operations (which are very relevant on transactional environments):  the 
penalty on write operations doesn't seem to be negligible.

best regards,
João

Em 11-09-2013 14:55, Carlo Daffara escreveu:
> Not a simple answer, however this article by LE and Huang provide quite some details:
> https://www.usenix.org/legacy/event/fast12/tech/full_papers/Le.pdf
> we ended up using ext4 and xfs mainly, with btrfs for mirrored disks or for very slow rotational media.
> Raw is good if you are able to map disks directly and you don't change them, but our results find that the difference is not that great- but the inconvenience is major :-)
> When using kvm and virtio, the actual loss in IO performance is not very high for the majority of workloads. Windows is a separate issue- ntfs has very poor performance on small blocks for sparse writes, and this tends to increase the apparent inefficiency of kvm.
> Actually, using the virtio device drivers the penalty is very small for most workloads; we tested a windows7 machine both as native (physical) and virtualized using a simple crystalmark test, and we found that using virtio the 4k random io write test is just 15% slower, while the sequential ones are much faster virtualized (thanks to the linux native page cache).
> We use for the intensive io workloads a combination of a single ssd plus one or more rotative disks, combined using enhanceio.
> We observed an increase of the available IOPS for random write (especially important for database servers, AD machines...) of 8 times using consumer-grade ssds.
> cheers,
> Carlo Daffara
> cloudweavers
>
> ----- Messaggio originale -----
> Da: "João Pagaime" <joao.pagaime at gmail.com>
> A: users at lists.opennebula.org
> Inviato: Mercoledì, 11 settembre 2013 15:20:19
> Oggetto: Re: [one-users] File system performance testing suite tailored to OpenNebula
>
> Hello all,
>
> the topic is very interesting
>
> I wonder if anyone could answer this:
>
> what is the penalty of using a file-system on top of a file-system? that
> is what happens when the VM disk is a regular file on the hypervisor's
> filesystem. I mean: the VM has its own file-system and then the
> hypervisor maps that vm-disk on a regular file on another filesystem
> (the hypervisor filesystem). Thus the file-system on top of a
> file-system issue
>
> putting the question the other way around: what is the benefit of using
> raw disk-device (local disk, LVM, iSCSI, ...) as an open-nebula datastore?
>
> didn't test this but I feel the benefit should be substantial
>
> anyway simple bonnie++ tests within a VM show heavy penalties, comparing
> test running in  the VM and outside (directly on the hipervisor).  That
> isn't of course an opennebula related performance issue, but a more
> general technology challenge
>
> best regards,
> João
>
>
>
>
> Em 11-09-2013 13:10, Gerry O'Brien escreveu:
>> Hi Carlo,
>>
>>    Thanks for the reply. I should really look at XFS for the
>> replication and performance.
>>
>>    Do you have any thoughts on my second questions about qcow2 copies
>> form /datastores/1 to /datastores/0 in a single filesystem?
>>
>>          Regards,
>>            Gerry
>>
>>
>> On 11/09/2013 12:53, Carlo Daffara wrote:
>>> It's difficult to provide an indication of what a typical workload
>>> may be, as it depends greatly on the
>>> I/O properties of the VM that run inside (we found that the
>>> "internal" load of OpenNebula itself to be basically negligible).
>>> For example, if you have lots of sequential I/O heavy VMs you may get
>>> benefits from one kind, while transactional and random I/O VMs may be
>>> more suitably served by other file systems.
>>> We tend to use fio for benchmarks (http://freecode.com/projects/fio)
>>> that is included in most linux distributions; it provides for
>>> flexible selection of read-vs-write patterns, can select different
>>> probability distributions and includes a few common presets (like
>>> file server, mail server etc.)
>>> Selecting the bottom file system for the store is thus extremely
>>> depending on application, feature and load. For example, we use in
>>> some configurations BTRFS with compression (slow rotative devices,
>>> especially when there are several of them in parallel), in other we
>>> use ext4 (good, all-around balanced) and in other XFS. For example
>>> XFS supports filesystem replication in a way similar to that of zfs
>>> (not as sofisticated, though), excellent performance for multiple
>>> parallel I/O operations.
>>> ZFS in our tests tend to be extremely slow outside of a few "sweet
>>> spots"; a fact confirmed by external benchmarks like this one:
>>> http://www.phoronix.com/scan.php?page=article&item=zfs_linux_062&num=3 We
>>> tried it (and we continue to do so, both for the FUSE and native
>>> kernel version) but for the moment the performance hit is excessive
>>> despite the nice feature set. BTRFS continue to improve nicely, and a
>>> set of patches to implement send/receive like ZFS are here:
>>> https://btrfs.wiki.kernel.org/index.php/Design_notes_on_Send/Receive
>>> but it is still marked as experimental.
>>>
>>> I personally *love* ZFS, and the feature set is unparalleled.
>>> Unfortunately, the poor license choice means that it never got the
>>> kind of hammering and tuning that other linux kernel filesystem can get.
>>> regards,
>>> carlo daffara
>>> cloudweavers
>>>
>>> ----- Messaggio originale -----
>>> Da: "Gerry O'Brien" <gerry at scss.tcd.ie>
>>> A: "Users OpenNebula" <users at lists.opennebula.org>
>>> Inviato: Mercoledì, 11 settembre 2013 13:16:52
>>> Oggetto: [one-users] File system performance testing suite tailored
>>> to    OpenNebula
>>>
>>> Hi,
>>>
>>>        Are there any recommendations for a file system performance
>>> testing
>>> suite tailored to OpenNebula typical workloads? I would like to compare
>>> the performance of zfs v. ext4. One of the reasons for considering zfs
>>> is that it allows replication to a remote site using snapshot streaming.
>>> Normal nightly backups, using something like rsync, are not suitable for
>>> virtual machine images where a single block change means the whole image
>>> has to be copied. The amount of change is to great.
>>>
>>>        On a related issue, does it make sense to have datastores 0 and 1
>>> in a single files system so that the instantiations of non-persistent
>>> images does not require a copy from one file system to another? I have
>>> in mind the case where the original image is a qcow2 image.
>>>
>>>            Regards,
>>>                Gerry
>>>
>>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

_______________________________________________
Users mailing list
Users at lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org