[one-users] Shared storage performance

Sat Jun 23 13:25:10 PDT 2012

Hi,

On 06/21/2012 10:55 AM, Andreas Calvo wrote:
> Hello,
> We are facing a performance issue in our opennebula infrastructure, and
> I'd like to heard your opinion on the best approach to solve it.
>
> We have 15 nodes plus 1 front-end. They all have the same shared storage
> thru iscsi, and they mount the opennebula home folder (/var/lib/one)
> which is a GFS2 partition.
> All machines are based on CentOS 6.2, using QEMU-KVM.
>
> We use the cloud to perform tests against a 120 VMs farm.
>
> As we are using QCOW2, it really decreases the need to write changes to
> disk.
>
> However, all machines need to copy over 1G of data every time they
> start, and this really collapse our iscsi network, until some machines
> receive a timeout accessing to data which stops the test.
> Opennebula infrastructure suffers from a read/write penalty leaving some
> VMs in pending state and the system (almost) non-responsive.
>
> We are not using at all the local disk of the nodes.
>
> It seems that the only option is to use the local disk to write disk
> changes, but I wanted to know what's your experienced opinion on our
> problem.

I have two suggestions and two comments:
Suggestion: Maybe you could try to move to multiple GFS2 partitions, 
potentially spread over multiple servers. This way the traffic will be 
more local.

Comment: find the person who first told you that iSCSI is fine for 
serious use. Consider hitting them.

Suggestion: Then either deploy a second/third/forth iSCSI network under 
your multipathing to raise bandwidth. It seems you just need to have 
enough B/W to cover these spikes so it should be possible to quantify 
how much B/W you're missing.
Comment: I'd immediately migrate out of iSCSI over to FC instead of 
deploying more ethernet, but that's just me.