[one-users] Multi-deployment of VMs: Slow "Dispatching virtual machine"

Mon Jun 3 09:11:22 PDT 2013

Hi,

Since 2.2 OpenNebula has to perform several new steps for each deployment.
ACL rules, authrorization driver, usage quotas, and probably some other
feature I'm missing.

To discard the scheduler, you can start the deployments with the onevm
deploy command. This command takes a range of VM ids, but in that case the
operations are sequential. I think this command should be enough to test
concurrent deployments:

$ for i in `seq 1 4`; do (onevm deploy $i 0 &); done

You may also want to test different values for the TM and VMM -t parameter,
to adjust the number of threads for each driver [1].

Regards

[1] http://opennebula.org/documentation:archives:rel3.8:oned_conf

--
Join us at OpenNebulaConf2013 <http://opennebulaconf.com> in Berlin, 24-26
September, 2013
--
Carlos Martín, MSc
Project Engineer
OpenNebula - The Open-source Solution for Data Center Virtualization
www.OpenNebula.org | cmartin at opennebula.org |
@OpenNebula<http://twitter.com/opennebula><cmartin at opennebula.org>

On Tue, May 28, 2013 at 5:45 PM, Michael Berlin <
michael.berlin.xtreemfs at gmail.com> wrote:

> Hi,
>
> I'm benchmarking the multi-deployment of VMs in OpenNebula to test the
> scalability of our distributed file system XtreemFS.
>
> Therefore, I do the following things:
>
> - stop the scheduler
> - "onevm create" multiple VMs
> - start the scheduler again
>
> - wait until the last VM has booted
>
> Recently, we upgraded our OpenNebula installation from 2.2 to 3.8 on our
> 32 node test cluster. With OpenNebula 2.2 the VMs were deployed almost
> simultaneously. But in 3.8 dispatching a single VM takes quite some time
> (1-2 seconds) for the scheduler.
>
> Here are the details:
>
> I benchmark the creation of the qcow2 snapshot in the "clone" transfer
> manager script and here's what it looked like for deploying 10 VMs with
> OpenNebula 2.2:
>
> 1362253295.5779 clone_starting n03
> 1362253295.5929 clone_starting n01
> 1362253295.6138 clone_starting n00
> 1362253295.6418 clone_starting n05
> 1362253295.6428 clone_starting n04
> 1362253295.6905 clone_starting n08
> 1362253295.6960 clone_starting n09
> 1362253295.7047 clone_starting n06
> 1362253295.7113 clone_starting n02
> 1362253295.7330 clone_starting n07
> 1362253296.7047 clone_finished n05
> 1362253296.7214 clone_finished n03
> 1362253296.7353 clone_finished n01
> 1362253296.7571 clone_finished n06
> 1362253296.7677 clone_finished n09
> 1362253296.7705 clone_finished n04
> 1362253296.8035 clone_finished n08
> 1362253296.8206 clone_finished n00
> 1362253296.8214 clone_finished n02
> 1362253296.8292 clone_finished n07
>
> The whole thing finished in under two seconds.
>
> With OpenNebula 3.8 it looks much different:
>
> 1369752457.4118 clone_starting n13
> 1369752457.4195 clone_finished n13
> 1369752459.6483 clone_starting n17
> 1369752459.6561 clone_finished n17
> 1369752460.6465 clone_starting n08
> 1369752460.6544 clone_finished n08
> 1369752461.9516 clone_starting n12
> 1369752461.9602 clone_finished n12
> 1369752463.2860 clone_starting n15
> 1369752463.2948 clone_finished n15
> 1369752465.7036 clone_starting n14
> 1369752465.7120 clone_finished n14
> 1369752466.7329 clone_starting n11
> 1369752466.7406 clone_finished n11
> 1369752467.9151 clone_starting n10
> 1369752467.9231 clone_finished n10
> 1369752468.8460 clone_starting n16
> 1369752468.8539 clone_finished n16
> 1369752469.8849 clone_starting n09
> 1369752469.8958 clone_finished n09
>
> Now, dispatching a single VM takes between 1-2 seconds. Here are the
> corresponding snippets from the sched.log:
>
> Tue May 28 16:47:35 2013 [VM][I]: Dispatching virtual machine 266 to host
> 98
> Tue May 28 16:47:36 2013 [VM][I]: Dispatching virtual machine 267 to host
> 102
> Tue May 28 16:47:36 2013 [VM][I]: Dispatching virtual machine 268 to host
> 93
> Tue May 28 16:47:39 2013 [VM][I]: Dispatching virtual machine 269 to host
> 97
> Tue May 28 16:47:41 2013 [VM][I]: Dispatching virtual machine 270 to host
> 100
> Tue May 28 16:47:41 2013 [VM][I]: Dispatching virtual machine 271 to host
> 99
> Tue May 28 16:47:43 2013 [VM][I]: Dispatching virtual machine 272 to host
> 96
> Tue May 28 16:47:44 2013 [VM][I]: Dispatching virtual machine 273 to host
> 95
> Tue May 28 16:47:44 2013 [VM][I]: Dispatching virtual machine 274 to host
> 101
> Tue May 28 16:47:45 2013 [VM][I]: Dispatching virtual machine 275 to host
> 94
>
> When I have a look at the sources, I suspect part of the problem is the
> blocking XML-RPC call to the one daemon (?):
>
> https://github.com/OpenNebula/**one/blob/**d732c5ae2fe774a2f0c0e24e6b60b3*
> *dc832a5f35/src/scheduler/src/**pool/VirtualMachinePoolXML.cc#**L133<https://github.com/OpenNebula/one/blob/d732c5ae2fe774a2f0c0e24e6b60b3dc832a5f35/src/scheduler/src/pool/VirtualMachinePoolXML.cc#L133>
>
> Nonetheless, it shouldn't take that long. Therefore, my questions are:
>
> - Is this normal? Can you please give advice how to further track down
> what takes so long?
>
> - With 2.2 you can clearly see the interleaving of multiple deployments
> while 3.8 processes them one at a time. Is there a way to get the old
> behavior back in a recent OpenNebula installation?
>
> Thank you very much for your help.
>
> Best regards,
> Michael
> ______________________________**_________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/**listinfo.cgi/users-opennebula.**org<http://lists.opennebula.org/listinfo.cgi/users-opennebula.org>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20130603/bfaa5d4a/attachment-0001.htm>