[one-users] Some issues while using OpenNebula
Ruben S. Montero
rubensm at dacya.ucm.es
Thu Feb 5 14:01:55 PST 2009
Hi Boris,
Thank you very much for your interest in OpenNebula, and sharing your
concerns with us. Comments below:
> The first issue is about ONEs behaviour in the case of malfunctions.
> Is it right, that ONE does not detect it, if an physical host crashed
> and has problems with handling that?
Not really, the logic is already there so if you have experience problems with
this it should be a bug. In fact, OpenNebula checks the physical hosts at
several points. If there is a misconfiguration or just a failure (e.g. the
physical node crashed) the information system detects it and marks the as
error, like this:
HID NAME RVM TCPU FCPU ACPU TMEM FMEM STAT
1 cluster02 0 100 100 100 1047552 896000 err
In that case OpenNebula should not allocate VMs in that node.
> What is the procedure, if an host running virtual machines crashes?
Well, there is not too much space to do things right ;). We are implementing
a general hook mechanism that let you program actions on different VM states.
Like executing a pre-defined action (on_failure=reschedule) or a custom command
on specific VM states like boot, failure... (this feature will not be available
in 1.2).
Note that this will be done at the cluster level, so you have to be cautious.
For example automatically re-schedule a VM with a cloned disk will not
preserve your data. If you have a shared file-system the disk may contain
inconsistencies and need a fsck so the VM will not boot....
>
> Is it right, that you can not delete a virtual machine (VM), when it
> gets stucked in the boot-status for any reasons?
This is a known issue of 1.2. But I must say that this is only cosmetic. We do
not delete VMs, even when they are done, just we do not show them in the
onehost list command (greep -v boot, will fix this ;).
All that info is kept in the DB to generate accounting reports, or use it for
billing purposes. Providing a friendly accounting API & CLI is also in our
short-term roadmap. Note that the info is in an standard sqlite DB, so it is
really easy to access the accounting data.
> Also, when anything ONE tries to do via ssh fails, the ssh connection
> remains, which brought down our ssh clients because of too many
> connections.
I'd love to hear more about this one, we are very interested in making
OpenNebula scalable. In fact we have successfully performed tests with ~100
ssh simultaneous connections. Could you give me more details so we can
reproduce this and track down the problem?
>
> Then we have got a problem with the tm_mv.sh in the nfs version we
> use. It does not seem to work. Maybe you can on help us:
> If I try to use nfs and cloning, the vm runs as expected. But when I
> shut it down (the save-tag in the template is set to "yes"), it is not
> saved.
> The vm log-file says "Will not move, is not saving image"
You are right, this should not happen. In fact, as you mention this is a very
simple script. I've filed a bug for this:
http://ruben@trac.opennebula.org/ticket/73
Could you send us the vm.log and transfer.0 files for the VM?. And just a silly
try, egrep is installed, isn't it?
> Even more critical seems to be, that the images-subfolder gets deleted
> anyway, so the changes made to your image are lost and you can only
> set up another vm from the previous image saved in the images-folder.
> (A vm's image is supposed to be saved back to the image folder, if you
> use onevm shutdown, isnt it?)
If the tm_mv.sh works properly you will have a copy in $VAR_LOCATION/<VMID>,
so the temporary images dir should be cleaned up. Then you can set up a
template with the save disk.
>
> Actually, I am still not sure which commands are meant for the regular
> operations? If I like to stop a VM and continue it anywhere later,
> should I use shutdown and then submit its template again (therefore
> getting a new onevm id)? Or should I use suspend and resume?
Ok. May be we should improve the doc here:
Stop. You stop the VM, generate a checkpoint and "transfer back" the image,
where "transfer-back" means different things for NFS or SCP. Then if you resume
the VM, then it is scheduled in other resource and continues its execution.
Resume. Same as above but everything is left in the physical host where the VM
is running. When you resume it, the VM will continue in the same resource.
Note that the scheduler is not invoked here and images are not moved so this
one should be faster.
Shutdown. In this case imagine that you have installed a base system, then you
boot up the machine and configure something. You can make the modifications
persistent by cloning the disk (so you can continue the VM by just submitting
it again after shutdown) or by saving. In the latter you keep the original
image and the modifications are saved in the $VAR_LOCATION/<VM_ID> directory.
You have to move the saved image to your repo and fix a new template for it.
Note that the first two are stateful
>
> One note:
> Before we even got it to run, we had to do a small change in the
> XenDriver.cc There you use tap:aio in the 1.2 beta version. But to use
> "file:" is not deprecated, as mentioned in the comment in XenDriver.cc.
> tap:aio just has a higher performance. But it does not work on all Linux
> distributions and xen-versions! So please consider making this
> configurable.
May be we are wrong and this is Xen/distribution dependent. We will add a note
under the known issues for 1.2
Thanks for your feedback!
Ruben
>
> Kind regards,
> Boris Konrad
>
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
--
+---------------------------------------------------------------+
Dr. Ruben Santiago Montero
Associate Professor
Distributed System Architecture Group (http://dsa-research.org)
URL: http://dsa-research.org/doku.php?id=people:ruben
Weblog: http://blog.dsa-research.org/?author=7
GridWay, http://www.gridway.org
OpenNEbula, http://www.opennebula.org
+---------------------------------------------------------------+
More information about the Users
mailing list