[one-users] Some issues while using OpenNebula

Tue Feb 10 06:04:37 PST 2009

Dear Ruben,

thanks a lot for your fast and broad response!

>> Is it right, that ONE does not detect it, if an physical host crashed
>> and has problems with handling that?
>
> In that case OpenNebula should not allocate VMs in that node.

Yes, that works fine.

>> What is the procedure, if an host running virtual machines crashes?
>
> Well, there is not too much space to do things right ;). We are implementing
> a general hook mechanism that let you program actions on different VM states.
> Like executing a pre-defined action (on_failure=reschedule) or a custom command
> on specific VM states like boot, failure... (this feature will not be available
> in 1.2).
> Note that this will be done at the cluster level, so you have to be cautious.
> For example automatically re-schedule a VM with a cloned disk will not
> preserve your data. If you have a shared file-system the disk may contain
> inconsistencies and need a fsck so the VM will not boot....

Ok ,that sounds good.
The problem I got was, that a machine got shutdown via the xm command
on the host itself. Than it does not appear in onevm list anymore and
I did not know what happened. Also if a host crashes, ONE takes the vm
out of the list, but then Xen itself restarts the machine after the
host got rebooted and Nebula does not found this machine anymore. So
it is up and running but not listed in onevm and of course a new
submit via ONE on the same image does not work.

>> Is it right, that you can not delete a virtual machine (VM), when it
>> gets stucked in the boot-status for any reasons?
>
> This is a known issue of 1.2. But I must say that this is only cosmetic. We do
> not delete VMs, even when they are done, just we do not show them in the
> onehost list command (greep -v boot, will fix this ;).

Hm.. I did not get this completely. I had a machine, that kept listed
in the state "boot".
I could not do any command on it (I tried onevm delete, onevm resume,
onevm stop) so I had to kill the hung-up booting vm via xen on the
selected host and resubmit the template (and still have the boot-entry
in the onevm list).

>> Also, when anything ONE tries to do via ssh fails, the ssh connection
>> remains, which brought down our ssh clients because of too many
>> connections.
>
> I'd love to hear more about this one, we are very interested in making
> OpenNebula scalable. In fact we have successfully performed tests with ~100
> ssh simultaneous connections. Could you give me more details so we can
> reproduce this and track down the problem?

Actually I can not really reproduce it myself at the moment, since I
don't want to shutdown the running machines.
There were some configuration problems after the installation, so not
everything was working.
While trying to fix this, many failed ssh connections openNebula had
opened to one of the nodes, kept open - and then the node did not
response to ssh at all anymore and had to be manually rebooted.

>> Then we have got a problem with the tm_mv.sh in the nfs version we
>> use. It does not seem to work. Maybe you can on help us:
>> If I try to use nfs and cloning, the vm runs as expected. But when I
>> shut it down (the save-tag in the template is set to "yes"), it is not
>> saved.
>> The vm log-file says "Will not move, is not saving image"
>
> You are right, this should not happen. In fact, as you mention this is a very
> simple script. I've filed a bug for this:
>
> http://ruben@trac.opennebula.org/ticket/73
>
> Could you send us the vm.log and transfer.0 files for the VM?. And just a silly
> try, egrep is installed, isn't it?

Sure, please find the files attached.
(OpenNebula Server: wn001, used host wn002)
And yes, egrep is installed.

>> Even more critical seems to be, that the images-subfolder gets deleted
>> anyway, so the changes made to your image are lost and you can only
>> set up another vm from the previous image saved in the images-folder.
>> (A vm's image is supposed to be saved back to the image folder, if you
>> use onevm shutdown, isnt it?)
>
> If the tm_mv.sh works properly you will have a copy in $VAR_LOCATION/<VMID>,
> so the temporary images dir should be cleaned up. Then you can set up a
> template with the save disk.

So when I resume the machine via onevm, it will use the clean initial
image instead of the cloned one, it was working on before?
Is this also true, if I use stop instead of shutdown?
For example if I run a svn server in a virtual machine, I intend to
have any changes saved back to the image?

> Ok. May be we should improve the doc here:

Thanks a lot for the clarification, it was really helpful!

Kind regards,
Boris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: transfer.0
Type: application/octet-stream
Size: 126 bytes
Desc: not available
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20090210/82e32959/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vm.log
Type: text/x-log
Size: 2247 bytes
Desc: not available
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20090210/82e32959/attachment-0003.bin>