[one-users] Some issues while using OpenNebula

Tue Feb 10 07:06:42 PST 2009

Hi Boris,
	Thank you very much for your clarification!. 

> Ok ,that sounds good.
> The problem I got was, that a machine got shutdown via the xm command
> on the host itself. Than it does not appear in onevm list anymore and
> I did not know what happened. Also if a host crashes, ONE takes the vm
> out of the list, but then Xen itself restarts the machine after the
> host got rebooted and Nebula does not found this machine anymore. So
> it is up and running but not listed in onevm and of course a new
> submit via ONE on the same image does not work.

Ok, now I understand the problem. May be OpenNebula tries to be too much 
cleaver here ;). We could add a new state, ERROR for a VM. Instead of current 
behavior , i.e. guessing,  we could leave the VM in the new ERROR state. In 
this way it is listed, and you can take a decision. Additionally we could 
monitor the VM periodically, to check if the hypervisor could recover from the 
error.

>
> Hm.. I did not get this completely. I had a machine, that kept listed
> in the state "boot".
> I could not do any command on it (I tried onevm delete, onevm resume,
> onevm stop) so I had to kill the hung-up booting vm via xen on the
> selected host and resubmit the template (and still have the boot-entry
> in the onevm list).

Ok. I misunderstood the problem. Yes, we should be able to send a kill command 
to the VM.  I'll put this in the 1.4 tentative roadmap

> Actually I can not really reproduce it myself at the moment, since I
> don't want to shutdown the running machines.
> There were some configuration problems after the installation, so not
> everything was working.
> While trying to fix this, many failed ssh connections openNebula had
> opened to one of the nodes, kept open - and then the node did not
> response to ssh at all anymore and had to be manually rebooted.
>

OK. We are re-engineering the OpenNebula drivers, so we'll have an opportunity 
to look at this one.

>
> Sure, please find the files attached.
> (OpenNebula Server: wn001, used host wn002)
> And yes, egrep is installed.

Thanks!!

>
> So when I resume the machine via onevm, it will use the clean initial
> image instead of the cloned one, it was working on before?
> Is this also true, if I use stop instead of shutdown?
> For example if I run a svn server in a virtual machine, I intend to
> have any changes saved back to the image?

It should use the saved image along with the checkpoint file to restore the VM, 
when you use stop/suspend. However, if you shutdown a VM, and you want to keep 
the changes you have two options:
  * Not cloning, If you are using a shared storage, you may work directly over 
the image. However you can not reuse it for other VMs.
  * SAVE=yes for the DISK. If you do not have a shared image repo, or if you 
want to clone the images.  OpenNebula will copy the disk back to the 
$ONE_LOCATION/var/$VMID directory, and it will NOT overwrite the original 
disk, so you have to move it yourself.

Thanks again for your valuable feedback!!
Cheers

Ruben
>
> > Ok. May be we should improve the doc here:
>
> Thanks a lot for the clarification, it was really helpful!
>
> Kind regards,
> Boris

-- 
+---------------------------------------------------------------+
 Dr. Ruben Santiago Montero
 Associate Professor
 Distributed System Architecture Group (http://dsa-research.org)

 URL:    http://dsa-research.org/doku.php?id=people:ruben
 Weblog: http://blog.dsa-research.org/?author=7

 GridWay, http://www.gridway.org
 OpenNEbula, http://www.opennebula.org
+---------------------------------------------------------------+