[one-users] VM HA with RBD

Fri Sep 13 14:01:00 PDT 2013

Hi,

Thanks a lot for your answer.

[...]
> * Why has the VM to be recreated? The disk image lies on a shared
> storage (RBD) and should only be started on another host, not
> recreated.
> 
> Any other process will try to contact the failing host so the only
> possible path is to recreate the VM. Note that this operations are
> agnostic from the underlying infrastructure, so it should work on RBD
> or a simple storage shared through SSH cp's.
> 
> Given said that, It seems that we need to modify the ceph Datastore to
> check if the volume exist before trying to create a new one, so the
> use case is fully supported.
> 
> http://dev.opennebula.org/issues/2324 [2]

Yes, that's a good idea. ONE should check this... If you can tell me 
where I need to look for this in the source, maybe I would be able to 
contribute to this.

> * The VM now has the state "FAILED". How is the VM supposed to be
> recovered?
> 
> You can try delete --recreate.

I did a new test: Disabled the default FT hook, powered off one of the 
KVM hosts with one VM running. This VM then was in the UNKNOWN state. In 
this state we only can issue "onevm boot 66" which tries to start this 
VM on the failed node, or the "onevm delete 66 --recreate" which would 
delete the VM and recreate it (But: "rbd image one-5-66-0 already 
exists").
Both commands are not able to start the same VM again on another host in 
the cluster and I even can't find any other command to get this VM up 
and running again.
The best thing would be to just reschedule the VM so that the VM can be 
started on one of the remaining hosts. Of course ticket #2324 needs to 
be done so that the existing volume on the shared storage (in my case 
RBD) will be used. What do you think on this?

Cheers,
Tobias