[one-users] Fwd: how running vms moved(not recreate) on another host on host error

Ruben S. Montero rsmontero at opennebula.org
Fri Dec 13 03:01:58 PST 2013


Hi

In general, we cannot assume that there will be access to the host in a
failure state. The current delete/recreate mechanism is based on the
following:

* If there are a shared FS (a reliable one) then recreation would not
affect if the VM disks are persistent. Changes will be available when the
VM is recreated, even if the disk is deleted (just the symbolic link is
removed)

* If the disk is not persistent, you will lost the changes, even if you
have a shared FS. However, a HA service should not be based on
non-persistent disks, as OpenNebula assumes that the changes can be
disposed.

* If there is no shared FS, and the host is down, then there is no hope to
get the disk out of it.

With this in mind the current OPenNebula behavior should work, provided
that *YOU HAVE* a working fencing mechanism for the physical hosts.

You need to fence the falling host in the ha-hook, to prevent a split brain
condition on the shared disks.

Cheers

Ruben





On Thu, Dec 5, 2013 at 3:45 PM, Dmitri Chebotarov <dchebota at gmu.edu> wrote:

>  Hi,
>
>  Did you ever figured out how to “move” VM in case when VM host goes
> down?
> I ran into the same issue last night.
> RHEL6 cluster, same OS/KVM version. One of them VM hosts went down (error)
> and VMs running on that host were recreated on available hosts.
> Recreated VMs lost work progress.
>
>  RHEL6 cluster is using shared NFS storage for system and “data”
> datastores.
> Once host died, ONE attempts to connect to the dead host to access system
> datastore, which is already mounted on ONE controller under the same path
> (log below).
> This is how system datastore configured:
>
>  TYPE: SYSTEM_DS
> DISK_TYPE: file
> TM_MAD: shared.
>
>  It’s mounted on all cluster nodes and ONED controller under the same
> path (/var/lib/one).
> I’m probably missing something in system datastore configuration, which
> would tell ONED to access it locally, not via dead VM host…
>
>  Shouldn’t ONED start VMs on available host using existing config/disk
> files in system datastore?
> And not delete/recreated it?
>
>  Thank you.
>
>  Thu Dec 5 04:49:29 2013 [VMM][I]: Command execution fail:
> /var/tmp/one/vnm/ovswitch/clean
> Thu Dec 5 04:49:29 2013 [VMM][I]: ssh: connect to host BC4-10 port 22: No
> route to host
> Thu Dec 5 04:49:29 2013 [VMM][I]: ExitSSHCode: 255
> Thu Dec 5 04:49:29 2013 [VMM][E]: Error connecting to BC4-10
> Thu Dec 5 04:49:29 2013 [VMM][I]: Failed to execute network driver
> operation: clean.
> Thu Dec 5 04:49:32 2013 [VMM][I]: Command execution fail:
> /var/lib/one/remotes/tm/qcow2/delete
> BC4-10:/var/lib/one//datastores/111/11251/disk.0 11251 107
> Thu Dec 5 04:49:32 2013 [VMM][I]: delete: Deleting
> /var/lib/one/datastores/111/11251/disk.0
> Thu Dec 5 04:49:32 2013 [VMM][E]: delete: Command "rm -rf
> /var/lib/one/datastores/111/11251/disk.0" failed: ssh: connect to host
> BC4-10 port 22: No route to host
> Thu Dec 5 04:49:32 2013 [VMM][E]: Error deleting
> /var/lib/one/datastores/111/11251/disk.0
> Thu Dec 5 04:49:32 2013 [VMM][I]: ExitCode: 255
> Thu Dec 5 04:49:32 2013 [VMM][I]: Failed to execute transfer manager
> driver operation: tm_delete.
> Thu Dec 5 04:49:35 2013 [VMM][I]: Command execution fail:
> /var/lib/one/remotes/tm/shared/delete
> BC4-10:/var/lib/one//datastores/111/11251 11251 111
> Thu Dec 5 04:49:35 2013 [VMM][I]: delete: Deleting
> /var/lib/one/datastores/111/11251
> Thu Dec 5 04:49:35 2013 [VMM][E]: delete: Command "rm -rf
> /var/lib/one/datastores/111/11251" failed: ssh: connect to host BC4-10 port
> 22: No route to host
> Thu Dec 5 04:49:35 2013 [VMM][E]: Error deleting
> /var/lib/one/datastores/111/11251
> Thu Dec 5 04:49:35 2013 [VMM][I]: ExitCode: 255
> Thu Dec 5 04:49:35 2013 [VMM][I]: Failed to execute transfer manager
> driver operation: tm_delete.
> Thu Dec 5 04:49:35 2013 [VMM][I]: Host successfully cleaned.
> Thu Dec 5 04:49:35 2013 [DiM][I]: New VM state is PENDING
>> Thank you,
>
> Dmitri Chebotarov
> VCL Sys Eng, Engineering & Architectural
> Support, TSD - Ent Servers & Messaging
> 223 Aquia Building, Ffx, MSN: 1B5
> Phone: (703) 993-6175 | Fax: (703) 993-3404
>
>
>   From: Carlos Martín Sánchez <cmartin at opennebula.org>
> Date: Wednesday, September 11, 2013 at 5:44
> To: Romany Nageh <engromanynageh at gmail.com>
> Cc: "users at lists.opennebula.org" <users at lists.opennebula.org>
> Subject: Re: [one-users] Fwd: how running vms moved(not recreate) on
> another host on host error
>
>   Hi,
>
>  What do you exactly mean by "move"? If you are referring to migration,
> that's not possible, once a host goes down, the VM state is lost.
>
>  Regards
>
>  --
> Join us at OpenNebulaConf2013 <http://opennebulaconf.com> in Berlin,
> 24-26 September, 2013
> --
> Carlos Martín, MSc
> Project Engineer
> OpenNebula - The Open-source Solution for Data Center Virtualization
> www.OpenNebula.org | cmartin at opennebula.org | @OpenNebula<http://twitter.com/opennebula><cmartin at opennebula.org>
>
>
> On Tue, Sep 10, 2013 at 11:47 PM, Romany Nageh <engromanynageh at gmail.com>wrote:
>
>>  HI
>> i am using opennebula 4.2 how to handle vms running on specific host to
>> move (not recreate) to another host when host error(down)
>>
>> please could any on help me ?
>>  Thanks
>>
>>  ---------- Forwarded message ----------
>> From: "Romany Nageh" <engromanynageh at gmail.com>
>> Date: Sep 9, 2013 9:46 PM
>> Subject: how running vms moved(not recreate) on another host on host error
>> To: <users at lists.opennebula.org>, "Carlos Martín Sánchez" <
>> cmartin at opennebula.org>
>>
>>  HI
>> i am  using opennebula 4.2
>>  how to handle vms running on specific host to move (not recreate) to
>> another host when host error(down)
>>
>> please could any on help me ?
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at lists.opennebula.org
>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>
>>
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
>


-- 
-- 
Ruben S. Montero, PhD
Project co-Lead and Chief Architect
OpenNebula - Flexible Enterprise Cloud Made Simple
www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20131213/73df9041/attachment-0002.htm>


More information about the Users mailing list