[one-users] migration not working completly

Ross Nordeen rjnordee at mtu.edu
Mon Jul 26 08:13:02 PDT 2010


Tino, 

I figured out my live migrate problem which turned out to be a bad default gw.  As far as the migration and check pointing though I have the /srv/cloud/one directory shared out to all nodes via nfs and full permissions for oneadmin... I think it is /srv/cloud/one/var/18.  I will check the VM_DIR variable in the oned.conf file though and see if it is right.  Still if everything else is working it seems like the VM_DIR is exported correctly and functioning for the running vm's.

-Ross

----- Original Message -----
From: "Tino Vazquez" <tinova at fdi.ucm.es>
To: "Ross Nordeen" <rjnordee at mtu.edu>
Cc: users at lists.opennebula.org
Sent: Monday, July 26, 2010 8:41:37 AM GMT -07:00 US/Canada Mountain
Subject: Re: [one-users] migration not working completly

Hi Ross,

There seems to be two issues here:

1) Not live/migrate between cn2 and cn1 --> could it be that the
oneadmin user cannot passwordlessly ssh from cn2 to cn1, but it can
from cn1 to cn2?

2) The save problem seems to come from the impossibility to save the
checkpoint file. This may be due to the fact that /srv/cloud/one
directory doesn't exist in the remote nodes, in which case you will
need to use the VM_DIR variable in the oned.conf file.

Hope it helps,

-Tino

--
Constantino Vázquez Blanco | dsa-research.org/tinova
Virtualization Technology Engineer / Researcher
OpenNebula Toolkit | opennebula.org



On Thu, Jul 22, 2010 at 11:39 PM, Ross Nordeen <rjnordee at mtu.edu> wrote:
> I have open nebula deployed with one head node and 2 compute nodes,  I have no problems live migrating from cn1 to cn2 but I get failures live/cold migrating from cn2 to cn1.  is there any reason I would not able to a) not save the state of any of my machines and why live-migration works one way but not the other??  Thanks
>
> -Ross
>
>
> here is my vm.log file after a live-migration, migration, and than suspend:
>
>
> Thu Jul 22 11:40:22 2010 [LCM][I]: New VM state is MIGRATE
> Thu Jul 22 11:40:22 2010 [VMM][I]: Command execution fail: virsh --connect qemu:///system migrate --live one-18 qemu+ssh://cn1/session
> Thu Jul 22 11:40:22 2010 [VMM][I]: STDERR follows.
> Thu Jul 22 11:40:22 2010 [VMM][I]: Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.
> Thu Jul 22 11:40:22 2010 [VMM][I]: error: cannot recv data: Connection reset by peer
> Thu Jul 22 11:40:22 2010 [VMM][I]: ExitCode: 1
> Thu Jul 22 11:40:22 2010 [VMM][E]: Error live-migrating VM, -
> Thu Jul 22 11:40:23 2010 [LCM][I]: Fail to life migrate VM. Assuming that the VM is still RUNNING (will poll VM).
> Thu Jul 22 11:40:23 2010 [VMM][D]: Monitor Information:
> .
> .
> .
> .
> .
> Thu Jul 22 15:09:04 2010 [LCM][I]: New VM state is MIGRATE
> Thu Jul 22 15:09:04 2010 [VMM][I]: Command execution fail: virsh --connect qemu:///system migrate --live one-18 qemu+ssh://cn1/session
> Thu Jul 22 15:09:04 2010 [VMM][I]: STDERR follows.
> Thu Jul 22 15:09:04 2010 [VMM][I]: Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.
> Thu Jul 22 15:09:04 2010 [VMM][I]: error: cannot recv data: Connection reset by peer
> Thu Jul 22 15:09:04 2010 [VMM][I]: ExitCode: 1
> Thu Jul 22 15:09:04 2010 [VMM][E]: Error live-migrating VM, -
> Thu Jul 22 15:09:05 2010 [LCM][I]: Fail to life migrate VM. Assuming that the VM is still RUNNING (will poll VM).
> Thu Jul 22 15:09:05 2010 [VMM][D]: Monitor Information:
> .
> .
> .
> .
> .
> Thu Jul 22 15:11:25 2010 [LCM][I]: New VM state is SAVE_MIGRATE
> Thu Jul 22 15:11:25 2010 [VMM][I]: Command execution fail: 'touch /srv/cloud/one/var//18/images/checkpoint;virsh --connect qemu:///system save one-18 /srv/cloud/one/var//18/images/checkpoint'
> Thu Jul 22 15:11:25 2010 [VMM][I]: STDERR follows.
> Thu Jul 22 15:11:25 2010 [VMM][I]: Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.
> Thu Jul 22 15:11:25 2010 [VMM][I]: error: Failed to save domain one-18 to /srv/cloud/one/var//18/images/checkpoint
> Thu Jul 22 15:11:25 2010 [VMM][I]: error: operation failed: failed to create '/srv/cloud/one/var//18/images/checkpoint'
> Thu Jul 22 15:11:25 2010 [VMM][I]: ExitCode: 1
> Thu Jul 22 15:11:25 2010 [VMM][E]: Error saving VM state, -
> Thu Jul 22 15:11:25 2010 [LCM][I]: Fail to save VM state while migrating. Assuming that the VM is still RUNNING (will poll VM).
> Thu Jul 22 15:11:26 2010 [VMM][I]: VM running but new state from monitor is PAUSED.
> Thu Jul 22 15:11:26 2010 [LCM][I]: VM is suspended.
> Thu Jul 22 15:11:26 2010 [DiM][I]: New VM state is SUSPENDED
> Thu Jul 22 15:13:20 2010 [DiM][I]: New VM state is ACTIVE.
> Thu Jul 22 15:13:20 2010 [LCM][I]: Restoring VM
> Thu Jul 22 15:13:20 2010 [LCM][I]: New state is BOOT
> Thu Jul 22 15:13:21 2010 [VMM][I]: Command execution fail: virsh --connect qemu:///system restore /srv/cloud/one/var//18/images/checkpoint
> Thu Jul 22 15:13:21 2010 [VMM][I]: STDERR follows.
> Thu Jul 22 15:13:21 2010 [VMM][I]: Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.
> Thu Jul 22 15:13:21 2010 [VMM][I]: error: Failed to restore domain from /srv/cloud/one/var//18/images/checkpoint
> Thu Jul 22 15:13:21 2010 [VMM][I]: error: operation failed: cannot read domain image
> Thu Jul 22 15:13:21 2010 [VMM][I]: ExitCode: 1
> Thu Jul 22 15:13:21 2010 [VMM][E]: Error restoring VM, -
> Thu Jul 22 15:13:21 2010 [DiM][I]: New VM state is FAILED
> Thu Jul 22 15:13:21 2010 [TM][W]: Ignored: LOG - 18 tm_delete.sh: Deleting /srv/cloud/one/var//18/images
>
> Thu Jul 22 15:13:21 2010 [TM][W]: Ignored: LOG - 18 tm_delete.sh: Executed "rm -rf /srv/cloud/one/var//18/images".
>
> Thu Jul 22 15:13:21 2010 [TM][W]: Ignored: TRANSFER SUCCESS 18 -
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>



More information about the Users mailing list