[one-users] migration not working completely

Ross Nordeen rjnordee at mtu.edu
Tue Jul 27 07:29:22 PDT 2010



I added the lines to the end of the /etc/apparmor.d/abstractions/libvirt-qemu file and now the migration and suspension work! but now i  get these errors in the oned.long file, "internal error Failed to get security label"

Tue Jul 27 08:17:01 2010 [DiM][D]: Suspending VM 35
Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachinePoolInfo method invoked
Tue Jul 27 08:17:01 2010 [ReM][D]: HostPoolInfo method invoked
Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachineInfo method invoked
Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachineInfo method invoked
Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachineInfo method invoked
Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachineInfo method invoked
Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 Command execution fail: 'touch /srv/cloud/one/var//35/images/checkpoint;virsh --connect qemu:///system save one-35 /srv/cloud/one/var//35/images/checkpoint'

Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 STDERR follows.

Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.

Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 error: Failed to save domain one-35 to /srv/cloud/one/var//35/images/checkpoint

Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 error: operation failed: failed to create '/srv/cloud/one/var//35/images/checkpoint'

Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 ExitCode: 1

Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: SAVE FAILURE 35 -

Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 Command execution fail: virsh --connect qemu:///system dominfo one-35

Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 STDERR follows.

Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.

Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 error: internal error Failed to get security label

Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 ExitCode: 1

Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: POLL FAILURE 35 -

Tue Jul 27 08:17:04 2010 [VMM][I]: Monitoring VM 35.
Tue Jul 27 08:17:04 2010 [VMM][I]: Monitoring VM 36.
Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: POLL SUCCESS 36  STATE=a USEDMEMORY=524288

Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 Command execution fail: virsh --connect qemu:///system dominfo one-35

Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 STDERR follows.

Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.

Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 error: internal error Failed to get security label

Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 ExitCode: 1

Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: POLL FAILURE 35 -


--
Ross Nordeen
Computer Networking And Systems Administration
Michigan Technological University
http://www.linkedin.com/in/rjnordee

----- Original Message -----
From: "Ross Nordeen" <rjnordee at mtu.edu>
To: "Jaime Melis" <j.melis at fdi.ucm.es>
Cc: users at lists.opennebula.org
Sent: Monday, July 26, 2010 9:54:59 AM GMT -07:00 US/Canada Mountain
Subject: Re: [one-users] migration not working completely

Tino,

I am using ubuntu 10.04.

Jaime,

I will try that and let you know if it worked as soon as we can get our air conditioner fixed here.

--
Ross Nordeen
Computer Networking And Systems Administration
Michigan Technological University
http://www.linkedin.com/in/rjnordee

----- Original Message -----
From: "Jaime Melis" <j.melis at fdi.ucm.es>
To: "Tino Vazquez" <tinova at fdi.ucm.es>
Cc: "Ross Nordeen" <rjnordee at mtu.edu>, users at lists.opennebula.org
Sent: Monday, July 26, 2010 9:45:15 AM GMT -07:00 US/Canada Mountain
Subject: Re: [one-users] migration not working completly

Hi Ross, 


actually in my experience disabling apparmor won't work either. You will have to modify one of its configuration files in order to make it work. 

Add this: 
-------8<-------- 
/srv/cloud/one/var/** rw, 
------->8-------- 
(If you have a different VMDIR change the above line accordingly). 
To the end of /etc/apparmor.d/abstractions/libvirt-qemu 
And restart the apparmor service. 


Regards, 
Jaime 








On Mon, Jul 26, 2010 at 5:30 PM, Tino Vazquez < tinova at fdi.ucm.es > wrote: 


Hi Ross, 

Are you using Ubuntu per chance? It may be a issue with the apparmor 
service, try disabling it to see if that is the one to blame. In case 
it is, we can provide rules to disable this apparmor behavior. 

Regards, 


-Tino 

-- 
Constantino Vázquez Blanco | dsa-research.org/tinova 
Virtualization Technology Engineer / Researcher 
OpenNebula Toolkit | opennebula.org 






On Mon, Jul 26, 2010 at 5:13 PM, Ross Nordeen < rjnordee at mtu.edu > wrote: 
> Tino, 
> 
> I figured out my live migrate problem which turned out to be a bad default gw. As far as the migration and check pointing though I have the /srv/cloud/one directory shared out to all nodes via nfs and full permissions for oneadmin... I think it is /srv/cloud/one/var/18. I will check the VM_DIR variable in the oned.conf file though and see if it is right. Still if everything else is working it seems like the VM_DIR is exported correctly and functioning for the running vm's. 
> 
> -Ross 
> 
> ----- Original Message ----- 
> From: "Tino Vazquez" < tinova at fdi.ucm.es > 
> To: "Ross Nordeen" < rjnordee at mtu.edu > 
> Cc: users at lists.opennebula.org 
> Sent: Monday, July 26, 2010 8:41:37 AM GMT -07:00 US/Canada Mountain 
> Subject: Re: [one-users] migration not working completly 
> 
> Hi Ross, 
> 
> There seems to be two issues here: 
> 
> 1) Not live/migrate between cn2 and cn1 --> could it be that the 
> oneadmin user cannot passwordlessly ssh from cn2 to cn1, but it can 
> from cn1 to cn2? 
> 
> 2) The save problem seems to come from the impossibility to save the 
> checkpoint file. This may be due to the fact that /srv/cloud/one 
> directory doesn't exist in the remote nodes, in which case you will 
> need to use the VM_DIR variable in the oned.conf file. 
> 
> Hope it helps, 
> 
> -Tino 
> 
> -- 
> Constantino Vázquez Blanco | dsa-research.org/tinova 
> Virtualization Technology Engineer / Researcher 
> OpenNebula Toolkit | opennebula.org 
> 
> 
> 
> On Thu, Jul 22, 2010 at 11:39 PM, Ross Nordeen < rjnordee at mtu.edu > wrote: 
>> I have open nebula deployed with one head node and 2 compute nodes, I have no problems live migrating from cn1 to cn2 but I get failures live/cold migrating from cn2 to cn1. is there any reason I would not able to a) not save the state of any of my machines and why live-migration works one way but not the other?? Thanks 
>> 
>> -Ross 
>> 
>> 
>> here is my vm.log file after a live-migration, migration, and than suspend: 
>> 
>> 
>> Thu Jul 22 11:40:22 2010 [LCM][I]: New VM state is MIGRATE 
>> Thu Jul 22 11:40:22 2010 [VMM][I]: Command execution fail: virsh --connect qemu:///system migrate --live one-18 qemu+ssh://cn1/session 
>> Thu Jul 22 11:40:22 2010 [VMM][I]: STDERR follows. 
>> Thu Jul 22 11:40:22 2010 [VMM][I]: Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts. 
>> Thu Jul 22 11:40:22 2010 [VMM][I]: error: cannot recv data: Connection reset by peer 
>> Thu Jul 22 11:40:22 2010 [VMM][I]: ExitCode: 1 
>> Thu Jul 22 11:40:22 2010 [VMM][E]: Error live-migrating VM, - 
>> Thu Jul 22 11:40:23 2010 [LCM][I]: Fail to life migrate VM. Assuming that the VM is still RUNNING (will poll VM). 
>> Thu Jul 22 11:40:23 2010 [VMM][D]: Monitor Information: 
>> . 
>> . 
>> . 
>> . 
>> . 
>> Thu Jul 22 15:09:04 2010 [LCM][I]: New VM state is MIGRATE 
>> Thu Jul 22 15:09:04 2010 [VMM][I]: Command execution fail: virsh --connect qemu:///system migrate --live one-18 qemu+ssh://cn1/session 
>> Thu Jul 22 15:09:04 2010 [VMM][I]: STDERR follows. 
>> Thu Jul 22 15:09:04 2010 [VMM][I]: Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts. 
>> Thu Jul 22 15:09:04 2010 [VMM][I]: error: cannot recv data: Connection reset by peer 
>> Thu Jul 22 15:09:04 2010 [VMM][I]: ExitCode: 1 
>> Thu Jul 22 15:09:04 2010 [VMM][E]: Error live-migrating VM, - 
>> Thu Jul 22 15:09:05 2010 [LCM][I]: Fail to life migrate VM. Assuming that the VM is still RUNNING (will poll VM). 
>> Thu Jul 22 15:09:05 2010 [VMM][D]: Monitor Information: 
>> . 
>> . 
>> . 
>> . 
>> . 
>> Thu Jul 22 15:11:25 2010 [LCM][I]: New VM state is SAVE_MIGRATE 
>> Thu Jul 22 15:11:25 2010 [VMM][I]: Command execution fail: 'touch /srv/cloud/one/var//18/images/checkpoint;virsh --connect qemu:///system save one-18 /srv/cloud/one/var//18/images/checkpoint' 
>> Thu Jul 22 15:11:25 2010 [VMM][I]: STDERR follows. 
>> Thu Jul 22 15:11:25 2010 [VMM][I]: Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts. 
>> Thu Jul 22 15:11:25 2010 [VMM][I]: error: Failed to save domain one-18 to /srv/cloud/one/var//18/images/checkpoint 
>> Thu Jul 22 15:11:25 2010 [VMM][I]: error: operation failed: failed to create '/srv/cloud/one/var//18/images/checkpoint' 
>> Thu Jul 22 15:11:25 2010 [VMM][I]: ExitCode: 1 
>> Thu Jul 22 15:11:25 2010 [VMM][E]: Error saving VM state, - 
>> Thu Jul 22 15:11:25 2010 [LCM][I]: Fail to save VM state while migrating. Assuming that the VM is still RUNNING (will poll VM). 
>> Thu Jul 22 15:11:26 2010 [VMM][I]: VM running but new state from monitor is PAUSED. 
>> Thu Jul 22 15:11:26 2010 [LCM][I]: VM is suspended. 
>> Thu Jul 22 15:11:26 2010 [DiM][I]: New VM state is SUSPENDED 
>> Thu Jul 22 15:13:20 2010 [DiM][I]: New VM state is ACTIVE. 
>> Thu Jul 22 15:13:20 2010 [LCM][I]: Restoring VM 
>> Thu Jul 22 15:13:20 2010 [LCM][I]: New state is BOOT 
>> Thu Jul 22 15:13:21 2010 [VMM][I]: Command execution fail: virsh --connect qemu:///system restore /srv/cloud/one/var//18/images/checkpoint 
>> Thu Jul 22 15:13:21 2010 [VMM][I]: STDERR follows. 
>> Thu Jul 22 15:13:21 2010 [VMM][I]: Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts. 
>> Thu Jul 22 15:13:21 2010 [VMM][I]: error: Failed to restore domain from /srv/cloud/one/var//18/images/checkpoint 
>> Thu Jul 22 15:13:21 2010 [VMM][I]: error: operation failed: cannot read domain image 
>> Thu Jul 22 15:13:21 2010 [VMM][I]: ExitCode: 1 
>> Thu Jul 22 15:13:21 2010 [VMM][E]: Error restoring VM, - 
>> Thu Jul 22 15:13:21 2010 [DiM][I]: New VM state is FAILED 
>> Thu Jul 22 15:13:21 2010 [TM][W]: Ignored: LOG - 18 tm_delete.sh: Deleting /srv/cloud/one/var//18/images 
>> 
>> Thu Jul 22 15:13:21 2010 [TM][W]: Ignored: LOG - 18 tm_delete.sh: Executed "rm -rf /srv/cloud/one/var//18/images". 
>> 
>> Thu Jul 22 15:13:21 2010 [TM][W]: Ignored: TRANSFER SUCCESS 18 - 
>> 
>> _______________________________________________ 
>> Users mailing list 
>> Users at lists.opennebula.org 
>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org 
>> 
> 
_______________________________________________ 
Users mailing list 
Users at lists.opennebula.org 
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org 

_______________________________________________
Users mailing list
Users at lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org



More information about the Users mailing list