[one-users] migration not working completely

Ross Nordeen rjnordee at mtu.edu
Tue Jul 27 08:46:09 PDT 2010


here is the out put from:
~$ sudo /etc/init.d/apparmor status

libvirt-cd735fe4-b5d9-f550-7576-bbac95b44d86 (enforce)
/usr/sbin/tcpdump (enforce)
/usr/sbin/libvirtd (enforce)
/usr/lib/libvirt/virt-aa-helper (enforce)
/usr/lib/connman/scripts/dhclient-script (enforce)
/usr/lib/NetworkManager/nm-dhcp-client.action (enforce)
/sbin/dhclient3 (enforce)


for one-35 (a vm that has been suspended and resumed):
$ virsh --connect qemu:///system dominfo one-35
Id:             2
Name:           one-35
UUID:           3450f5d0-e0c7-a118-7259-0664c02df8fc
OS Type:        hvm
State:          paused
CPU(s):         1
CPU time:       1899.5s
Max memory:     524288 kB
Used memory:    524288 kB
Autostart:      disable
Security model: apparmor
Security DOI:   0
error: internal error Failed to get security label


yes for one of my running nodes i get:
$virsh --connect qemu:///system dominfo one-37
Id:             5
Name:           one-37
UUID:           cd735fe4-b5d9-f550-7576-bbac95b44d86
OS Type:        hvm
State:          running
CPU(s):         1
CPU time:       0.7s
Max memory:     524288 kB
Used memory:    524288 kB
Autostart:      disable
Security model: apparmor
Security DOI:   0
Security label: libvirt-cd735fe4-b5d9-f550-7576-bbac95b44d86 (enforcing)


-Ross 

----- Original Message -----
From: "Tino Vazquez" <tinova at fdi.ucm.es>
To: "Ross Nordeen" <rjnordee at mtu.edu>
Cc: "Jaime Melis" <j.melis at fdi.ucm.es>, users at lists.opennebula.org
Sent: Tuesday, July 27, 2010 8:49:20 AM GMT -07:00 US/Canada Mountain
Subject: Re: [one-users] migration not working completely

Dear Ross,

This look like an issue with libvirt. What happens if you manually issue an

$ virsh --connect qemu:///system dominfo one-35

in cn2?

Regards,

-Tino

--
Constantino Vázquez Blanco | dsa-research.org/tinova
Virtualization Technology Engineer / Researcher
OpenNebula Toolkit | opennebula.org



On Tue, Jul 27, 2010 at 4:29 PM, Ross Nordeen <rjnordee at mtu.edu> wrote:
>
>
> I added the lines to the end of the /etc/apparmor.d/abstractions/libvirt-qemu file and now the migration and suspension work! but now i  get these errors in the oned.long file, "internal error Failed to get security label"
>
> Tue Jul 27 08:17:01 2010 [DiM][D]: Suspending VM 35
> Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachinePoolInfo method invoked
> Tue Jul 27 08:17:01 2010 [ReM][D]: HostPoolInfo method invoked
> Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachineInfo method invoked
> Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachineInfo method invoked
> Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachineInfo method invoked
> Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachineInfo method invoked
> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 Command execution fail: 'touch /srv/cloud/one/var//35/images/checkpoint;virsh --connect qemu:///system save one-35 /srv/cloud/one/var//35/images/checkpoint'
>
> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 STDERR follows.
>
> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.
>
> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 error: Failed to save domain one-35 to /srv/cloud/one/var//35/images/checkpoint
>
> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 error: operation failed: failed to create '/srv/cloud/one/var//35/images/checkpoint'
>
> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 ExitCode: 1
>
> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: SAVE FAILURE 35 -
>
> Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 Command execution fail: virsh --connect qemu:///system dominfo one-35
>
> Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 STDERR follows.
>
> Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.
>
> Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 error: internal error Failed to get security label
>
> Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 ExitCode: 1
>
> Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: POLL FAILURE 35 -
>
> Tue Jul 27 08:17:04 2010 [VMM][I]: Monitoring VM 35.
> Tue Jul 27 08:17:04 2010 [VMM][I]: Monitoring VM 36.
> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: POLL SUCCESS 36  STATE=a USEDMEMORY=524288
>
> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 Command execution fail: virsh --connect qemu:///system dominfo one-35
>
> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 STDERR follows.
>
> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.
>
> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 error: internal error Failed to get security label
>
> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 ExitCode: 1
>
> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: POLL FAILURE 35 -
>
>
> --
> Ross Nordeen
> Computer Networking And Systems Administration
> Michigan Technological University
> http://www.linkedin.com/in/rjnordee
>
> ----- Original Message -----
> From: "Ross Nordeen" <rjnordee at mtu.edu>
> To: "Jaime Melis" <j.melis at fdi.ucm.es>
> Cc: users at lists.opennebula.org
> Sent: Monday, July 26, 2010 9:54:59 AM GMT -07:00 US/Canada Mountain
> Subject: Re: [one-users] migration not working completely
>
> Tino,
>
> I am using ubuntu 10.04.
>
> Jaime,
>
> I will try that and let you know if it worked as soon as we can get our air conditioner fixed here.
>
> --
> Ross Nordeen
> Computer Networking And Systems Administration
> Michigan Technological University
> http://www.linkedin.com/in/rjnordee
>
> ----- Original Message -----
> From: "Jaime Melis" <j.melis at fdi.ucm.es>
> To: "Tino Vazquez" <tinova at fdi.ucm.es>
> Cc: "Ross Nordeen" <rjnordee at mtu.edu>, users at lists.opennebula.org
> Sent: Monday, July 26, 2010 9:45:15 AM GMT -07:00 US/Canada Mountain
> Subject: Re: [one-users] migration not working completly
>
> Hi Ross,
>
>
> actually in my experience disabling apparmor won't work either. You will have to modify one of its configuration files in order to make it work.
>
> Add this:
> -------8<--------
> /srv/cloud/one/var/** rw,
> ------->8--------
> (If you have a different VMDIR change the above line accordingly).
> To the end of /etc/apparmor.d/abstractions/libvirt-qemu
> And restart the apparmor service.
>
>
> Regards,
> Jaime
>
>
>
>
>
>
>
>
> On Mon, Jul 26, 2010 at 5:30 PM, Tino Vazquez < tinova at fdi.ucm.es > wrote:
>
>
> Hi Ross,
>
> Are you using Ubuntu per chance? It may be a issue with the apparmor
> service, try disabling it to see if that is the one to blame. In case
> it is, we can provide rules to disable this apparmor behavior.
>
> Regards,
>
>
> -Tino
>
> --
> Constantino Vázquez Blanco | dsa-research.org/tinova
> Virtualization Technology Engineer / Researcher
> OpenNebula Toolkit | opennebula.org
>
>
>
>
>
>
> On Mon, Jul 26, 2010 at 5:13 PM, Ross Nordeen < rjnordee at mtu.edu > wrote:
>> Tino,
>>
>> I figured out my live migrate problem which turned out to be a bad default gw. As far as the migration and check pointing though I have the /srv/cloud/one directory shared out to all nodes via nfs and full permissions for oneadmin... I think it is /srv/cloud/one/var/18. I will check the VM_DIR variable in the oned.conf file though and see if it is right. Still if everything else is working it seems like the VM_DIR is exported correctly and functioning for the running vm's.
>>
>> -Ross
>>
>> ----- Original Message -----
>> From: "Tino Vazquez" < tinova at fdi.ucm.es >
>> To: "Ross Nordeen" < rjnordee at mtu.edu >
>> Cc: users at lists.opennebula.org
>> Sent: Monday, July 26, 2010 8:41:37 AM GMT -07:00 US/Canada Mountain
>> Subject: Re: [one-users] migration not working completly
>>
>> Hi Ross,
>>
>> There seems to be two issues here:
>>
>> 1) Not live/migrate between cn2 and cn1 --> could it be that the
>> oneadmin user cannot passwordlessly ssh from cn2 to cn1, but it can
>> from cn1 to cn2?
>>
>> 2) The save problem seems to come from the impossibility to save the
>> checkpoint file. This may be due to the fact that /srv/cloud/one
>> directory doesn't exist in the remote nodes, in which case you will
>> need to use the VM_DIR variable in the oned.conf file.
>>
>> Hope it helps,
>>
>> -Tino
>>
>> --
>> Constantino Vázquez Blanco | dsa-research.org/tinova
>> Virtualization Technology Engineer / Researcher
>> OpenNebula Toolkit | opennebula.org
>>
>>
>>
>> On Thu, Jul 22, 2010 at 11:39 PM, Ross Nordeen < rjnordee at mtu.edu > wrote:
>>> I have open nebula deployed with one head node and 2 compute nodes, I have no problems live migrating from cn1 to cn2 but I get failures live/cold migrating from cn2 to cn1. is there any reason I would not able to a) not save the state of any of my machines and why live-migration works one way but not the other?? Thanks
>>>
>>> -Ross
>>>
>>>
>>> here is my vm.log file after a live-migration, migration, and than suspend:
>>>
>>>
>>> Thu Jul 22 11:40:22 2010 [LCM][I]: New VM state is MIGRATE
>>> Thu Jul 22 11:40:22 2010 [VMM][I]: Command execution fail: virsh --connect qemu:///system migrate --live one-18 qemu+ssh://cn1/session
>>> Thu Jul 22 11:40:22 2010 [VMM][I]: STDERR follows.
>>> Thu Jul 22 11:40:22 2010 [VMM][I]: Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.
>>> Thu Jul 22 11:40:22 2010 [VMM][I]: error: cannot recv data: Connection reset by peer
>>> Thu Jul 22 11:40:22 2010 [VMM][I]: ExitCode: 1
>>> Thu Jul 22 11:40:22 2010 [VMM][E]: Error live-migrating VM, -
>>> Thu Jul 22 11:40:23 2010 [LCM][I]: Fail to life migrate VM. Assuming that the VM is still RUNNING (will poll VM).
>>> Thu Jul 22 11:40:23 2010 [VMM][D]: Monitor Information:
>>> .
>>> .
>>> .
>>> .
>>> .
>>> Thu Jul 22 15:09:04 2010 [LCM][I]: New VM state is MIGRATE
>>> Thu Jul 22 15:09:04 2010 [VMM][I]: Command execution fail: virsh --connect qemu:///system migrate --live one-18 qemu+ssh://cn1/session
>>> Thu Jul 22 15:09:04 2010 [VMM][I]: STDERR follows.
>>> Thu Jul 22 15:09:04 2010 [VMM][I]: Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.
>>> Thu Jul 22 15:09:04 2010 [VMM][I]: error: cannot recv data: Connection reset by peer
>>> Thu Jul 22 15:09:04 2010 [VMM][I]: ExitCode: 1
>>> Thu Jul 22 15:09:04 2010 [VMM][E]: Error live-migrating VM, -
>>> Thu Jul 22 15:09:05 2010 [LCM][I]: Fail to life migrate VM. Assuming that the VM is still RUNNING (will poll VM).
>>> Thu Jul 22 15:09:05 2010 [VMM][D]: Monitor Information:
>>> .
>>> .
>>> .
>>> .
>>> .
>>> Thu Jul 22 15:11:25 2010 [LCM][I]: New VM state is SAVE_MIGRATE
>>> Thu Jul 22 15:11:25 2010 [VMM][I]: Command execution fail: 'touch /srv/cloud/one/var//18/images/checkpoint;virsh --connect qemu:///system save one-18 /srv/cloud/one/var//18/images/checkpoint'
>>> Thu Jul 22 15:11:25 2010 [VMM][I]: STDERR follows.
>>> Thu Jul 22 15:11:25 2010 [VMM][I]: Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.
>>> Thu Jul 22 15:11:25 2010 [VMM][I]: error: Failed to save domain one-18 to /srv/cloud/one/var//18/images/checkpoint
>>> Thu Jul 22 15:11:25 2010 [VMM][I]: error: operation failed: failed to create '/srv/cloud/one/var//18/images/checkpoint'
>>> Thu Jul 22 15:11:25 2010 [VMM][I]: ExitCode: 1
>>> Thu Jul 22 15:11:25 2010 [VMM][E]: Error saving VM state, -
>>> Thu Jul 22 15:11:25 2010 [LCM][I]: Fail to save VM state while migrating. Assuming that the VM is still RUNNING (will poll VM).
>>> Thu Jul 22 15:11:26 2010 [VMM][I]: VM running but new state from monitor is PAUSED.
>>> Thu Jul 22 15:11:26 2010 [LCM][I]: VM is suspended.
>>> Thu Jul 22 15:11:26 2010 [DiM][I]: New VM state is SUSPENDED
>>> Thu Jul 22 15:13:20 2010 [DiM][I]: New VM state is ACTIVE.
>>> Thu Jul 22 15:13:20 2010 [LCM][I]: Restoring VM
>>> Thu Jul 22 15:13:20 2010 [LCM][I]: New state is BOOT
>>> Thu Jul 22 15:13:21 2010 [VMM][I]: Command execution fail: virsh --connect qemu:///system restore /srv/cloud/one/var//18/images/checkpoint
>>> Thu Jul 22 15:13:21 2010 [VMM][I]: STDERR follows.
>>> Thu Jul 22 15:13:21 2010 [VMM][I]: Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.
>>> Thu Jul 22 15:13:21 2010 [VMM][I]: error: Failed to restore domain from /srv/cloud/one/var//18/images/checkpoint
>>> Thu Jul 22 15:13:21 2010 [VMM][I]: error: operation failed: cannot read domain image
>>> Thu Jul 22 15:13:21 2010 [VMM][I]: ExitCode: 1
>>> Thu Jul 22 15:13:21 2010 [VMM][E]: Error restoring VM, -
>>> Thu Jul 22 15:13:21 2010 [DiM][I]: New VM state is FAILED
>>> Thu Jul 22 15:13:21 2010 [TM][W]: Ignored: LOG - 18 tm_delete.sh: Deleting /srv/cloud/one/var//18/images
>>>
>>> Thu Jul 22 15:13:21 2010 [TM][W]: Ignored: LOG - 18 tm_delete.sh: Executed "rm -rf /srv/cloud/one/var//18/images".
>>>
>>> Thu Jul 22 15:13:21 2010 [TM][W]: Ignored: TRANSFER SUCCESS 18 -
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opennebula.org
>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>
>>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>



More information about the Users mailing list