[one-users] migration not working completely

Tino Vazquez tinova at fdi.ucm.es
Tue Jul 27 10:06:45 PDT 2010


Hi Ross,

This issue may come from the sVirt model. Let's try disabling it for now.

--
To disable sVirt, and revert to the basic level of AppArmor protection
(host protection only), the /etc/libvirt/qemu.conf file can be used to
change the setting to security_driver="none".
--

Regards,

-Tino

--
Constantino Vázquez Blanco | dsa-research.org/tinova
Virtualization Technology Engineer / Researcher
OpenNebula Toolkit | opennebula.org



On Tue, Jul 27, 2010 at 5:46 PM, Ross Nordeen <rjnordee at mtu.edu> wrote:
> here is the out put from:
> ~$ sudo /etc/init.d/apparmor status
>
> libvirt-cd735fe4-b5d9-f550-7576-bbac95b44d86 (enforce)
> /usr/sbin/tcpdump (enforce)
> /usr/sbin/libvirtd (enforce)
> /usr/lib/libvirt/virt-aa-helper (enforce)
> /usr/lib/connman/scripts/dhclient-script (enforce)
> /usr/lib/NetworkManager/nm-dhcp-client.action (enforce)
> /sbin/dhclient3 (enforce)
>
>
> for one-35 (a vm that has been suspended and resumed):
> $ virsh --connect qemu:///system dominfo one-35
> Id:             2
> Name:           one-35
> UUID:           3450f5d0-e0c7-a118-7259-0664c02df8fc
> OS Type:        hvm
> State:          paused
> CPU(s):         1
> CPU time:       1899.5s
> Max memory:     524288 kB
> Used memory:    524288 kB
> Autostart:      disable
> Security model: apparmor
> Security DOI:   0
> error: internal error Failed to get security label
>
>
> yes for one of my running nodes i get:
> $virsh --connect qemu:///system dominfo one-37
> Id:             5
> Name:           one-37
> UUID:           cd735fe4-b5d9-f550-7576-bbac95b44d86
> OS Type:        hvm
> State:          running
> CPU(s):         1
> CPU time:       0.7s
> Max memory:     524288 kB
> Used memory:    524288 kB
> Autostart:      disable
> Security model: apparmor
> Security DOI:   0
> Security label: libvirt-cd735fe4-b5d9-f550-7576-bbac95b44d86 (enforcing)
>
>
> -Ross
>
> ----- Original Message -----
> From: "Tino Vazquez" <tinova at fdi.ucm.es>
> To: "Ross Nordeen" <rjnordee at mtu.edu>
> Cc: "Jaime Melis" <j.melis at fdi.ucm.es>, users at lists.opennebula.org
> Sent: Tuesday, July 27, 2010 8:49:20 AM GMT -07:00 US/Canada Mountain
> Subject: Re: [one-users] migration not working completely
>
> Dear Ross,
>
> This look like an issue with libvirt. What happens if you manually issue an
>
> $ virsh --connect qemu:///system dominfo one-35
>
> in cn2?
>
> Regards,
>
> -Tino
>
> --
> Constantino Vázquez Blanco | dsa-research.org/tinova
> Virtualization Technology Engineer / Researcher
> OpenNebula Toolkit | opennebula.org
>
>
>
> On Tue, Jul 27, 2010 at 4:29 PM, Ross Nordeen <rjnordee at mtu.edu> wrote:
>>
>>
>> I added the lines to the end of the /etc/apparmor.d/abstractions/libvirt-qemu file and now the migration and suspension work! but now i  get these errors in the oned.long file, "internal error Failed to get security label"
>>
>> Tue Jul 27 08:17:01 2010 [DiM][D]: Suspending VM 35
>> Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachinePoolInfo method invoked
>> Tue Jul 27 08:17:01 2010 [ReM][D]: HostPoolInfo method invoked
>> Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachineInfo method invoked
>> Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachineInfo method invoked
>> Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachineInfo method invoked
>> Tue Jul 27 08:17:01 2010 [ReM][D]: VirtualMachineInfo method invoked
>> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 Command execution fail: 'touch /srv/cloud/one/var//35/images/checkpoint;virsh --connect qemu:///system save one-35 /srv/cloud/one/var//35/images/checkpoint'
>>
>> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 STDERR follows.
>>
>> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.
>>
>> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 error: Failed to save domain one-35 to /srv/cloud/one/var//35/images/checkpoint
>>
>> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 error: operation failed: failed to create '/srv/cloud/one/var//35/images/checkpoint'
>>
>> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: LOG - 35 ExitCode: 1
>>
>> Tue Jul 27 08:17:01 2010 [VMM][D]: Message received: SAVE FAILURE 35 -
>>
>> Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 Command execution fail: virsh --connect qemu:///system dominfo one-35
>>
>> Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 STDERR follows.
>>
>> Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.
>>
>> Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 error: internal error Failed to get security label
>>
>> Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: LOG - 35 ExitCode: 1
>>
>> Tue Jul 27 08:17:02 2010 [VMM][D]: Message received: POLL FAILURE 35 -
>>
>> Tue Jul 27 08:17:04 2010 [VMM][I]: Monitoring VM 35.
>> Tue Jul 27 08:17:04 2010 [VMM][I]: Monitoring VM 36.
>> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: POLL SUCCESS 36  STATE=a USEDMEMORY=524288
>>
>> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 Command execution fail: virsh --connect qemu:///system dominfo one-35
>>
>> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 STDERR follows.
>>
>> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.
>>
>> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 error: internal error Failed to get security label
>>
>> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: LOG - 35 ExitCode: 1
>>
>> Tue Jul 27 08:17:04 2010 [VMM][D]: Message received: POLL FAILURE 35 -
>>
>>
>> --
>> Ross Nordeen
>> Computer Networking And Systems Administration
>> Michigan Technological University
>> http://www.linkedin.com/in/rjnordee
>>
>> ----- Original Message -----
>> From: "Ross Nordeen" <rjnordee at mtu.edu>
>> To: "Jaime Melis" <j.melis at fdi.ucm.es>
>> Cc: users at lists.opennebula.org
>> Sent: Monday, July 26, 2010 9:54:59 AM GMT -07:00 US/Canada Mountain
>> Subject: Re: [one-users] migration not working completely
>>
>> Tino,
>>
>> I am using ubuntu 10.04.
>>
>> Jaime,
>>
>> I will try that and let you know if it worked as soon as we can get our air conditioner fixed here.
>>
>> --
>> Ross Nordeen
>> Computer Networking And Systems Administration
>> Michigan Technological University
>> http://www.linkedin.com/in/rjnordee
>>
>> ----- Original Message -----
>> From: "Jaime Melis" <j.melis at fdi.ucm.es>
>> To: "Tino Vazquez" <tinova at fdi.ucm.es>
>> Cc: "Ross Nordeen" <rjnordee at mtu.edu>, users at lists.opennebula.org
>> Sent: Monday, July 26, 2010 9:45:15 AM GMT -07:00 US/Canada Mountain
>> Subject: Re: [one-users] migration not working completly
>>
>> Hi Ross,
>>
>>
>> actually in my experience disabling apparmor won't work either. You will have to modify one of its configuration files in order to make it work.
>>
>> Add this:
>> -------8<--------
>> /srv/cloud/one/var/** rw,
>> ------->8--------
>> (If you have a different VMDIR change the above line accordingly).
>> To the end of /etc/apparmor.d/abstractions/libvirt-qemu
>> And restart the apparmor service.
>>
>>
>> Regards,
>> Jaime
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jul 26, 2010 at 5:30 PM, Tino Vazquez < tinova at fdi.ucm.es > wrote:
>>
>>
>> Hi Ross,
>>
>> Are you using Ubuntu per chance? It may be a issue with the apparmor
>> service, try disabling it to see if that is the one to blame. In case
>> it is, we can provide rules to disable this apparmor behavior.
>>
>> Regards,
>>
>>
>> -Tino
>>
>> --
>> Constantino Vázquez Blanco | dsa-research.org/tinova
>> Virtualization Technology Engineer / Researcher
>> OpenNebula Toolkit | opennebula.org
>>
>>
>>
>>
>>
>>
>> On Mon, Jul 26, 2010 at 5:13 PM, Ross Nordeen < rjnordee at mtu.edu > wrote:
>>> Tino,
>>>
>>> I figured out my live migrate problem which turned out to be a bad default gw. As far as the migration and check pointing though I have the /srv/cloud/one directory shared out to all nodes via nfs and full permissions for oneadmin... I think it is /srv/cloud/one/var/18. I will check the VM_DIR variable in the oned.conf file though and see if it is right. Still if everything else is working it seems like the VM_DIR is exported correctly and functioning for the running vm's.
>>>
>>> -Ross
>>>
>>> ----- Original Message -----
>>> From: "Tino Vazquez" < tinova at fdi.ucm.es >
>>> To: "Ross Nordeen" < rjnordee at mtu.edu >
>>> Cc: users at lists.opennebula.org
>>> Sent: Monday, July 26, 2010 8:41:37 AM GMT -07:00 US/Canada Mountain
>>> Subject: Re: [one-users] migration not working completly
>>>
>>> Hi Ross,
>>>
>>> There seems to be two issues here:
>>>
>>> 1) Not live/migrate between cn2 and cn1 --> could it be that the
>>> oneadmin user cannot passwordlessly ssh from cn2 to cn1, but it can
>>> from cn1 to cn2?
>>>
>>> 2) The save problem seems to come from the impossibility to save the
>>> checkpoint file. This may be due to the fact that /srv/cloud/one
>>> directory doesn't exist in the remote nodes, in which case you will
>>> need to use the VM_DIR variable in the oned.conf file.
>>>
>>> Hope it helps,
>>>
>>> -Tino
>>>
>>> --
>>> Constantino Vázquez Blanco | dsa-research.org/tinova
>>> Virtualization Technology Engineer / Researcher
>>> OpenNebula Toolkit | opennebula.org
>>>
>>>
>>>
>>> On Thu, Jul 22, 2010 at 11:39 PM, Ross Nordeen < rjnordee at mtu.edu > wrote:
>>>> I have open nebula deployed with one head node and 2 compute nodes, I have no problems live migrating from cn1 to cn2 but I get failures live/cold migrating from cn2 to cn1. is there any reason I would not able to a) not save the state of any of my machines and why live-migration works one way but not the other?? Thanks
>>>>
>>>> -Ross
>>>>
>>>>
>>>> here is my vm.log file after a live-migration, migration, and than suspend:
>>>>
>>>>
>>>> Thu Jul 22 11:40:22 2010 [LCM][I]: New VM state is MIGRATE
>>>> Thu Jul 22 11:40:22 2010 [VMM][I]: Command execution fail: virsh --connect qemu:///system migrate --live one-18 qemu+ssh://cn1/session
>>>> Thu Jul 22 11:40:22 2010 [VMM][I]: STDERR follows.
>>>> Thu Jul 22 11:40:22 2010 [VMM][I]: Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.
>>>> Thu Jul 22 11:40:22 2010 [VMM][I]: error: cannot recv data: Connection reset by peer
>>>> Thu Jul 22 11:40:22 2010 [VMM][I]: ExitCode: 1
>>>> Thu Jul 22 11:40:22 2010 [VMM][E]: Error live-migrating VM, -
>>>> Thu Jul 22 11:40:23 2010 [LCM][I]: Fail to life migrate VM. Assuming that the VM is still RUNNING (will poll VM).
>>>> Thu Jul 22 11:40:23 2010 [VMM][D]: Monitor Information:
>>>> .
>>>> .
>>>> .
>>>> .
>>>> .
>>>> Thu Jul 22 15:09:04 2010 [LCM][I]: New VM state is MIGRATE
>>>> Thu Jul 22 15:09:04 2010 [VMM][I]: Command execution fail: virsh --connect qemu:///system migrate --live one-18 qemu+ssh://cn1/session
>>>> Thu Jul 22 15:09:04 2010 [VMM][I]: STDERR follows.
>>>> Thu Jul 22 15:09:04 2010 [VMM][I]: Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.
>>>> Thu Jul 22 15:09:04 2010 [VMM][I]: error: cannot recv data: Connection reset by peer
>>>> Thu Jul 22 15:09:04 2010 [VMM][I]: ExitCode: 1
>>>> Thu Jul 22 15:09:04 2010 [VMM][E]: Error live-migrating VM, -
>>>> Thu Jul 22 15:09:05 2010 [LCM][I]: Fail to life migrate VM. Assuming that the VM is still RUNNING (will poll VM).
>>>> Thu Jul 22 15:09:05 2010 [VMM][D]: Monitor Information:
>>>> .
>>>> .
>>>> .
>>>> .
>>>> .
>>>> Thu Jul 22 15:11:25 2010 [LCM][I]: New VM state is SAVE_MIGRATE
>>>> Thu Jul 22 15:11:25 2010 [VMM][I]: Command execution fail: 'touch /srv/cloud/one/var//18/images/checkpoint;virsh --connect qemu:///system save one-18 /srv/cloud/one/var//18/images/checkpoint'
>>>> Thu Jul 22 15:11:25 2010 [VMM][I]: STDERR follows.
>>>> Thu Jul 22 15:11:25 2010 [VMM][I]: Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.
>>>> Thu Jul 22 15:11:25 2010 [VMM][I]: error: Failed to save domain one-18 to /srv/cloud/one/var//18/images/checkpoint
>>>> Thu Jul 22 15:11:25 2010 [VMM][I]: error: operation failed: failed to create '/srv/cloud/one/var//18/images/checkpoint'
>>>> Thu Jul 22 15:11:25 2010 [VMM][I]: ExitCode: 1
>>>> Thu Jul 22 15:11:25 2010 [VMM][E]: Error saving VM state, -
>>>> Thu Jul 22 15:11:25 2010 [LCM][I]: Fail to save VM state while migrating. Assuming that the VM is still RUNNING (will poll VM).
>>>> Thu Jul 22 15:11:26 2010 [VMM][I]: VM running but new state from monitor is PAUSED.
>>>> Thu Jul 22 15:11:26 2010 [LCM][I]: VM is suspended.
>>>> Thu Jul 22 15:11:26 2010 [DiM][I]: New VM state is SUSPENDED
>>>> Thu Jul 22 15:13:20 2010 [DiM][I]: New VM state is ACTIVE.
>>>> Thu Jul 22 15:13:20 2010 [LCM][I]: Restoring VM
>>>> Thu Jul 22 15:13:20 2010 [LCM][I]: New state is BOOT
>>>> Thu Jul 22 15:13:21 2010 [VMM][I]: Command execution fail: virsh --connect qemu:///system restore /srv/cloud/one/var//18/images/checkpoint
>>>> Thu Jul 22 15:13:21 2010 [VMM][I]: STDERR follows.
>>>> Thu Jul 22 15:13:21 2010 [VMM][I]: Warning: Permanently added 'cn2,192.168.1.105' (RSA) to the list of known hosts.
>>>> Thu Jul 22 15:13:21 2010 [VMM][I]: error: Failed to restore domain from /srv/cloud/one/var//18/images/checkpoint
>>>> Thu Jul 22 15:13:21 2010 [VMM][I]: error: operation failed: cannot read domain image
>>>> Thu Jul 22 15:13:21 2010 [VMM][I]: ExitCode: 1
>>>> Thu Jul 22 15:13:21 2010 [VMM][E]: Error restoring VM, -
>>>> Thu Jul 22 15:13:21 2010 [DiM][I]: New VM state is FAILED
>>>> Thu Jul 22 15:13:21 2010 [TM][W]: Ignored: LOG - 18 tm_delete.sh: Deleting /srv/cloud/one/var//18/images
>>>>
>>>> Thu Jul 22 15:13:21 2010 [TM][W]: Ignored: LOG - 18 tm_delete.sh: Executed "rm -rf /srv/cloud/one/var//18/images".
>>>>
>>>> Thu Jul 22 15:13:21 2010 [TM][W]: Ignored: TRANSFER SUCCESS 18 -
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at lists.opennebula.org
>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>
>>>
>> _______________________________________________
>> Users mailing list
>> Users at lists.opennebula.org
>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>
>> _______________________________________________
>> Users mailing list
>> Users at lists.opennebula.org
>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>> _______________________________________________
>> Users mailing list
>> Users at lists.opennebula.org
>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>
>



More information about the Users mailing list