[one-users] Migration issues with OpenNebula 3.8.1

Fri Mar 8 06:12:47 PST 2013

It is very strange indeed, the whole directory is moved from one
machine to the other so there should be no images left behind.

First we will try to fix the problem with the checkpoint being owned
by root. I think you have configured /etc/libvirt/qemu.conf with
"dynamic_ownership = 1". Change it to 0 as stated in the doc [1].

Maybe these two problems are related and fixing one solves the other.

[1] http://opennebula.org/documentation:rel3.8:kvmg#kvm_configuration

On Mon, Mar 4, 2013 at 12:15 PM, Martin Herfurt
<martin.herfurt at toothr.com> wrote:
> Hi Javier,
> thank you for your response!
> Here is the VM-log of the failing migration:
>
> Mon Mar  4 12:05:08 2013 [VMM][I]: ExitCode: 0
> Mon Mar  4 12:05:08 2013 [VMM][D]: Monitor Information:
>         CPU   : 9
>         Memory: 2097152
>         Net_TX: 1550
>         Net_RX: 6594
> Mon Mar  4 12:05:14 2013 [LCM][I]: New VM state is SAVE_MIGRATE
> Mon Mar  4 12:05:45 2013 [VMM][I]: ExitCode: 0
> Mon Mar  4 12:05:45 2013 [VMM][I]: Successfully execute virtualization
> driver operation: save.
> Mon Mar  4 12:05:45 2013 [VMM][I]: ExitCode: 0
> Mon Mar  4 12:05:45 2013 [VMM][I]: Successfully execute network driver
> operation: clean.
> Mon Mar  4 12:05:45 2013 [LCM][I]: New VM state is PROLOG_MIGRATE
> Mon Mar  4 12:05:45 2013 [TM][I]: ExitCode: 0
> Mon Mar  4 12:05:51 2013 [TM][I]: mv: Moving
> server1:/var/lib/one/datastores/0/37 to server2:/var/lib/one/datastores/0/37
> Mon Mar  4 12:05:51 2013 [TM][I]: ExitCode: 0
> Mon Mar  4 12:05:51 2013 [LCM][I]: New VM state is BOOT
> Mon Mar  4 12:05:51 2013 [VMM][I]: ExitCode: 0
> Mon Mar  4 12:05:51 2013 [VMM][I]: Successfully execute network driver
> operation: pre.
> Mon Mar  4 12:05:51 2013 [VMM][I]: Command execution fail:
> /var/lib/one/vmm/kvm/restore /var/lib/one//datastores/0/37/checkpoint
> server2 37 server2
> Mon Mar  4 12:05:51 2013 [VMM][E]: restore: Command "virsh --connect
> qemu:///system restore /var/lib/one//datastores/0/37/checkpoint" failed:
> error: Failed to restore domain from
> /var/lib/one//datastores/0/37/checkpoint
> Mon Mar  4 12:05:51 2013 [VMM][I]: error: Unable to allow access for
> disk path /var/lib/one//datastores/0/37/disk.0: No such file or directory
> Mon Mar  4 12:05:51 2013 [VMM][E]: Could not restore from
> /var/lib/one//datastores/0/37/checkpoint
> Mon Mar  4 12:05:51 2013 [VMM][I]: ExitCode: 1
> Mon Mar  4 12:05:51 2013 [VMM][I]: Failed to execute virtualization
> driver operation: restore.
> Mon Mar  4 12:05:51 2013 [VMM][E]: Error restoring VM: Could not restore
> from /var/lib/one//datastores/0/37/checkpoint
> Mon Mar  4 12:05:52 2013 [DiM][I]: New VM state is FAILED
>
> Here are the configurations of the datastores on both systems:
> server1:
>
> [oneadmin at server1 ~]$ onedatastore show 0
> DATASTORE 0 INFORMATION
> ID             : 0
> NAME           : system
> USER           : oneadmin
> GROUP          : oneadmin
> CLUSTER        : myCluster
> DS_MAD         : -
> TM_MAD         : ssh
> BASE PATH      : /var/lib/one/datastores/0
>
> PERMISSIONS
> OWNER          : um-
> GROUP          : u--
> OTHER          : ---
>
> DATASTORE TEMPLATE
> DS_MAD="-"
> SYSTEM="YES"
> TM_MAD="ssh"
>
> IMAGES
>
> [oneadmin at server1 ~]$ onedatastore show 1
> DATASTORE 1 INFORMATION
> ID             : 1
> NAME           : default
> USER           : oneadmin
> GROUP          : oneadmin
> CLUSTER        : myCluster
> DS_MAD         : fs
> TM_MAD         : ssh
> BASE PATH      : /var/lib/one/datastores/1
>
> PERMISSIONS
> OWNER          : um-
> GROUP          : u--
> OTHER          : u--
>
> DATASTORE TEMPLATE
> DS_MAD="fs"
> TM_MAD="ssh"
>
> server2:
>
> [oneadmin at server2 ~]$ onedatastore show 0
> DATASTORE 0 INFORMATION
> ID             : 0
> NAME           : system
> USER           : oneadmin
> GROUP          : oneadmin
> CLUSTER        : myCluster
> DS_MAD         : -
> TM_MAD         : ssh
> BASE PATH      : /var/lib/one/datastores/0
>
> PERMISSIONS
> OWNER          : um-
> GROUP          : u--
> OTHER          : ---
>
> DATASTORE TEMPLATE
> DS_MAD="-"
> SYSTEM="YES"
> TM_MAD="ssh"
>
> IMAGES
>
> [oneadmin at server2 ~]$ onedatastore show 1
> DATASTORE 1 INFORMATION
> ID             : 1
> NAME           : default
> USER           : oneadmin
> GROUP          : oneadmin
> CLUSTER        : myCluster
> DS_MAD         : fs
> TM_MAD         : ssh
> BASE PATH      : /var/lib/one/datastores/1
>
> PERMISSIONS
> OWNER          : um-
> GROUP          : u--
> OTHER          : u--
>
> DATASTORE TEMPLATE
> DS_MAD="fs"
> TM_MAD="ssh"
>
> IMAGES
> 0
>
> When investigating the issue I find no disk.0 file in the destination
> datastore. IMHO that is the cause of the restore-failure, since the
> deployment.0 file has a reference to it.
>
> [oneadmin at server2 37]$ ls -lh /var/lib/one/datastores/0/37/
> total 119M
> -rw-rw-r-- 1 root     root     119M Mar  4  2013 checkpoint
> -rw-rw-r-- 1 oneadmin oneadmin  675 Mar  4  2013 deployment.0
>
> What confuses me, is that the checkpoint file is owned by root. Could
> this be part of the issue?
>
> Thanks for your help!
> Martin
>
> Am 3/4/2013 11:27, schrieb Javier Fontan:
>> Can you send us the log file of the VM that had this problem? Also
>> tell us the configuration you have for the storage, that is, the
>> drivers you have configured for both the system datastore (0) and the
>> datastore where you have the images.
>>
>> On Thu, Feb 28, 2013 at 3:34 PM, Martin Herfurt
>> <martin.herfurt at toothr.com> wrote:
>>> Hello,
>>> as I am currently starting to get acquainted with OpenNebula, I need some
>>> help with save migration.
>>>
>>> I have setup a cluster with two equal servers. I have installed CentOS 6.3
>>> with the OpenNebula Packages from the repository (version 3.8.1-2.6) on each
>>> of them. Using the sunstone frontend on server1, I was able to add the two
>>> hosts (server1 and server2) to the cluster. Starting to test with the
>>> minimal KVM image, I achieved to create a template and could deploy VMs on
>>> both hosts (datastore uses the ssh transfer manager).
>>>
>>> When trying to migrate a VM from server1 to server2, the VM status changes
>>> to SAVE_MIGRATE and after a few seconds to FAILED.
>>> After doing some research, I found out, that during the migration process,
>>> the disk-image is not transferred to the target host's system datastore. All
>>> there is is a file called checkpoint and a file that is called deployment.0
>>> - so the restore operation fails, because the disk.0 file is missing.
>>>
>>> Did anyone experience these issues as well or knows how the issue is caused?
>>>
>>> Thanks for your help,
>>> Martin
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opennebula.org
>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>
>>

-- 
Javier Fontán Muiños
Project Engineer
OpenNebula - The Open Source Toolkit for Data Center Virtualization
www.OpenNebula.org | jfontan at opennebula.org | @OpenNebula