[one-users] Migration issues with OpenNebula 3.8.1

Mon Mar 4 03:15:05 PST 2013

Hi Javier,
thank you for your response!
Here is the VM-log of the failing migration:

Mon Mar  4 12:05:08 2013 [VMM][I]: ExitCode: 0
Mon Mar  4 12:05:08 2013 [VMM][D]: Monitor Information:
        CPU   : 9
        Memory: 2097152
        Net_TX: 1550
        Net_RX: 6594
Mon Mar  4 12:05:14 2013 [LCM][I]: New VM state is SAVE_MIGRATE
Mon Mar  4 12:05:45 2013 [VMM][I]: ExitCode: 0
Mon Mar  4 12:05:45 2013 [VMM][I]: Successfully execute virtualization
driver operation: save.
Mon Mar  4 12:05:45 2013 [VMM][I]: ExitCode: 0
Mon Mar  4 12:05:45 2013 [VMM][I]: Successfully execute network driver
operation: clean.
Mon Mar  4 12:05:45 2013 [LCM][I]: New VM state is PROLOG_MIGRATE
Mon Mar  4 12:05:45 2013 [TM][I]: ExitCode: 0
Mon Mar  4 12:05:51 2013 [TM][I]: mv: Moving
server1:/var/lib/one/datastores/0/37 to server2:/var/lib/one/datastores/0/37
Mon Mar  4 12:05:51 2013 [TM][I]: ExitCode: 0
Mon Mar  4 12:05:51 2013 [LCM][I]: New VM state is BOOT
Mon Mar  4 12:05:51 2013 [VMM][I]: ExitCode: 0
Mon Mar  4 12:05:51 2013 [VMM][I]: Successfully execute network driver
operation: pre.
Mon Mar  4 12:05:51 2013 [VMM][I]: Command execution fail:
/var/lib/one/vmm/kvm/restore /var/lib/one//datastores/0/37/checkpoint
server2 37 server2
Mon Mar  4 12:05:51 2013 [VMM][E]: restore: Command "virsh --connect
qemu:///system restore /var/lib/one//datastores/0/37/checkpoint" failed:
error: Failed to restore domain from
/var/lib/one//datastores/0/37/checkpoint
Mon Mar  4 12:05:51 2013 [VMM][I]: error: Unable to allow access for
disk path /var/lib/one//datastores/0/37/disk.0: No such file or directory
Mon Mar  4 12:05:51 2013 [VMM][E]: Could not restore from
/var/lib/one//datastores/0/37/checkpoint
Mon Mar  4 12:05:51 2013 [VMM][I]: ExitCode: 1
Mon Mar  4 12:05:51 2013 [VMM][I]: Failed to execute virtualization
driver operation: restore.
Mon Mar  4 12:05:51 2013 [VMM][E]: Error restoring VM: Could not restore
from /var/lib/one//datastores/0/37/checkpoint
Mon Mar  4 12:05:52 2013 [DiM][I]: New VM state is FAILED

Here are the configurations of the datastores on both systems:
server1:

[oneadmin at server1 ~]$ onedatastore show 0
DATASTORE 0 INFORMATION
ID             : 0
NAME           : system
USER           : oneadmin
GROUP          : oneadmin
CLUSTER        : myCluster
DS_MAD         : -
TM_MAD         : ssh
BASE PATH      : /var/lib/one/datastores/0

PERMISSIONS
OWNER          : um-
GROUP          : u--
OTHER          : ---

DATASTORE TEMPLATE
DS_MAD="-"
SYSTEM="YES"
TM_MAD="ssh"

IMAGES

[oneadmin at server1 ~]$ onedatastore show 1
DATASTORE 1 INFORMATION
ID             : 1
NAME           : default
USER           : oneadmin
GROUP          : oneadmin
CLUSTER        : myCluster
DS_MAD         : fs
TM_MAD         : ssh
BASE PATH      : /var/lib/one/datastores/1

PERMISSIONS
OWNER          : um-
GROUP          : u--
OTHER          : u--

DATASTORE TEMPLATE
DS_MAD="fs"
TM_MAD="ssh"

server2:

[oneadmin at server2 ~]$ onedatastore show 0
DATASTORE 0 INFORMATION
ID             : 0
NAME           : system
USER           : oneadmin
GROUP          : oneadmin
CLUSTER        : myCluster
DS_MAD         : -
TM_MAD         : ssh
BASE PATH      : /var/lib/one/datastores/0

PERMISSIONS
OWNER          : um-
GROUP          : u--
OTHER          : ---

DATASTORE TEMPLATE
DS_MAD="-"
SYSTEM="YES"
TM_MAD="ssh"

IMAGES

[oneadmin at server2 ~]$ onedatastore show 1
DATASTORE 1 INFORMATION
ID             : 1
NAME           : default
USER           : oneadmin
GROUP          : oneadmin
CLUSTER        : myCluster
DS_MAD         : fs
TM_MAD         : ssh
BASE PATH      : /var/lib/one/datastores/1

PERMISSIONS
OWNER          : um-
GROUP          : u--
OTHER          : u--

DATASTORE TEMPLATE
DS_MAD="fs"
TM_MAD="ssh"

IMAGES
0

When investigating the issue I find no disk.0 file in the destination
datastore. IMHO that is the cause of the restore-failure, since the
deployment.0 file has a reference to it.

[oneadmin at server2 37]$ ls -lh /var/lib/one/datastores/0/37/
total 119M
-rw-rw-r-- 1 root     root     119M Mar  4  2013 checkpoint
-rw-rw-r-- 1 oneadmin oneadmin  675 Mar  4  2013 deployment.0

What confuses me, is that the checkpoint file is owned by root. Could
this be part of the issue?

Thanks for your help!
Martin

Am 3/4/2013 11:27, schrieb Javier Fontan:
> Can you send us the log file of the VM that had this problem? Also
> tell us the configuration you have for the storage, that is, the
> drivers you have configured for both the system datastore (0) and the
> datastore where you have the images.
>
> On Thu, Feb 28, 2013 at 3:34 PM, Martin Herfurt
> <martin.herfurt at toothr.com> wrote:
>> Hello,
>> as I am currently starting to get acquainted with OpenNebula, I need some
>> help with save migration.
>>
>> I have setup a cluster with two equal servers. I have installed CentOS 6.3
>> with the OpenNebula Packages from the repository (version 3.8.1-2.6) on each
>> of them. Using the sunstone frontend on server1, I was able to add the two
>> hosts (server1 and server2) to the cluster. Starting to test with the
>> minimal KVM image, I achieved to create a template and could deploy VMs on
>> both hosts (datastore uses the ssh transfer manager).
>>
>> When trying to migrate a VM from server1 to server2, the VM status changes
>> to SAVE_MIGRATE and after a few seconds to FAILED.
>> After doing some research, I found out, that during the migration process,
>> the disk-image is not transferred to the target host's system datastore. All
>> there is is a file called checkpoint and a file that is called deployment.0
>> - so the restore operation fails, because the disk.0 file is missing.
>>
>> Did anyone experience these issues as well or knows how the issue is caused?
>>
>> Thanks for your help,
>> Martin
>>
>> _______________________________________________
>> Users mailing list
>> Users at lists.opennebula.org
>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
>