[one-users] Stop-Resume failing with shared storage
Rangababu Chakravarthula
rbabu at hexagrid.com
Sun Mar 7 20:04:55 PST 2010
Thank you Tino. Sorry for the late reply. Here are the detailed logs.
Any help is appreciated.
NFS SHARED IMAGES DIRECTORY BETWEEN ALL HOSTS /mnt/sharedimagesdir
Contents of ONED.CONF
VM_DIR=/mnt/sharedimagesdir
IM_MAD = [
name = "im_kvm",
executable = "one_im_ssh",
arguments = "im_kvm/im_kvm.conf",
default = "im_kvm/im_kvm.conf" ]
VM_MAD = [
name = "vmm_kvm",
executable = "one_vmm_kvm",
default = "vmm_kvm/vmm_kvm.conf",
type = "kvm" ]
TM_MAD = [
name = "tm_nfs",
executable = "one_tm",
arguments = "tm_nfs/tm_nfs.conf",
default = "tm_nfs/tm_nfs.conf" ]
WE MODIFIED tm_clone.sh & tm_ln.sh to add SSH
SUBMITTED NEW VM
onevm show 433
VID : 433
UID : 0
STATE : ACTIVE
LCM STATE : RUNNING
DEPLOY ID : one-433
MEMORY : 262144
CPU : 0
LAST POLL : 1267828125
START TIME : 03/05 16:12:02
STOP TIME : 12/31 18:00:00
NET TX : 0
NET RX : 0
....: Template :....
DISK :
CLONE=no,SOURCE=/mnt/sharedimagesdir/images/onetest0,TARGET=hda,TYPE=disk
GRAPHICS : LISTEN=0.0.0.0,PORT=6003,TYPE=vnc
INPUT : TYPE=tablet
MEMORY : 256
NAME : onetest
NIC : BRIDGE=br171,MAC=00:04:c9:5b:44:8a
OS : BOOT=hd
VCPU : 1
ON THE MANAGEMENT NODE
root at ManagementNode:/etc/one/tm_nfs# ls -al /var/lib/one/433/
total 24
drwxrwxrwx 2 oneadmin nogroup 4096 2010-03-05 16:12 .
drwxr-xr-x 437 oneadmin root 12288 2010-03-05 16:26 ..
-rw-r--r-- 1 oneadmin nogroup 549 2010-03-05 16:12 deployment.0
-rw-r--r-- 1 oneadmin nogroup 89 2010-03-05 16:12 transfer.0
/var/log/one/433.log
Fri Mar 5 16:12:11 2010 [DiM][I]: New VM state is ACTIVE.
Fri Mar 5 16:12:11 2010 [LCM][I]: New VM state is PROLOG.
Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Creating directory
/mnt/sharedimagesdir/433/images
Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Executed "ssh 10.10.20.190
mkdir -p /mnt/sharedimagesdir/433/images".
Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Executed "ssh 10.10.20.190
chmod a+w /mnt/sharedimagesdir/433/images".
Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Link
/mnt/sharedimagesdir/images/onetest0
Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Executed "ssh 10.10.20.190
ln -s /mnt/sharedimagesdir/images/onetest0
/mnt/sharedimagesdir/433/images/disk.0".
Fri Mar 5 16:12:11 2010 [LCM][I]: New VM state is BOOT
Fri Mar 5 16:12:11 2010 [VMM][I]: Generating deployment file:
/var/lib/one/433/deployment.0
Fri Mar 5 16:12:11 2010 [VMM][I]: Command: scp
/var/lib/one/433/deployment.0
10.10.20.190:/mnt/sharedimagesdir/433/images/deployment.0
Fri Mar 5 16:12:11 2010 [VMM][I]: Copy success
Fri Mar 5 16:12:12 2010 [VMM][I]: Connecting to uri: qemu:///system
Fri Mar 5 16:12:12 2010 [VMM][I]: ExitCode: 0
Fri Mar 5 16:12:12 2010 [LCM][I]: New VM state is RUNNING
onevm list
433 onetest runn 0 262144 10.10.20.190 00 00:16:44
ON THE HOST
root at 00238bbda914:/mnt/sharedimagesdir# ls -ltr
/mnt/sharedimagesdir/433/images/
total 2
lrwxrwxrwx 1 oneadmin nogroup 32 2010-03-05 22:08 disk.0 ->
/mnt/sharedimagesdir/images/onetest0
-rw-r--r--+ 1 oneadmin nogroup 549 2010-03-05 22:08 deployment.0
root at 00238bbda914:/mnt/sharedimagesdir#
/var/log/libvirt/qemu/433.log on HOST
LC_ALL=C
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
/usr/bin/kvm -S -M pc-0.11 -m 256 -smp 1 -name one-433 -uuid
74c151d6-b1f5-3e41-fc45-e7fdc9247722 -monitor
unix:/var/run/libvirt/qemu/one-433.monitor,server,nowait -boot c -drive
file=/mnt/sharedimagesdir/433/images/disk.0,if=ide,index=0,boot=on -net
nic,macaddr=00:04:c9:5b:44:8a,vlan=0,name=nic.0 -net
tap,fd=20,vlan=0,name=tap.0 -serial none -parallel none -usb -usbdevice
tablet -vnc 0.0.0.0:103 -vga cirrus
deployment.0 file on HOST
<domain type='kvm'>
<name>one-433</name>
<vcpu>1</vcpu>
<memory>262144</memory>
<os>
<type>hvm</type>
<boot dev='hd'/>
</os>
<devices>
<emulator>/usr/bin/kvm</emulator>
<disk type='file' device='disk'>
<source
file='/mnt/sharedimagesdir/433/images/disk.0'/>
<target dev='hda'/>
</disk>
<interface type='bridge'>
<source bridge='br171'/>
<mac address='00:04:c9:5b:44:8a'/>
</interface>
<graphics type='vnc' listen='0.0.0.0' port='6003'/>
<input type='tablet'/>
</devices>
<features>
<acpi/>
</features>
</domain>
SUSPEND INVOKED
onevm list
433 onetest susp 0 262144 10.10.20.190 00 00:25:08
433.log
Fri Mar 5 16:35:28 2010 [LCM][I]: New VM state is SAVE_SUSPEND
Fri Mar 5 16:35:29 2010 [VMM][I]: Connecting to uri: qemu:///system
Fri Mar 5 16:35:29 2010 [VMM][I]: ExitCode: 0
Fri Mar 5 16:35:29 2010 [DiM][I]: New VM state is SUSPENDED
Oned.log
Fri Mar 5 16:35:28 2010 [ReM][D]: VirtualMachineAction invoked
Fri Mar 5 16:35:28 2010 [DiM][D]: Suspending VM 433
Fri Mar 5 16:35:29 2010 [VMM][D]: Message received: LOG - 433
Connecting to uri: qemu:///system
Fri Mar 5 16:35:29 2010 [VMM][D]: Message received: LOG - 433 ExitCode: 0
Fri Mar 5 16:35:29 2010 [VMM][D]: Message received: SAVE SUCCESS 433
ONE THE HOST
root at 00238bbda914:/mnt/sharedimagesdir/433/images# ls -ltr
total 3
lrwxrwxrwx 1 oneadmin nogroup 32 2010-03-05 22:08 disk.0 ->
/mnt/sharedimagesdir/images/onetest0
-rw-r--r--+ 1 oneadmin nogroup 549 2010-03-05 22:08 deployment.0
-rw-------+ 1 root root 940894 2010-03-05 22:31 checkpoint
Tino Vazquez wrote:
> Hi Ranga,
>
> If you are using a shared repository (i'll assume you use NFS or a
> similar distributed FS), then the "<vmid>/images/" is shared between
> all the remote hosts, so there is no need to move the checkpoint files
> and they should be available in all the nodes.
>
> Please send us the log of the VM that is failing so we can try and
> reproduce the problem.
>
> Regards,
>
> -Tino
>
> --
> Constantino Vázquez, Grid & Virtualization Technology
> Engineer/Researcher: http://www.dsa-research.org/tinova
> DSA Research Group: http://dsa-research.org
> Globus GridWay Metascheduler: http://www.GridWay.org
> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>
>
>
> On Thu, Feb 18, 2010 at 2:44 AM, Rangababu Chakravarthula
> <rbabu at hexagrid.com> wrote:
>
>> We are using shared storage as defined here
>>
>> http://www.opennebula.org/doku.php?id=documentation:rel1.2:sm#samplea_shared_image_repository
>>
>> When we run onevm stop or onevm suspend it tries to do SAVE_STOP and
>> SAVE_SUSPEND and creates a checkpoint file on the host
>> /var/lib/one/<vmid>/images/
>>
>> and in the logs we see
>> tm_mv.sh: Will not move, is not saving image
>>
>> I think it is trying to move the checkpoint file back to the management node
>> and based on logic in tm_mv.sh it is not moving.
>>
>> Later when we try to do onevm resume , one picks a different host and tries
>> to move the checkpoint file from the management node to the new host and
>> again says "Will not move, is not saving image" and on the host it fails to
>> bring the VM since there is no checkpoint file on the new host.
>>
>> How can we ask ONE to not resume from checkpoint file but instead load from
>> the disk file that is in the template.
>>
>> Ranga
>> _______________________________________________
>> Users mailing list
>> Users at lists.opennebula.org
>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>
>>
More information about the Users
mailing list