[one-users] Stop-Resume failing with shared storage
Tino Vazquez
tinova at fdi.ucm.es
Wed Apr 7 04:25:54 PDT 2010
Hi,
The checkpoint file is needed to resume the VM from the point where it
has been stopped.
Regards,
-Tino
--
Constantino Vázquez, Grid & Virtualization Technology
Engineer/Researcher: http://www.dsa-research.org/tinova
DSA Research Group: http://dsa-research.org
Globus GridWay Metascheduler: http://www.GridWay.org
OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
On Fri, Apr 2, 2010 at 7:52 PM, Rangababu Chakravarthula
<rbabu at hexagrid.com> wrote:
> Tino
> I think the original problem was, we were sharing only the NFS path where
> disk images are stored but not the <vmid>/images directory. After we changed
> that and defined the path in VM_DIR in oned.conf, suspend and resume is
> working good.
>
> Since <vmid>/images is accessible by all hosts and even when it gets resumed
> on different machine, that host is able to access the <vmid>/images
> directory.
>
> However the checkpoint file is still getting created. Is there a way to have
> Opennebula not create the checkpoint file.
>
> Ranga
>
> On Mon, Mar 15, 2010 at 4:35 AM, Tino Vazquez <tinova at fdi.ucm.es> wrote:
>>
>> Hi there,
>>
>> Sorry, but I'm failing to see the
>>
>> tm_mv.sh: Will not move, is not saving image
>>
>> message anywhere in your logs.
>>
>> Regards,
>>
>> -Tino
>>
>> --
>> Constantino Vázquez, Grid & Virtualization Technology
>> Engineer/Researcher: http://www.dsa-research.org/tinova
>> DSA Research Group: http://dsa-research.org
>> Globus GridWay Metascheduler: http://www.GridWay.org
>> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>>
>>
>>
>> On Mon, Mar 8, 2010 at 5:04 AM, Rangababu Chakravarthula
>> <rbabu at hexagrid.com> wrote:
>> > Thank you Tino. Sorry for the late reply. Here are the detailed logs.
>> > Any
>> > help is appreciated.
>> >
>> > NFS SHARED IMAGES DIRECTORY BETWEEN ALL HOSTS /mnt/sharedimagesdir
>> >
>> > Contents of ONED.CONF
>> >
>> > VM_DIR=/mnt/sharedimagesdir
>> > IM_MAD = [
>> > name = "im_kvm",
>> > executable = "one_im_ssh",
>> > arguments = "im_kvm/im_kvm.conf",
>> > default = "im_kvm/im_kvm.conf" ]
>> > VM_MAD = [
>> > name = "vmm_kvm",
>> > executable = "one_vmm_kvm",
>> > default = "vmm_kvm/vmm_kvm.conf",
>> > type = "kvm" ]
>> > TM_MAD = [
>> > name = "tm_nfs",
>> > executable = "one_tm",
>> > arguments = "tm_nfs/tm_nfs.conf",
>> > default = "tm_nfs/tm_nfs.conf" ]
>> >
>> > WE MODIFIED tm_clone.sh & tm_ln.sh to add SSH
>> >
>> >
>> > SUBMITTED NEW VM
>> >
>> > onevm show 433
>> >
>> > VID : 433 UID : 0
>> > STATE : ACTIVE LCM STATE : RUNNING
>> > DEPLOY ID : one-433 MEMORY : 262144
>> > CPU
>> > : 0 LAST POLL : 1267828125
>> > START
>> > TIME : 03/05 16:12:02 STOP TIME : 12/31 18:00:00 NET TX
>> > : 0 NET RX : 0
>> > ....: Template :....
>> > DISK :
>> >
>> > CLONE=no,SOURCE=/mnt/sharedimagesdir/images/onetest0,TARGET=hda,TYPE=disk
>> > GRAPHICS : LISTEN=0.0.0.0,PORT=6003,TYPE=vnc
>> > INPUT : TYPE=tablet MEMORY : 256
>> > NAME : onetest NIC :
>> > BRIDGE=br171,MAC=00:04:c9:5b:44:8a
>> > OS : BOOT=hd VCPU : 1
>> >
>> >
>> > ON THE MANAGEMENT NODE
>> >
>> > root at ManagementNode:/etc/one/tm_nfs# ls -al /var/lib/one/433/
>> > total 24
>> > drwxrwxrwx 2 oneadmin nogroup 4096 2010-03-05 16:12 .
>> > drwxr-xr-x 437 oneadmin root 12288 2010-03-05 16:26 ..
>> > -rw-r--r-- 1 oneadmin nogroup 549 2010-03-05 16:12 deployment.0
>> > -rw-r--r-- 1 oneadmin nogroup 89 2010-03-05 16:12 transfer.0
>> >
>> > /var/log/one/433.log
>> >
>> > Fri Mar 5 16:12:11 2010 [DiM][I]: New VM state is ACTIVE.
>> > Fri Mar 5 16:12:11 2010 [LCM][I]: New VM state is PROLOG.
>> > Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Creating directory
>> > /mnt/sharedimagesdir/433/images
>> > Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Executed "ssh 10.10.20.190
>> > mkdir
>> > -p /mnt/sharedimagesdir/433/images".
>> > Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Executed "ssh 10.10.20.190
>> > chmod
>> > a+w /mnt/sharedimagesdir/433/images".
>> > Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Link
>> > /mnt/sharedimagesdir/images/onetest0
>> > Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Executed "ssh 10.10.20.190
>> > ln -s
>> > /mnt/sharedimagesdir/images/onetest0
>> > /mnt/sharedimagesdir/433/images/disk.0".
>> > Fri Mar 5 16:12:11 2010 [LCM][I]: New VM state is BOOT
>> > Fri Mar 5 16:12:11 2010 [VMM][I]: Generating deployment file:
>> > /var/lib/one/433/deployment.0
>> > Fri Mar 5 16:12:11 2010 [VMM][I]: Command: scp
>> > /var/lib/one/433/deployment.0
>> > 10.10.20.190:/mnt/sharedimagesdir/433/images/deployment.0
>> > Fri Mar 5 16:12:11 2010 [VMM][I]: Copy success
>> > Fri Mar 5 16:12:12 2010 [VMM][I]: Connecting to uri: qemu:///system
>> > Fri Mar 5 16:12:12 2010 [VMM][I]: ExitCode: 0
>> > Fri Mar 5 16:12:12 2010 [LCM][I]: New VM state is RUNNING
>> >
>> >
>> > onevm list
>> >
>> > 433 onetest runn 0 262144 10.10.20.190 00 00:16:44
>> >
>> >
>> >
>> >
>> >
>> > ON THE HOST
>> >
>> > root at 00238bbda914:/mnt/sharedimagesdir# ls -ltr
>> > /mnt/sharedimagesdir/433/images/
>> > total 2
>> > lrwxrwxrwx 1 oneadmin nogroup 32 2010-03-05 22:08 disk.0 ->
>> > /mnt/sharedimagesdir/images/onetest0
>> > -rw-r--r--+ 1 oneadmin nogroup 549 2010-03-05 22:08 deployment.0
>> > root at 00238bbda914:/mnt/sharedimagesdir#
>> >
>> >
>> > /var/log/libvirt/qemu/433.log on HOST
>> >
>> > LC_ALL=C
>> > PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
>> > /usr/bin/kvm -S -M pc-0.11 -m 256 -smp 1 -name one-433 -uuid
>> > 74c151d6-b1f5-3e41-fc45-e7fdc9247722 -monitor
>> > unix:/var/run/libvirt/qemu/one-433.monitor,server,nowait -boot c -drive
>> > file=/mnt/sharedimagesdir/433/images/disk.0,if=ide,index=0,boot=on -net
>> > nic,macaddr=00:04:c9:5b:44:8a,vlan=0,name=nic.0 -net
>> > tap,fd=20,vlan=0,name=tap.0 -serial none -parallel none -usb -usbdevice
>> > tablet -vnc 0.0.0.0:103 -vga cirrus
>> >
>> > deployment.0 file on HOST
>> >
>> > <domain type='kvm'>
>> > <name>one-433</name>
>> > <vcpu>1</vcpu>
>> > <memory>262144</memory>
>> > <os>
>> > <type>hvm</type>
>> > <boot dev='hd'/>
>> > </os>
>> > <devices>
>> > <emulator>/usr/bin/kvm</emulator>
>> > <disk type='file' device='disk'>
>> > <source
>> > file='/mnt/sharedimagesdir/433/images/disk.0'/>
>> > <target dev='hda'/>
>> > </disk>
>> > <interface type='bridge'>
>> > <source bridge='br171'/>
>> > <mac address='00:04:c9:5b:44:8a'/>
>> > </interface>
>> > <graphics type='vnc' listen='0.0.0.0' port='6003'/>
>> > <input type='tablet'/>
>> > </devices>
>> > <features>
>> > <acpi/>
>> > </features>
>> > </domain>
>> >
>> >
>> > SUSPEND INVOKED
>> >
>> >
>> > onevm list
>> >
>> > 433 onetest susp 0 262144 10.10.20.190 00 00:25:08
>> >
>> > 433.log
>> >
>> > Fri Mar 5 16:35:28 2010 [LCM][I]: New VM state is SAVE_SUSPEND
>> > Fri Mar 5 16:35:29 2010 [VMM][I]: Connecting to uri: qemu:///system
>> > Fri Mar 5 16:35:29 2010 [VMM][I]: ExitCode: 0
>> > Fri Mar 5 16:35:29 2010 [DiM][I]: New VM state is SUSPENDED
>> >
>> > Oned.log
>> >
>> > Fri Mar 5 16:35:28 2010 [ReM][D]: VirtualMachineAction invoked
>> > Fri Mar 5 16:35:28 2010 [DiM][D]: Suspending VM 433
>> > Fri Mar 5 16:35:29 2010 [VMM][D]: Message received: LOG - 433
>> > Connecting to
>> > uri: qemu:///system
>> >
>> > Fri Mar 5 16:35:29 2010 [VMM][D]: Message received: LOG - 433 ExitCode:
>> > 0
>> >
>> > Fri Mar 5 16:35:29 2010 [VMM][D]: Message received: SAVE SUCCESS 433
>> >
>> > ONE THE HOST
>> >
>> > root at 00238bbda914:/mnt/sharedimagesdir/433/images# ls -ltr
>> > total 3
>> > lrwxrwxrwx 1 oneadmin nogroup 32 2010-03-05 22:08 disk.0 ->
>> > /mnt/sharedimagesdir/images/onetest0
>> > -rw-r--r--+ 1 oneadmin nogroup 549 2010-03-05 22:08 deployment.0
>> > -rw-------+ 1 root root 940894 2010-03-05 22:31 checkpoint
>> >
>> >
>> >
>> >
>> >
>> >
>> > Tino Vazquez wrote:
>> >>
>> >> Hi Ranga,
>> >>
>> >> If you are using a shared repository (i'll assume you use NFS or a
>> >> similar distributed FS), then the "<vmid>/images/" is shared between
>> >> all the remote hosts, so there is no need to move the checkpoint files
>> >> and they should be available in all the nodes.
>> >>
>> >> Please send us the log of the VM that is failing so we can try and
>> >> reproduce the problem.
>> >>
>> >> Regards,
>> >>
>> >> -Tino
>> >>
>> >> --
>> >> Constantino Vázquez, Grid & Virtualization Technology
>> >> Engineer/Researcher: http://www.dsa-research.org/tinova
>> >> DSA Research Group: http://dsa-research.org
>> >> Globus GridWay Metascheduler: http://www.GridWay.org
>> >> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>> >>
>> >>
>> >>
>> >> On Thu, Feb 18, 2010 at 2:44 AM, Rangababu Chakravarthula
>> >> <rbabu at hexagrid.com> wrote:
>> >>
>> >>>
>> >>> We are using shared storage as defined here
>> >>>
>> >>>
>> >>>
>> >>> http://www.opennebula.org/doku.php?id=documentation:rel1.2:sm#samplea_shared_image_repository
>> >>>
>> >>> When we run onevm stop or onevm suspend it tries to do SAVE_STOP and
>> >>> SAVE_SUSPEND and creates a checkpoint file on the host
>> >>> /var/lib/one/<vmid>/images/
>> >>>
>> >>> and in the logs we see
>> >>> tm_mv.sh: Will not move, is not saving image
>> >>>
>> >>> I think it is trying to move the checkpoint file back to the
>> >>> management
>> >>> node
>> >>> and based on logic in tm_mv.sh it is not moving.
>> >>>
>> >>> Later when we try to do onevm resume , one picks a different host and
>> >>> tries
>> >>> to move the checkpoint file from the management node to the new host
>> >>> and
>> >>> again says "Will not move, is not saving image" and on the host it
>> >>> fails
>> >>> to
>> >>> bring the VM since there is no checkpoint file on the new host.
>> >>>
>> >>> How can we ask ONE to not resume from checkpoint file but instead load
>> >>> from
>> >>> the disk file that is in the template.
>> >>>
>> >>> Ranga
>> >>> _______________________________________________
>> >>> Users mailing list
>> >>> Users at lists.opennebula.org
>> >>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>> >>>
>> >>>
>> >
>> >
>
>
More information about the Users
mailing list