[one-users] Stop-Resume failing with shared storage
Rangababu Chakravarthula
rbabu at hexagrid.com
Fri Apr 2 10:52:12 PDT 2010
Tino
I think the original problem was, we were sharing only the NFS path where
disk images are stored but not the <vmid>/images directory. After we changed
that and defined the path in VM_DIR in oned.conf, suspend and resume is
working good.
Since <vmid>/images is accessible by all hosts and even when it gets resumed
on different machine, that host is able to access the <vmid>/images
directory.
However the checkpoint file is still getting created. Is there a way to have
Opennebula not create the checkpoint file.
Ranga
On Mon, Mar 15, 2010 at 4:35 AM, Tino Vazquez <tinova at fdi.ucm.es> wrote:
> Hi there,
>
> Sorry, but I'm failing to see the
>
> tm_mv.sh: Will not move, is not saving image
>
> message anywhere in your logs.
>
> Regards,
>
> -Tino
>
> --
> Constantino Vázquez, Grid & Virtualization Technology
> Engineer/Researcher: http://www.dsa-research.org/tinova
> DSA Research Group: http://dsa-research.org
> Globus GridWay Metascheduler: http://www.GridWay.org
> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>
>
>
> On Mon, Mar 8, 2010 at 5:04 AM, Rangababu Chakravarthula
> <rbabu at hexagrid.com> wrote:
> > Thank you Tino. Sorry for the late reply. Here are the detailed logs. Any
> > help is appreciated.
> >
> > NFS SHARED IMAGES DIRECTORY BETWEEN ALL HOSTS /mnt/sharedimagesdir
> >
> > Contents of ONED.CONF
> >
> > VM_DIR=/mnt/sharedimagesdir
> > IM_MAD = [
> > name = "im_kvm",
> > executable = "one_im_ssh",
> > arguments = "im_kvm/im_kvm.conf",
> > default = "im_kvm/im_kvm.conf" ]
> > VM_MAD = [
> > name = "vmm_kvm",
> > executable = "one_vmm_kvm",
> > default = "vmm_kvm/vmm_kvm.conf",
> > type = "kvm" ]
> > TM_MAD = [
> > name = "tm_nfs",
> > executable = "one_tm",
> > arguments = "tm_nfs/tm_nfs.conf",
> > default = "tm_nfs/tm_nfs.conf" ]
> >
> > WE MODIFIED tm_clone.sh & tm_ln.sh to add SSH
> >
> >
> > SUBMITTED NEW VM
> >
> > onevm show 433
> >
> > VID : 433 UID : 0
> > STATE : ACTIVE LCM STATE : RUNNING
> > DEPLOY ID : one-433 MEMORY : 262144
> CPU
> > : 0 LAST POLL : 1267828125 START
> > TIME : 03/05 16:12:02 STOP TIME : 12/31 18:00:00 NET TX
> > : 0 NET RX : 0
> > ....: Template :....
> > DISK :
> > CLONE=no,SOURCE=/mnt/sharedimagesdir/images/onetest0,TARGET=hda,TYPE=disk
> > GRAPHICS : LISTEN=0.0.0.0,PORT=6003,TYPE=vnc
> > INPUT : TYPE=tablet MEMORY : 256
> > NAME : onetest NIC :
> > BRIDGE=br171,MAC=00:04:c9:5b:44:8a
> > OS : BOOT=hd VCPU : 1
> >
> >
> > ON THE MANAGEMENT NODE
> >
> > root at ManagementNode:/etc/one/tm_nfs# ls -al /var/lib/one/433/
> > total 24
> > drwxrwxrwx 2 oneadmin nogroup 4096 2010-03-05 16:12 .
> > drwxr-xr-x 437 oneadmin root 12288 2010-03-05 16:26 ..
> > -rw-r--r-- 1 oneadmin nogroup 549 2010-03-05 16:12 deployment.0
> > -rw-r--r-- 1 oneadmin nogroup 89 2010-03-05 16:12 transfer.0
> >
> > /var/log/one/433.log
> >
> > Fri Mar 5 16:12:11 2010 [DiM][I]: New VM state is ACTIVE.
> > Fri Mar 5 16:12:11 2010 [LCM][I]: New VM state is PROLOG.
> > Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Creating directory
> > /mnt/sharedimagesdir/433/images
> > Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Executed "ssh 10.10.20.190
> mkdir
> > -p /mnt/sharedimagesdir/433/images".
> > Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Executed "ssh 10.10.20.190
> chmod
> > a+w /mnt/sharedimagesdir/433/images".
> > Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Link
> > /mnt/sharedimagesdir/images/onetest0
> > Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Executed "ssh 10.10.20.190 ln
> -s
> > /mnt/sharedimagesdir/images/onetest0
> > /mnt/sharedimagesdir/433/images/disk.0".
> > Fri Mar 5 16:12:11 2010 [LCM][I]: New VM state is BOOT
> > Fri Mar 5 16:12:11 2010 [VMM][I]: Generating deployment file:
> > /var/lib/one/433/deployment.0
> > Fri Mar 5 16:12:11 2010 [VMM][I]: Command: scp
> > /var/lib/one/433/deployment.0
> > 10.10.20.190:/mnt/sharedimagesdir/433/images/deployment.0
> > Fri Mar 5 16:12:11 2010 [VMM][I]: Copy success
> > Fri Mar 5 16:12:12 2010 [VMM][I]: Connecting to uri: qemu:///system
> > Fri Mar 5 16:12:12 2010 [VMM][I]: ExitCode: 0
> > Fri Mar 5 16:12:12 2010 [LCM][I]: New VM state is RUNNING
> >
> >
> > onevm list
> >
> > 433 onetest runn 0 262144 10.10.20.190 00 00:16:44
> >
> >
> >
> >
> >
> > ON THE HOST
> >
> > root at 00238bbda914:/mnt/sharedimagesdir# ls -ltr
> > /mnt/sharedimagesdir/433/images/
> > total 2
> > lrwxrwxrwx 1 oneadmin nogroup 32 2010-03-05 22:08 disk.0 ->
> > /mnt/sharedimagesdir/images/onetest0
> > -rw-r--r--+ 1 oneadmin nogroup 549 2010-03-05 22:08 deployment.0
> > root at 00238bbda914:/mnt/sharedimagesdir#
> >
> >
> > /var/log/libvirt/qemu/433.log on HOST
> >
> > LC_ALL=C
> PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
> > /usr/bin/kvm -S -M pc-0.11 -m 256 -smp 1 -name one-433 -uuid
> > 74c151d6-b1f5-3e41-fc45-e7fdc9247722 -monitor
> > unix:/var/run/libvirt/qemu/one-433.monitor,server,nowait -boot c -drive
> > file=/mnt/sharedimagesdir/433/images/disk.0,if=ide,index=0,boot=on -net
> > nic,macaddr=00:04:c9:5b:44:8a,vlan=0,name=nic.0 -net
> > tap,fd=20,vlan=0,name=tap.0 -serial none -parallel none -usb -usbdevice
> > tablet -vnc 0.0.0.0:103 -vga cirrus
> >
> > deployment.0 file on HOST
> >
> > <domain type='kvm'>
> > <name>one-433</name>
> > <vcpu>1</vcpu>
> > <memory>262144</memory>
> > <os>
> > <type>hvm</type>
> > <boot dev='hd'/>
> > </os>
> > <devices>
> > <emulator>/usr/bin/kvm</emulator>
> > <disk type='file' device='disk'>
> > <source
> > file='/mnt/sharedimagesdir/433/images/disk.0'/>
> > <target dev='hda'/>
> > </disk>
> > <interface type='bridge'>
> > <source bridge='br171'/>
> > <mac address='00:04:c9:5b:44:8a'/>
> > </interface>
> > <graphics type='vnc' listen='0.0.0.0' port='6003'/>
> > <input type='tablet'/>
> > </devices>
> > <features>
> > <acpi/>
> > </features>
> > </domain>
> >
> >
> > SUSPEND INVOKED
> >
> >
> > onevm list
> >
> > 433 onetest susp 0 262144 10.10.20.190 00 00:25:08
> >
> > 433.log
> >
> > Fri Mar 5 16:35:28 2010 [LCM][I]: New VM state is SAVE_SUSPEND
> > Fri Mar 5 16:35:29 2010 [VMM][I]: Connecting to uri: qemu:///system
> > Fri Mar 5 16:35:29 2010 [VMM][I]: ExitCode: 0
> > Fri Mar 5 16:35:29 2010 [DiM][I]: New VM state is SUSPENDED
> >
> > Oned.log
> >
> > Fri Mar 5 16:35:28 2010 [ReM][D]: VirtualMachineAction invoked
> > Fri Mar 5 16:35:28 2010 [DiM][D]: Suspending VM 433
> > Fri Mar 5 16:35:29 2010 [VMM][D]: Message received: LOG - 433 Connecting
> to
> > uri: qemu:///system
> >
> > Fri Mar 5 16:35:29 2010 [VMM][D]: Message received: LOG - 433 ExitCode:
> 0
> >
> > Fri Mar 5 16:35:29 2010 [VMM][D]: Message received: SAVE SUCCESS 433
> >
> > ONE THE HOST
> >
> > root at 00238bbda914:/mnt/sharedimagesdir/433/images# ls -ltr
> > total 3
> > lrwxrwxrwx 1 oneadmin nogroup 32 2010-03-05 22:08 disk.0 ->
> > /mnt/sharedimagesdir/images/onetest0
> > -rw-r--r--+ 1 oneadmin nogroup 549 2010-03-05 22:08 deployment.0
> > -rw-------+ 1 root root 940894 2010-03-05 22:31 checkpoint
> >
> >
> >
> >
> >
> >
> > Tino Vazquez wrote:
> >>
> >> Hi Ranga,
> >>
> >> If you are using a shared repository (i'll assume you use NFS or a
> >> similar distributed FS), then the "<vmid>/images/" is shared between
> >> all the remote hosts, so there is no need to move the checkpoint files
> >> and they should be available in all the nodes.
> >>
> >> Please send us the log of the VM that is failing so we can try and
> >> reproduce the problem.
> >>
> >> Regards,
> >>
> >> -Tino
> >>
> >> --
> >> Constantino Vázquez, Grid & Virtualization Technology
> >> Engineer/Researcher: http://www.dsa-research.org/tinova
> >> DSA Research Group: http://dsa-research.org
> >> Globus GridWay Metascheduler: http://www.GridWay.org
> >> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
> >>
> >>
> >>
> >> On Thu, Feb 18, 2010 at 2:44 AM, Rangababu Chakravarthula
> >> <rbabu at hexagrid.com> wrote:
> >>
> >>>
> >>> We are using shared storage as defined here
> >>>
> >>>
> >>>
> http://www.opennebula.org/doku.php?id=documentation:rel1.2:sm#samplea_shared_image_repository
> >>>
> >>> When we run onevm stop or onevm suspend it tries to do SAVE_STOP and
> >>> SAVE_SUSPEND and creates a checkpoint file on the host
> >>> /var/lib/one/<vmid>/images/
> >>>
> >>> and in the logs we see
> >>> tm_mv.sh: Will not move, is not saving image
> >>>
> >>> I think it is trying to move the checkpoint file back to the management
> >>> node
> >>> and based on logic in tm_mv.sh it is not moving.
> >>>
> >>> Later when we try to do onevm resume , one picks a different host and
> >>> tries
> >>> to move the checkpoint file from the management node to the new host
> and
> >>> again says "Will not move, is not saving image" and on the host it
> fails
> >>> to
> >>> bring the VM since there is no checkpoint file on the new host.
> >>>
> >>> How can we ask ONE to not resume from checkpoint file but instead load
> >>> from
> >>> the disk file that is in the template.
> >>>
> >>> Ranga
> >>> _______________________________________________
> >>> Users mailing list
> >>> Users at lists.opennebula.org
> >>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
> >>>
> >>>
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20100402/35e072e0/attachment-0002.htm>
More information about the Users
mailing list