Tino<br>I think the original problem was, we were sharing only the NFS path where disk images are stored but not the <vmid>/images directory. After we changed that and defined the path in VM_DIR in oned.conf, suspend and resume is working good. <br>
<br>Since <vmid>/images is accessible by all hosts and even when it gets resumed on different machine, that host is able to access the <vmid>/images directory.<br><br>However the checkpoint file is still getting created. Is there a way to have Opennebula not create the checkpoint file.<br>
<br>Ranga<br><br><div class="gmail_quote">On Mon, Mar 15, 2010 at 4:35 AM, Tino Vazquez <span dir="ltr"><<a href="mailto:tinova@fdi.ucm.es">tinova@fdi.ucm.es</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi there,<br>
<br>
Sorry, but I'm failing to see the<br>
<div class="im"><br>
tm_mv.sh: Will not move, is not saving image<br>
<br>
</div>message anywhere in your logs.<br>
<div class="im"><br>
Regards,<br>
<br>
-Tino<br>
<br>
--<br>
Constantino Vázquez, Grid & Virtualization Technology<br>
Engineer/Researcher: <a href="http://www.dsa-research.org/tinova" target="_blank">http://www.dsa-research.org/tinova</a><br>
DSA Research Group: <a href="http://dsa-research.org" target="_blank">http://dsa-research.org</a><br>
Globus GridWay Metascheduler: <a href="http://www.GridWay.org" target="_blank">http://www.GridWay.org</a><br>
OpenNebula Virtual Infrastructure Engine: <a href="http://www.OpenNebula.org" target="_blank">http://www.OpenNebula.org</a><br>
<br>
<br>
<br>
</div>On Mon, Mar 8, 2010 at 5:04 AM, Rangababu Chakravarthula<br>
<div><div></div><div class="h5"><<a href="mailto:rbabu@hexagrid.com">rbabu@hexagrid.com</a>> wrote:<br>
> Thank you Tino. Sorry for the late reply. Here are the detailed logs. Any<br>
> help is appreciated.<br>
><br>
> NFS SHARED IMAGES DIRECTORY BETWEEN ALL HOSTS /mnt/sharedimagesdir<br>
><br>
> Contents of ONED.CONF<br>
><br>
> VM_DIR=/mnt/sharedimagesdir<br>
> IM_MAD = [<br>
> name = "im_kvm",<br>
> executable = "one_im_ssh",<br>
> arguments = "im_kvm/im_kvm.conf",<br>
> default = "im_kvm/im_kvm.conf" ]<br>
> VM_MAD = [<br>
> name = "vmm_kvm",<br>
> executable = "one_vmm_kvm",<br>
> default = "vmm_kvm/vmm_kvm.conf",<br>
> type = "kvm" ]<br>
> TM_MAD = [<br>
> name = "tm_nfs",<br>
> executable = "one_tm",<br>
> arguments = "tm_nfs/tm_nfs.conf",<br>
> default = "tm_nfs/tm_nfs.conf" ]<br>
><br>
> WE MODIFIED tm_clone.sh & tm_ln.sh to add SSH<br>
><br>
><br>
> SUBMITTED NEW VM<br>
><br>
> onevm show 433<br>
><br>
> VID : 433 UID : 0<br>
> STATE : ACTIVE LCM STATE : RUNNING<br>
> DEPLOY ID : one-433 MEMORY : 262144 CPU<br>
> : 0 LAST POLL : 1267828125 START<br>
> TIME : 03/05 16:12:02 STOP TIME : 12/31 18:00:00 NET TX<br>
> : 0 NET RX : 0<br>
> ....: Template :....<br>
> DISK :<br>
> CLONE=no,SOURCE=/mnt/sharedimagesdir/images/onetest0,TARGET=hda,TYPE=disk<br>
> GRAPHICS : LISTEN=0.0.0.0,PORT=6003,TYPE=vnc<br>
> INPUT : TYPE=tablet MEMORY : 256<br>
> NAME : onetest NIC :<br>
> BRIDGE=br171,MAC=00:04:c9:5b:44:8a<br>
> OS : BOOT=hd VCPU : 1<br>
><br>
><br>
> ON THE MANAGEMENT NODE<br>
><br>
> root@ManagementNode:/etc/one/tm_nfs# ls -al /var/lib/one/433/<br>
> total 24<br>
> drwxrwxrwx 2 oneadmin nogroup 4096 2010-03-05 16:12 .<br>
> drwxr-xr-x 437 oneadmin root 12288 2010-03-05 16:26 ..<br>
> -rw-r--r-- 1 oneadmin nogroup 549 2010-03-05 16:12 deployment.0<br>
> -rw-r--r-- 1 oneadmin nogroup 89 2010-03-05 16:12 transfer.0<br>
><br>
> /var/log/one/433.log<br>
><br>
> Fri Mar 5 16:12:11 2010 [DiM][I]: New VM state is ACTIVE.<br>
> Fri Mar 5 16:12:11 2010 [LCM][I]: New VM state is PROLOG.<br>
> Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Creating directory<br>
> /mnt/sharedimagesdir/433/images<br>
> Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Executed "ssh 10.10.20.190 mkdir<br>
> -p /mnt/sharedimagesdir/433/images".<br>
> Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Executed "ssh 10.10.20.190 chmod<br>
> a+w /mnt/sharedimagesdir/433/images".<br>
> Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Link<br>
> /mnt/sharedimagesdir/images/onetest0<br>
> Fri Mar 5 16:12:11 2010 [TM][I]: tm_ln.sh: Executed "ssh 10.10.20.190 ln -s<br>
> /mnt/sharedimagesdir/images/onetest0<br>
> /mnt/sharedimagesdir/433/images/disk.0".<br>
> Fri Mar 5 16:12:11 2010 [LCM][I]: New VM state is BOOT<br>
> Fri Mar 5 16:12:11 2010 [VMM][I]: Generating deployment file:<br>
> /var/lib/one/433/deployment.0<br>
> Fri Mar 5 16:12:11 2010 [VMM][I]: Command: scp<br>
> /var/lib/one/433/deployment.0<br>
> 10.10.20.190:/mnt/sharedimagesdir/433/images/deployment.0<br>
> Fri Mar 5 16:12:11 2010 [VMM][I]: Copy success<br>
> Fri Mar 5 16:12:12 2010 [VMM][I]: Connecting to uri: qemu:///system<br>
> Fri Mar 5 16:12:12 2010 [VMM][I]: ExitCode: 0<br>
> Fri Mar 5 16:12:12 2010 [LCM][I]: New VM state is RUNNING<br>
><br>
><br>
> onevm list<br>
><br>
> 433 onetest runn 0 262144 10.10.20.190 00 00:16:44<br>
><br>
><br>
><br>
><br>
><br>
> ON THE HOST<br>
><br>
> root@00238bbda914:/mnt/sharedimagesdir# ls -ltr<br>
> /mnt/sharedimagesdir/433/images/<br>
> total 2<br>
> lrwxrwxrwx 1 oneadmin nogroup 32 2010-03-05 22:08 disk.0 -><br>
> /mnt/sharedimagesdir/images/onetest0<br>
> -rw-r--r--+ 1 oneadmin nogroup 549 2010-03-05 22:08 deployment.0<br>
> root@00238bbda914:/mnt/sharedimagesdir#<br>
><br>
><br>
> /var/log/libvirt/qemu/433.log on HOST<br>
><br>
> LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin<br>
> /usr/bin/kvm -S -M pc-0.11 -m 256 -smp 1 -name one-433 -uuid<br>
> 74c151d6-b1f5-3e41-fc45-e7fdc9247722 -monitor<br>
> unix:/var/run/libvirt/qemu/one-433.monitor,server,nowait -boot c -drive<br>
> file=/mnt/sharedimagesdir/433/images/disk.0,if=ide,index=0,boot=on -net<br>
> nic,macaddr=00:04:c9:5b:44:8a,vlan=0,name=nic.0 -net<br>
> tap,fd=20,vlan=0,name=tap.0 -serial none -parallel none -usb -usbdevice<br>
> tablet -vnc <a href="http://0.0.0.0:103" target="_blank">0.0.0.0:103</a> -vga cirrus<br>
><br>
> deployment.0 file on HOST<br>
><br>
> <domain type='kvm'><br>
> <name>one-433</name><br>
> <vcpu>1</vcpu><br>
> <memory>262144</memory><br>
> <os><br>
> <type>hvm</type><br>
> <boot dev='hd'/><br>
> </os><br>
> <devices><br>
> <emulator>/usr/bin/kvm</emulator><br>
> <disk type='file' device='disk'><br>
> <source<br>
> file='/mnt/sharedimagesdir/433/images/disk.0'/><br>
> <target dev='hda'/><br>
> </disk><br>
> <interface type='bridge'><br>
> <source bridge='br171'/><br>
> <mac address='00:04:c9:5b:44:8a'/><br>
> </interface><br>
> <graphics type='vnc' listen='0.0.0.0' port='6003'/><br>
> <input type='tablet'/><br>
> </devices><br>
> <features><br>
> <acpi/><br>
> </features><br>
> </domain><br>
><br>
><br>
> SUSPEND INVOKED<br>
><br>
><br>
> onevm list<br>
><br>
> 433 onetest susp 0 262144 10.10.20.190 00 00:25:08<br>
><br>
> 433.log<br>
><br>
> Fri Mar 5 16:35:28 2010 [LCM][I]: New VM state is SAVE_SUSPEND<br>
> Fri Mar 5 16:35:29 2010 [VMM][I]: Connecting to uri: qemu:///system<br>
> Fri Mar 5 16:35:29 2010 [VMM][I]: ExitCode: 0<br>
> Fri Mar 5 16:35:29 2010 [DiM][I]: New VM state is SUSPENDED<br>
><br>
> Oned.log<br>
><br>
> Fri Mar 5 16:35:28 2010 [ReM][D]: VirtualMachineAction invoked<br>
> Fri Mar 5 16:35:28 2010 [DiM][D]: Suspending VM 433<br>
> Fri Mar 5 16:35:29 2010 [VMM][D]: Message received: LOG - 433 Connecting to<br>
> uri: qemu:///system<br>
><br>
> Fri Mar 5 16:35:29 2010 [VMM][D]: Message received: LOG - 433 ExitCode: 0<br>
><br>
> Fri Mar 5 16:35:29 2010 [VMM][D]: Message received: SAVE SUCCESS 433<br>
><br>
> ONE THE HOST<br>
><br>
> root@00238bbda914:/mnt/sharedimagesdir/433/images# ls -ltr<br>
> total 3<br>
> lrwxrwxrwx 1 oneadmin nogroup 32 2010-03-05 22:08 disk.0 -><br>
> /mnt/sharedimagesdir/images/onetest0<br>
> -rw-r--r--+ 1 oneadmin nogroup 549 2010-03-05 22:08 deployment.0<br>
> -rw-------+ 1 root root 940894 2010-03-05 22:31 checkpoint<br>
><br>
><br>
><br>
><br>
><br>
><br>
> Tino Vazquez wrote:<br>
>><br>
>> Hi Ranga,<br>
>><br>
>> If you are using a shared repository (i'll assume you use NFS or a<br>
>> similar distributed FS), then the "<vmid>/images/" is shared between<br>
>> all the remote hosts, so there is no need to move the checkpoint files<br>
>> and they should be available in all the nodes.<br>
>><br>
>> Please send us the log of the VM that is failing so we can try and<br>
>> reproduce the problem.<br>
>><br>
>> Regards,<br>
>><br>
>> -Tino<br>
>><br>
>> --<br>
>> Constantino Vázquez, Grid & Virtualization Technology<br>
>> Engineer/Researcher: <a href="http://www.dsa-research.org/tinova" target="_blank">http://www.dsa-research.org/tinova</a><br>
>> DSA Research Group: <a href="http://dsa-research.org" target="_blank">http://dsa-research.org</a><br>
>> Globus GridWay Metascheduler: <a href="http://www.GridWay.org" target="_blank">http://www.GridWay.org</a><br>
>> OpenNebula Virtual Infrastructure Engine: <a href="http://www.OpenNebula.org" target="_blank">http://www.OpenNebula.org</a><br>
>><br>
>><br>
>><br>
>> On Thu, Feb 18, 2010 at 2:44 AM, Rangababu Chakravarthula<br>
>> <<a href="mailto:rbabu@hexagrid.com">rbabu@hexagrid.com</a>> wrote:<br>
>><br>
>>><br>
>>> We are using shared storage as defined here<br>
>>><br>
>>><br>
>>> <a href="http://www.opennebula.org/doku.php?id=documentation:rel1.2:sm#samplea_shared_image_repository" target="_blank">http://www.opennebula.org/doku.php?id=documentation:rel1.2:sm#samplea_shared_image_repository</a><br>
>>><br>
>>> When we run onevm stop or onevm suspend it tries to do SAVE_STOP and<br>
>>> SAVE_SUSPEND and creates a checkpoint file on the host<br>
>>> /var/lib/one/<vmid>/images/<br>
>>><br>
>>> and in the logs we see<br>
>>> tm_mv.sh: Will not move, is not saving image<br>
>>><br>
>>> I think it is trying to move the checkpoint file back to the management<br>
>>> node<br>
>>> and based on logic in tm_mv.sh it is not moving.<br>
>>><br>
>>> Later when we try to do onevm resume , one picks a different host and<br>
>>> tries<br>
>>> to move the checkpoint file from the management node to the new host and<br>
>>> again says "Will not move, is not saving image" and on the host it fails<br>
>>> to<br>
>>> bring the VM since there is no checkpoint file on the new host.<br>
>>><br>
>>> How can we ask ONE to not resume from checkpoint file but instead load<br>
>>> from<br>
>>> the disk file that is in the template.<br>
>>><br>
>>> Ranga<br>
>>> _______________________________________________<br>
>>> Users mailing list<br>
>>> <a href="mailto:Users@lists.opennebula.org">Users@lists.opennebula.org</a><br>
>>> <a href="http://lists.opennebula.org/listinfo.cgi/users-opennebula.org" target="_blank">http://lists.opennebula.org/listinfo.cgi/users-opennebula.org</a><br>
>>><br>
>>><br>
><br>
><br>
</div></div></blockquote></div><br>