[one-users] Stop-Resume failing with shared storage

Tino Vazquez tinova at fdi.ucm.es
Mon Mar 15 03:35:31 PDT 2010


Hi there,

Sorry, but I'm failing to see the

tm_mv.sh: Will not move, is not saving image

message anywhere in your logs.

Regards,

-Tino

--
Constantino Vázquez, Grid & Virtualization Technology
Engineer/Researcher: http://www.dsa-research.org/tinova
DSA Research Group: http://dsa-research.org
Globus GridWay Metascheduler: http://www.GridWay.org
OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org



On Mon, Mar 8, 2010 at 5:04 AM, Rangababu Chakravarthula
<rbabu at hexagrid.com> wrote:
> Thank you Tino. Sorry for the late reply. Here are the detailed logs. Any
> help is appreciated.
>
> NFS SHARED IMAGES DIRECTORY BETWEEN ALL HOSTS /mnt/sharedimagesdir
>
> Contents of ONED.CONF
>
> VM_DIR=/mnt/sharedimagesdir
> IM_MAD = [
>   name       = "im_kvm",
>   executable = "one_im_ssh",
>   arguments  = "im_kvm/im_kvm.conf",
>   default    = "im_kvm/im_kvm.conf" ]
> VM_MAD = [
>    name       = "vmm_kvm",
>    executable = "one_vmm_kvm",
>    default    = "vmm_kvm/vmm_kvm.conf",
>    type       = "kvm" ]
> TM_MAD = [
>       name       = "tm_nfs",
>       executable = "one_tm",
>       arguments  = "tm_nfs/tm_nfs.conf",
>       default    = "tm_nfs/tm_nfs.conf" ]
>
> WE MODIFIED tm_clone.sh & tm_ln.sh to add SSH
>
>
> SUBMITTED NEW VM
>
> onevm show 433
>
> VID            : 433                UID            : 0
>  STATE          : ACTIVE             LCM STATE      : RUNNING
>  DEPLOY ID      : one-433            MEMORY         : 262144             CPU
>            : 0                  LAST POLL      : 1267828125         START
> TIME     : 03/05 16:12:02     STOP TIME      : 12/31 18:00:00     NET TX
>     : 0                  NET RX         : 0
> ....: Template :....
>   DISK            :
> CLONE=no,SOURCE=/mnt/sharedimagesdir/images/onetest0,TARGET=hda,TYPE=disk
>   GRAPHICS        : LISTEN=0.0.0.0,PORT=6003,TYPE=vnc
>   INPUT           : TYPE=tablet           MEMORY          : 256
>       NAME            : onetest               NIC             :
> BRIDGE=br171,MAC=00:04:c9:5b:44:8a
>   OS              : BOOT=hd               VCPU            : 1
>
>
> ON THE MANAGEMENT NODE
>
> root at ManagementNode:/etc/one/tm_nfs# ls -al /var/lib/one/433/
> total 24
> drwxrwxrwx   2 oneadmin nogroup  4096 2010-03-05 16:12 .
> drwxr-xr-x 437 oneadmin root    12288 2010-03-05 16:26 ..
> -rw-r--r--   1 oneadmin nogroup   549 2010-03-05 16:12 deployment.0
> -rw-r--r--   1 oneadmin nogroup    89 2010-03-05 16:12 transfer.0
>
> /var/log/one/433.log
>
> Fri Mar  5 16:12:11 2010 [DiM][I]: New VM state is ACTIVE.
> Fri Mar  5 16:12:11 2010 [LCM][I]: New VM state is PROLOG.
> Fri Mar  5 16:12:11 2010 [TM][I]: tm_ln.sh: Creating directory
> /mnt/sharedimagesdir/433/images
> Fri Mar  5 16:12:11 2010 [TM][I]: tm_ln.sh: Executed "ssh 10.10.20.190 mkdir
> -p /mnt/sharedimagesdir/433/images".
> Fri Mar  5 16:12:11 2010 [TM][I]: tm_ln.sh: Executed "ssh 10.10.20.190 chmod
> a+w /mnt/sharedimagesdir/433/images".
> Fri Mar  5 16:12:11 2010 [TM][I]: tm_ln.sh: Link
> /mnt/sharedimagesdir/images/onetest0
> Fri Mar  5 16:12:11 2010 [TM][I]: tm_ln.sh: Executed "ssh 10.10.20.190 ln -s
> /mnt/sharedimagesdir/images/onetest0
> /mnt/sharedimagesdir/433/images/disk.0".
> Fri Mar  5 16:12:11 2010 [LCM][I]: New VM state is BOOT
> Fri Mar  5 16:12:11 2010 [VMM][I]: Generating deployment file:
> /var/lib/one/433/deployment.0
> Fri Mar  5 16:12:11 2010 [VMM][I]: Command: scp
> /var/lib/one/433/deployment.0
> 10.10.20.190:/mnt/sharedimagesdir/433/images/deployment.0
> Fri Mar  5 16:12:11 2010 [VMM][I]: Copy success
> Fri Mar  5 16:12:12 2010 [VMM][I]: Connecting to uri: qemu:///system
> Fri Mar  5 16:12:12 2010 [VMM][I]: ExitCode: 0
> Fri Mar  5 16:12:12 2010 [LCM][I]: New VM state is RUNNING
>
>
> onevm list
>
> 433  onetest runn   0  262144    10.10.20.190 00 00:16:44
>
>
>
>
>
> ON THE HOST
>
> root at 00238bbda914:/mnt/sharedimagesdir# ls -ltr
> /mnt/sharedimagesdir/433/images/
> total 2
> lrwxrwxrwx  1 oneadmin nogroup  32 2010-03-05 22:08 disk.0 ->
> /mnt/sharedimagesdir/images/onetest0
> -rw-r--r--+ 1 oneadmin nogroup 549 2010-03-05 22:08 deployment.0
> root at 00238bbda914:/mnt/sharedimagesdir#
>
>
> /var/log/libvirt/qemu/433.log on HOST
>
> LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
> /usr/bin/kvm -S -M pc-0.11 -m 256 -smp 1 -name one-433 -uuid
> 74c151d6-b1f5-3e41-fc45-e7fdc9247722 -monitor
> unix:/var/run/libvirt/qemu/one-433.monitor,server,nowait -boot c -drive
> file=/mnt/sharedimagesdir/433/images/disk.0,if=ide,index=0,boot=on -net
> nic,macaddr=00:04:c9:5b:44:8a,vlan=0,name=nic.0 -net
> tap,fd=20,vlan=0,name=tap.0 -serial none -parallel none -usb -usbdevice
> tablet -vnc 0.0.0.0:103 -vga cirrus
>
> deployment.0 file on HOST
>
> <domain type='kvm'>
>       <name>one-433</name>
>       <vcpu>1</vcpu>
>       <memory>262144</memory>
>       <os>
>               <type>hvm</type>
>               <boot dev='hd'/>
>       </os>
>       <devices>
>               <emulator>/usr/bin/kvm</emulator>
>               <disk type='file' device='disk'>
>                       <source
> file='/mnt/sharedimagesdir/433/images/disk.0'/>
>                       <target dev='hda'/>
>               </disk>
>               <interface type='bridge'>
>                       <source bridge='br171'/>
>                       <mac address='00:04:c9:5b:44:8a'/>
>               </interface>
>               <graphics type='vnc' listen='0.0.0.0' port='6003'/>
>               <input type='tablet'/>
>       </devices>
>       <features>
>               <acpi/>
>       </features>
> </domain>
>
>
> SUSPEND INVOKED
>
>
> onevm list
>
> 433  onetest susp   0  262144    10.10.20.190 00 00:25:08
>
> 433.log
>
> Fri Mar  5 16:35:28 2010 [LCM][I]: New VM state is SAVE_SUSPEND
> Fri Mar  5 16:35:29 2010 [VMM][I]: Connecting to uri: qemu:///system
> Fri Mar  5 16:35:29 2010 [VMM][I]: ExitCode: 0
> Fri Mar  5 16:35:29 2010 [DiM][I]: New VM state is SUSPENDED
>
> Oned.log
>
> Fri Mar  5 16:35:28 2010 [ReM][D]: VirtualMachineAction invoked
> Fri Mar  5 16:35:28 2010 [DiM][D]: Suspending VM 433
> Fri Mar  5 16:35:29 2010 [VMM][D]: Message received: LOG - 433 Connecting to
> uri: qemu:///system
>
> Fri Mar  5 16:35:29 2010 [VMM][D]: Message received: LOG - 433 ExitCode: 0
>
> Fri Mar  5 16:35:29 2010 [VMM][D]: Message received: SAVE SUCCESS 433
>
> ONE THE HOST
>
> root at 00238bbda914:/mnt/sharedimagesdir/433/images# ls -ltr
> total 3
> lrwxrwxrwx  1 oneadmin nogroup     32 2010-03-05 22:08 disk.0 ->
> /mnt/sharedimagesdir/images/onetest0
> -rw-r--r--+ 1 oneadmin nogroup    549 2010-03-05 22:08 deployment.0
> -rw-------+ 1 root     root    940894 2010-03-05 22:31 checkpoint
>
>
>
>
>
>
> Tino Vazquez wrote:
>>
>> Hi Ranga,
>>
>> If you are using a shared repository (i'll assume you use NFS or a
>> similar distributed FS), then the "<vmid>/images/" is shared between
>> all the remote hosts, so there is no need to move the checkpoint files
>> and they should be available in all the nodes.
>>
>> Please send us the log of the VM that is failing so we can try and
>> reproduce the problem.
>>
>> Regards,
>>
>> -Tino
>>
>> --
>> Constantino Vázquez, Grid & Virtualization Technology
>> Engineer/Researcher: http://www.dsa-research.org/tinova
>> DSA Research Group: http://dsa-research.org
>> Globus GridWay Metascheduler: http://www.GridWay.org
>> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>>
>>
>>
>> On Thu, Feb 18, 2010 at 2:44 AM, Rangababu Chakravarthula
>> <rbabu at hexagrid.com> wrote:
>>
>>>
>>> We are using shared storage as defined here
>>>
>>>
>>> http://www.opennebula.org/doku.php?id=documentation:rel1.2:sm#samplea_shared_image_repository
>>>
>>> When we run onevm stop or onevm suspend it tries to do SAVE_STOP and
>>> SAVE_SUSPEND and creates a checkpoint file on the host
>>> /var/lib/one/<vmid>/images/
>>>
>>> and in the logs we see
>>> tm_mv.sh: Will not move, is not saving image
>>>
>>> I think it is trying to move the checkpoint file back to the management
>>> node
>>> and based on logic in tm_mv.sh it is not moving.
>>>
>>> Later when we try to do onevm resume , one picks a different host and
>>> tries
>>> to move the checkpoint file from the management node to the new host and
>>> again says "Will not move, is not saving image" and on the host it fails
>>> to
>>> bring the VM  since there is no checkpoint file on the new host.
>>>
>>> How can we ask ONE to not resume from checkpoint file but instead load
>>> from
>>> the disk file that is in the template.
>>>
>>> Ranga
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opennebula.org
>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>
>>>
>
>



More information about the Users mailing list