[one-users] vm status is failed but the actual VM is fine

Shi Jin jinzishuai at gmail.com
Thu Oct 8 21:18:32 PDT 2009


Hi there,

I just had a interesting experience. OpenNebula returns a failed
status but the VM is running fine on the node. virsh list shows it
running and I am able to login to the VM.
The vm.log from OpenNebula shows:
Thu Oct  8 21:01:14 2009 [TM][I]: tm_clone.sh: Executed "scp
onefrontend64:/opt/cloud/ImgRep/Haemonetics/client/biomat_client_xp.qcow2
node1:/opt/cloud/VM/57/images/disk.0".
Thu Oct  8 21:01:14 2009 [TM][I]: tm_clone.sh: Executed "ssh node1
chmod a+w /opt/cloud/VM/57/images/disk.0".
Thu Oct  8 21:01:14 2009 [LCM][I]: New VM state is BOOT
Thu Oct  8 21:01:14 2009 [VMM][I]: Generating deployment file:
/srv/cloud/one/var/57/deployment.0
Thu Oct  8 21:01:29 2009 [VMM][I]: Command execution fail: 'cat >
/opt/cloud/VM/57/images/deployment.0 && virsh --connect qemu:///system
create /opt/cloud/VM/57/images/deployment.0'
Thu Oct  8 21:01:29 2009 [VMM][I]: STDERR follows.
Thu Oct  8 21:01:29 2009 [VMM][I]: Connecting to uri: qemu:///system
Thu Oct  8 21:01:29 2009 [VMM][I]: error: Failed to create domain from
/opt/cloud/VM/57/images/deployment.0
Thu Oct  8 21:01:29 2009 [VMM][I]: error: server closed connection
Thu Oct  8 21:01:29 2009 [VMM][I]: ExitCode: 141
Thu Oct  8 21:01:29 2009 [VMM][E]: Error deploying virtual machine
Thu Oct  8 21:01:29 2009 [DiM][I]: New VM state is FAILED
Thu Oct  8 21:01:35 2009 [TM][W]: Ignored: LOG - 57 tm_delete.sh:
Deleting /opt/cloud/VM/57/images

Thu Oct  8 21:01:35 2009 [TM][W]: Ignored: LOG - 57 tm_delete.sh:
Executed "ssh node1 rm -rf /opt/cloud/VM/57/images".

Thu Oct  8 21:01:35 2009 [TM][W]: Ignored: TRANSFER SUCCESS 57 -

Thu Oct  8 21:56:42 2009 [DiM][I]: New VM state is DONE.

The funny part is that now the actual /opt/cloud/VM/57/images/disk.0
on the node has been removed after I ran "onevm delete" but the VM is
still running. I guess this is how Linux handles files: files actually
removed when all processes using it are finished.

Does the error code 141 tells us more information?

I should also mention that this error happens when I tried to deploy
many VMs in the same time.

Thanks.
Shi

-- 
Shi Jin, Ph.D.



More information about the Users mailing list