[one-users] vm status is failed but the actual VM is fine

Javier Fontan jfontan at gmail.com
Tue Oct 13 03:43:46 PDT 2009


Hello,

I have been stress testing OpenNebula in some libvirt/kvm nodes and
had the same problem. When libvirt gets lost of connections (around 5)
to create virtual machines it drops some and others behave like you
describe. Anyway, most of the error VMs I had because of this didn't
start up at all, just a couple of the appeared as running after the
error.

We are checking possibilities on how to minimize these problems but we
are still not sure on how to solve that. Maybe newer libvirt versions
behave better when stressed.

Bye

On Fri, Oct 9, 2009 at 6:18 AM, Shi Jin <jinzishuai at gmail.com> wrote:
> Hi there,
>
> I just had a interesting experience. OpenNebula returns a failed
> status but the VM is running fine on the node. virsh list shows it
> running and I am able to login to the VM.
> The vm.log from OpenNebula shows:
> Thu Oct  8 21:01:14 2009 [TM][I]: tm_clone.sh: Executed "scp
> onefrontend64:/opt/cloud/ImgRep/Haemonetics/client/biomat_client_xp.qcow2
> node1:/opt/cloud/VM/57/images/disk.0".
> Thu Oct  8 21:01:14 2009 [TM][I]: tm_clone.sh: Executed "ssh node1
> chmod a+w /opt/cloud/VM/57/images/disk.0".
> Thu Oct  8 21:01:14 2009 [LCM][I]: New VM state is BOOT
> Thu Oct  8 21:01:14 2009 [VMM][I]: Generating deployment file:
> /srv/cloud/one/var/57/deployment.0
> Thu Oct  8 21:01:29 2009 [VMM][I]: Command execution fail: 'cat >
> /opt/cloud/VM/57/images/deployment.0 && virsh --connect qemu:///system
> create /opt/cloud/VM/57/images/deployment.0'
> Thu Oct  8 21:01:29 2009 [VMM][I]: STDERR follows.
> Thu Oct  8 21:01:29 2009 [VMM][I]: Connecting to uri: qemu:///system
> Thu Oct  8 21:01:29 2009 [VMM][I]: error: Failed to create domain from
> /opt/cloud/VM/57/images/deployment.0
> Thu Oct  8 21:01:29 2009 [VMM][I]: error: server closed connection
> Thu Oct  8 21:01:29 2009 [VMM][I]: ExitCode: 141
> Thu Oct  8 21:01:29 2009 [VMM][E]: Error deploying virtual machine
> Thu Oct  8 21:01:29 2009 [DiM][I]: New VM state is FAILED
> Thu Oct  8 21:01:35 2009 [TM][W]: Ignored: LOG - 57 tm_delete.sh:
> Deleting /opt/cloud/VM/57/images
>
> Thu Oct  8 21:01:35 2009 [TM][W]: Ignored: LOG - 57 tm_delete.sh:
> Executed "ssh node1 rm -rf /opt/cloud/VM/57/images".
>
> Thu Oct  8 21:01:35 2009 [TM][W]: Ignored: TRANSFER SUCCESS 57 -
>
> Thu Oct  8 21:56:42 2009 [DiM][I]: New VM state is DONE.
>
> The funny part is that now the actual /opt/cloud/VM/57/images/disk.0
> on the node has been removed after I ran "onevm delete" but the VM is
> still running. I guess this is how Linux handles files: files actually
> removed when all processes using it are finished.
>
> Does the error code 141 tells us more information?
>
> I should also mention that this error happens when I tried to deploy
> many VMs in the same time.
>
> Thanks.
> Shi
>
> --
> Shi Jin, Ph.D.
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>



-- 
Javier Fontan, Grid & Virtualization Technology Engineer/Researcher
DSA Research Group: http://dsa-research.org
Globus GridWay Metascheduler: http://www.GridWay.org
OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org



More information about the Users mailing list