[one-users] Running VMs listed as zombies on KVM host?
Dmitri Chebotarov
dchebota at gmu.edu
Mon Nov 17 07:48:02 PST 2014
Sorry for the late response.
I ended up rebuilding trouble nodes as part of RHEL7 upgrade.
I believe the problem was related to absence of proper fencing for HOSTS.
VMs that are using persistent storage ended up with corrupted filesystem - I fixed some disks with fsck, others restored from backup.
I'm looking into how to implement fencing (via power reset) via HOST_HOOK, all hosts can be controlled using xCAT and I have xCAT's REST API configured, I need to figure out how to execute multiple HOST_HOOK...
--
Thank you,
Dmitri Chebotarov
VCL Sys Eng, Engineering & Architectural Support, TSD - Ent Servers & Messaging
223 Aquia Building, Ffx, MSN: 1B5
Phone: (703) 993-6175 | Fax: (703) 993-3404
> On Oct 20, 2014, at 11:30 , Carlos Martín Sánchez <cmartin at opennebula.org> wrote:
>
> Hi Dmitri,
>
> Do you still experience this issue? If so, we could use more information to further debug this.
>
> Please send the output of onehost show -x.
> And from the host, the output of this command: cd /var/tmp/one/im/kvm-probes.d/; ./poll.sh
>
> Thank you.
>
> --
> Carlos Martín, MSc
> Project Engineer
> OpenNebula - Flexible Enterprise Cloud Made Simple
> www.OpenNebula.org | cmartin at opennebula.org | @OpenNebula
>
> On Wed, Oct 1, 2014 at 6:30 PM, Carlos Martín Sánchez <cmartin at opennebula.org> wrote:
> Hi,
>
> That is strange, so we opened a bug to look into it for the next release:
> http://dev.opennebula.org/issues/3217
>
> Cheers,
> Carlos
>
> --
> Carlos Martín, MSc
> Project Engineer
> OpenNebula - Flexible Enterprise Cloud Made Simple
> www.OpenNebula.org | cmartin at opennebula.org | @OpenNebula
>
> On Tue, Sep 30, 2014 at 5:46 PM, Dmitri Chebotarov <dchebota at gmu.edu> wrote:
> Hi,
>
> Something strange is happening with my VMs...
> Below is 'one host show 161' output and it shows that ONE marks running VMs as zombies on the host. It happens to multiple VMs/HOSTs, but now all. I tried to delete ZOMBIES attribute, but few minutes later it's back with the list of running VMs. What would cause this behavior?
>
> (ONE 4.8)
>
> $ onehost show 161
> HOST 161 INFORMATION
> ID : 161
> NAME : BC3-6
> CLUSTER : RHEL
> STATE : MONITORED
> IM_MAD : kvm
> VM_MAD : kvm
> VN_MAD : ovswitch
> LAST MONITORING TIME : 09/30 11:35:22
>
> HOST SHARES
> TOTAL MEM : 47.1G
> USED MEM (REAL) : 16.9G
> USED MEM (ALLOCATED) : 14G
> TOTAL CPU : 2400
> USED CPU (REAL) : 50
> USED CPU (ALLOCATED) : 700
> RUNNING VMS : 4
>
> MONITORING INFORMATION
> ARCH="x86_64"
> ARCH="x86_64"
> CPUSPEED="1596"
> CPUSPEED="1596"
> HOSTNAME="BC3-6"
> HOSTNAME="BC3-6"
> HYPERVISOR="kvm"
> HYPERVISOR="kvm"
> MODELNAME="Intel(R) Xeon(R) CPU X5660 @ 2.80GHz"
> MODELNAME="Intel(R) Xeon(R) CPU X5660 @ 2.80GHz"
> NETRX="45361301966553"
> NETRX="0"
> NETTX="2465939705538"
> NETTX="0"
> RESERVED_CPU=""
> RESERVED_MEM=""
> TOTAL_ZOMBIES="4"
> VERSION="4.8.0"
> ZOMBIES="one-51701, one-51774, one-51747, one-51763"
>
> VIRTUAL MACHINES
>
> ID USER GROUP NAME STAT UCPU UMEM HOST TIME
> 51701 vcl-gmu- vcl vmguest-vcl323 runn 14 4G BC3-6 0d 12h40
> 51747 vcl-gmu- vcl vmguest-vcl65 ( runn 9 2G BC3-6 0d 10h25
> 51763 vcl-gmu- vcl vmguest-vcl312 runn 12 4G BC3-6 0d 09h10
> 51774 vcl-gmu- vcl vmguest-vcl17 ( runn 14 4G BC3-6 0d 06h10
>
> --
> Thank you,
>
> Dmitri Chebotarov
> VCL Sys Eng, Engineering & Architectural Support, TSD - Ent Servers & Messaging
> 223 Aquia Building, Ffx, MSN: 1B5
> Phone: (703) 993-6175 | Fax: (703) 993-3404
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
>
More information about the Users
mailing list