[one-users] What remotes commands does one 4.6 use:

Ruben S. Montero rsmontero at opennebula.org
Wed Jul 30 04:17:18 PDT 2014


Maybe you could try to execute the  monitor probes in the node,

1. ssh the node
2. Go to /var/tmp/one/im
3. Execute run_probes kvm-probes

Make sure you do not have a host using the same hostname fgtest14 and
running a  collectd process
On Jul 29, 2014 4:35 PM, "Steven Timm" <timm at fnal.gov> wrote:

>
> I am still trying to debug a nasty monitoring inconsistency.
>
> -bash-4.1$ onevm list | grep fgtest14
>     26 oneadmin oneadmin fgt6x4-26       runn    6      4G fgtest14   117d
> 19h50
>     27 oneadmin oneadmin fgt5x4-27       runn   10      4G fgtest14   117d
> 17h57
>     28 oneadmin oneadmin fgt1x1-28       runn   10    4.1G fgtest14   117d
> 16h59
>     30 oneadmin oneadmin fgt5x1-30       runn    0      4G fgtest14   116d
> 23h50
>     33 oneadmin oneadmin ip6sl5vda-33    runn    6      4G fgtest14   116d
> 19h57
> -bash-4.1$ onehost list
>   ID NAME            CLUSTER   RVM      ALLOCATED_CPU      ALLOCATED_MEM
> STAT
>    3 fgtest11        ipv6        0       0 / 400 (0%)    0K / 15.7G (0%) on
>    4 fgtest12        ipv6        0       0 / 400 (0%)    0K / 15.7G (0%) on
>    7 fgtest13        ipv6        0       0 / 800 (0%)    0K / 23.6G (0%) on
>    8 fgtest14        ipv6        5       0 / 800 (0%)    0K / 23.6G (0%) on
>    9 fgtest20        ipv6        3    300 / 800 (37%)  12G / 31.4G (38%) on
>   11 fgtest19        ipv6        0       0 / 800 (0%)    0K / 31.5G (0%) on
> -bash-4.1$ onehost show 8
> HOST 8 INFORMATION
> ID                    : 8
> NAME                  : fgtest14
> CLUSTER               : ipv6
> STATE                 : MONITORED
> IM_MAD                : kvm
> VM_MAD                : kvm
> VN_MAD                : dummy
> LAST MONITORING TIME  : 07/29 09:25:45
>
> HOST SHARES
> TOTAL MEM             : 23.6G
> USED MEM (REAL)       : 876.4M
> USED MEM (ALLOCATED)  : 0K
> TOTAL CPU             : 800
> USED CPU (REAL)       : 0
> USED CPU (ALLOCATED)  : 0
> RUNNING VMS           : 5
>
> LOCAL SYSTEM DATASTORE #102 CAPACITY
> TOTAL:                : 548.8G
> USED:                 : 175.3G
> FREE:                 : 345.6G
>
> MONITORING INFORMATION
> ARCH="x86_64"
> CPUSPEED="2992"
> HOSTNAME="fgtest14.fnal.gov"
> HYPERVISOR="kvm"
> MODELNAME="Intel(R) Xeon(R) CPU           E5450  @ 3.00GHz"
> NETRX="234844577"
> NETTX="21553126"
> RESERVED_CPU=""
> RESERVED_MEM=""
> VERSION="4.6.0"
>
> VIRTUAL MACHINES
>
>     ID USER     GROUP    NAME            STAT UCPU    UMEM HOST TIME
>     26 oneadmin oneadmin fgt6x4-26       runn    6      4G fgtest14   117d
> 19h50
>     27 oneadmin oneadmin fgt5x4-27       runn   10      4G fgtest14   117d
> 17h57
>     28 oneadmin oneadmin fgt1x1-28       runn   10    4.1G fgtest14   117d
> 17h00
>     30 oneadmin oneadmin fgt5x1-30       runn    0      4G fgtest14   116d
> 23h50
>     33 oneadmin oneadmin ip6sl5vda-33    runn    6      4G fgtest14   116d
> 19h57
> ------------------------------------------------------------
> -----------------------
>
> All of this looks great, right?
> Just one problem:  There are no VM's running on fgtest14 and
> haven't been for 4 days.
>
> [root at fgtest14 ~]# virsh list
>  Id    Name                           State
> ----------------------------------------------------
>
> [root at fgtest14 ~]#
>
> -------------------------------------------------------------------------
> Yet the monitoring reports no errors.
>
> Tue Jul 29 09:28:10 2014 [InM][D]: Host fgtest14 (8) successfully
> monitored.
>
> ------------------------------------------------------------
> -----------------
> At the same time, there is no evidence that ONE is actually trying to or
> succeeding to monitor these five vm's yet they are still stuck in "runn"
> which means I can't do a onevm restart to restart them.
> (the vm images of these 5 vm's are still out there on the VM host and
> I would like to save and restart them if I can).
>
> What is the remotes command that ONE4.6 would use to monitor this host?
> Can I do it manually and see what output I get?
>
> Are we dealing with some kind of a bug, or just a very confused system?
> Any help is appreciated. I have to get this sorted out before
> I dare deploy one4.x in production.
>
> Steve Timm
>
>
> ------------------------------------------------------------------
> Steven C. Timm, Ph.D  (630) 840-8525
> timm at fnal.gov  http://home.fnal.gov/~timm/
> Fermilab Scientific Computing Division, Scientific Computing Services Quad.
> Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20140730/f254a16f/attachment.htm>


More information about the Users mailing list