[one-users] What remotes commands does one 4.6 use:
Steven Timm
timm at fnal.gov
Wed Jul 30 06:50:08 PDT 2014
On Wed, 30 Jul 2014, Ruben S. Montero wrote:
>
> Maybe you could try to execute the monitor probes in the node,
>
> 1. ssh the node
> 2. Go to /var/tmp/one/im
> 3. Execute run_probes kvm-probes
When I do that, (using sh -x ) I get the following:
-bash-4.1$ sh -x ./run_probes kvm-probes
++ dirname ./run_probes
+ source ./../scripts_common.sh
++ export LANG=C
++ LANG=C
++ export
PATH=/bin:/sbin:/usr/bin:/usr/krb5/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
++
PATH=/bin:/sbin:/usr/bin:/usr/krb5/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
++ AWK=awk
++ BASH=bash
++ CUT=cut
++ DATE=date
++ DD=dd
++ DF=df
++ DU=du
++ GREP=grep
++ ISCSIADM=iscsiadm
++ LVCREATE=lvcreate
++ LVREMOVE=lvremove
++ LVRENAME=lvrename
++ LVS=lvs
++ LN=ln
++ MD5SUM=md5sum
++ MKFS=mkfs
++ MKISOFS=genisoimage
++ MKSWAP=mkswap
++ QEMU_IMG=qemu-img
++ RADOS=rados
++ RBD=rbd
++ READLINK=readlink
++ RM=rm
++ SCP=scp
++ SED=sed
++ SSH=ssh
++ SUDO=sudo
++ SYNC=sync
++ TAR=tar
++ TGTADM=tgtadm
++ TGTADMIN=tgt-admin
++ TGTSETUPLUN=tgt-setup-lun-one
++ TR=tr
++ VGDISPLAY=vgdisplay
++ VMKFSTOOLS=vmkfstools
++ WGET=wget
+++ uname -s
++ '[' xLinux = xLinux ']'
++ SED='sed -r'
+++ basename ./run_probes
++ SCRIPT_NAME=run_probes
+ export LANG=C
+ LANG=C
+ HYPERVISOR_DIR=kvm-probes.d
+ ARGUMENTS=kvm-probes
++ dirname ./run_probes
+ SCRIPTS_DIR=.
+ cd .
++ '[' -d kvm-probes.d ']'
++ run_dir kvm-probes.d
++ cd kvm-probes.d
+++ ls architecture.sh collectd-client-shepherd.sh cpu.sh kvm.rb
monitor_ds.sh name.sh poll.sh version.sh
++ for i in '`ls *`'
++ '[' -x architecture.sh ']'
++ ./architecture.sh kvm-probes
++ EXIT_CODE=0
++ '[' x0 '!=' x0 ']'
++ for i in '`ls *`'
++ '[' -x collectd-client-shepherd.sh ']'
++ ./collectd-client-shepherd.sh kvm-probes
++ EXIT_CODE=0
++ '[' x0 '!=' x0 ']'
++ for i in '`ls *`'
++ '[' -x cpu.sh ']'
++ ./cpu.sh kvm-probes
++ EXIT_CODE=0
++ '[' x0 '!=' x0 ']'
++ for i in '`ls *`'
++ '[' -x kvm.rb ']'
++ ./kvm.rb kvm-probes
++ EXIT_CODE=0
++ '[' x0 '!=' x0 ']'
++ for i in '`ls *`'
++ '[' -x monitor_ds.sh ']'
++ ./monitor_ds.sh kvm-probes
[sudo] password for oneadmin:
and it stays hung on the password for oneadmin.
What's going on?
Also, you mentioned a collectd--are you saying that OpenNebula 4.6 now
needs to run a daemon on every single VM host? Where is it documented
on how to set it up?
Steve
>
> Make sure you do not have a host using the same hostname fgtest14 and running a collectd process
>
> On Jul 29, 2014 4:35 PM, "Steven Timm" <timm at fnal.gov> wrote:
>
> I am still trying to debug a nasty monitoring inconsistency.
>
> -bash-4.1$ onevm list | grep fgtest14
> 26 oneadmin oneadmin fgt6x4-26 runn 6 4G fgtest14 117d 19h50
> 27 oneadmin oneadmin fgt5x4-27 runn 10 4G fgtest14 117d 17h57
> 28 oneadmin oneadmin fgt1x1-28 runn 10 4.1G fgtest14 117d 16h59
> 30 oneadmin oneadmin fgt5x1-30 runn 0 4G fgtest14 116d 23h50
> 33 oneadmin oneadmin ip6sl5vda-33 runn 6 4G fgtest14 116d 19h57
> -bash-4.1$ onehost list
> ID NAME CLUSTER RVM ALLOCATED_CPU ALLOCATED_MEM STAT
> 3 fgtest11 ipv6 0 0 / 400 (0%) 0K / 15.7G (0%) on
> 4 fgtest12 ipv6 0 0 / 400 (0%) 0K / 15.7G (0%) on
> 7 fgtest13 ipv6 0 0 / 800 (0%) 0K / 23.6G (0%) on
> 8 fgtest14 ipv6 5 0 / 800 (0%) 0K / 23.6G (0%) on
> 9 fgtest20 ipv6 3 300 / 800 (37%) 12G / 31.4G (38%) on
> 11 fgtest19 ipv6 0 0 / 800 (0%) 0K / 31.5G (0%) on
> -bash-4.1$ onehost show 8
> HOST 8 INFORMATION
> ID : 8
> NAME : fgtest14
> CLUSTER : ipv6
> STATE : MONITORED
> IM_MAD : kvm
> VM_MAD : kvm
> VN_MAD : dummy
> LAST MONITORING TIME : 07/29 09:25:45
>
> HOST SHARES
> TOTAL MEM : 23.6G
> USED MEM (REAL) : 876.4M
> USED MEM (ALLOCATED) : 0K
> TOTAL CPU : 800
> USED CPU (REAL) : 0
> USED CPU (ALLOCATED) : 0
> RUNNING VMS : 5
>
> LOCAL SYSTEM DATASTORE #102 CAPACITY
> TOTAL: : 548.8G
> USED: : 175.3G
> FREE: : 345.6G
>
> MONITORING INFORMATION
> ARCH="x86_64"
> CPUSPEED="2992"
> HOSTNAME="fgtest14.fnal.gov"
> HYPERVISOR="kvm"
> MODELNAME="Intel(R) Xeon(R) CPU E5450 @ 3.00GHz"
> NETRX="234844577"
> NETTX="21553126"
> RESERVED_CPU=""
> RESERVED_MEM=""
> VERSION="4.6.0"
>
> VIRTUAL MACHINES
>
> ID USER GROUP NAME STAT UCPU UMEM HOST TIME
> 26 oneadmin oneadmin fgt6x4-26 runn 6 4G fgtest14 117d 19h50
> 27 oneadmin oneadmin fgt5x4-27 runn 10 4G fgtest14 117d 17h57
> 28 oneadmin oneadmin fgt1x1-28 runn 10 4.1G fgtest14 117d 17h00
> 30 oneadmin oneadmin fgt5x1-30 runn 0 4G fgtest14 116d 23h50
> 33 oneadmin oneadmin ip6sl5vda-33 runn 6 4G fgtest14 116d 19h57
> -----------------------------------------------------------------------------------
>
> All of this looks great, right?
> Just one problem: There are no VM's running on fgtest14 and
> haven't been for 4 days.
>
> [root at fgtest14 ~]# virsh list
> Id Name State
> ----------------------------------------------------
>
> [root at fgtest14 ~]#
>
> -------------------------------------------------------------------------
> Yet the monitoring reports no errors.
>
> Tue Jul 29 09:28:10 2014 [InM][D]: Host fgtest14 (8) successfully monitored.
>
> -----------------------------------------------------------------------------
> At the same time, there is no evidence that ONE is actually trying to or
> succeeding to monitor these five vm's yet they are still stuck in "runn"
> which means I can't do a onevm restart to restart them.
> (the vm images of these 5 vm's are still out there on the VM host and
> I would like to save and restart them if I can).
>
> What is the remotes command that ONE4.6 would use to monitor this host?
> Can I do it manually and see what output I get?
>
> Are we dealing with some kind of a bug, or just a very confused system?
> Any help is appreciated. I have to get this sorted out before
> I dare deploy one4.x in production.
>
> Steve Timm
>
>
> ------------------------------------------------------------------
> Steven C. Timm, Ph.D (630) 840-8525
> timm at fnal.gov http://home.fnal.gov/~timm/
> Fermilab Scientific Computing Division, Scientific Computing Services Quad.
> Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
>
>
------------------------------------------------------------------
Steven C. Timm, Ph.D (630) 840-8525
timm at fnal.gov http://home.fnal.gov/~timm/
Fermilab Scientific Computing Division, Scientific Computing Services Quad.
Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing
More information about the Users
mailing list