[one-users] What remotes commands does one 4.6 use:
Ruben S. Montero
rsmontero at opennebula.org
Wed Jul 30 07:58:35 PDT 2014
This seems to be a bug, when collectd does not respond (because of waiting
for sudo password) OpenNebula does not move the hosts to ERROR. The probes
are designed to not start another collectd process; but probably we should
check that a running one it is not working and send the ERROR message to
OpenNebula.
Pointer to the issue:
http://dev.opennebula.org/issues/3118
Cheers
On Wed, Jul 30, 2014 at 4:53 PM, Steven Timm <timm at fnal.gov> wrote:
> On Wed, 30 Jul 2014, Ruben S. Montero wrote:
>
> Hi,
>> 1.- monitor_ds.sh may use LVM commands (vgdisplay) that needs sudo
>> access. It should be automatically setup by the opennebula node
>> packages.
>>
>> 2.- It is not a real daemon, the first time a host is monitored a process
>> is left to periodically send information. OpenNebula
>> restarts it if no information is received in 3 monitor steps. Nothing
>> needs to be set up...
>>
>> Cheers
>>
>>
> On further inspection I found that this collectd was running on my nodes,
> and obviously failing up until now because the sudoers was not set
> correctly. But there was nothing to warn us about it. Nothing on
> the opennebula head node to even tell us that the information was stale.
> No log file on the node to show the errors we were getting. In short,
> it was just quietly dying and we had no idea. How to make sure this
> doesn't happen again in the future?
>
> Steve Timm
>
>
>
>
>
>
>
>> On Wed, Jul 30, 2014 at 3:50 PM, Steven Timm <timm at fnal.gov> wrote:
>> On Wed, 30 Jul 2014, Ruben S. Montero wrote:
>>
>>
>> Maybe you could try to execute the monitor probes in the
>> node,
>>
>> 1. ssh the node
>> 2. Go to /var/tmp/one/im
>> 3. Execute run_probes kvm-probes
>>
>>
>> When I do that, (using sh -x ) I get the following:
>>
>> -bash-4.1$ sh -x ./run_probes kvm-probes
>> ++ dirname ./run_probes
>> + source ./../scripts_common.sh
>> ++ export LANG=C
>> ++ LANG=C
>> ++ export
>> PATH=/bin:/sbin:/usr/bin:/usr/krb5/bin:/usr/lib64/qt-3.3/
>> bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
>> ++
>> PATH=/bin:/sbin:/usr/bin:/usr/krb5/bin:/usr/lib64/qt-3.3/
>> bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
>> ++ AWK=awk
>> ++ BASH=bash
>> ++ CUT=cut
>> ++ DATE=date
>> ++ DD=dd
>> ++ DF=df
>> ++ DU=du
>> ++ GREP=grep
>> ++ ISCSIADM=iscsiadm
>> ++ LVCREATE=lvcreate
>> ++ LVREMOVE=lvremove
>> ++ LVRENAME=lvrename
>> ++ LVS=lvs
>> ++ LN=ln
>> ++ MD5SUM=md5sum
>> ++ MKFS=mkfs
>> ++ MKISOFS=genisoimage
>> ++ MKSWAP=mkswap
>> ++ QEMU_IMG=qemu-img
>> ++ RADOS=rados
>> ++ RBD=rbd
>> ++ READLINK=readlink
>> ++ RM=rm
>> ++ SCP=scp
>> ++ SED=sed
>> ++ SSH=ssh
>> ++ SUDO=sudo
>> ++ SYNC=sync
>> ++ TAR=tar
>> ++ TGTADM=tgtadm
>> ++ TGTADMIN=tgt-admin
>> ++ TGTSETUPLUN=tgt-setup-lun-one
>> ++ TR=tr
>> ++ VGDISPLAY=vgdisplay
>> ++ VMKFSTOOLS=vmkfstools
>> ++ WGET=wget
>> +++ uname -s
>> ++ '[' xLinux = xLinux ']'
>> ++ SED='sed -r'
>> +++ basename ./run_probes
>> ++ SCRIPT_NAME=run_probes
>> + export LANG=C
>> + LANG=C
>> + HYPERVISOR_DIR=kvm-probes.d
>> + ARGUMENTS=kvm-probes
>> ++ dirname ./run_probes
>> + SCRIPTS_DIR=.
>> + cd .
>> ++ '[' -d kvm-probes.d ']'
>> ++ run_dir kvm-probes.d
>> ++ cd kvm-probes.d
>> +++ ls architecture.sh collectd-client-shepherd.sh cpu.sh kvm.rb
>> monitor_ds.sh name.sh poll.sh version.sh
>> ++ for i in '`ls *`'
>> ++ '[' -x architecture.sh ']'
>> ++ ./architecture.sh kvm-probes
>> ++ EXIT_CODE=0
>> ++ '[' x0 '!=' x0 ']'
>> ++ for i in '`ls *`'
>> ++ '[' -x collectd-client-shepherd.sh ']'
>> ++ ./collectd-client-shepherd.sh kvm-probes
>> ++ EXIT_CODE=0
>> ++ '[' x0 '!=' x0 ']'
>> ++ for i in '`ls *`'
>> ++ '[' -x cpu.sh ']'
>> ++ ./cpu.sh kvm-probes
>> ++ EXIT_CODE=0
>> ++ '[' x0 '!=' x0 ']'
>> ++ for i in '`ls *`'
>> ++ '[' -x kvm.rb ']'
>> ++ ./kvm.rb kvm-probes
>> ++ EXIT_CODE=0
>> ++ '[' x0 '!=' x0 ']'
>> ++ for i in '`ls *`'
>> ++ '[' -x monitor_ds.sh ']'
>> ++ ./monitor_ds.sh kvm-probes
>> [sudo] password for oneadmin:
>>
>> and it stays hung on the password for oneadmin.
>>
>> What's going on?
>>
>> Also, you mentioned a collectd--are you saying that OpenNebula 4.6
>> now needs to run a daemon on every single VM host?
>> Where is it documented
>> on how to set it up?
>>
>> Steve
>>
>>
>>
>>
>>
>>
>>
>> Make sure you do not have a host using the same hostname
>> fgtest14 and running a collectd process
>>
>> On Jul 29, 2014 4:35 PM, "Steven Timm" <timm at fnal.gov> wrote:
>>
>> I am still trying to debug a nasty monitoring
>> inconsistency.
>>
>> -bash-4.1$ onevm list | grep fgtest14
>> 26 oneadmin oneadmin fgt6x4-26 runn 6
>> 4G fgtest14 117d 19h50
>> 27 oneadmin oneadmin fgt5x4-27 runn 10
>> 4G fgtest14 117d 17h57
>> 28 oneadmin oneadmin fgt1x1-28 runn 10
>> 4.1G fgtest14 117d 16h59
>> 30 oneadmin oneadmin fgt5x1-30 runn 0
>> 4G fgtest14 116d 23h50
>> 33 oneadmin oneadmin ip6sl5vda-33 runn 6
>> 4G fgtest14 116d 19h57
>> -bash-4.1$ onehost list
>> ID NAME CLUSTER RVM ALLOCATED_CPU
>> ALLOCATED_MEM STAT
>> 3 fgtest11 ipv6 0 0 / 400 (0%)
>> 0K / 15.7G (0%) on
>> 4 fgtest12 ipv6 0 0 / 400 (0%)
>> 0K / 15.7G (0%) on
>> 7 fgtest13 ipv6 0 0 / 800 (0%)
>> 0K / 23.6G (0%) on
>> 8 fgtest14 ipv6 5 0 / 800 (0%)
>> 0K / 23.6G (0%) on
>> 9 fgtest20 ipv6 3 300 / 800 (37%)
>> 12G / 31.4G (38%) on
>> 11 fgtest19 ipv6 0 0 / 800 (0%)
>> 0K / 31.5G (0%) on
>> -bash-4.1$ onehost show 8
>> HOST 8 INFORMATION
>> ID : 8
>> NAME : fgtest14
>> CLUSTER : ipv6
>> STATE : MONITORED
>> IM_MAD : kvm
>> VM_MAD : kvm
>> VN_MAD : dummy
>> LAST MONITORING TIME : 07/29 09:25:45
>>
>> HOST SHARES
>> TOTAL MEM : 23.6G
>> USED MEM (REAL) : 876.4M
>> USED MEM (ALLOCATED) : 0K
>> TOTAL CPU : 800
>> USED CPU (REAL) : 0
>> USED CPU (ALLOCATED) : 0
>> RUNNING VMS : 5
>>
>> LOCAL SYSTEM DATASTORE #102 CAPACITY
>> TOTAL: : 548.8G
>> USED: : 175.3G
>> FREE: : 345.6G
>>
>> MONITORING INFORMATION
>> ARCH="x86_64"
>> CPUSPEED="2992"
>> HOSTNAME="fgtest14.fnal.gov"
>> HYPERVISOR="kvm"
>> MODELNAME="Intel(R) Xeon(R) CPU E5450 @
>> 3.00GHz"
>> NETRX="234844577"
>> NETTX="21553126"
>> RESERVED_CPU=""
>> RESERVED_MEM=""
>> VERSION="4.6.0"
>>
>> VIRTUAL MACHINES
>>
>> ID USER GROUP NAME STAT UCPU
>> UMEM HOST TIME
>> 26 oneadmin oneadmin fgt6x4-26 runn 6
>> 4G fgtest14 117d 19h50
>> 27 oneadmin oneadmin fgt5x4-27 runn 10
>> 4G fgtest14 117d 17h57
>> 28 oneadmin oneadmin fgt1x1-28 runn 10
>> 4.1G fgtest14 117d 17h00
>> 30 oneadmin oneadmin fgt5x1-30 runn 0
>> 4G fgtest14 116d 23h50
>> 33 oneadmin oneadmin ip6sl5vda-33 runn 6
>> 4G fgtest14 116d 19h57
>> ------------------------------
>> -----------------------------------------------------
>>
>> All of this looks great, right?
>> Just one problem: There are no VM's running on
>> fgtest14 and
>> haven't been for 4 days.
>>
>> [root at fgtest14 ~]# virsh list
>> Id Name State
>> ----------------------------------------------------
>>
>> [root at fgtest14 ~]#
>>
>> ------------------------------
>> -------------------------------------------
>> Yet the monitoring reports no errors.
>>
>> Tue Jul 29 09:28:10 2014 [InM][D]: Host fgtest14 (8)
>> successfully monitored.
>>
>> ------------------------------
>> -----------------------------------------------
>> At the same time, there is no evidence that ONE is
>> actually trying to or
>> succeeding to monitor these five vm's yet they are
>> still stuck in "runn"
>> which means I can't do a onevm restart to restart them.
>> (the vm images of these 5 vm's are still out there on
>> the VM host and
>> I would like to save and restart them if I can).
>>
>> What is the remotes command that ONE4.6 would use to
>> monitor this host?
>> Can I do it manually and see what output I get?
>>
>> Are we dealing with some kind of a bug, or just a very
>> confused system?
>> Any help is appreciated. I have to get this sorted out
>> before
>> I dare deploy one4.x in production.
>>
>> Steve Timm
>>
>>
>> ------------------------------
>> ------------------------------------
>> Steven C. Timm, Ph.D (630) 840-8525
>> timm at fnal.gov http://home.fnal.gov/~timm/
>> Fermilab Scientific Computing Division, Scientific
>> Computing Services Quad.
>> Grid and Cloud Services Dept., Associate Dept. Head for
>> Cloud Computing
>> _______________________________________________
>> Users mailing list
>> Users at lists.opennebula.org
>> http://lists.opennebula.org/
>> listinfo.cgi/users-opennebula.org
>>
>>
>>
>>
>> ------------------------------------------------------------------
>> Steven C. Timm, Ph.D (630) 840-8525
>> timm at fnal.gov http://home.fnal.gov/~timm/
>> Fermilab Scientific Computing Division, Scientific Computing
>> Services Quad.
>> Grid and Cloud Services Dept., Associate Dept. Head for Cloud
>> Computing
>>
>>
>>
>>
>> --
>> --
>> Ruben S. Montero, PhD
>> Project co-Lead and Chief Architect OpenNebula - Flexible Enterprise
>> Cloud Made Simple
>> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>>
>>
>>
> ------------------------------------------------------------------
> Steven C. Timm, Ph.D (630) 840-8525
> timm at fnal.gov http://home.fnal.gov/~timm/
> Fermilab Scientific Computing Division, Scientific Computing Services Quad.
> Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing
>
--
--
Ruben S. Montero, PhD
Project co-Lead and Chief Architect
OpenNebula - Flexible Enterprise Cloud Made Simple
www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20140730/8473b5c3/attachment-0001.htm>
More information about the Users
mailing list