[one-users] What remotes commands does one 4.6 use:
Ruben S. Montero
rsmontero at opennebula.org
Wed Jul 30 08:02:58 PDT 2014
Sorry for the noise, just saw it in the other thread...
On Wed, Jul 30, 2014 at 5:01 PM, Ruben S. Montero <rsmontero at opennebula.org>
wrote:
> BTW, Could you paste the ouput of run_probes commands once it finish?
>
>
> On Wed, Jul 30, 2014 at 4:58 PM, Ruben S. Montero <
> rsmontero at opennebula.org> wrote:
>
>> This seems to be a bug, when collectd does not respond (because of
>> waiting for sudo password) OpenNebula does not move the hosts to ERROR. The
>> probes are designed to not start another collectd process; but probably we
>> should check that a running one it is not working and send the ERROR
>> message to OpenNebula.
>>
>> Pointer to the issue:
>> http://dev.opennebula.org/issues/3118
>>
>> Cheers
>>
>>
>> On Wed, Jul 30, 2014 at 4:53 PM, Steven Timm <timm at fnal.gov> wrote:
>>
>>> On Wed, 30 Jul 2014, Ruben S. Montero wrote:
>>>
>>> Hi,
>>>> 1.- monitor_ds.sh may use LVM commands (vgdisplay) that needs sudo
>>>> access. It should be automatically setup by the opennebula node
>>>> packages.
>>>>
>>>> 2.- It is not a real daemon, the first time a host is monitored a
>>>> process is left to periodically send information. OpenNebula
>>>> restarts it if no information is received in 3 monitor steps. Nothing
>>>> needs to be set up...
>>>>
>>>> Cheers
>>>>
>>>>
>>> On further inspection I found that this collectd was running on my
>>> nodes, and obviously failing up until now because the sudoers was not set
>>> correctly. But there was nothing to warn us about it. Nothing on
>>> the opennebula head node to even tell us that the information was stale.
>>> No log file on the node to show the errors we were getting. In short,
>>> it was just quietly dying and we had no idea. How to make sure this
>>> doesn't happen again in the future?
>>>
>>> Steve Timm
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> On Wed, Jul 30, 2014 at 3:50 PM, Steven Timm <timm at fnal.gov> wrote:
>>>> On Wed, 30 Jul 2014, Ruben S. Montero wrote:
>>>>
>>>>
>>>> Maybe you could try to execute the monitor probes in the
>>>> node,
>>>>
>>>> 1. ssh the node
>>>> 2. Go to /var/tmp/one/im
>>>> 3. Execute run_probes kvm-probes
>>>>
>>>>
>>>> When I do that, (using sh -x ) I get the following:
>>>>
>>>> -bash-4.1$ sh -x ./run_probes kvm-probes
>>>> ++ dirname ./run_probes
>>>> + source ./../scripts_common.sh
>>>> ++ export LANG=C
>>>> ++ LANG=C
>>>> ++ export
>>>> PATH=/bin:/sbin:/usr/bin:/usr/krb5/bin:/usr/lib64/qt-3.3/
>>>> bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
>>>> ++
>>>> PATH=/bin:/sbin:/usr/bin:/usr/krb5/bin:/usr/lib64/qt-3.3/
>>>> bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
>>>> ++ AWK=awk
>>>> ++ BASH=bash
>>>> ++ CUT=cut
>>>> ++ DATE=date
>>>> ++ DD=dd
>>>> ++ DF=df
>>>> ++ DU=du
>>>> ++ GREP=grep
>>>> ++ ISCSIADM=iscsiadm
>>>> ++ LVCREATE=lvcreate
>>>> ++ LVREMOVE=lvremove
>>>> ++ LVRENAME=lvrename
>>>> ++ LVS=lvs
>>>> ++ LN=ln
>>>> ++ MD5SUM=md5sum
>>>> ++ MKFS=mkfs
>>>> ++ MKISOFS=genisoimage
>>>> ++ MKSWAP=mkswap
>>>> ++ QEMU_IMG=qemu-img
>>>> ++ RADOS=rados
>>>> ++ RBD=rbd
>>>> ++ READLINK=readlink
>>>> ++ RM=rm
>>>> ++ SCP=scp
>>>> ++ SED=sed
>>>> ++ SSH=ssh
>>>> ++ SUDO=sudo
>>>> ++ SYNC=sync
>>>> ++ TAR=tar
>>>> ++ TGTADM=tgtadm
>>>> ++ TGTADMIN=tgt-admin
>>>> ++ TGTSETUPLUN=tgt-setup-lun-one
>>>> ++ TR=tr
>>>> ++ VGDISPLAY=vgdisplay
>>>> ++ VMKFSTOOLS=vmkfstools
>>>> ++ WGET=wget
>>>> +++ uname -s
>>>> ++ '[' xLinux = xLinux ']'
>>>> ++ SED='sed -r'
>>>> +++ basename ./run_probes
>>>> ++ SCRIPT_NAME=run_probes
>>>> + export LANG=C
>>>> + LANG=C
>>>> + HYPERVISOR_DIR=kvm-probes.d
>>>> + ARGUMENTS=kvm-probes
>>>> ++ dirname ./run_probes
>>>> + SCRIPTS_DIR=.
>>>> + cd .
>>>> ++ '[' -d kvm-probes.d ']'
>>>> ++ run_dir kvm-probes.d
>>>> ++ cd kvm-probes.d
>>>> +++ ls architecture.sh collectd-client-shepherd.sh cpu.sh kvm.rb
>>>> monitor_ds.sh name.sh poll.sh version.sh
>>>> ++ for i in '`ls *`'
>>>> ++ '[' -x architecture.sh ']'
>>>> ++ ./architecture.sh kvm-probes
>>>> ++ EXIT_CODE=0
>>>> ++ '[' x0 '!=' x0 ']'
>>>> ++ for i in '`ls *`'
>>>> ++ '[' -x collectd-client-shepherd.sh ']'
>>>> ++ ./collectd-client-shepherd.sh kvm-probes
>>>> ++ EXIT_CODE=0
>>>> ++ '[' x0 '!=' x0 ']'
>>>> ++ for i in '`ls *`'
>>>> ++ '[' -x cpu.sh ']'
>>>> ++ ./cpu.sh kvm-probes
>>>> ++ EXIT_CODE=0
>>>> ++ '[' x0 '!=' x0 ']'
>>>> ++ for i in '`ls *`'
>>>> ++ '[' -x kvm.rb ']'
>>>> ++ ./kvm.rb kvm-probes
>>>> ++ EXIT_CODE=0
>>>> ++ '[' x0 '!=' x0 ']'
>>>> ++ for i in '`ls *`'
>>>> ++ '[' -x monitor_ds.sh ']'
>>>> ++ ./monitor_ds.sh kvm-probes
>>>> [sudo] password for oneadmin:
>>>>
>>>> and it stays hung on the password for oneadmin.
>>>>
>>>> What's going on?
>>>>
>>>> Also, you mentioned a collectd--are you saying that OpenNebula
>>>> 4.6 now needs to run a daemon on every single VM host?
>>>> Where is it documented
>>>> on how to set it up?
>>>>
>>>> Steve
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Make sure you do not have a host using the same hostname
>>>> fgtest14 and running a collectd process
>>>>
>>>> On Jul 29, 2014 4:35 PM, "Steven Timm" <timm at fnal.gov>
>>>> wrote:
>>>>
>>>> I am still trying to debug a nasty monitoring
>>>> inconsistency.
>>>>
>>>> -bash-4.1$ onevm list | grep fgtest14
>>>> 26 oneadmin oneadmin fgt6x4-26 runn 6
>>>> 4G fgtest14 117d 19h50
>>>> 27 oneadmin oneadmin fgt5x4-27 runn 10
>>>> 4G fgtest14 117d 17h57
>>>> 28 oneadmin oneadmin fgt1x1-28 runn 10
>>>> 4.1G fgtest14 117d 16h59
>>>> 30 oneadmin oneadmin fgt5x1-30 runn 0
>>>> 4G fgtest14 116d 23h50
>>>> 33 oneadmin oneadmin ip6sl5vda-33 runn 6
>>>> 4G fgtest14 116d 19h57
>>>> -bash-4.1$ onehost list
>>>> ID NAME CLUSTER RVM ALLOCATED_CPU
>>>> ALLOCATED_MEM STAT
>>>> 3 fgtest11 ipv6 0 0 / 400 (0%)
>>>> 0K / 15.7G (0%) on
>>>> 4 fgtest12 ipv6 0 0 / 400 (0%)
>>>> 0K / 15.7G (0%) on
>>>> 7 fgtest13 ipv6 0 0 / 800 (0%)
>>>> 0K / 23.6G (0%) on
>>>> 8 fgtest14 ipv6 5 0 / 800 (0%)
>>>> 0K / 23.6G (0%) on
>>>> 9 fgtest20 ipv6 3 300 / 800 (37%)
>>>> 12G / 31.4G (38%) on
>>>> 11 fgtest19 ipv6 0 0 / 800 (0%)
>>>> 0K / 31.5G (0%) on
>>>> -bash-4.1$ onehost show 8
>>>> HOST 8 INFORMATION
>>>> ID : 8
>>>> NAME : fgtest14
>>>> CLUSTER : ipv6
>>>> STATE : MONITORED
>>>> IM_MAD : kvm
>>>> VM_MAD : kvm
>>>> VN_MAD : dummy
>>>> LAST MONITORING TIME : 07/29 09:25:45
>>>>
>>>> HOST SHARES
>>>> TOTAL MEM : 23.6G
>>>> USED MEM (REAL) : 876.4M
>>>> USED MEM (ALLOCATED) : 0K
>>>> TOTAL CPU : 800
>>>> USED CPU (REAL) : 0
>>>> USED CPU (ALLOCATED) : 0
>>>> RUNNING VMS : 5
>>>>
>>>> LOCAL SYSTEM DATASTORE #102 CAPACITY
>>>> TOTAL: : 548.8G
>>>> USED: : 175.3G
>>>> FREE: : 345.6G
>>>>
>>>> MONITORING INFORMATION
>>>> ARCH="x86_64"
>>>> CPUSPEED="2992"
>>>> HOSTNAME="fgtest14.fnal.gov"
>>>> HYPERVISOR="kvm"
>>>> MODELNAME="Intel(R) Xeon(R) CPU E5450 @
>>>> 3.00GHz"
>>>> NETRX="234844577"
>>>> NETTX="21553126"
>>>> RESERVED_CPU=""
>>>> RESERVED_MEM=""
>>>> VERSION="4.6.0"
>>>>
>>>> VIRTUAL MACHINES
>>>>
>>>> ID USER GROUP NAME STAT UCPU
>>>> UMEM HOST TIME
>>>> 26 oneadmin oneadmin fgt6x4-26 runn 6
>>>> 4G fgtest14 117d 19h50
>>>> 27 oneadmin oneadmin fgt5x4-27 runn 10
>>>> 4G fgtest14 117d 17h57
>>>> 28 oneadmin oneadmin fgt1x1-28 runn 10
>>>> 4.1G fgtest14 117d 17h00
>>>> 30 oneadmin oneadmin fgt5x1-30 runn 0
>>>> 4G fgtest14 116d 23h50
>>>> 33 oneadmin oneadmin ip6sl5vda-33 runn 6
>>>> 4G fgtest14 116d 19h57
>>>> ------------------------------
>>>> -----------------------------------------------------
>>>>
>>>> All of this looks great, right?
>>>> Just one problem: There are no VM's running on
>>>> fgtest14 and
>>>> haven't been for 4 days.
>>>>
>>>> [root at fgtest14 ~]# virsh list
>>>> Id Name State
>>>> ----------------------------------------------------
>>>>
>>>> [root at fgtest14 ~]#
>>>>
>>>> ------------------------------
>>>> -------------------------------------------
>>>> Yet the monitoring reports no errors.
>>>>
>>>> Tue Jul 29 09:28:10 2014 [InM][D]: Host fgtest14 (8)
>>>> successfully monitored.
>>>>
>>>> ------------------------------
>>>> -----------------------------------------------
>>>> At the same time, there is no evidence that ONE is
>>>> actually trying to or
>>>> succeeding to monitor these five vm's yet they are
>>>> still stuck in "runn"
>>>> which means I can't do a onevm restart to restart
>>>> them.
>>>> (the vm images of these 5 vm's are still out there on
>>>> the VM host and
>>>> I would like to save and restart them if I can).
>>>>
>>>> What is the remotes command that ONE4.6 would use to
>>>> monitor this host?
>>>> Can I do it manually and see what output I get?
>>>>
>>>> Are we dealing with some kind of a bug, or just a
>>>> very confused system?
>>>> Any help is appreciated. I have to get this sorted
>>>> out before
>>>> I dare deploy one4.x in production.
>>>>
>>>> Steve Timm
>>>>
>>>>
>>>> ------------------------------
>>>> ------------------------------------
>>>> Steven C. Timm, Ph.D (630) 840-8525
>>>> timm at fnal.gov http://home.fnal.gov/~timm/
>>>> Fermilab Scientific Computing Division, Scientific
>>>> Computing Services Quad.
>>>> Grid and Cloud Services Dept., Associate Dept. Head
>>>> for Cloud Computing
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at lists.opennebula.org
>>>> http://lists.opennebula.org/
>>>> listinfo.cgi/users-opennebula.org
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------
>>>> ------
>>>> Steven C. Timm, Ph.D (630) 840-8525
>>>> timm at fnal.gov http://home.fnal.gov/~timm/
>>>> Fermilab Scientific Computing Division, Scientific Computing
>>>> Services Quad.
>>>> Grid and Cloud Services Dept., Associate Dept. Head for Cloud
>>>> Computing
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> Ruben S. Montero, PhD
>>>> Project co-Lead and Chief Architect OpenNebula - Flexible Enterprise
>>>> Cloud Made Simple
>>>> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>>>>
>>>>
>>>>
>>> ------------------------------------------------------------------
>>> Steven C. Timm, Ph.D (630) 840-8525
>>> timm at fnal.gov http://home.fnal.gov/~timm/
>>> Fermilab Scientific Computing Division, Scientific Computing Services
>>> Quad.
>>> Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing
>>>
>>
>>
>>
>> --
>> --
>> Ruben S. Montero, PhD
>> Project co-Lead and Chief Architect
>> OpenNebula - Flexible Enterprise Cloud Made Simple
>> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>>
>
>
>
> --
> --
> Ruben S. Montero, PhD
> Project co-Lead and Chief Architect
> OpenNebula - Flexible Enterprise Cloud Made Simple
> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>
--
--
Ruben S. Montero, PhD
Project co-Lead and Chief Architect
OpenNebula - Flexible Enterprise Cloud Made Simple
www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20140730/8c6a02d9/attachment-0001.htm>
More information about the Users
mailing list