[one-users] What remotes commands does one 4.6 use:

Wed Jul 30 07:53:06 PDT 2014

BTW, If you are not using the LVM Datastore, just replace LVM_SIZE_CMD with

LVM_SIZE_CMD=""

We are looking for a better method to handle this

http://dev.opennebula.org/issues/2912

Note that this is not a change because of the monitoring system but by the
need of monitoring the Datastore Size.

Also, any change made in monitor_ds.sh can be propagated with onehost sync
and versioned with VERSION attribute:

http://docs.opennebula.org/4.6/administration/hosts_and_clusters/host_guide.html#sync

On Wed, Jul 30, 2014 at 4:45 PM, Ruben S. Montero <rsmontero at opennebula.org>
wrote:

> Hi,
>
> 1.- monitor_ds.sh may use LVM commands (vgdisplay) that needs sudo access.
> It should be automatically setup by the opennebula node packages.
>
> 2.- It is not a real daemon, the first time a host is monitored a process
> is left to periodically send information. OpenNebula restarts it if no
> information is received in 3 monitor steps. Nothing needs to be set up...
>
> Cheers
>
>
> On Wed, Jul 30, 2014 at 3:50 PM, Steven Timm <timm at fnal.gov> wrote:
>
>> On Wed, 30 Jul 2014, Ruben S. Montero wrote:
>>
>>
>>> Maybe you could try to execute the  monitor probes in the node,
>>>
>>> 1. ssh the node
>>> 2. Go to /var/tmp/one/im
>>> 3. Execute run_probes kvm-probes
>>>
>>
>> When I do that, (using sh -x ) I get the following:
>>
>> -bash-4.1$ sh -x ./run_probes kvm-probes
>> ++ dirname ./run_probes
>> + source ./../scripts_common.sh
>> ++ export LANG=C
>> ++ LANG=C
>> ++ export PATH=/bin:/sbin:/usr/bin:/usr/krb5/bin:/usr/lib64/qt-3.3/
>> bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
>> ++ PATH=/bin:/sbin:/usr/bin:/usr/krb5/bin:/usr/lib64/qt-3.3/
>> bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
>> ++ AWK=awk
>> ++ BASH=bash
>> ++ CUT=cut
>> ++ DATE=date
>> ++ DD=dd
>> ++ DF=df
>> ++ DU=du
>> ++ GREP=grep
>> ++ ISCSIADM=iscsiadm
>> ++ LVCREATE=lvcreate
>> ++ LVREMOVE=lvremove
>> ++ LVRENAME=lvrename
>> ++ LVS=lvs
>> ++ LN=ln
>> ++ MD5SUM=md5sum
>> ++ MKFS=mkfs
>> ++ MKISOFS=genisoimage
>> ++ MKSWAP=mkswap
>> ++ QEMU_IMG=qemu-img
>> ++ RADOS=rados
>> ++ RBD=rbd
>> ++ READLINK=readlink
>> ++ RM=rm
>> ++ SCP=scp
>> ++ SED=sed
>> ++ SSH=ssh
>> ++ SUDO=sudo
>> ++ SYNC=sync
>> ++ TAR=tar
>> ++ TGTADM=tgtadm
>> ++ TGTADMIN=tgt-admin
>> ++ TGTSETUPLUN=tgt-setup-lun-one
>> ++ TR=tr
>> ++ VGDISPLAY=vgdisplay
>> ++ VMKFSTOOLS=vmkfstools
>> ++ WGET=wget
>> +++ uname -s
>> ++ '[' xLinux = xLinux ']'
>> ++ SED='sed -r'
>> +++ basename ./run_probes
>> ++ SCRIPT_NAME=run_probes
>> + export LANG=C
>> + LANG=C
>> + HYPERVISOR_DIR=kvm-probes.d
>> + ARGUMENTS=kvm-probes
>> ++ dirname ./run_probes
>> + SCRIPTS_DIR=.
>> + cd .
>> ++ '[' -d kvm-probes.d ']'
>> ++ run_dir kvm-probes.d
>> ++ cd kvm-probes.d
>> +++ ls architecture.sh collectd-client-shepherd.sh cpu.sh kvm.rb
>> monitor_ds.sh name.sh poll.sh version.sh
>> ++ for i in '`ls *`'
>> ++ '[' -x architecture.sh ']'
>> ++ ./architecture.sh kvm-probes
>> ++ EXIT_CODE=0
>> ++ '[' x0 '!=' x0 ']'
>> ++ for i in '`ls *`'
>> ++ '[' -x collectd-client-shepherd.sh ']'
>> ++ ./collectd-client-shepherd.sh kvm-probes
>> ++ EXIT_CODE=0
>> ++ '[' x0 '!=' x0 ']'
>> ++ for i in '`ls *`'
>> ++ '[' -x cpu.sh ']'
>> ++ ./cpu.sh kvm-probes
>> ++ EXIT_CODE=0
>> ++ '[' x0 '!=' x0 ']'
>> ++ for i in '`ls *`'
>> ++ '[' -x kvm.rb ']'
>> ++ ./kvm.rb kvm-probes
>> ++ EXIT_CODE=0
>> ++ '[' x0 '!=' x0 ']'
>> ++ for i in '`ls *`'
>> ++ '[' -x monitor_ds.sh ']'
>> ++ ./monitor_ds.sh kvm-probes
>> [sudo] password for oneadmin:
>>
>> and it stays hung on the password for oneadmin.
>>
>> What's going on?
>>
>> Also, you mentioned a collectd--are you saying that OpenNebula 4.6 now
>> needs to run a daemon on every single VM host?  Where is it documented
>> on how to set it up?
>>
>> Steve
>>
>>
>>
>>
>>
>>
>>
>>> Make sure you do not have a host using the same hostname fgtest14 and
>>> running a  collectd process
>>>
>>> On Jul 29, 2014 4:35 PM, "Steven Timm" <timm at fnal.gov> wrote:
>>>
>>>       I am still trying to debug a nasty monitoring inconsistency.
>>>
>>>       -bash-4.1$ onevm list | grep fgtest14
>>>           26 oneadmin oneadmin fgt6x4-26       runn    6      4G
>>> fgtest14   117d 19h50
>>>           27 oneadmin oneadmin fgt5x4-27       runn   10      4G
>>> fgtest14   117d 17h57
>>>           28 oneadmin oneadmin fgt1x1-28       runn   10    4.1G
>>> fgtest14   117d 16h59
>>>           30 oneadmin oneadmin fgt5x1-30       runn    0      4G
>>> fgtest14   116d 23h50
>>>           33 oneadmin oneadmin ip6sl5vda-33    runn    6      4G
>>> fgtest14   116d 19h57
>>>       -bash-4.1$ onehost list
>>>         ID NAME            CLUSTER   RVM      ALLOCATED_CPU
>>>  ALLOCATED_MEM STAT
>>>          3 fgtest11        ipv6        0       0 / 400 (0%)    0K /
>>> 15.7G (0%) on
>>>          4 fgtest12        ipv6        0       0 / 400 (0%)    0K /
>>> 15.7G (0%) on
>>>          7 fgtest13        ipv6        0       0 / 800 (0%)    0K /
>>> 23.6G (0%) on
>>>          8 fgtest14        ipv6        5       0 / 800 (0%)    0K /
>>> 23.6G (0%) on
>>>          9 fgtest20        ipv6        3    300 / 800 (37%)  12G / 31.4G
>>> (38%) on
>>>         11 fgtest19        ipv6        0       0 / 800 (0%)    0K /
>>> 31.5G (0%) on
>>>       -bash-4.1$ onehost show 8
>>>       HOST 8 INFORMATION
>>>       ID                    : 8
>>>       NAME                  : fgtest14
>>>       CLUSTER               : ipv6
>>>       STATE                 : MONITORED
>>>       IM_MAD                : kvm
>>>       VM_MAD                : kvm
>>>       VN_MAD                : dummy
>>>       LAST MONITORING TIME  : 07/29 09:25:45
>>>
>>>       HOST SHARES
>>>       TOTAL MEM             : 23.6G
>>>       USED MEM (REAL)       : 876.4M
>>>       USED MEM (ALLOCATED)  : 0K
>>>       TOTAL CPU             : 800
>>>       USED CPU (REAL)       : 0
>>>       USED CPU (ALLOCATED)  : 0
>>>       RUNNING VMS           : 5
>>>
>>>       LOCAL SYSTEM DATASTORE #102 CAPACITY
>>>       TOTAL:                : 548.8G
>>>       USED:                 : 175.3G
>>>       FREE:                 : 345.6G
>>>
>>>       MONITORING INFORMATION
>>>       ARCH="x86_64"
>>>       CPUSPEED="2992"
>>>       HOSTNAME="fgtest14.fnal.gov"
>>>       HYPERVISOR="kvm"
>>>       MODELNAME="Intel(R) Xeon(R) CPU           E5450  @ 3.00GHz"
>>>       NETRX="234844577"
>>>       NETTX="21553126"
>>>       RESERVED_CPU=""
>>>       RESERVED_MEM=""
>>>       VERSION="4.6.0"
>>>
>>>       VIRTUAL MACHINES
>>>
>>>           ID USER     GROUP    NAME            STAT UCPU    UMEM HOST
>>> TIME
>>>           26 oneadmin oneadmin fgt6x4-26       runn    6      4G
>>> fgtest14   117d 19h50
>>>           27 oneadmin oneadmin fgt5x4-27       runn   10      4G
>>> fgtest14   117d 17h57
>>>           28 oneadmin oneadmin fgt1x1-28       runn   10    4.1G
>>> fgtest14   117d 17h00
>>>           30 oneadmin oneadmin fgt5x1-30       runn    0      4G
>>> fgtest14   116d 23h50
>>>           33 oneadmin oneadmin ip6sl5vda-33    runn    6      4G
>>> fgtest14   116d 19h57
>>>       ------------------------------------------------------------
>>> -----------------------
>>>
>>>       All of this looks great, right?
>>>       Just one problem:  There are no VM's running on fgtest14 and
>>>       haven't been for 4 days.
>>>
>>>       [root at fgtest14 ~]# virsh list
>>>        Id    Name                           State
>>>       ----------------------------------------------------
>>>
>>>       [root at fgtest14 ~]#
>>>
>>>       ------------------------------------------------------------
>>> -------------
>>>       Yet the monitoring reports no errors.
>>>
>>>       Tue Jul 29 09:28:10 2014 [InM][D]: Host fgtest14 (8) successfully
>>> monitored.
>>>
>>>       ------------------------------------------------------------
>>> -----------------
>>>       At the same time, there is no evidence that ONE is actually trying
>>> to or
>>>       succeeding to monitor these five vm's yet they are still stuck in
>>> "runn"
>>>       which means I can't do a onevm restart to restart them.
>>>       (the vm images of these 5 vm's are still out there on the VM host
>>> and
>>>       I would like to save and restart them if I can).
>>>
>>>       What is the remotes command that ONE4.6 would use to monitor this
>>> host?
>>>       Can I do it manually and see what output I get?
>>>
>>>       Are we dealing with some kind of a bug, or just a very confused
>>> system?
>>>       Any help is appreciated. I have to get this sorted out before
>>>       I dare deploy one4.x in production.
>>>
>>>       Steve Timm
>>>
>>>
>>>       ------------------------------------------------------------------
>>>       Steven C. Timm, Ph.D  (630) 840-8525
>>>       timm at fnal.gov  http://home.fnal.gov/~timm/
>>>       Fermilab Scientific Computing Division, Scientific Computing
>>> Services Quad.
>>>       Grid and Cloud Services Dept., Associate Dept. Head for Cloud
>>> Computing
>>>       _______________________________________________
>>>       Users mailing list
>>>       Users at lists.opennebula.org
>>>       http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>
>>>
>>>
>>>
>> ------------------------------------------------------------------
>> Steven C. Timm, Ph.D  (630) 840-8525
>> timm at fnal.gov  http://home.fnal.gov/~timm/
>> Fermilab Scientific Computing Division, Scientific Computing Services
>> Quad.
>> Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing
>
>
>
>
> --
> --
> Ruben S. Montero, PhD
> Project co-Lead and Chief Architect
> OpenNebula - Flexible Enterprise Cloud Made Simple
> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>

-- 
-- 
Ruben S. Montero, PhD
Project co-Lead and Chief Architect
OpenNebula - Flexible Enterprise Cloud Made Simple
www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20140730/11ebdd62/attachment.htm>