[one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN
Javier Fontan
jfontan at opennebula.org
Mon Jan 20 07:15:20 PST 2014
The problem seems to be the high amount of collectd processes running.
Try killing all "collectd-client.rb" processes. There should be only
one running per host.
In case you want to use the old method of monitoring you can follow this guide:
http://docs.opennebula.org/stable/administration/monitoring/imsshpullg.html#imsshpullg
On Mon, Jan 20, 2014 at 2:17 PM, Gerry O'Brien <gerry at scss.tcd.ie> wrote:
> Hi Ruben,
>
> Below is the output of 'ps -ef | grep one' on a host that has been
> disabled, rebooted and enabled. There are multiple versions of
> collectd-client.rb kvm running.
>
>
> We have discovered today a serious issue that is having an adverse
> effect on our DNS system. When the machines below was enabled, immediately
> our DNS server is flooded with requests from the host (see a sample below).
> Our logs show that this has only started happening since the upgrade to
> 4.4. If we don't get a fix for this we will have to go back to 4.2, which is
> something I really don't want to do.
>
> Regards,
> Gerry
>
>
>
>
> oneadmin 3628 1 0 13:04 ? 00:00:00 ruby
> /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
> 20 0 host101.scss.tcd.ie
> oneadmin 4600 1 0 13:05 ? 00:00:00 ruby
> /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
> 20 0 host101.scss.tcd.ie
> oneadmin 6400 1 0 13:07 ? 00:00:00 ruby
> /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
> 20 0 host101.scss.tcd.ie
> oneadmin 9003 1 0 13:08 ? 00:00:00 ruby
> /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
> 20 0 host101.scss.tcd.ie
> oneadmin 12953 3628 0 13:10 ? 00:00:00 /bin/bash
> /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
> 20 0 host101.scss.tcd.ie
> oneadmin 12955 6400 0 13:10 ? 00:00:00 /bin/bash
> /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
> 20 0 host101.scss.tcd.ie
> oneadmin 12969 12953 0 13:10 ? 00:00:00 /bin/bash
> /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
> 20 0 host101.scss.tcd.ie
> oneadmin 12970 12969 0 13:10 ? 00:00:00 /bin/bash
> /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
> 20 0 host101.scss.tcd.ie
> oneadmin 12972 12955 0 13:10 ? 00:00:00 /bin/bash
> /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
> 20 0 host101.scss.tcd.ie
> oneadmin 12973 12972 0 13:10 ? 00:00:00 /bin/bash
> /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
> 20 0 host101.scss.tcd.ie
> oneadmin 13029 12973 0 13:10 ? 00:00:00 /bin/bash ./monitor_ds.sh
> kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie
> oneadmin 13030 12970 0 13:10 ? 00:00:00 /bin/bash ./monitor_ds.sh
> kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie
>
>
>
> -2014 13:14:26.675 client 134.226.59.101#52314: query: host101.scss.tcd.ie
> IN AAAA + (134.226.32.57)
> 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query:
> host101.scss.tcd.ie IN A + (134.226.32.57)
> 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query:
> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
> 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query:
> host101.scss.tcd.ie IN A + (134.226.32.57)
> 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query:
> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
> 20-Jan-2014 13:14:26.824 client 134.226.59.101#58734: query:
> host101.scss.tcd.ie IN A + (134.226.32.57)
> 20-Jan-2014 13:14:26.825 client 134.226.59.101#58734: query:
> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
> 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query:
> host101.scss.tcd.ie IN A + (134.226.32.57)
> 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query:
> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
> 20-Jan-2014 13:14:26.952 client 134.226.59.101#53975: query:
> host101.scss.tcd.ie IN A + (134.226.32.57)
> 20-Jan-2014 13:14:26.953 client 134.226.59.101#53975: query:
> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
> 20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query:
> host101.scss.tcd.ie IN A + (134.226.32.57)
> 20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query:
> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
> 20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query:
> host101.scss.tcd.ie IN A + (134.226.32.57)
> 20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query:
> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
> 20-Jan-2014 13:14:27.347 client 134.226.59.101#49614: query:
> host101.scss.tcd.ie IN A + (134.226.32.57)
> 20-Jan-2014 13:14:27.348 client 134.226.59.101#49614: query:
> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
> 20-Jan-2014 13:14:27.350 client 134.226.59.101#44058: query:
> host101.scss.tcd.ie IN A + (134.226.32.57)
> 20-Jan-2014 13:14:27.357 client 134.226.59.101#44058: query:
> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
> 20-Jan-2014 13:14:27.458 client 134.226.59.101#51830: query:
> host101.scss.tcd.ie IN A + (134.226.32.57)
> 20-Jan-2014 13:14:27.458 client 134.226.59.101#51830: query:
> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
> 20-Jan-2014 13:14:27.461 client 134.226.59.101#38419: query:
> host101.scss.tcd.ie IN A + (134.226.32.57)
> 20-Jan-2014 13:14:27.461 client 134.226.59.101#38419: query:
> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
> 20-Jan-2014 13:14:31.184 client 134.226.59.101#38617: query:
> host101.scss.tcd.ie IN A + (134.226.32.57)
> 20-Jan-2014 13:14:31.184 client 134.226.59.101#38617: query:
> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
> 20-Jan-2014 13:14:31.302 client 134.226
>
>
>
>
>
>
>
> On 17/01/2014 17:45, Ruben S. Montero wrote:
>>
>> Hi Gerry
>>
>> Just to check, are you using 4.4 Final? We've seen this in the betas and
>> "thought" we fixed for the final version. Also could you check that there
>> are just one monitorization process at the hosts (collectd-client.sh, or
>> equiv should be the name of the process)
>>
>> Also could you send us the lines from oned.log between Thu Jan 16 16:56:25
>> 2014 and Thu Jan 16 17:25:43 2014; plus the first lines that includes you
>> oned.conf values (we are interested specially in those related to
>> monitoring interval)
>>
>>
>> Cheers
>>
>> Ruben
>>
>>
>>
>>
>> On Fri, Jan 17, 2014 at 2:27 PM, Gerry O'Brien <gerry at scss.tcd.ie> wrote:
>>
>>> Hi,
>>>
>>> Below is a truncated log file for a VM. The monitor continually
>>> cycles
>>> through finding the machine RUNNING and stat UNKNOWN. This occurs for
>>> many
>>> many machines at the same time. All machines were created by a script.
>>>
>>> The VMs are Microsoft Windows 7 64bit Enterprise. Individual context
>>> is created by a startup script. They run fine but eventually /var/log/one
>>> is going overflow.
>>>
>>> Restarting oned seems to fix the problem but this is hardly a long
>>> term solution.
>>>
>>> Any suggestions on what could be causing this?
>>>
>>> Regards,
>>> Gerry
>>>
>>>
>>>
>>>
>>> Thu Jan 16 16:56:21 2014 [DiM][I]: New VM state is ACTIVE.
>>> Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is PROLOG.
>>> Thu Jan 16 16:56:22 2014 [VM][I]: Virtual Machine has no context
>>> Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is BOOT
>>> Thu Jan 16 16:56:22 2014 [VMM][I]: Generating deployment file:
>>> /var/lib/one/vms/1788/deployment.0
>>> Thu Jan 16 16:56:23 2014 [VMM][I]: ExitCode: 0
>>> Thu Jan 16 16:56:23 2014 [VMM][I]: Successfully execute network driver
>>> operation: pre.
>>> Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0
>>> Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute virtualization
>>> driver operation: deploy.
>>> Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0
>>> Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute network driver
>>> operation: post.
>>> Thu Jan 16 16:56:25 2014 [LCM][I]: New VM state is RUNNING
>>> Thu Jan 16 16:56:51 2014 [LCM][I]: New VM state is UNKNOWN
>>> Thu Jan 16 16:59:01 2014 [VMM][I]: VM found again, state is RUNNING
>>> Thu Jan 16 16:59:23 2014 [LCM][I]: New VM state is UNKNOWN
>>> Thu Jan 16 17:01:41 2014 [VMM][I]: VM found again, state is RUNNING
>>> Thu Jan 16 17:01:58 2014 [LCM][I]: New VM state is UNKNOWN
>>> Thu Jan 16 17:04:18 2014 [VMM][I]: VM found again, state is RUNNING
>>> Thu Jan 16 17:04:39 2014 [LCM][I]: New VM state is UNKNOWN
>>> Thu Jan 16 17:06:55 2014 [VMM][I]: VM found again, state is RUNNING
>>> Thu Jan 16 17:07:06 2014 [LCM][I]: New VM state is UNKNOWN
>>> Thu Jan 16 17:09:31 2014 [VMM][I]: VM found again, state is RUNNING
>>> Thu Jan 16 17:09:31 2014 [LCM][I]: New VM state is UNKNOWN
>>> Thu Jan 16 17:12:22 2014 [VMM][I]: VM found again, state is RUNNING
>>> Thu Jan 16 17:12:27 2014 [LCM][I]: New VM state is UNKNOWN
>>> Thu Jan 16 17:15:11 2014 [VMM][I]: VM found again, state is RUNNING
>>> Thu Jan 16 17:15:22 2014 [LCM][I]: New VM state is UNKNOWN
>>> Thu Jan 16 17:17:49 2014 [VMM][I]: VM found again, state is RUNNING
>>> Thu Jan 16 17:18:00 2014 [LCM][I]: New VM state is UNKNOWN
>>> Thu Jan 16 17:20:27 2014 [VMM][I]: VM found again, state is RUNNING
>>> Thu Jan 16 17:20:34 2014 [LCM][I]: New VM state is UNKNOWN
>>> Thu Jan 16 17:23:04 2014 [VMM][I]: VM found again, state is RUNNING
>>> Thu Jan 16 17:23:08 2014 [LCM][I]: New VM state is UNKNOWN
>>> Thu Jan 16 17:25:41 2014 [VMM][I]: VM found again, state is RUNNING
>>> Thu Jan 16 17:25:43 2014 [LCM][I]: New VM state is UNKNOWN
>>>
>>> --
>>> Gerry O'Brien
>>>
>>> Systems Manager
>>> School of Computer Science and Statistics
>>> Trinity College Dublin
>>> Dublin 2
>>> IRELAND
>>>
>>> 00 353 1 896 1341
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opennebula.org
>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>
>>
>>
>
>
> --
> Gerry O'Brien
>
> Systems Manager
> School of Computer Science and Statistics
> Trinity College Dublin
> Dublin 2
> IRELAND
>
> 00 353 1 896 1341
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
--
Javier Fontán Muiños
Developer
OpenNebula - The Open Source Toolkit for Data Center Virtualization
www.OpenNebula.org | @OpenNebula | github.com/jfontan
More information about the Users
mailing list