[one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

Gerry O'Brien gerry at scss.tcd.ie
Mon Jan 20 05:17:07 PST 2014


Hi Ruben,

     Below is the output of 'ps -ef | grep one' on a host that has been 
disabled, rebooted and enabled. There are multiple versions of  
collectd-client.rb kvm running.


     We have discovered today a serious issue that is having an adverse 
effect on our DNS system. When the machines below was enabled, 
immediately our DNS server is flooded with requests from the host (see a 
sample below).
      Our logs show that this has only started happening since the 
upgrade to 4.4. If we don't get a fix for this we will have to go back 
to 4.2, which is something I really don't want to do.

         Regards,
             Gerry




oneadmin  3628     1  0 13:04 ?        00:00:00 ruby 
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin  4600     1  0 13:05 ?        00:00:00 ruby 
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin  6400     1  0 13:07 ?        00:00:00 ruby 
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin  9003     1  0 13:08 ?        00:00:00 ruby 
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin 12953  3628  0 13:10 ?        00:00:00 /bin/bash 
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin 12955  6400  0 13:10 ?        00:00:00 /bin/bash 
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin 12969 12953  0 13:10 ?        00:00:00 /bin/bash 
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin 12970 12969  0 13:10 ?        00:00:00 /bin/bash 
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin 12972 12955  0 13:10 ?        00:00:00 /bin/bash 
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin 12973 12972  0 13:10 ?        00:00:00 /bin/bash 
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin 13029 12973  0 13:10 ?        00:00:00 /bin/bash 
./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 20 0 
host101.scss.tcd.ie
oneadmin 13030 12970  0 13:10 ?        00:00:00 /bin/bash 
./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 20 0 
host101.scss.tcd.ie



-2014 13:14:26.675 client 134.226.59.101#52314: query: 
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query: 
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query: 
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:26.824 client 134.226.59.101#58734: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.825 client 134.226.59.101#58734: query: 
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query: 
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:26.952 client 134.226.59.101#53975: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.953 client 134.226.59.101#53975: query: 
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query: 
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query: 
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:27.347 client 134.226.59.101#49614: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.348 client 134.226.59.101#49614: query: 
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:27.350 client 134.226.59.101#44058: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.357 client 134.226.59.101#44058: query: 
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:27.458 client 134.226.59.101#51830: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.458 client 134.226.59.101#51830: query: 
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:27.461 client 134.226.59.101#38419: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.461 client 134.226.59.101#38419: query: 
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:31.184 client 134.226.59.101#38617: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:31.184 client 134.226.59.101#38617: query: 
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:31.302 client 134.226






On 17/01/2014 17:45, Ruben S. Montero wrote:
> Hi Gerry
>
> Just to check, are you using 4.4 Final? We've seen this in the betas and
> "thought" we fixed for the final version. Also could you check that there
> are just one monitorization process at the hosts (collectd-client.sh, or
> equiv should be the name of the process)
>
> Also could you send us the lines from oned.log between Thu Jan 16 16:56:25
> 2014 and Thu Jan 16 17:25:43 2014; plus the first lines that includes you
> oned.conf values (we are interested specially in those related to
> monitoring interval)
>
>
> Cheers
>
> Ruben
>
>
>
>
> On Fri, Jan 17, 2014 at 2:27 PM, Gerry O'Brien <gerry at scss.tcd.ie> wrote:
>
>> Hi,
>>
>>      Below is a truncated log file for a VM. The monitor continually cycles
>> through finding the machine RUNNING and stat UNKNOWN. This occurs for many
>> many machines at the same time. All machines were created by a script.
>>
>>      The VMs are Microsoft Windows 7 64bit Enterprise. Individual context
>> is created by a startup script. They run fine but eventually /var/log/one
>> is going overflow.
>>
>>      Restarting oned seems to fix the problem but this is hardly a long
>> term solution.
>>
>>      Any suggestions on what could be causing this?
>>
>>          Regards,
>>              Gerry
>>
>>
>>
>>
>> Thu Jan 16 16:56:21 2014 [DiM][I]: New VM state is ACTIVE.
>> Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is PROLOG.
>> Thu Jan 16 16:56:22 2014 [VM][I]: Virtual Machine has no context
>> Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is BOOT
>> Thu Jan 16 16:56:22 2014 [VMM][I]: Generating deployment file:
>> /var/lib/one/vms/1788/deployment.0
>> Thu Jan 16 16:56:23 2014 [VMM][I]: ExitCode: 0
>> Thu Jan 16 16:56:23 2014 [VMM][I]: Successfully execute network driver
>> operation: pre.
>> Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0
>> Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute virtualization
>> driver operation: deploy.
>> Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0
>> Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute network driver
>> operation: post.
>> Thu Jan 16 16:56:25 2014 [LCM][I]: New VM state is RUNNING
>> Thu Jan 16 16:56:51 2014 [LCM][I]: New VM state is UNKNOWN
>> Thu Jan 16 16:59:01 2014 [VMM][I]: VM found again, state is RUNNING
>> Thu Jan 16 16:59:23 2014 [LCM][I]: New VM state is UNKNOWN
>> Thu Jan 16 17:01:41 2014 [VMM][I]: VM found again, state is RUNNING
>> Thu Jan 16 17:01:58 2014 [LCM][I]: New VM state is UNKNOWN
>> Thu Jan 16 17:04:18 2014 [VMM][I]: VM found again, state is RUNNING
>> Thu Jan 16 17:04:39 2014 [LCM][I]: New VM state is UNKNOWN
>> Thu Jan 16 17:06:55 2014 [VMM][I]: VM found again, state is RUNNING
>> Thu Jan 16 17:07:06 2014 [LCM][I]: New VM state is UNKNOWN
>> Thu Jan 16 17:09:31 2014 [VMM][I]: VM found again, state is RUNNING
>> Thu Jan 16 17:09:31 2014 [LCM][I]: New VM state is UNKNOWN
>> Thu Jan 16 17:12:22 2014 [VMM][I]: VM found again, state is RUNNING
>> Thu Jan 16 17:12:27 2014 [LCM][I]: New VM state is UNKNOWN
>> Thu Jan 16 17:15:11 2014 [VMM][I]: VM found again, state is RUNNING
>> Thu Jan 16 17:15:22 2014 [LCM][I]: New VM state is UNKNOWN
>> Thu Jan 16 17:17:49 2014 [VMM][I]: VM found again, state is RUNNING
>> Thu Jan 16 17:18:00 2014 [LCM][I]: New VM state is UNKNOWN
>> Thu Jan 16 17:20:27 2014 [VMM][I]: VM found again, state is RUNNING
>> Thu Jan 16 17:20:34 2014 [LCM][I]: New VM state is UNKNOWN
>> Thu Jan 16 17:23:04 2014 [VMM][I]: VM found again, state is RUNNING
>> Thu Jan 16 17:23:08 2014 [LCM][I]: New VM state is UNKNOWN
>> Thu Jan 16 17:25:41 2014 [VMM][I]: VM found again, state is RUNNING
>> Thu Jan 16 17:25:43 2014 [LCM][I]: New VM state is UNKNOWN
>>
>> --
>> Gerry O'Brien
>>
>> Systems Manager
>> School of Computer Science and Statistics
>> Trinity College Dublin
>> Dublin 2
>> IRELAND
>>
>> 00 353 1 896 1341
>>
>> _______________________________________________
>> Users mailing list
>> Users at lists.opennebula.org
>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>
>
>


-- 
Gerry O'Brien

Systems Manager
School of Computer Science and Statistics
Trinity College Dublin
Dublin 2
IRELAND

00 353 1 896 1341




More information about the Users mailing list