[one-users] OpenNebula Load Average/CPU Usage
Alberto Zuin - Liste
liste at albertozuin.eu
Thu Jul 5 23:23:22 PDT 2012
Hello Tao,
I had the same problem on my server (see "[one-users] OpenNebula Drain
CPU").
Thanks for your solutions, save me from a very big hardware upgrade!
Use:
date -s "$(LC_ALL=C date)"
on localized servers.
Alberto Zuin
On 06/07/2012 00:24, Tao Craig wrote:
> Hi Marshall,
> I think this could be related to the "leap second bug". Did you notice
> any lag prior to this past weekend?
> If not, try issuing this command on your cloud controller(s): date -s
> "`date`"
> Alternatively, you can reboot. I had pretty much an identical issue as
> you and it appears it was related to the "leap second bug". A reboot
> fixed it for me, but some people have had success with the date command.
>
> ----- Original Message -----
> *From:* Marshall Grillos <mailto:mgrillos at optimalpath.com>
> *To:* users at lists.opennebula.org <mailto:users at lists.opennebula.org>
> *Cc:* Rusty Wolf <mailto:rwolf at optimalpath.com>
> *Sent:* Thursday, July 05, 2012 11:53 AM
> *Subject:* [one-users] OpenNebula Load Average/CPU Usage
>
> My company is running OpenNebula 3.4. It's been running in
> production now for just over a month. Recently we started
> noticing issues in Sunstone. Mainly the VM List wouldn't load and
> several other parts of the GUI would not load. I attempted to
> restart Sunstone and oned and the problem persists.
>
> One item of note is the high CPU utilization of several of
> OpenNebula's processes. Here are the top details from our cloud
> controller (this server also serves up the shared data for our VMs
> via NFS):
> top - 13:47:21 up 48 days, 17:07, 1 user, load average: 7.78,
> 5.09, 2.69
> Tasks: 305 total, 2 running, 303 sleeping, 0 stopped, 0 zombie
> Cpu(s): 0.0%us, 52.2%sy, 6.5%ni, 39.1%id, 0.0%wa, 0.0%hi,
> 2.2%si, 0.0%st
> Mem: 32865312k total, 32622008k used, 243304k free, 65196k
> buffers
> Swap: 16383992k total, 0k used, 16383992k free, 30481224k
> cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 16664 oneadmin 39 19 53652 15m 1548 S 69.8 0.0 9:37.50 ruby
> 16718 oneadmin 39 19 39392 4044 1452 S 67.8 0.0 8:49.46 ruby
> 16697 oneadmin 39 19 39220 3880 1452 S 67.1 0.0 9:04.05 ruby
> 16687 oneadmin 39 19 39388 4040 1452 S 65.1 0.0 9:20.84 ruby
> 16677 oneadmin 39 19 39500 4228 1464 S 54.2 0.0 8:13.68 ruby
> 16708 oneadmin 39 19 43520 6780 1524 S 42.2 0.0 8:09.96 ruby
>
> Here is the process list for oneadmin on the cloud controller:
> oneadmin 16629 0.0 0.0 108288 1900 pts/0 S 13:31 0:00 bash
> oneadmin 16641 0.0 0.0 1461156 12604 pts/0 Sl 13:31 0:00
> /usr/bin/oned -f
> oneadmin 16642 0.0 0.0 192200 5688 pts/0 Sl 13:31 0:00
> /usr/bin/mm_sched
> oneadmin 16664 58.9 0.0 53652 16160 pts/0 SNl 13:31 7:04
> ruby /usr/lib/one/mads/one_vmm_exec.rb -t 15 -r 0 xen
> oneadmin 16677 51.8 0.0 39500 4228 pts/0 SNl 13:31 6:12
> ruby /usr/lib/one/mads/one_im_exec.rb xen
> oneadmin 16687 56.4 0.0 39388 4040 pts/0 SNl 13:31 6:46
> ruby /usr/lib/one/mads/one_tm.rb -t 15 -d
> dummy,lvm,shared,qcow2,ssh,vmware,iscsi
> oneadmin 16697 53.9 0.0 39220 3880 pts/0 SNl 13:31 6:28
> ruby /usr/lib/one/mads/one_hm.rb
> oneadmin 16708 53.1 0.0 43520 6780 pts/0 SNl 13:31 6:21
> ruby /usr/lib/one/mads/one_datastore.rb -t 15 -d fs,vmware,iscsi
> oneadmin 16718 52.5 0.0 39392 4044 pts/0 SNl 13:31 6:17
> ruby /usr/lib/one/mads/one_auth_mad.rb --authn
> ssh,x509,ldap,server_cipher,server_x509
> oneadmin 16997 1.0 0.0 110212 1164 pts/0 R+ 13:43 0:00
> ps -aux
> oneadmin 16998 0.0 0.0 103116 908 pts/0 S+ 13:43 0:00 more
>
> When I stop one, the load average on the server returns to normal
> with over 98% idle CPU. I can't seem to find anything bad in the
> logs:
> Thu Jul 5 13:47:24 2012 [VMM][I]: --Mark--
> Thu Jul 5 13:47:50 2012 [ReM][D]: HostPoolInfo method invoked
> Thu Jul 5 13:47:50 2012 [ReM][D]: VirtualMachinePoolInfo method
> invoked
> Thu Jul 5 13:47:50 2012 [ReM][D]: AclInfo method invoked
> Thu Jul 5 13:48:19 2012 [ReM][D]: HostPoolInfo method invoked
> Thu Jul 5 13:48:19 2012 [ReM][D]: VirtualMachinePoolInfo method
> invoked
> Thu Jul 5 13:48:19 2012 [ReM][D]: AclInfo method invoked
> Thu Jul 5 13:48:48 2012 [ReM][D]: HostPoolInfo method invoked
> Thu Jul 5 13:48:48 2012 [ReM][D]: VirtualMachinePoolInfo method
> invoked
> Thu Jul 5 13:48:48 2012 [ReM][D]: AclInfo method invoked
> Thu Jul 5 13:49:17 2012 [ReM][D]: HostPoolInfo method invoked
> Thu Jul 5 13:49:17 2012 [ReM][D]: VirtualMachinePoolInfo method
> invoked
> Thu Jul 5 13:49:17 2012 [ReM][D]: AclInfo method invoked
> Thu Jul 5 13:49:46 2012 [ReM][D]: HostPoolInfo method invoked
> Thu Jul 5 13:49:46 2012 [ReM][D]: VirtualMachinePoolInfo method
> invoked
> Thu Jul 5 13:49:46 2012 [ReM][D]: AclInfo method invoked
> Thu Jul 5 13:50:15 2012 [ReM][D]: HostPoolInfo method invoked
> Thu Jul 5 13:50:15 2012 [ReM][D]: VirtualMachinePoolInfo method
> invoked
> Thu Jul 5 13:50:15 2012 [ReM][D]: AclInfo method invoked
> Thu Jul 5 13:50:44 2012 [ReM][D]: HostPoolInfo method invoked
> Thu Jul 5 13:50:44 2012 [ReM][D]: VirtualMachinePoolInfo method
> invoked
> Thu Jul 5 13:50:44 2012 [ReM][D]: AclInfo method invoked
> Thu Jul 5 13:51:13 2012 [ReM][D]: HostPoolInfo method invoked
> Thu Jul 5 13:51:13 2012 [ReM][D]: VirtualMachinePoolInfo method
> invoked
> Thu Jul 5 13:51:13 2012 [ReM][D]: AclInfo method invoked
> Thu Jul 5 13:51:28 2012 [VMM][I]: Monitoring VM 134.
> Thu Jul 5 13:51:28 2012 [VMM][I]: Monitoring VM 184.
> Thu Jul 5 13:51:28 2012 [VMM][I]: Monitoring VM 200.
> Thu Jul 5 13:51:28 2012 [VMM][I]: Monitoring VM 202.
> Thu Jul 5 13:51:28 2012 [VMM][I]: Monitoring VM 206.
> Thu Jul 5 13:51:32 2012 [VMM][I]: Monitoring VM 123.
> Thu Jul 5 13:51:32 2012 [VMM][I]: Monitoring VM 130.
> Thu Jul 5 13:51:32 2012 [VMM][I]: Monitoring VM 162.
> Thu Jul 5 13:51:32 2012 [VMM][I]: Monitoring VM 186.
> Thu Jul 5 13:51:32 2012 [VMM][I]: Monitoring VM 208.
> Thu Jul 5 13:51:36 2012 [InM][I]: Monitoring host 10.20.52.31 (6)
> Thu Jul 5 13:51:36 2012 [InM][I]: Monitoring host 10.20.52.32 (7)
> Thu Jul 5 13:51:36 2012 [InM][I]: Monitoring host 10.20.52.33 (9)
> Thu Jul 5 13:51:36 2012 [InM][I]: Monitoring host 10.20.52.34 (10)
> Thu Jul 5 13:51:36 2012 [VMM][I]: Monitoring VM 127.
> Thu Jul 5 13:51:36 2012 [VMM][I]: Monitoring VM 141.
> Thu Jul 5 13:51:36 2012 [VMM][I]: Monitoring VM 146.
> Thu Jul 5 13:51:36 2012 [VMM][I]: Monitoring VM 190.
> Thu Jul 5 13:51:36 2012 [VMM][I]: Monitoring VM 201.
> Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: LOG I 202
> ExitCode: 0
>
> Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
> 202 NAME=one-202 STATE=a USEDCPU=0.3 USEDMEMORY=4197164 NETTX=5147
> NETRX=24499
>
> Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: LOG I 206
> ExitCode: 0
>
> Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: LOG I 184
> ExitCode: 0
>
> Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: LOG I 134
> ExitCode: 0
>
> Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
> 206 NAME=one-206 STATE=a USEDCPU=0.3 USEDMEMORY=4197164
> NETTX=37568 NETRX=640851
>
> Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
> 184 NAME=one-184 STATE=a USEDCPU=2.2 USEDMEMORY=4197164
> NETTX=1220134 NETRX=496270
>
> Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
> 134 NAME=one-134 STATE=a USEDCPU=0.9 USEDMEMORY=8391468 NETTX=665
> NETRX=1451
>
> Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: LOG I 200
> ExitCode: 0
>
> Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
> 200 NAME=one-200 STATE=a USEDCPU=10.3 USEDMEMORY=8392616
> NETTX=265144 NETRX=370045
>
> Thu Jul 5 13:51:39 2012 [InM][I]: ExitCode: 0
> Thu Jul 5 13:51:39 2012 [InM][D]: Host 6 successfully monitored.
> Thu Jul 5 13:51:39 2012 [InM][I]: ExitCode: 0
> Thu Jul 5 13:51:39 2012 [InM][I]: ExitCode: 0
> Thu Jul 5 13:51:39 2012 [InM][D]: Host 7 successfully monitored.
> Thu Jul 5 13:51:39 2012 [InM][D]: Host 9 successfully monitored.
> Thu Jul 5 13:51:39 2012 [InM][I]: ExitCode: 0
> Thu Jul 5 13:51:39 2012 [InM][D]: Host 10 successfully monitored.
> Thu Jul 5 13:51:40 2012 [VMM][I]: Monitoring VM 120.
> Thu Jul 5 13:51:40 2012 [VMM][I]: Monitoring VM 143.
> Thu Jul 5 13:51:40 2012 [VMM][I]: Monitoring VM 191.
> Thu Jul 5 13:51:42 2012 [ReM][D]: HostPoolInfo method invoked
> Thu Jul 5 13:51:42 2012 [ReM][D]: VirtualMachinePoolInfo method
> invoked
> Thu Jul 5 13:51:42 2012 [ReM][D]: AclInfo method invoked
> Thu Jul 5 13:51:42 2012 [VMM][D]: Message received: LOG I 208
> ExitCode: 0
>
> Thu Jul 5 13:51:42 2012 [VMM][D]: Message received: POLL SUCCESS
> 208 NAME=one-208 STATE=a USEDCPU=3.7 USEDMEMORY=8391468
> NETTX=250167 NETRX=162294
>
> The running VMs are not impacted in any way -- we have resorted to
> leaving one stopped until we can resolve the issue. What
> can/should I look at to begin debugging this problem?
>
> Thanks,
> Marshall
>
> ------------------------------------------------------------------------
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
> ------------------------------------------------------------------------
>
> No virus found in this message.
> Checked by AVG - www.avg.com <http://www.avg.com>
> Version: 2012.0.2193 / Virus Database: 2437/5112 - Release Date:
> 07/05/12
>
>
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
--
----------------------------
Alberto Zuin
via Mare, 36/A
36030 Lugo di Vicenza (VI)
Italy
P.I. 04310790284
Tel. +39.0499271575
Fax. +39.0492106654
Cell. +39.3286268626
www.azns.it - alberto at azns.it
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20120706/4e930ee0/attachment-0003.htm>
More information about the Users
mailing list