[one-users] OpenNebula Load Average/CPU Usage

Alberto Zuin - Liste liste at albertozuin.eu
Thu Jul 5 23:23:22 PDT 2012


Hello Tao,
I had the same problem on my server (see "[one-users] OpenNebula Drain 
CPU").
Thanks for your solutions, save me from a very big hardware upgrade!
Use:

date -s "$(LC_ALL=C date)"

on localized servers.

Alberto Zuin


On 06/07/2012 00:24, Tao Craig wrote:
> Hi Marshall,
> I think this could be related to the "leap second bug". Did you notice 
> any lag prior to this past weekend?
> If not, try issuing this command on your cloud controller(s): date -s 
> "`date`"
> Alternatively, you can reboot. I had pretty much an identical issue as 
> you and it appears it was related to the "leap second bug". A reboot 
> fixed it for me, but some people have had success with the date command.
>
>     ----- Original Message -----
>     *From:* Marshall Grillos <mailto:mgrillos at optimalpath.com>
>     *To:* users at lists.opennebula.org <mailto:users at lists.opennebula.org>
>     *Cc:* Rusty Wolf <mailto:rwolf at optimalpath.com>
>     *Sent:* Thursday, July 05, 2012 11:53 AM
>     *Subject:* [one-users] OpenNebula Load Average/CPU Usage
>
>     My company is running OpenNebula 3.4.  It's been running in
>     production now for just over a month.  Recently we started
>     noticing issues in Sunstone.  Mainly the VM List wouldn't load and
>     several other parts of the GUI would not load.  I attempted to
>     restart Sunstone and oned and the problem persists.
>
>     One item of note is the high CPU utilization of several of
>     OpenNebula's processes.  Here are the top details from our cloud
>     controller (this server also serves up the shared data for our VMs
>     via NFS):
>     top - 13:47:21 up 48 days, 17:07,  1 user,  load average: 7.78,
>     5.09, 2.69
>     Tasks: 305 total,   2 running, 303 sleeping,   0 stopped,   0 zombie
>     Cpu(s):  0.0%us, 52.2%sy,  6.5%ni, 39.1%id,  0.0%wa,  0.0%hi,
>      2.2%si,  0.0%st
>     Mem:  32865312k total, 32622008k used,   243304k free,    65196k
>     buffers
>     Swap: 16383992k total,        0k used, 16383992k free, 30481224k
>     cached
>
>       PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM  TIME+  COMMAND
>     16664 oneadmin  39  19 53652  15m 1548 S 69.8  0.0 9:37.50 ruby
>     16718 oneadmin  39  19 39392 4044 1452 S 67.8  0.0 8:49.46 ruby
>     16697 oneadmin  39  19 39220 3880 1452 S 67.1  0.0 9:04.05 ruby
>     16687 oneadmin  39  19 39388 4040 1452 S 65.1  0.0 9:20.84 ruby
>     16677 oneadmin  39  19 39500 4228 1464 S 54.2  0.0 8:13.68 ruby
>     16708 oneadmin  39  19 43520 6780 1524 S 42.2  0.0 8:09.96 ruby
>
>     Here is the process list for oneadmin on the cloud controller:
>     oneadmin 16629  0.0  0.0 108288  1900 pts/0    S    13:31   0:00 bash
>     oneadmin 16641  0.0  0.0 1461156 12604 pts/0   Sl   13:31   0:00
>     /usr/bin/oned -f
>     oneadmin 16642  0.0  0.0 192200  5688 pts/0    Sl   13:31   0:00
>     /usr/bin/mm_sched
>     oneadmin 16664 58.9  0.0  53652 16160 pts/0    SNl  13:31   7:04
>     ruby /usr/lib/one/mads/one_vmm_exec.rb -t 15 -r 0 xen
>     oneadmin 16677 51.8  0.0  39500  4228 pts/0    SNl  13:31   6:12
>     ruby /usr/lib/one/mads/one_im_exec.rb xen
>     oneadmin 16687 56.4  0.0  39388  4040 pts/0    SNl  13:31   6:46
>     ruby /usr/lib/one/mads/one_tm.rb -t 15 -d
>     dummy,lvm,shared,qcow2,ssh,vmware,iscsi
>     oneadmin 16697 53.9  0.0  39220  3880 pts/0    SNl  13:31   6:28
>     ruby /usr/lib/one/mads/one_hm.rb
>     oneadmin 16708 53.1  0.0  43520  6780 pts/0    SNl  13:31   6:21
>     ruby /usr/lib/one/mads/one_datastore.rb -t 15 -d fs,vmware,iscsi
>     oneadmin 16718 52.5  0.0  39392  4044 pts/0    SNl  13:31   6:17
>     ruby /usr/lib/one/mads/one_auth_mad.rb --authn
>     ssh,x509,ldap,server_cipher,server_x509
>     oneadmin 16997  1.0  0.0 110212  1164 pts/0    R+   13:43   0:00
>     ps -aux
>     oneadmin 16998  0.0  0.0 103116   908 pts/0    S+   13:43   0:00 more
>
>     When I stop one, the load average on the server returns to normal
>     with over 98% idle CPU.  I can't seem to find anything bad in the
>     logs:
>     Thu Jul  5 13:47:24 2012 [VMM][I]: --Mark--
>     Thu Jul  5 13:47:50 2012 [ReM][D]: HostPoolInfo method invoked
>     Thu Jul  5 13:47:50 2012 [ReM][D]: VirtualMachinePoolInfo method
>     invoked
>     Thu Jul  5 13:47:50 2012 [ReM][D]: AclInfo method invoked
>     Thu Jul  5 13:48:19 2012 [ReM][D]: HostPoolInfo method invoked
>     Thu Jul  5 13:48:19 2012 [ReM][D]: VirtualMachinePoolInfo method
>     invoked
>     Thu Jul  5 13:48:19 2012 [ReM][D]: AclInfo method invoked
>     Thu Jul  5 13:48:48 2012 [ReM][D]: HostPoolInfo method invoked
>     Thu Jul  5 13:48:48 2012 [ReM][D]: VirtualMachinePoolInfo method
>     invoked
>     Thu Jul  5 13:48:48 2012 [ReM][D]: AclInfo method invoked
>     Thu Jul  5 13:49:17 2012 [ReM][D]: HostPoolInfo method invoked
>     Thu Jul  5 13:49:17 2012 [ReM][D]: VirtualMachinePoolInfo method
>     invoked
>     Thu Jul  5 13:49:17 2012 [ReM][D]: AclInfo method invoked
>     Thu Jul  5 13:49:46 2012 [ReM][D]: HostPoolInfo method invoked
>     Thu Jul  5 13:49:46 2012 [ReM][D]: VirtualMachinePoolInfo method
>     invoked
>     Thu Jul  5 13:49:46 2012 [ReM][D]: AclInfo method invoked
>     Thu Jul  5 13:50:15 2012 [ReM][D]: HostPoolInfo method invoked
>     Thu Jul  5 13:50:15 2012 [ReM][D]: VirtualMachinePoolInfo method
>     invoked
>     Thu Jul  5 13:50:15 2012 [ReM][D]: AclInfo method invoked
>     Thu Jul  5 13:50:44 2012 [ReM][D]: HostPoolInfo method invoked
>     Thu Jul  5 13:50:44 2012 [ReM][D]: VirtualMachinePoolInfo method
>     invoked
>     Thu Jul  5 13:50:44 2012 [ReM][D]: AclInfo method invoked
>     Thu Jul  5 13:51:13 2012 [ReM][D]: HostPoolInfo method invoked
>     Thu Jul  5 13:51:13 2012 [ReM][D]: VirtualMachinePoolInfo method
>     invoked
>     Thu Jul  5 13:51:13 2012 [ReM][D]: AclInfo method invoked
>     Thu Jul  5 13:51:28 2012 [VMM][I]: Monitoring VM 134.
>     Thu Jul  5 13:51:28 2012 [VMM][I]: Monitoring VM 184.
>     Thu Jul  5 13:51:28 2012 [VMM][I]: Monitoring VM 200.
>     Thu Jul  5 13:51:28 2012 [VMM][I]: Monitoring VM 202.
>     Thu Jul  5 13:51:28 2012 [VMM][I]: Monitoring VM 206.
>     Thu Jul  5 13:51:32 2012 [VMM][I]: Monitoring VM 123.
>     Thu Jul  5 13:51:32 2012 [VMM][I]: Monitoring VM 130.
>     Thu Jul  5 13:51:32 2012 [VMM][I]: Monitoring VM 162.
>     Thu Jul  5 13:51:32 2012 [VMM][I]: Monitoring VM 186.
>     Thu Jul  5 13:51:32 2012 [VMM][I]: Monitoring VM 208.
>     Thu Jul  5 13:51:36 2012 [InM][I]: Monitoring host 10.20.52.31 (6)
>     Thu Jul  5 13:51:36 2012 [InM][I]: Monitoring host 10.20.52.32 (7)
>     Thu Jul  5 13:51:36 2012 [InM][I]: Monitoring host 10.20.52.33 (9)
>     Thu Jul  5 13:51:36 2012 [InM][I]: Monitoring host 10.20.52.34 (10)
>     Thu Jul  5 13:51:36 2012 [VMM][I]: Monitoring VM 127.
>     Thu Jul  5 13:51:36 2012 [VMM][I]: Monitoring VM 141.
>     Thu Jul  5 13:51:36 2012 [VMM][I]: Monitoring VM 146.
>     Thu Jul  5 13:51:36 2012 [VMM][I]: Monitoring VM 190.
>     Thu Jul  5 13:51:36 2012 [VMM][I]: Monitoring VM 201.
>     Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: LOG I 202
>     ExitCode: 0
>
>     Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
>     202 NAME=one-202 STATE=a USEDCPU=0.3 USEDMEMORY=4197164 NETTX=5147
>     NETRX=24499
>
>     Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: LOG I 206
>     ExitCode: 0
>
>     Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: LOG I 184
>     ExitCode: 0
>
>     Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: LOG I 134
>     ExitCode: 0
>
>     Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
>     206 NAME=one-206 STATE=a USEDCPU=0.3 USEDMEMORY=4197164
>     NETTX=37568 NETRX=640851
>
>     Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
>     184 NAME=one-184 STATE=a USEDCPU=2.2 USEDMEMORY=4197164
>     NETTX=1220134 NETRX=496270
>
>     Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
>     134 NAME=one-134 STATE=a USEDCPU=0.9 USEDMEMORY=8391468 NETTX=665
>     NETRX=1451
>
>     Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: LOG I 200
>     ExitCode: 0
>
>     Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
>     200 NAME=one-200 STATE=a USEDCPU=10.3 USEDMEMORY=8392616
>     NETTX=265144 NETRX=370045
>
>     Thu Jul  5 13:51:39 2012 [InM][I]: ExitCode: 0
>     Thu Jul  5 13:51:39 2012 [InM][D]: Host 6 successfully monitored.
>     Thu Jul  5 13:51:39 2012 [InM][I]: ExitCode: 0
>     Thu Jul  5 13:51:39 2012 [InM][I]: ExitCode: 0
>     Thu Jul  5 13:51:39 2012 [InM][D]: Host 7 successfully monitored.
>     Thu Jul  5 13:51:39 2012 [InM][D]: Host 9 successfully monitored.
>     Thu Jul  5 13:51:39 2012 [InM][I]: ExitCode: 0
>     Thu Jul  5 13:51:39 2012 [InM][D]: Host 10 successfully monitored.
>     Thu Jul  5 13:51:40 2012 [VMM][I]: Monitoring VM 120.
>     Thu Jul  5 13:51:40 2012 [VMM][I]: Monitoring VM 143.
>     Thu Jul  5 13:51:40 2012 [VMM][I]: Monitoring VM 191.
>     Thu Jul  5 13:51:42 2012 [ReM][D]: HostPoolInfo method invoked
>     Thu Jul  5 13:51:42 2012 [ReM][D]: VirtualMachinePoolInfo method
>     invoked
>     Thu Jul  5 13:51:42 2012 [ReM][D]: AclInfo method invoked
>     Thu Jul  5 13:51:42 2012 [VMM][D]: Message received: LOG I 208
>     ExitCode: 0
>
>     Thu Jul  5 13:51:42 2012 [VMM][D]: Message received: POLL SUCCESS
>     208 NAME=one-208 STATE=a USEDCPU=3.7 USEDMEMORY=8391468
>     NETTX=250167 NETRX=162294
>
>     The running VMs are not impacted in any way -- we have resorted to
>     leaving one stopped until we can resolve the issue.  What
>     can/should I look at to begin debugging this problem?
>
>     Thanks,
>     Marshall
>
>     ------------------------------------------------------------------------
>     _______________________________________________
>     Users mailing list
>     Users at lists.opennebula.org
>     http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
>     ------------------------------------------------------------------------
>
>     No virus found in this message.
>     Checked by AVG - www.avg.com <http://www.avg.com>
>     Version: 2012.0.2193 / Virus Database: 2437/5112 - Release Date:
>     07/05/12
>
>
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


-- 
----------------------------
Alberto Zuin
via Mare, 36/A
36030 Lugo di Vicenza (VI)
Italy
P.I. 04310790284
Tel. +39.0499271575
Fax. +39.0492106654
Cell. +39.3286268626
www.azns.it - alberto at azns.it

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20120706/4e930ee0/attachment-0003.htm>


More information about the Users mailing list