<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=Windows-1252" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 9.00.8112.16446">
<STYLE></STYLE>
</HEAD>
<BODY
style="FONT-FAMILY: Calibri, sans-serif; WORD-WRAP: break-word; COLOR: rgb(0,0,0); FONT-SIZE: 14px; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space"
bgColor=#ffffff>
<DIV><FONT size=2 face=Arial>Hi Marshall,</FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial>I think this could be related to the "leap second
bug". Did you notice any lag prior to this past weekend?</FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial>If not, try issuing this command on your cloud
controller(s): date -s "`date`"</FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial>Alternatively, you can reboot. I had pretty much an
identical issue as you and it appears it was related to the "leap second bug". A
reboot fixed it for me, but some people have had success with the date
command.</FONT></DIV>
<BLOCKQUOTE
style="BORDER-LEFT: #000000 2px solid; PADDING-LEFT: 5px; PADDING-RIGHT: 0px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px"
dir=ltr>
<DIV style="FONT: 10pt arial">----- Original Message ----- </DIV>
<DIV
style="FONT: 10pt arial; BACKGROUND: #e4e4e4; font-color: black"><B>From:</B>
<A title=mgrillos@optimalpath.com
href="mailto:mgrillos@optimalpath.com">Marshall Grillos</A> </DIV>
<DIV style="FONT: 10pt arial"><B>To:</B> <A title=users@lists.opennebula.org
href="mailto:users@lists.opennebula.org">users@lists.opennebula.org</A> </DIV>
<DIV style="FONT: 10pt arial"><B>Cc:</B> <A title=rwolf@optimalpath.com
href="mailto:rwolf@optimalpath.com">Rusty Wolf</A> </DIV>
<DIV style="FONT: 10pt arial"><B>Sent:</B> Thursday, July 05, 2012 11:53
AM</DIV>
<DIV style="FONT: 10pt arial"><B>Subject:</B> [one-users] OpenNebula Load
Average/CPU Usage</DIV>
<DIV><BR></DIV>
<DIV>My company is running OpenNebula 3.4. It's been running in
production now for just over a month. Recently we started noticing
issues in Sunstone. Mainly the VM List wouldn't load and several other
parts of the GUI would not load. I attempted to restart Sunstone and
oned and the problem persists.</DIV>
<DIV><BR></DIV>
<DIV>One item of note is the high CPU utilization of several of OpenNebula's
processes. Here are the top details from our cloud controller (this
server also serves up the shared data for our VMs via NFS):</DIV>
<DIV>
<DIV>
<DIV>
<DIV>top - 13:47:21 up 48 days, 17:07, 1 user, load average: 7.78,
5.09, 2.69</DIV>
<DIV>Tasks: 305 total, 2 running, 303 sleeping, 0 stopped,
0 zombie</DIV>
<DIV>Cpu(s): 0.0%us, 52.2%sy, 6.5%ni, 39.1%id, 0.0%wa,
0.0%hi, 2.2%si, 0.0%st</DIV>
<DIV>Mem: 32865312k total, 32622008k used, 243304k free,
65196k buffers</DIV>
<DIV>Swap: 16383992k total, 0k used, 16383992k
free, 30481224k cached</DIV>
<DIV><BR></DIV>
<DIV> PID USER PR NI VIRT RES
SHR S %CPU %MEM TIME+ COMMAND
</DIV>
<DIV>16664 oneadmin 39 19 53652 15m 1548 S 69.8 0.0
9:37.50 ruby
</DIV>
<DIV>16718 oneadmin 39 19 39392 4044 1452 S 67.8 0.0
8:49.46 ruby
</DIV>
<DIV>16697 oneadmin 39 19 39220 3880 1452 S 67.1 0.0
9:04.05 ruby
</DIV>
<DIV>16687 oneadmin 39 19 39388 4040 1452 S 65.1 0.0
9:20.84 ruby
</DIV>
<DIV>16677 oneadmin 39 19 39500 4228 1464 S 54.2 0.0
8:13.68 ruby
</DIV>
<DIV>16708 oneadmin 39 19 43520 6780 1524 S 42.2 0.0
8:09.96 ruby </DIV></DIV></DIV></DIV>
<DIV><BR></DIV>
<DIV>Here is the process list for oneadmin on the cloud controller:</DIV>
<DIV>
<DIV>oneadmin 16629 0.0 0.0 108288 1900 pts/0 S
13:31 0:00 bash</DIV>
<DIV>oneadmin 16641 0.0 0.0 1461156 12604 pts/0 Sl
13:31 0:00 /usr/bin/oned -f</DIV>
<DIV>oneadmin 16642 0.0 0.0 192200 5688 pts/0
Sl 13:31 0:00 /usr/bin/mm_sched</DIV>
<DIV>oneadmin 16664 58.9 0.0 53652 16160 pts/0 SNl
13:31 7:04 ruby /usr/lib/one/mads/one_vmm_exec.rb -t 15 -r 0
xen</DIV>
<DIV>oneadmin 16677 51.8 0.0 39500 4228 pts/0
SNl 13:31 6:12 ruby /usr/lib/one/mads/one_im_exec.rb
xen</DIV>
<DIV>oneadmin 16687 56.4 0.0 39388 4040 pts/0
SNl 13:31 6:46 ruby /usr/lib/one/mads/one_tm.rb -t 15 -d
dummy,lvm,shared,qcow2,ssh,vmware,iscsi</DIV>
<DIV>oneadmin 16697 53.9 0.0 39220 3880 pts/0
SNl 13:31 6:28 ruby /usr/lib/one/mads/one_hm.rb</DIV>
<DIV>oneadmin 16708 53.1 0.0 43520 6780 pts/0
SNl 13:31 6:21 ruby /usr/lib/one/mads/one_datastore.rb -t
15 -d fs,vmware,iscsi</DIV>
<DIV>oneadmin 16718 52.5 0.0 39392 4044 pts/0
SNl 13:31 6:17 ruby /usr/lib/one/mads/one_auth_mad.rb
--authn ssh,x509,ldap,server_cipher,server_x509</DIV>
<DIV>oneadmin 16997 1.0 0.0 110212 1164 pts/0
R+ 13:43 0:00 ps -aux</DIV>
<DIV>oneadmin 16998 0.0 0.0 103116 908 pts/0
S+ 13:43 0:00 more</DIV></DIV>
<DIV><BR></DIV>
<DIV>When I stop one, the load average on the server returns to normal with
over 98% idle CPU. I can't seem to find anything bad in the logs:</DIV>
<DIV>
<DIV>
<DIV>Thu Jul 5 13:47:24 2012 [VMM][I]: --Mark--</DIV>
<DIV>Thu Jul 5 13:47:50 2012 [ReM][D]: HostPoolInfo method invoked</DIV>
<DIV>Thu Jul 5 13:47:50 2012 [ReM][D]: VirtualMachinePoolInfo method
invoked</DIV>
<DIV>Thu Jul 5 13:47:50 2012 [ReM][D]: AclInfo method invoked</DIV>
<DIV>Thu Jul 5 13:48:19 2012 [ReM][D]: HostPoolInfo method invoked</DIV>
<DIV>Thu Jul 5 13:48:19 2012 [ReM][D]: VirtualMachinePoolInfo method
invoked</DIV>
<DIV>Thu Jul 5 13:48:19 2012 [ReM][D]: AclInfo method invoked</DIV>
<DIV>Thu Jul 5 13:48:48 2012 [ReM][D]: HostPoolInfo method invoked</DIV>
<DIV>Thu Jul 5 13:48:48 2012 [ReM][D]: VirtualMachinePoolInfo method
invoked</DIV>
<DIV>Thu Jul 5 13:48:48 2012 [ReM][D]: AclInfo method invoked</DIV>
<DIV>Thu Jul 5 13:49:17 2012 [ReM][D]: HostPoolInfo method invoked</DIV>
<DIV>Thu Jul 5 13:49:17 2012 [ReM][D]: VirtualMachinePoolInfo method
invoked</DIV>
<DIV>Thu Jul 5 13:49:17 2012 [ReM][D]: AclInfo method invoked</DIV>
<DIV>Thu Jul 5 13:49:46 2012 [ReM][D]: HostPoolInfo method invoked</DIV>
<DIV>Thu Jul 5 13:49:46 2012 [ReM][D]: VirtualMachinePoolInfo method
invoked</DIV>
<DIV>Thu Jul 5 13:49:46 2012 [ReM][D]: AclInfo method invoked</DIV>
<DIV>Thu Jul 5 13:50:15 2012 [ReM][D]: HostPoolInfo method invoked</DIV>
<DIV>Thu Jul 5 13:50:15 2012 [ReM][D]: VirtualMachinePoolInfo method
invoked</DIV>
<DIV>Thu Jul 5 13:50:15 2012 [ReM][D]: AclInfo method invoked</DIV>
<DIV>Thu Jul 5 13:50:44 2012 [ReM][D]: HostPoolInfo method invoked</DIV>
<DIV>Thu Jul 5 13:50:44 2012 [ReM][D]: VirtualMachinePoolInfo method
invoked</DIV>
<DIV>Thu Jul 5 13:50:44 2012 [ReM][D]: AclInfo method invoked</DIV>
<DIV>Thu Jul 5 13:51:13 2012 [ReM][D]: HostPoolInfo method invoked</DIV>
<DIV>Thu Jul 5 13:51:13 2012 [ReM][D]: VirtualMachinePoolInfo method
invoked</DIV>
<DIV>Thu Jul 5 13:51:13 2012 [ReM][D]: AclInfo method invoked</DIV>
<DIV>Thu Jul 5 13:51:28 2012 [VMM][I]: Monitoring VM 134.</DIV>
<DIV>Thu Jul 5 13:51:28 2012 [VMM][I]: Monitoring VM 184.</DIV>
<DIV>Thu Jul 5 13:51:28 2012 [VMM][I]: Monitoring VM 200.</DIV>
<DIV>Thu Jul 5 13:51:28 2012 [VMM][I]: Monitoring VM 202.</DIV>
<DIV>Thu Jul 5 13:51:28 2012 [VMM][I]: Monitoring VM 206.</DIV>
<DIV>Thu Jul 5 13:51:32 2012 [VMM][I]: Monitoring VM 123.</DIV>
<DIV>Thu Jul 5 13:51:32 2012 [VMM][I]: Monitoring VM 130.</DIV>
<DIV>Thu Jul 5 13:51:32 2012 [VMM][I]: Monitoring VM 162.</DIV>
<DIV>Thu Jul 5 13:51:32 2012 [VMM][I]: Monitoring VM 186.</DIV>
<DIV>Thu Jul 5 13:51:32 2012 [VMM][I]: Monitoring VM 208.</DIV>
<DIV>Thu Jul 5 13:51:36 2012 [InM][I]: Monitoring host 10.20.52.31
(6)</DIV>
<DIV>Thu Jul 5 13:51:36 2012 [InM][I]: Monitoring host 10.20.52.32
(7)</DIV>
<DIV>Thu Jul 5 13:51:36 2012 [InM][I]: Monitoring host 10.20.52.33
(9)</DIV>
<DIV>Thu Jul 5 13:51:36 2012 [InM][I]: Monitoring host 10.20.52.34
(10)</DIV>
<DIV>Thu Jul 5 13:51:36 2012 [VMM][I]: Monitoring VM 127.</DIV>
<DIV>Thu Jul 5 13:51:36 2012 [VMM][I]: Monitoring VM 141.</DIV>
<DIV>Thu Jul 5 13:51:36 2012 [VMM][I]: Monitoring VM 146.</DIV>
<DIV>Thu Jul 5 13:51:36 2012 [VMM][I]: Monitoring VM 190.</DIV>
<DIV>Thu Jul 5 13:51:36 2012 [VMM][I]: Monitoring VM 201.</DIV>
<DIV>Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: LOG I 202
ExitCode: 0</DIV>
<DIV><BR></DIV>
<DIV>Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
202 NAME=one-202 STATE=a USEDCPU=0.3 USEDMEMORY=4197164 NETTX=5147
NETRX=24499</DIV>
<DIV><BR></DIV>
<DIV>Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: LOG I 206
ExitCode: 0</DIV>
<DIV><BR></DIV>
<DIV>Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: LOG I 184
ExitCode: 0</DIV>
<DIV><BR></DIV>
<DIV>Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: LOG I 134
ExitCode: 0</DIV>
<DIV><BR></DIV>
<DIV>Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
206 NAME=one-206 STATE=a USEDCPU=0.3 USEDMEMORY=4197164 NETTX=37568
NETRX=640851</DIV>
<DIV><BR></DIV>
<DIV>Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
184 NAME=one-184 STATE=a USEDCPU=2.2 USEDMEMORY=4197164 NETTX=1220134
NETRX=496270</DIV>
<DIV><BR></DIV>
<DIV>Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
134 NAME=one-134 STATE=a USEDCPU=0.9 USEDMEMORY=8391468 NETTX=665
NETRX=1451</DIV>
<DIV><BR></DIV>
<DIV>Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: LOG I 200
ExitCode: 0</DIV>
<DIV><BR></DIV>
<DIV>Thu Jul 5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
200 NAME=one-200 STATE=a USEDCPU=10.3 USEDMEMORY=8392616 NETTX=265144
NETRX=370045</DIV>
<DIV><BR></DIV>
<DIV>Thu Jul 5 13:51:39 2012 [InM][I]: ExitCode: 0</DIV>
<DIV>Thu Jul 5 13:51:39 2012 [InM][D]: Host 6 successfully
monitored.</DIV>
<DIV>Thu Jul 5 13:51:39 2012 [InM][I]: ExitCode: 0</DIV>
<DIV>Thu Jul 5 13:51:39 2012 [InM][I]: ExitCode: 0</DIV>
<DIV>Thu Jul 5 13:51:39 2012 [InM][D]: Host 7 successfully
monitored.</DIV>
<DIV>Thu Jul 5 13:51:39 2012 [InM][D]: Host 9 successfully
monitored.</DIV>
<DIV>Thu Jul 5 13:51:39 2012 [InM][I]: ExitCode: 0</DIV>
<DIV>Thu Jul 5 13:51:39 2012 [InM][D]: Host 10 successfully
monitored.</DIV>
<DIV>Thu Jul 5 13:51:40 2012 [VMM][I]: Monitoring VM 120.</DIV>
<DIV>Thu Jul 5 13:51:40 2012 [VMM][I]: Monitoring VM 143.</DIV>
<DIV>Thu Jul 5 13:51:40 2012 [VMM][I]: Monitoring VM 191.</DIV>
<DIV>Thu Jul 5 13:51:42 2012 [ReM][D]: HostPoolInfo method invoked</DIV>
<DIV>Thu Jul 5 13:51:42 2012 [ReM][D]: VirtualMachinePoolInfo method
invoked</DIV>
<DIV>Thu Jul 5 13:51:42 2012 [ReM][D]: AclInfo method invoked</DIV>
<DIV>Thu Jul 5 13:51:42 2012 [VMM][D]: Message received: LOG I 208
ExitCode: 0</DIV>
<DIV><BR></DIV>
<DIV>Thu Jul 5 13:51:42 2012 [VMM][D]: Message received: POLL SUCCESS
208 NAME=one-208 STATE=a USEDCPU=3.7 USEDMEMORY=8391468 NETTX=250167
NETRX=162294</DIV></DIV></DIV>
<DIV><BR></DIV>
<DIV>The running VMs are not impacted in any way – we have resorted to leaving
one stopped until we can resolve the issue. What can/should I look at to
begin debugging this problem? </DIV>
<DIV><BR></DIV>
<DIV>Thanks,</DIV>
<DIV>Marshall</DIV>
<P>
<HR>
<P></P>_______________________________________________<BR>Users mailing
list<BR>Users@lists.opennebula.org<BR>http://lists.opennebula.org/listinfo.cgi/users-opennebula.org<BR>
<P>
<HR>
<P></P><A></A>
<P align=left color="#000000" avgcert??>No virus found in this
message.<BR>Checked by AVG - <A
href="http://www.avg.com">www.avg.com</A><BR>Version: 2012.0.2193 / Virus
Database: 2437/5112 - Release Date: 07/05/12</P></BLOCKQUOTE></BODY></HTML>