[one-users] Fixing high cpu load due to leap second bug (Was: Sunstone does not load any stats)

Jhon Masschelein jhon.masschelein at sara.nl
Thu Jul 5 01:24:06 PDT 2012


Hi,

As apprantly this is not yet general knowledge on this list: rebooting 
is usually not required to fix the leap second bug (unless the system 
has become unresponsive and you need to reset it, of course).

On all my servers (cloud and others) I was able to fix this by issuing 
the command:

date -s "`date`"

(Credit for this fix goes to the tweakers.net site, but I cannot find 
the exact page link anymore.)

Wkr,

Jhon

On 07/05/2012 09:24 AM, Tao Craig wrote:
> Nice catch, Rolandas!
>
> I'm sorry to clog up the list with such simple fixes, but I guess this 
> just goes to show anyone who may be reading this that sometimes the 
> fix really is as simple as a reboot. I should have done that a long 
> time ago, but I got distracted with all the details.
>
> Anyway, I manually forced ntp to resync and when that had no effect -I 
> rebooted both the private and public cloud controllers. Sure enough, 
> everything is nice and fast and stable now... ruby CPU usage went from 
> 100+ percent down to 0.3 percent and all my graphs are online.
>
> Thanks again, everyone!
>
> ----- Original Message ----- From: "Rolandas Naujikas" 
> <rolandas.naujikas at mif.vu.lt>
> To: <users at lists.opennebula.org>
> Cc: "Tao Craig" <tao at leadmesh.com>; "Hector Sanjuan" 
> <hsanjuan at opennebula.org>
> Sent: Wednesday, July 04, 2012 10:34 PM
> Subject: Re: [one-users] Sunstone does not load any stats. (Users 
> Digest, Vol 53, Issue 14)
>
>
>> Hi,
>>
>> If that started to appear in Monday after Saturday leap second then 
>> it could be related to 
>> http://it.slashdot.org/story/12/07/01/1920217/leap-second-bug-causes-crashes. 
>> Our opennebula server had increased load (to ~20) after that also. 
>> Reboot helped.
>>
>> Regards, Rolandas Naujikas
>>
>> On 2012-07-05 05:48, Tao Craig wrote:
>>> Hi Hector,
>>>
>>> At first, I see the orange spinning balls... then after some time, this
>>> is replaced with "undefined". I also noticed that ruby seems to be
>>> fairly stable until I try to load these graphs. Then one ruby script
>>> will jumpt to 100+ CPU usage and pretty much stay there. Sometimes, I
>>> will get a "Could not connect..." alert during this time and ruby will
>>> return to normal.
>>>
>>> Seeing stuff like this when I trace the PID of the ruby script:
>>>
>>> rt_sigreturn(0x1a)                      = 121
>>> --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
>>>
>>> sunstone.log:
>>>
>>> Wed Jul 04 19:32:04 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:04]
>>> "GET /vmtemplate?timeout=true HTTP/1.1" 200 3907 53.3096
>>> Wed Jul 04 19:32:08 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:07]
>>> "GET /acl?timeout=true HTTP/1.1" 200 377 33.8135
>>> Wed Jul 04 19:32:12 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:11]
>>> "GET /vnet?timeout=true HTTP/1.1" 200 649 39.9651
>>> Wed Jul 04 19:32:22 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:22]
>>> "GET /datastore?timeout=true HTTP/1.1" 200 2335 55.5165
>>> Wed Jul 04 19:32:28 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:28]
>>> "GET /user?timeout=true HTTP/1.1" 200 1505 44.8972
>>> Wed Jul 04 19:32:41 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:41]
>>> "GET /cluster?timeout=true HTTP/1.1" 200 27 26.3875
>>> Wed Jul 04 19:32:44 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:44]
>>> "GET /image?timeout=true HTTP/1.1" 200 2957 65.6978
>>> Wed Jul 04 19:32:48 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:48]
>>> "GET /host?timeout=true HTTP/1.1" 200 11110 139.9275
>>> Wed Jul 04 19:32:51 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:51]
>>> "GET /group?timeout=true HTTP/1.1" 200 796 36.0109
>>> Wed Jul 04 19:32:58 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:58]
>>> "GET /datastore?timeout=true HTTP/1.1" 200 2335 56.8328
>>>
>>> sunstone.error is empty, except for this:
>>>
>>> == Sinatra/1.3.2 has taken the stage on 9869 for development with 
>>> backup
>>> from Thin
>>>
>>> "gem list"
>>>
>>> addressable (2.2.8)
>>> amazon-ec2 (0.9.17)
>>> bcrypt-ruby (3.0.1)
>>> curb (0.8.0)
>>> daemons (1.1.8)
>>> data_mapper (1.2.0)
>>> data_objects (0.10.8)
>>> dm-aggregates (1.2.0)
>>> dm-constraints (1.2.0)
>>> dm-core (1.2.0)
>>> dm-do-adapter (1.2.0)
>>> dm-migrations (1.2.0)
>>> dm-mysql-adapter (1.2.0)
>>> dm-serializer (1.2.1)
>>> dm-sqlite-adapter (1.2.0)
>>> dm-timestamps (1.2.0)
>>> dm-transactions (1.2.0)
>>> dm-types (1.2.1)
>>> dm-validations (1.2.0)
>>> do_mysql (0.10.8)
>>> do_sqlite3 (0.10.8)
>>> eventmachine (0.12.10)
>>> fastercsv (1.5.4)
>>> json (1.7.0, 1.6.7)
>>> json_pure (1.6.7)
>>> multi_json (1.0.4)
>>> mysql (2.8.1)
>>> net-ldap (0.3.1)
>>> nokogiri (1.5.2)
>>> rack (1.4.1)
>>> rack-protection (1.2.0)
>>> rake (0.8.7)
>>> sequel (3.35.0)
>>> sinatra (1.3.2)
>>> sqlite3 (1.3.6)
>>> stringex (1.3.3)
>>> thin (1.3.1)
>>> tilt (1.3.3)
>>> uuidtools (2.1.2)
>>> xml-simple (1.1.1)
>>>
>>> I am running sunstone and opennebula on the same box... it does seem to
>>> be ruby related, but I never had a problem until Monday. Prior to that,
>>> nothing had changed on my end. I just came into the office on Monday 
>>> and
>>> discvoered I could not log in to the older, private cloud and the 
>>> public
>>> cloud was very slow. Upgrading the public cloud seemed to help (aside
>>> from the issues mentioned above), but I can't upgrade the private cloud
>>> just yet and I would rather identify the source of this problem first.
>>>
>>> The CLI is very fast and responsive and other tools such as, VNC 
>>> console
>>> work fine.
>>>
>>> I haven't really been using the self-service portal (although I would
>>> like to in the future), but when I try to start it -I get the following
>>> error:
>>>
>>> Wed Jul 04 19:43:17 2012 [E]: Error initializing authentication system
>>> Wed Jul 04 19:43:17 2012 [E]: [UserPoolInfo] User couldn't be
>>> authenticated, aborting call.
>>>
>>> Thanks again for your help.
>>>
>>> ----- Original Message ----- From: "Hector Sanjuan"
>>> <hsanjuan at opennebula.org>
>>> To: <users at lists.opennebula.org>; "Tao Craig" <tao at leadmesh.com>
>>> Sent: Wednesday, July 04, 2012 4:06 PM
>>> Subject: Re: [one-users] Sunstone does not load any stats.
>>>
>>>
>>> Hi,
>>>
>>>> monitoring graphs on my hosts and virtual machines no longer appear.
>>>
>>> Is there an empty graph in place or is there an error message? If 
>>> you can
>>> attach sunstone.log and sunstone.error (if not empty) after trying 
>>> to see
>>> those graphs etc. perhaps I see something...
>>>
>>> It's not normal that the dashboard takes 30secs to load. I guess the 
>>> CLI
>>> is not so slow when issuing a listing command (onehost list, onevm list
>>> etc..) right?
>>>
>>> And what is the ruby script consuming 100% exactly? (grep pid from 'ps
>>> aux' or press 'c' during the execution of 'top' to find the full 
>>> command).
>>>
>>> If you have this long-wait problem in two different clouds and ruby is
>>> consuming so much cpu I would think there is an issue with your boxes
>>> configuration related to ruby perhaps. What's the output of 'gem list'?
>>> Are you running sunstone and opennebula on the same box? Have you tried
>>> Self-Service interface? Is it so slow as well?
>>>
>>> Hector
>>>
>>>
>>> En Thu, 05 Jul 2012 00:40:19 +0200, Tao Craig <tao at leadmesh.com> 
>>> escribi?:
>>>
>>>> Hector,
>>>>
>>>> Thanks for the prompt reply. I am ashamed to admit that browser cache
>>>> was the problem in this case. The dashboard still takes about 30
>>>> seconds  to load, but at least it is loading now. I noticed a few
>>>> other minor  issues though that I can not track down in my logs. For
>>>> example, the  monitoring graphs on my hosts and virtual machines no
>>>> longer appear.
>>>>
>>>> ... any advice?
>>>>
>>>> Part of the reason I didn't catch the browser cache issue earlier is
>>>> because I have a second CentOS/KVM cloud running version 3.2.0 and the
>>>> dashboard recently stopped loading on it as well. This was not fixed
>>>> by clearing my browser cache. Eventually, I get a "Could not
>>>> connect..." alert and the page never finishes loading. During this
>>>> time, there is a ruby script consuming 100+ percent of CPU resources.
>>>> When I kill this script, the cloud is still functional but Sunstone is
>>>> no longer running.
>>>>
>>>> The logs all appear normal as far as I can tell and all CLI commnands
>>>> work without error. Any suggestions here would be greatly appreciated
>>>> as well.
>>>>
>>>> Thanks.
>>>> ----- Original Message ----- From: "Hector Sanjuan"
>>>> <hsanjuan at opennebula.org>
>>>> To: <users at lists.opennebula.org>
>>>> Sent: Wednesday, July 04, 2012 4:17 AM
>>>> Subject: Re: [one-users] Sunstone does not load any stats.
>>>>
>>>>
>>>>> Hello,
>>>>>
>>>>> can you try to remove browsers cache and see if that fixes it?
>>>>>
>>>>> Hector
>>>>>
>>>>> En Wed, 04 Jul 2012 02:10:34 +0200, Tao Craig <tao at leadmesh.com>
>>>>> escribi?:
>>>>>
>>>>>> Hi everybody,
>>>>>>
>>>>>> I recently upgraded my CentOS Open Nebula installation from 3.4 to
>>>>>> 3.6 (Lagoon).
>>>>>>
>>>>>> Prior to the upgrade, I noticed my Sunstone dashboard was loading
>>>>>> slowly on login (the page would load fine, but it took awhile to
>>>>>> load the graphs, number of hosts, etc). I saw there were some
>>>>>> improvements with the Sunstone dashboard with this upgrade, so I
>>>>>> applied it hoping it would help. Now, my Sunstone dashboard doesn't
>>>>>> load any stats or graphs... I just see those spinning orange dots
>>>>>> and the rest of the Sunstone interface does not work either (I'm
>>>>>> assuming because this information is never gathered).
>>>>>>
>>>>>> There are no errors in my logs anywhere that I can find. The only
>>>>>> thing I am noticing is that ruby scripts are consuming a large
>>>>>> amount  of CPU resources.
>>>>>>
>>>>>> If it helps, I am currently running 13 virtual machines on 9 hosts
>>>>>> and all "one" CLI commands work fine.
>>>>>>
>>>>>> Any help would be appreciated.
>>>>>>
>>>>>> Thanks.
>>>>>
>>>>>
>>>>> -- Hector Sanjuan
>>>>> OpenNebula Developer
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at lists.opennebula.org
>>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>>
>>>>>
>>>>> -----
>>>>> No virus found in this message.
>>>>> Checked by AVG - www.avg.com
>>>>> Version: 2012.0.2193 / Virus Database: 2437/5109 - Release Date:
>>>>> 07/03/12
>>>>>
>>>>
>>>
>>>
>>
>>
>>
>>
>> -----
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 2012.0.2193 / Virus Database: 2437/5111 - Release Date: 
>> 07/04/12
>>
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

-- 
Jhon Masschelein
Senior Systeemprogrammeur
SARA - HPCV

Science Park 140
1098 XG Amsterdam
T +31 (0)20 592 8099
F +31 (0)20 668 3167
M +31 (0)6 4748 9328
E jhon.masschelein at sara.nl
http://www.sara.nl






More information about the Users mailing list