<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:p="urn:schemas-microsoft-com:office:powerpoint" xmlns:a="urn:schemas-microsoft-com:office:access" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882" xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="#RowsetSchema" xmlns:b="urn:schemas-microsoft-com:office:publisher" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:c="urn:schemas-microsoft-com:office:component:spreadsheet" xmlns:odc="urn:schemas-microsoft-com:office:odc" xmlns:oa="urn:schemas-microsoft-com:office:activation" xmlns:html="http://www.w3.org/TR/REC-html40" xmlns:q="http://schemas.xmlsoap.org/soap/envelope/" xmlns:rtc="http://microsoft.com/officenet/conferencing" xmlns:D="DAV:" xmlns:Repl="http://schemas.microsoft.com/repl/" xmlns:mt="http://schemas.microsoft.com/sharepoint/soap/meetings/" xmlns:x2="http://schemas.microsoft.com/office/excel/2003/xml" xmlns:ppda="http://www.passport.com/NameSpace.xsd" xmlns:ois="http://schemas.microsoft.com/sharepoint/soap/ois/" xmlns:dir="http://schemas.microsoft.com/sharepoint/soap/directory/" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:dsp="http://schemas.microsoft.com/sharepoint/dsp" xmlns:udc="http://schemas.microsoft.com/data/udc" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sub="http://schemas.microsoft.com/sharepoint/soap/2002/1/alerts/" xmlns:ec="http://www.w3.org/2001/04/xmlenc#" xmlns:sp="http://schemas.microsoft.com/sharepoint/" xmlns:sps="http://schemas.microsoft.com/sharepoint/soap/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:udcs="http://schemas.microsoft.com/data/udc/soap" xmlns:udcxf="http://schemas.microsoft.com/data/udc/xmlfile" xmlns:udcp2p="http://schemas.microsoft.com/data/udc/parttopart" xmlns:wf="http://schemas.microsoft.com/sharepoint/soap/workflow/" xmlns:dsss="http://schemas.microsoft.com/office/2006/digsig-setup" xmlns:dssi="http://schemas.microsoft.com/office/2006/digsig" xmlns:mdssi="http://schemas.openxmlformats.org/package/2006/digital-signature" xmlns:mver="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns:mrels="http://schemas.openxmlformats.org/package/2006/relationships" xmlns:spwp="http://microsoft.com/sharepoint/webpartpages" xmlns:ex12t="http://schemas.microsoft.com/exchange/services/2006/types" xmlns:ex12m="http://schemas.microsoft.com/exchange/services/2006/messages" xmlns:pptsl="http://schemas.microsoft.com/sharepoint/soap/SlideLibrary/" xmlns:spsl="http://microsoft.com/webservices/SharePointPortalServer/PublishedLinksService" xmlns:Z="urn:schemas-microsoft-com:" xmlns:st="" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Verdana;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
span.EmailStyle18
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:#1F497D;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=EN-US link=blue vlink=purple>
<div class=WordSection1>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Hi,<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>We again had issues when having many VMs deployed on many hosts at
the same time (log excerpts below) and deploying more.<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>We saw over 25 runaway VMs left behind running from the last
two weeks, that one had marked as DONE, also deploy, copy and stop failed
randomly quite often.<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>It starts to be a major problem, when we can’t run
opennebula in a stable and predictable manner on larger Clouds… <o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>We have the following intervals configured, we do need to
monitor more often then every 10 minutes we feel.<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>HOST_MONITORING_INTERVAL = 20<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>VM_POLLING_INTERVAL = 30<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>So we first used our snmp driver again, which solved a large
part of the problems, but our cloud is still growing, so we reached the next
limit…<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>What seems to be happening is that the “virsh --connect
qemu:///system dominfo” interferes with other virsh commands.Virsh locks libvirt-sock,
so multiple processes can not connect at the same time.<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Solution we are now trying : do the monitoring of VMs in read only
mode: “virsh –readonly --connect qemu:///system dominfo”<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Which we added/changed in the file: /usr/lib/one/mads/one_vmm_kvm.rb<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Now virsh doesn’t lock the libvirt-sock as far as we can
see<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Currently we do not see the error messages we had before, but
some kind of robust, scalable and fail safe monitoring solution for opennebula
is needed.<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Hope this helps<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Kind regards,<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Floris<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Thu Jul 22 16:16:21 2010 [VMM][I]: Command execution fail: virsh
--connect qemu:///system dominfo one-428<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Thu Jul 22 16:16:21 2010 [VMM][I]: STDERR follows.<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Thu Jul 22 16:16:21 2010 [VMM][I]: error: unable to connect to
'/var/run/libvirt/libvirt-sock', libvirtd may need to be started: Permission
denied<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Thu Jul 22 16:16:21 2010 [VMM][I]: error: failed to connect to
the hypervisor<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Thu Jul 22 16:16:21 2010 [VMM][I]: ExitCode: 1<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Thu Jul 22 16:16:21 2010 [VMM][E]: Error monitoring VM, -<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>And sometimes destroy would fail:<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:05:34 2010 [LCM][I]: New VM state is SAVE_STOP<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:05:34 2010 [VMM][I]: Command execution fail:
'touch /var/lib/one/585/images/checkpoint;virsh --connect qemu:///system save
one-585 /var/lib/one/585/images/checkpoint'<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:05:34 2010 [VMM][I]: STDERR follows.<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:05:34 2010 [VMM][I]: error: unable to connect to
'/var/run/libvirt/libvirt-sock', libvirtd may need to be started: Permission
denied<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:05:34 2010 [VMM][I]: error: failed to connect to
the hypervisor<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:05:34 2010 [VMM][I]: ExitCode: 1<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:05:34 2010 [VMM][E]: Error saving VM state, -<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:05:35 2010 [LCM][I]: Fail to save VM state.
Assuming that the VM is still RUNNING (will poll VM).<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:05:38 2010 [VMM][I]: Command execution fail: virsh
--connect qemu:///system dominfo one-585<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:05:38 2010 [VMM][I]: STDERR follows.<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:05:38 2010 [VMM][I]: error: unable to connect to
'/var/run/libvirt/libvirt-sock', libvirtd may need to be started: Permission
denied<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:05:38 2010 [VMM][I]: error: failed to connect to
the hypervisor<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:05:38 2010 [VMM][I]: ExitCode: 1<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:05:38 2010 [VMM][E]: Error monitoring VM, -<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>…trying like 10 times …<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:09:14 2010 [VMM][E]: Error monitoring VM, -<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:09:56 2010 [LCM][I]: New VM state is SAVE_STOP<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:09:56 2010 [VMM][I]: Command execution fail:
'touch /var/lib/one/585/images/checkpoint;virsh --connect qemu:///system save
one-585 /var/lib/one/585/images/checkpoint'<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:09:56 2010 [VMM][I]: STDERR follows.<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:09:56 2010 [VMM][I]: error: unable to connect to
'/var/run/libvirt/libvirt-sock', libvirtd may need to be started: Permission
denied<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:09:56 2010 [VMM][I]: error: failed to connect to
the hypervisor<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:09:56 2010 [VMM][I]: ExitCode: 1<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:09:56 2010 [VMM][E]: Error saving VM state, -<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:09:56 2010 [LCM][I]: Fail to save VM state.
Assuming that the VM is still RUNNING (will poll VM).<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:09:56 2010 [VMM][I]: Command execution fail: virsh
--connect qemu:///system dominfo one-585<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:09:56 2010 [VMM][I]: STDERR follows.<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:09:56 2010 [VMM][I]: error: unable to connect to
'/var/run/libvirt/libvirt-sock', libvirtd may need to be started: Permission
denied<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:09:56 2010 [VMM][I]: error: failed to connect to
the hypervisor<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:09:56 2010 [VMM][I]: ExitCode: 1<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:09:56 2010 [VMM][E]: Error monitoring VM, -<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:10:24 2010 [VMM][I]: Command execution fail: virsh
--connect qemu:///system dominfo one-585<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:10:24 2010 [VMM][I]: STDERR follows.<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:10:24 2010 [VMM][I]: error: unable to connect to
'/var/run/libvirt/libvirt-sock', libvirtd may need to be started: Permission
denied<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:10:24 2010 [VMM][I]: error: failed to connect to
the hypervisor<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:10:24 2010 [VMM][I]: ExitCode: 1<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:10:24 2010 [VMM][E]: Error monitoring VM, -<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:10:45 2010 [DiM][I]: New VM state is DONE<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:10:45 2010 [VMM][W]: Ignored: LOG - 585 Driver
command for 585 cancelled<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:10:45 2010 [TM][W]: Ignored: LOG - 585
tm_delete.sh: Deleting /var/lib/one/585/images<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:10:45 2010 [TM][W]: Ignored: LOG - 585
tm_delete.sh: Executed "ssh node13-one rm -rf
/var/lib/one/585/images".<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:10:45 2010 [TM][W]: Ignored: TRANSFER SUCCESS 585
-<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:10:45 2010 [VMM][W]: Ignored: LOG - 585 Command
execution fail: virsh --connect qemu:///system destroy one-585<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:10:45 2010 [VMM][W]: Ignored: LOG - 585 STDERR
follows.<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:10:45 2010 [VMM][W]: Ignored: LOG - 585 error:
unable to connect to '/var/run/libvirt/libvirt-sock', libvirtd may need to be
started: Permission denied<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:10:45 2010 [VMM][W]: Ignored: LOG - 585 error:
failed to connect to the hypervisor<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:10:45 2010 [VMM][W]: Ignored: LOG - 585 ExitCode:
1<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Wed Jul 28 13:10:45 2010 [VMM][W]: Ignored: CANCEL FAILURE 585 -<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<div>
<div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm'>
<p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span
style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>
users-bounces@lists.opennebula.org [mailto:users-bounces@lists.opennebula.org] <b>On
Behalf Of </b>Floris Sluiter<br>
<b>Sent:</b> maandag 19 juli 2010 18:18<br>
<b>To:</b> 'Tino Vazquez'; DuDu<br>
<b>Cc:</b> users@lists.opennebula.org<br>
<b>Subject:</b> Re: [one-users] oned hang<o:p></o:p></span></p>
</div>
</div>
<p class=MsoNormal><o:p> </o:p></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Hi Dudu, Tino and all,<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>We have seen the exact same message (</span>Command execution
fail and bad interpreter: Text file busy))<span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:#1F497D'> on our cluster last week when
we expanded it from 12 to 16 hosts (with add host)and deploying 10 Vmachines at
the same time. We did not have multiple instances of opennebula running, we
only added to a running one, so it is unlikely that was the issue (the
cluster was already running stable for a while). We investigated and thought it
was a timing issue with the monitoring (ssh) driver set to 60 seconds and
having many hosts and many VMs. <o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>We started using the ssh-monitoring driver again in after the
latest update to opennebula, before that we used our in hous developed snmp
monitoring driver. <o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>When we deployed our snmp driver, the error message stopped and
for the last week we have a stable cloud again, now with 16 hosts…<o:p></o:p></span></p>
<p><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>For
people who think see the same timing issues as we did , the snmp_driver is
available in the ecosystem (but make sure you know what snmp is before you try
;-)): </span><span style='font-size:9.5pt;font-family:"Verdana","sans-serif";
color:#484848'><a href="http://opennebula.org/software:ecosystem:snmp_im_driver">http://opennebula.org/software:ecosystem:snmp_im_driver</a><o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Regards,<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Floris <o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>HPC project leader<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Sara<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm'>
<p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span
style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>
users-bounces@lists.opennebula.org [mailto:users-bounces@lists.opennebula.org] <b>On
Behalf Of </b>Tino Vazquez<br>
<b>Sent:</b> maandag 19 juli 2010 16:15<br>
<b>To:</b> DuDu<br>
<b>Cc:</b> users@lists.opennebula.org<br>
<b>Subject:</b> Re: [one-users] oned hang<o:p></o:p></span></p>
</div>
<p class=MsoNormal><o:p> </o:p></p>
<p class=MsoNormal>Dear DuDu,<o:p></o:p></p>
<div>
<p class=MsoNormal><o:p> </o:p></p>
</div>
<div>
<p class=MsoNormal>This happens when two monitorization actions take place at
the same time.<o:p></o:p></p>
</div>
<div>
<p class=MsoNormal><o:p> </o:p></p>
</div>
<div>
<p class=MsoNormal>First thing, which OpenNebula version are you using?<o:p></o:p></p>
</div>
<div>
<p class=MsoNormal><o:p> </o:p></p>
</div>
<div>
<p class=MsoNormal>Are you per chance running two OpenNebula instances? Did you
change the host polling time?<o:p></o:p></p>
</div>
<div>
<p class=MsoNormal><o:p> </o:p></p>
</div>
<div>
<p class=MsoNormal>Regards,<o:p></o:p></p>
</div>
<div>
<p class=MsoNormal><o:p> </o:p></p>
</div>
<div>
<p class=MsoNormal>-Tino<o:p></o:p></p>
</div>
<div>
<p class=MsoNormal style='margin-bottom:12.0pt'><br clear=all>
--<br>
Constantino Vázquez Blanco | <a href="http://dsa-research.org/tinova">dsa-research.org/tinova</a><br>
Virtualization Technology Engineer / Researcher<br>
OpenNebula Toolkit | <a href="http://opennebula.org">opennebula.org</a><o:p></o:p></p>
<div>
<p class=MsoNormal>On Wed, Jul 14, 2010 at 3:13 PM, DuDu <<a
href="mailto:blackass@gmail.com">blackass@gmail.com</a>> wrote:<o:p></o:p></p>
<div>
<p class=MsoNormal><o:p> </o:p></p>
</div>
<div>
<p class=MsoNormal>Hi,<o:p></o:p></p>
</div>
<div>
<p class=MsoNormal><o:p> </o:p></p>
</div>
<div>
<p class=MsoNormal>We deployed a small cluster of opennebula, with 8 hosts. It
is the default opennebula installation, however, we found that after several
days of running, oned hung. All CLI commands hang too. No new logs generated in
one_xmlrpc.log. And there are quite some error message like the following in
oned.log:<o:p></o:p></p>
</div>
<div>
<p class=MsoNormal><o:p> </o:p></p>
</div>
<div>
<p class=MsoNormal>[root@vm-container-31-0 logdir]# tail oned.log<br>
Wed Jul 14 14:51:02 2010 [InM][I]: Warning: untrusted X11 forwarding setup failed:
xauth key data not generated<br>
Wed Jul 14 14:51:02 2010 [InM][I]: Warning: No xauth data; using fake
authentication data for X11 forwarding.<br>
Wed Jul 14 14:51:02 2010 [InM][I]: bash:
/tmp/one-im//one_im-c4718299a313d89398ea693104dcce5f: /bin/sh: bad interpreter:
Text file busy<br>
Wed Jul 14 14:51:02 2010 [InM][I]: ExitCode: 126<br>
Wed Jul 14 14:51:02 2010 [InM][I]: Command execution fail: 'mkdir -p
/tmp/one-im/; cat > /tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822; if
[ "x$?" != "x0" ]; then exit -1; fi; chmod +x
/tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822;
/tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822'<br>
Wed Jul 14 14:51:02 2010 [InM][I]: STDERR follows.<br>
Wed Jul 14 14:51:02 2010 [InM][I]: Warning: untrusted X11 forwarding setup
failed: xauth key data not generated<br>
Wed Jul 14 14:51:02 2010 [InM][I]: Warning: No xauth data; using fake
authentication data for X11 forwarding.<br>
Wed Jul 14 14:51:02 2010 [InM][I]: bash:
/tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822: /bin/sh: bad interpreter:
Text file busy<br>
Wed Jul 14 14:51:02 2010 [InM][I]: ExitCode: 126<o:p></o:p></p>
</div>
<div>
<p class=MsoNormal><o:p> </o:p></p>
</div>
<div>
<p class=MsoNormal>We have to sigkill oned and restart it. And that solves all
problems.<o:p></o:p></p>
</div>
<div>
<p class=MsoNormal><o:p> </o:p></p>
</div>
<div>
<p class=MsoNormal>Any idea of this?<o:p></o:p></p>
</div>
<div>
<p class=MsoNormal><o:p> </o:p></p>
</div>
<div>
<p class=MsoNormal style='margin-bottom:12.0pt'>Thanks!<o:p></o:p></p>
</div>
<p class=MsoNormal style='margin-bottom:12.0pt'><br>
_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@lists.opennebula.org">Users@lists.opennebula.org</a><br>
<a href="http://lists.opennebula.org/listinfo.cgi/users-opennebula.org"
target="_blank">http://lists.opennebula.org/listinfo.cgi/users-opennebula.org</a><o:p></o:p></p>
</div>
<p class=MsoNormal><o:p> </o:p></p>
</div>
</div>
</body>
</html>