Hi Fabian,<br><br>The automatic reconnection bug was solved for OpenNebula 3.0 [1].<br>I agree with you that a lost DB connection could lead to inconsistencies... but I don't think that's the case here, otherwise everything would stop working, even the onehost list command.<br>
<br>Regards.<br><br>[1] <a href="http://dev.opennebula.org/issues/408">http://dev.opennebula.org/issues/408</a><br clear="all"><span style="border-collapse:collapse;color:rgb(136, 136, 136);font-family:arial,sans-serif;font-size:13px">--<br>
Carlos Martín, MSc</span><font color="#888888"><br>Project Engineer</font><br><span style="border-collapse:collapse;color:rgb(136, 136, 136);font-family:arial, sans-serif;font-size:13px"><span style="background-color:rgb(255, 255, 204);color:rgb(34, 34, 34);background-repeat:initial initial">OpenNebula</span> - The Open Source Toolkit for Cloud Computing<br>
<a href="http://www.opennebula.org/" style="color:rgb(42, 93, 176)" target="_blank">www.<span style="background-color:rgb(255, 255, 204);color:rgb(34, 34, 34);background-repeat:initial initial">OpenNebula</span>.org</a> | <a href="mailto:cmartin@opennebula.org" style="color:rgb(42, 93, 176)" target="_blank">cmartin@<span style="background-color:rgb(255, 255, 204);color:rgb(34, 34, 34);background-repeat:initial initial">opennebula</span>.org</a></span><br>
<br><br><div class="gmail_quote">On Tue, Sep 27, 2011 at 4:14 PM, Fabian Wenk <span dir="ltr"><<a href="mailto:fabian@wenks.ch">fabian@wenks.ch</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Hello<div class="im"><br>
<br>
On <a href="tel:26.09.2011%2013" value="+12609201113" target="_blank">26.09.2011 13</a>:44, Carlos Martín Sánchez wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
The host_shares contains the "running_vms" column; you need to update that<br>
column value with OpenNebula stopped.<br>
<br>
We are still trying to figure out what causes this bug, so if you come<br>
across it again, it would be great if you could write down the operations<br>
that led to it.<br>
</blockquote>
<br></div>
I do not know if this is related or not, but I guess it could be an indication.<br>
<br>
I am running OpenNebula 2.2.1 with MySQL database. I did just restart mysqld and now all the one* commands report errors like this:<br>
<br>
# onevm list<br>
[VirtualMachinePoolInfo] Error getting VM Pool.<br>
<br>
In oned.log I see the following messages (regarding the 'onevm list' command):<br>
<br>
Tue Sep 27 13:47:20 2011 [ReM][D]: VirtualMachinePoolInfo method invoked<br>
Tue Sep 27 13:47:20 2011 [ONE][E]: SQL command was: SELECT vm_pool.oid, vm_pool.uid, <a href="http://vm_pool.name" target="_blank">vm_pool.name</a>, vm_pool.last_poll, vm_pool.state, vm_pool.lcm_state, vm_pool.stime, vm_pool.etime, vm_pool.deploy_id, vm_pool.memory, vm_pool.cpu, vm_pool.net_tx, vm_pool.net_rx, vm_pool.last_seq, vm_pool.template, user_pool.user_name, history.vid, history.seq, history.host_name, history.vm_dir, history.hid, history.vm_mad, history.tm_mad, history.stime, history.etime, history.pstime, history.petime, history.rstime, history.retime, history.estime, history.eetime, history.reason FROM vm_pool LEFT OUTER JOIN history ON vm_pool.oid = history.vid AND history.seq = vm_pool.last_seq LEFT OUTER JOIN (SELECT oid,user_name FROM user_pool) AS user_pool ON vm_pool.uid = user_pool.oid WHERE vm_pool.state <> 6, error 2006 : MySQL server has gone away<br>
Tue Sep 27 13:47:20 2011 [ReM][E]: [VirtualMachinePoolInfo] Error getting VM Pool.<br>
<br>
And some other general messages, probably from monitoring:<br>
<br>
Tue Sep 27 13:47:13 2011 [ONE][E]: SQL command was: SELECT oid, im_mad FROM host_pool WHERE state != 4 ORDER BY last_mon_time ASC LIMIT 15, error 2006 : MySQL server has gone away<br>
Tue Sep 27 13:47:13 2011 [ONE][E]: SQL command was: SELECT oid FROM vm_pool WHERE last_poll <= 1317130633 and state = 3 and ( lcm_state = 3 or lcm_state = 16 ) ORDER BY last_poll ASC LIMIT 5, error 2006 : MySQL server has gone away<br>
<br>
For some reason oned does not re-connect to the MySQL server. I do not know how this is implemented (or if this is something which depends on my system), but I think if the mysql library is used, the reconnect should be automatically and transparently. A still running mysql client after the restart of mysqld does handle this just fine and transparently (with just an informational message):<br>
<br>
mysql> show databases;<br>
ERROR 2006 (HY000): MySQL server has gone away<br>
No connection. Trying to reconnect...<br>
Connection id: 1<br>
Current database: *** NONE ***<br>
<br>
+--------------------+<br>
| Database |<br>
+--------------------+<br>
| information_schema |<br>
<br>
<br>
After also restarting OpenNebula (oned, scheduler), everything seems to work fine again. But I guess, if for some reason mysqld is down (or is going done) at the wrong moment, the database could not have saved all the needed information. Eg. in the moment when scheduler is deploying a VM to a cluster node. Could something like this cause the reporting errors Steve is seeing?<br>
<br>
<br>
bye<br><font color="#888888">
Fabian</font><div><div></div><div class="h5"><br>
______________________________<u></u>_________________<br>
Users mailing list<br>
<a href="mailto:Users@lists.opennebula.org" target="_blank">Users@lists.opennebula.org</a><br>
<a href="http://lists.opennebula.org/listinfo.cgi/users-opennebula.org" target="_blank">http://lists.opennebula.org/<u></u>listinfo.cgi/users-opennebula.<u></u>org</a><br>
</div></div></blockquote></div><br>