[one-users] ONE 2.0 underreporting running VM's in onehost list

Carlos Martín Sánchez cmartin at opennebula.org
Tue Oct 11 03:09:32 PDT 2011


Hi Fabian,

The automatic reconnection bug was solved for OpenNebula 3.0 [1].
I agree with you that a lost DB connection could lead to inconsistencies...
but I don't think that's the case here, otherwise everything would stop
working, even the onehost list command.

Regards.

[1] http://dev.opennebula.org/issues/408
--
Carlos Martín, MSc
Project Engineer
OpenNebula - The Open Source Toolkit for Cloud Computing
www.OpenNebula.org <http://www.opennebula.org/> | cmartin at opennebula.org


On Tue, Sep 27, 2011 at 4:14 PM, Fabian Wenk <fabian at wenks.ch> wrote:

> Hello
>
>
> On 26.09.2011 13:44, Carlos Martín Sánchez wrote:
>
>> The host_shares contains the "running_vms" column; you need to update that
>> column value with OpenNebula stopped.
>>
>> We are still trying to figure out what causes this bug, so if you come
>> across it again, it would be great if you could write down the operations
>> that led to it.
>>
>
> I do not know if this is related or not, but I guess it could be an
> indication.
>
> I am running OpenNebula 2.2.1 with MySQL database. I did just restart
> mysqld and now all the one* commands report errors like this:
>
> # onevm list
> [VirtualMachinePoolInfo] Error getting VM Pool.
>
> In oned.log I see the following messages (regarding the 'onevm list'
> command):
>
> Tue Sep 27 13:47:20 2011 [ReM][D]: VirtualMachinePoolInfo method invoked
> Tue Sep 27 13:47:20 2011 [ONE][E]: SQL command was: SELECT vm_pool.oid,
> vm_pool.uid, vm_pool.name, vm_pool.last_poll, vm_pool.state,
> vm_pool.lcm_state, vm_pool.stime, vm_pool.etime, vm_pool.deploy_id,
> vm_pool.memory, vm_pool.cpu, vm_pool.net_tx, vm_pool.net_rx,
> vm_pool.last_seq, vm_pool.template, user_pool.user_name, history.vid,
> history.seq, history.host_name, history.vm_dir, history.hid, history.vm_mad,
> history.tm_mad, history.stime, history.etime, history.pstime,
> history.petime, history.rstime, history.retime, history.estime,
> history.eetime, history.reason FROM vm_pool LEFT OUTER JOIN history ON
> vm_pool.oid = history.vid AND history.seq = vm_pool.last_seq LEFT OUTER JOIN
> (SELECT oid,user_name FROM user_pool) AS user_pool ON vm_pool.uid =
> user_pool.oid WHERE vm_pool.state <> 6, error 2006 : MySQL server has gone
> away
> Tue Sep 27 13:47:20 2011 [ReM][E]: [VirtualMachinePoolInfo] Error getting
> VM Pool.
>
> And some other general messages, probably from monitoring:
>
> Tue Sep 27 13:47:13 2011 [ONE][E]: SQL command was: SELECT oid, im_mad FROM
> host_pool WHERE state != 4 ORDER BY last_mon_time ASC LIMIT 15, error 2006 :
> MySQL server has gone away
> Tue Sep 27 13:47:13 2011 [ONE][E]: SQL command was: SELECT oid FROM vm_pool
> WHERE last_poll <= 1317130633 and state = 3 and ( lcm_state = 3 or lcm_state
> = 16 ) ORDER BY last_poll ASC LIMIT 5, error 2006 : MySQL server has gone
> away
>
> For some reason oned does not re-connect to the MySQL server. I do not know
> how this is implemented (or if this is something which depends on my
> system), but I think if the mysql library is used, the reconnect should be
> automatically and transparently. A still running mysql client after the
> restart of mysqld does handle this just fine and transparently (with just an
> informational message):
>
> mysql> show databases;
> ERROR 2006 (HY000): MySQL server has gone away
> No connection. Trying to reconnect...
> Connection id:    1
> Current database: *** NONE ***
>
> +--------------------+
> | Database           |
> +--------------------+
> | information_schema |
>
>
> After also restarting OpenNebula (oned, scheduler), everything seems to
> work fine again. But I guess, if for some reason mysqld is down (or is going
> done) at the wrong moment, the database could not have saved all the needed
> information. Eg. in the moment when scheduler is deploying a VM to a cluster
> node. Could something like this cause the reporting errors Steve is seeing?
>
>
> bye
> Fabian
>
> ______________________________**_________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/**listinfo.cgi/users-opennebula.**org<http://lists.opennebula.org/listinfo.cgi/users-opennebula.org>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20111011/2d7091ab/attachment-0002.htm>


More information about the Users mailing list