[one-users] Unknown state
Ruben S. Montero
rubensm at dacya.ucm.es
Wed Oct 27 13:22:44 PDT 2010
Hi,
This is the problem:
A shutdown action is triggered and shutdown process starts. The VM can not
proceed to the next state (epilog, i.e. image transfer) until the VM has
been totally stopped. Note that start a file movement operation while the VM
is shutting down will lead to a corrupted image. Also note that the shutdown
operation succeed instantaneously, well before the VM is really shutdown.
Solution:
The driver has to notify a shutdown success when the VM has been completely
shutdown, i.e. it disappears from the hypervisor. Now, we need a timeout or
we may end up waiting for ever for the VM to shutdown (e.g. if the VM has no
acpi).
About reverting to running:
We assume that a VM is valuable so we rather be play safe . If the VM
returns to running, it will be monitored and it is not really there it will
be moved to unknown.
Cheers
Ruben
On Wed, Oct 27, 2010 at 5:06 PM, Rich Wellner <rkw at objenv.com> wrote:
> Yeah, even better. I like this idea if there has to be a timeout.
>
> Though the more I think about it, the less I'm sure I understand why the
> timeout needs to exist nor why the state reverts to Running instead of
> Unknown once it triggers. Seems like maybe the state model needs another
> node "Shutdown Failed" or something for when the guest fails to disappear
> (for example if acpid isn't installed). Otherwise an administrator looking
> at 'onevm list' doesn't get a complete picture of how the current state
> differs from the desired state.
>
> So, what was the use case for having the time out in the first place?
>
> rw2
>
>
> On 10/27/10 9:50 AM, Igor Rosenberg wrote:
>
>> Humble opinion: shutdown-time is VM specific. I may have, running
>> concurrently, an image of shutdown time ~ 10s (a tiny linux), and another
>> with shutdown time ~ 5 minutes or more (fat J2EE container with remote DB
>> dependencies).
>>
>> The shutdown-time would typically be known by the person who created
>> originally the VM image. So can this information be embed in the image
>> itself? If not, the VM template would be another likely place. But having a
>> maximum value for all possible images may create problems as VMs grow in
>> embed service complexity.
>>
>> -----Original Message-----
>> From: users-bounces at lists.opennebula.org [mailto:
>> users-bounces at lists.opennebula.org] On Behalf Of Rich Wellner
>> Sent: miércoles, 27 de octubre de 2010 16:19
>> To: Tino Vazquez
>> Cc: users at lists.opennebula.org
>> Subject: Re: [one-users] Unknown state
>>
>> Ok, I'll check that out. FYI: my RHEL 5.5 machines on reasonably
>> capable hardware take longer than that default. Might be worth
>> considering a longer default.
>>
>> rw2
>>
>> On 10/27/10 8:33 AM, Tino Vazquez wrote:
>>
>>> Hi Rich,
>>>
>>> OpenNebula ceases its monitoring when the VM enters the shutdown
>>> state. What is probably happening is that the VM takes more time to
>>> shutdown than the default timeout, which is 40 seconds (20 iterations
>>> over a 2 seconds sleep), so for OpenNebula is like if the shutdown
>>> failed. This timeout default can be adjusted in
>>> $ONE_LOCATION/bin/remotes/vmm/kvm/shutdown.
>>>
>>> Best regards,
>>>
>>> -Tino
>>>
>>> --
>>> Constantino Vázquez Blanco | dsa-research.org/tinova
>>> Virtualization Technology Engineer / Researcher
>>> OpenNebula Toolkit | opennebula.org
>>>
>>>
>>>
>>> On Wed, Oct 27, 2010 at 1:08 AM, Rich Wellner<rkw at objenv.com> wrote:
>>>
>>>> Hey guys,
>>>>
>>>> I have monitoring turned down to a minute so that I don't have much
>>>> latency
>>>> on my management while we're doing testing. As a result, when I do a
>>>> shutdown on a vm sometimes the shutdown isn't complete before the next
>>>> monitoring update. What ends up happening is that the state of the
>>>> machine
>>>> goes from running to shutdown, then a bit later to running again.
>>>> Finally,
>>>> when the guest shutdown actually complete, the state goes to unknow
>>>> because
>>>> One doesn't know why the machine disappeared.
>>>>
>>>> It would be better if this race condition were handled more elegantly
>>>> and
>>>> One could tolerate that the machine took a while to shutdown. As is a
>>>> manual clean-up has to happen. I have also confirmed that my one minute
>>>> monitor cycle only makes the problem more likely. If, by coincidence,
>>>> someone asks One to shutdown a vm slightly before the monitor thread
>>>> kicks
>>>> off, this issue shows up. So it seems any machine that is shutdown
>>>> where
>>>> timeToShutdown> timeUntilMonitorRefresh will end up in an unknown
>>>> state.
>>>>
>>>> rw2
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at lists.opennebula.org
>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>
>>>>
>>>> _______________________________________________
>> Users mailing list
>> Users at lists.opennebula.org
>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>> ------------------------------------------------------------------
>> This e-mail and the documents attached are confidential and intended
>> solely for the addressee; it may also be privileged. If you receive
>> this e-mail in error, please notify the sender immediately and destroy it.
>> As its integrity cannot be secured on the Internet, the Atos Origin
>> group liability cannot be triggered for the message content. Although
>> the sender endeavours to maintain a computer virus-free network,
>> the sender does not warrant that this transmission is virus-free and
>> will not be liable for any damages resulting from any virus transmitted.
>>
>> Este mensaje y los ficheros adjuntos pueden contener informacion
>> confidencial
>> destinada solamente a la(s) persona(s) mencionadas anteriormente
>> pueden estar protegidos por secreto profesional.
>> Si usted recibe este correo electronico por error, gracias por informar
>> inmediatamente al remitente y destruir el mensaje.
>> Al no estar asegurada la integridad de este mensaje sobre la red, Atos
>> Origin
>> no se hace responsable por su contenido. Su contenido no constituye ningun
>> compromiso para el grupo Atos Origin, salvo ratificacion escrita por ambas
>> partes.
>> Aunque se esfuerza al maximo por mantener su red libre de virus, el emisor
>> no puede garantizar nada al respecto y no sera responsable de cualesquiera
>> danos que puedan resultar de una transmision de virus.
>> ------------------------------------------------------------------
>>
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
--
Dr. Ruben Santiago Montero
Associate Professor (Profesor Titular), Complutense University of Madrid
URL: http://dsa-research.org/doku.php?id=people:ruben
Weblog: http://blog.dsa-research.org/?author=7
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20101027/358fe97f/attachment-0002.htm>
More information about the Users
mailing list