[one-users] VM deleted when fails to get domain

Ruben S. Montero rubensm at dacya.ucm.es
Mon Jun 15 09:18:42 PDT 2009


BTW,

In the meantime you can get rid of this behavior by commenting line
566 in src/vmm/VirtualMachineManagerDriver.cc

                case 'd': //The VM was not found
                    os.str("");
                    os  << "VM running but it was not found. Restart
and delete ";
                    os  << "actions available or try to recover it manually";
                    vm->log("VMM",Log::INFO,os);

                    //lcm->trigger(LifeCycleManager::MONITOR_DONE, id);

                    break;

This would leave the Vm in RUNNING state...
Cheers

2009/6/15 Fermín Manzanedo Guzmán <fmanzanedo at jetmultimedia.es>:
> Hi Rubén,
>
> Searched by google this issue but I didn't found anything :)
>
> Your solution fits so well in our environment (Now when a VM dissapears
> we have to recover it from our NAS snapshot (fortunately we have a
> snapshot filesystem) ).
>
> Thanks a lot.
>
> Regards.
>
> Ruben S. Montero escribió:
>> Hi Fermín,
>>
>> Yes this is a known issue, see the issues:
>>
>> http://dev.opennebula.org/issues/75
>> http://dev.opennebula.org/issues/91
>> http://dev.opennebula.org/issues/96
>>
>> After some discussion the solution implemented is:
>>   1.- Do not remove VMs that are not reported by the underlying hypervisor
>>   2.- Set the VM in an special state (called UNKNOWN)
>>   3.- Let the sysadmin either recover the VM manually (it will appear
>> in the next polling)
>> or issue a restart request to start the VM again (note that  the VM is
>> alredy defined and the images are in place)
>>   4.- The user may also decide MANUALLY to delete the VM
>>
>> This has been implemented in one of the development branches, and will
>> be included in the next release (by the end of the month we will
>> release the first beta.)
>>
>> Please let me known if the previous solution makes sense in your
>> environment, as we could improve it before the code freeze.
>>
>> Cheers
>>
>> Ruben
>>
>>
>> 2009/6/15 Fermín Manzanedo Guzmán <fmanzanedo at jetmultimedia.es>:
>>
>>> Hi all.
>>>
>>> We've our cloud up and running, but sometimes our VM images dissapear
>>> without an aparent reason. It seems like if ONE doesn't connect to the
>>> domain and then executes the delete command. This is the log for two VMs
>>> deleted today:
>>>
>>> Mon Jun 15 12:35:57 2009 [VMM][I]: Monitoring VM 134.
>>> Mon Jun 15 12:36:01 2009 [VMM][D]: Message received: LOG - 134 Connecting to
>>> uri: qemu:///system
>>> Mon Jun 15 12:36:01 2009 [VMM][D]: Message received: LOG - 134 error: failed
>>> to get domain 'one-134'
>>> Mon Jun 15 12:36:01 2009 [VMM][D]: Message received: LOG - 134 error: Domain
>>> not found
>>> Mon Jun 15 12:36:01 2009 [VMM][D]: Message received: LOG - 134 ExitCode: 1
>>> Mon Jun 15 12:36:01 2009 [VMM][D]: Message received: POLL SUCCESS 134
>>> STATE=d
>>> Mon Jun 15 12:36:02 2009 [TM][D]: Message received: LOG - 134 tm_delete.sh:
>>> Deleting /VM//134/images
>>> Mon Jun 15 12:36:02 2009 [TM][D]: Message received: LOG - 134 tm_delete.sh:
>>> Executed "rm -rf /VM//134/images".
>>> Mon Jun 15 12:36:02 2009 [TM][D]: Message received: TRANSFER SUCCESS 134 -
>>> [...]
>>> Mon Jun 15 13:41:06 2009 [VMM][I]: Monitoring VM 132.
>>> Mon Jun 15 13:41:12 2009 [VMM][D]: Message received: LOG - 132 Connecting to
>>> uri: qemu:///system
>>> Mon Jun 15 13:41:12 2009 [VMM][D]: Message received: LOG - 132 ExitCode: 0
>>> Mon Jun 15 13:41:12 2009 [VMM][D]: Message received: POLL SUCCESS 132
>>> USEDMEMORY=524288 STATE=a
>>> Mon Jun 15 13:41:39 2009 [VMM][I]: Monitoring VM 132.
>>> Mon Jun 15 13:41:41 2009 [VMM][D]: Message received: LOG - 132 Connecting to
>>> uri: qemu:///system
>>> Mon Jun 15 13:41:41 2009 [VMM][D]: Message received: LOG - 132 error: failed
>>> to get domain 'one-132'
>>> Mon Jun 15 13:41:41 2009 [VMM][D]: Message received: LOG - 132 error: Domain
>>> not found
>>> Mon Jun 15 13:41:41 2009 [VMM][D]: Message received: LOG - 132 ExitCode: 1
>>> Mon Jun 15 13:41:41 2009 [VMM][D]: Message received: POLL SUCCESS 132
>>> STATE=d
>>> Mon Jun 15 13:41:42 2009 [TM][D]: Message received: LOG - 132 tm_delete.sh:
>>> Deleting /VM//132/images
>>> Mon Jun 15 13:41:42 2009 [TM][D]: Message received: LOG - 132 tm_delete.sh:
>>> Executed "rm -rf /VM//132/images".
>>> Mon Jun 15 13:41:42 2009 [TM][D]: Message received: TRANSFER SUCCESS 132 -
>>>
>>> Is there any way to override the tm_delete execution when get the "Domain
>>> not found" error? Perhaps move the image or save it to another path?
>>>
>>> Thanks in advance.
>>>
>>> --
>>>
>>> Fermin Manzanedo
>>>
>>> Administrador de Sistemas Senior
>>>
>>> Madrid
>>> Parque Empresarial Cristalia,
>>> Vía de los Poblados nº3 Edificio 5, Planta 4ª
>>> 28033 Madrid
>>>
>>> fmanzanedo at jetmultimedia.es
>>> Tel: 902 01 02 01
>>> Fax: 902 01 00 00
>>>
>>> Jet Multimedia España, S.A. ® Aviso legal: Este mensaje electrónico está
>>> dirigido únicamente a la(s) dirección(es) indicadas anteriormente; el
>>> carácter confidencial, personal e intransferible del mismo está protegido
>>> legalmente. Cualquier revelación, uso o reenvío no autorizado, completo o en
>>> parte, está prohibido. Si ha recibido este mensaje por equivocación,
>>> notifíquelo inmediatamente a la persona que lo ha enviado y borre el mensaje
>>> original junto con sus ficheros anexos sin leerlo ni grabarlo, total o
>>> parcialmente. Gracias
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opennebula.org
>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>
>>>
>>>
>>
>>
>>
>>
>
>
>



-- 
+---------------------------------------------------------------+
 Dr. Ruben Santiago Montero
 Associate Professor
 Distributed System Architecture Group (http://dsa-research.org)

 URL:    http://dsa-research.org/doku.php?id=people:ruben
 Weblog: http://blog.dsa-research.org/?author=7

 GridWay, http://www.gridway.org
 OpenNebula, http://www.opennebula.org
+---------------------------------------------------------------+



More information about the Users mailing list