[one-users] A resumption failure results in the deletion of images

Shi Jin jinzishuai at gmail.com
Tue Jun 22 21:37:59 PDT 2010


Thank you very much.

Shi

On Mon, Jun 21, 2010 at 12:56 PM, Javier Fontan <jfontan at gmail.com> wrote:
> I have created a ticket to track this problem.
> http://dev.opennebula.org/issues/265
>
> On Wed, Jun 16, 2010 at 12:08 AM, Shi Jin <jinzishuai at gmail.com> wrote:
>> Hi there,
>>
>> I recently had a very serious problem.
>> I called "onevm stop" on a VM to hiberate the VM into checkpoint file.
>> Then I tried to call "onevm resume" to bring it back online.
>> However, the resumption progress went wrong.
>> There can be several reasons for it to go wrong.
>> For example, libvirt would fail if there is another volume attached to it.
>> But this is not relevant to this thread (I am planning on starting a
>> new one on this soon).
>> The key point here is that, as soon as the restore fails, the
>> OpenNebula code triggers the DEPLOY_FAILURE LCM.
>> This can be found at src/vmm/VirtualMachineManagerDriver.cc
>> 399     else if ( action == "RESTORE" )
>> 400     {
>> 401         Nebula              &ne  = Nebula::instance();
>> 402         LifeCycleManager    *lcm = ne.get_lcm();
>> 403
>> 404         if (result == "SUCCESS")
>> 405         {
>> 406             lcm->trigger(LifeCycleManager::DEPLOY_SUCCESS, id);
>> 407         }
>> 408         else
>> 409         {
>> 410             string          info;
>> 411
>> 412             getline(is,info);
>> 413
>> 414             os.str("");
>> 415             os << "Error restoring VM, " << info;
>> 416
>> 417             vm->log("VMM",Log::ERROR,os);
>> 418
>> 419             lcm->trigger(LifeCycleManager::DEPLOY_FAILURE, id);
>> 420         }
>> 421     }
>>
>>
>> The LCM would eventually delete the images directory and the user
>> would lost all the precious data he/she has obtained so far and there
>> is no way to get it back!
>>
>> So I desperately need to prevent OpenNebula from deleting  the precious images.
>> A quick hack I did was to comment out the line 419 above so that the
>> LCM is not triggered at all. But I am sure this is not clean and we
>> need more than this.
>> I am thinking maybe one needs a way to separate a fresh booting VM and
>> a resumption VM. For now, they are no different to OpenNebula and are
>> both in the BOOT State.
>> So please let me know if what I reported is a bug and if this can be
>> fixed in the future.
>> I could submit this on the dev site as well.
>> Thank you very much.
>>
>> Shi
>>
>> --
>> Shi Jin, Ph.D.
>> _______________________________________________
>> Users mailing list
>> Users at lists.opennebula.org
>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>
>
>
>
> --
> Javier Fontan, Grid & Virtualization Technology Engineer/Researcher
> DSA Research Group: http://dsa-research.org
> Globus GridWay Metascheduler: http://www.GridWay.org
> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>



-- 
Shi Jin, Ph.D.



More information about the Users mailing list