[one-users] A resumption failure results in the deletion of images

Javier Fontan jfontan at gmail.com
Mon Jun 21 09:07:08 PDT 2010


Hello,

It does indeed look like a problem. We are discussing the best way to
solve this. I hope that a workaround for it could be done for the next
version.

Thank you for reporting it.


On Wed, Jun 16, 2010 at 12:08 AM, Shi Jin <jinzishuai at gmail.com> wrote:
> Hi there,
>
> I recently had a very serious problem.
> I called "onevm stop" on a VM to hiberate the VM into checkpoint file.
> Then I tried to call "onevm resume" to bring it back online.
> However, the resumption progress went wrong.
> There can be several reasons for it to go wrong.
> For example, libvirt would fail if there is another volume attached to it.
> But this is not relevant to this thread (I am planning on starting a
> new one on this soon).
> The key point here is that, as soon as the restore fails, the
> OpenNebula code triggers the DEPLOY_FAILURE LCM.
> This can be found at src/vmm/VirtualMachineManagerDriver.cc
> 399     else if ( action == "RESTORE" )
> 400     {
> 401         Nebula              &ne  = Nebula::instance();
> 402         LifeCycleManager    *lcm = ne.get_lcm();
> 403
> 404         if (result == "SUCCESS")
> 405         {
> 406             lcm->trigger(LifeCycleManager::DEPLOY_SUCCESS, id);
> 407         }
> 408         else
> 409         {
> 410             string          info;
> 411
> 412             getline(is,info);
> 413
> 414             os.str("");
> 415             os << "Error restoring VM, " << info;
> 416
> 417             vm->log("VMM",Log::ERROR,os);
> 418
> 419             lcm->trigger(LifeCycleManager::DEPLOY_FAILURE, id);
> 420         }
> 421     }
>
>
> The LCM would eventually delete the images directory and the user
> would lost all the precious data he/she has obtained so far and there
> is no way to get it back!
>
> So I desperately need to prevent OpenNebula from deleting  the precious images.
> A quick hack I did was to comment out the line 419 above so that the
> LCM is not triggered at all. But I am sure this is not clean and we
> need more than this.
> I am thinking maybe one needs a way to separate a fresh booting VM and
> a resumption VM. For now, they are no different to OpenNebula and are
> both in the BOOT State.
> So please let me know if what I reported is a bug and if this can be
> fixed in the future.
> I could submit this on the dev site as well.
> Thank you very much.
>
> Shi
>
> --
> Shi Jin, Ph.D.
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>



-- 
Javier Fontan, Grid & Virtualization Technology Engineer/Researcher
DSA Research Group: http://dsa-research.org
Globus GridWay Metascheduler: http://www.GridWay.org
OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org



More information about the Users mailing list