[one-users] A resumption failure results in the deletion of images
Shi Jin
jinzishuai at gmail.com
Tue Jun 15 15:08:48 PDT 2010
Hi there,
I recently had a very serious problem.
I called "onevm stop" on a VM to hiberate the VM into checkpoint file.
Then I tried to call "onevm resume" to bring it back online.
However, the resumption progress went wrong.
There can be several reasons for it to go wrong.
For example, libvirt would fail if there is another volume attached to it.
But this is not relevant to this thread (I am planning on starting a
new one on this soon).
The key point here is that, as soon as the restore fails, the
OpenNebula code triggers the DEPLOY_FAILURE LCM.
This can be found at src/vmm/VirtualMachineManagerDriver.cc
399 else if ( action == "RESTORE" )
400 {
401 Nebula &ne = Nebula::instance();
402 LifeCycleManager *lcm = ne.get_lcm();
403
404 if (result == "SUCCESS")
405 {
406 lcm->trigger(LifeCycleManager::DEPLOY_SUCCESS, id);
407 }
408 else
409 {
410 string info;
411
412 getline(is,info);
413
414 os.str("");
415 os << "Error restoring VM, " << info;
416
417 vm->log("VMM",Log::ERROR,os);
418
419 lcm->trigger(LifeCycleManager::DEPLOY_FAILURE, id);
420 }
421 }
The LCM would eventually delete the images directory and the user
would lost all the precious data he/she has obtained so far and there
is no way to get it back!
So I desperately need to prevent OpenNebula from deleting the precious images.
A quick hack I did was to comment out the line 419 above so that the
LCM is not triggered at all. But I am sure this is not clean and we
need more than this.
I am thinking maybe one needs a way to separate a fresh booting VM and
a resumption VM. For now, they are no different to OpenNebula and are
both in the BOOT State.
So please let me know if what I reported is a bug and if this can be
fixed in the future.
I could submit this on the dev site as well.
Thank you very much.
Shi
--
Shi Jin, Ph.D.
More information about the Users
mailing list