[one-users] wrong restart -> delete disk image!

Carlos Martín Sánchez cmartin at opennebula.org
Thu Sep 8 07:08:18 PDT 2011


Hi Samuel,

It is safe to change the code, just a couple of comments:

Before stopping OpenNebula, check that there are not any VMs in a transient
state (migrating, saving, etc.).
Then stop it, backup /var/lib/one (or $ONE_LOCATION/var), and use the '-k'
option of install.sh to keep your current etc folder.

Also take into account that we haven't tested the proposed solution.

Regards.
--
Carlos Martín, MSc
Project Major Contributor
OpenNebula - The Open Source Toolkit for Cloud Computing
www.OpenNebula.org <http://www.opennebula.org/> | cmartin at opennebula.org


On Thu, Sep 8, 2011 at 1:32 PM, samuel <samu60 at gmail.com> wrote:

> Thank you very much!
>
> Is it safe to manually change the code and just perform a ./install.sh from
> the sources on a running installation? I'm using Mysql backend so I expect
> that the modification of the sources will only affect the compillation of
> the modified library and the rest will continue working ok.
>
> Am I right?
>
> I really appreciate the fast response.
>
> Samuel Osorio.
>
>
> On 8 September 2011 12:20, Ruben S. Montero <rubensm at dacya.ucm.es> wrote:
>
>> Hi,
>>
>> Yes you are right. There is an issue open [1]. We are planning to
>> apply the proposed solution in that issue for 3.0 (i.e. clean-up will
>> happen only when you issue a delete operation). I think this will
>> address your use-case.
>>
>> [1] http://dev.opennebula.org/issues/265
>>
>> Thanks
>>
>> Ruben
>> On Tue, Sep 6, 2011 at 5:28 PM, samuel <samu60 at gmail.com> wrote:
>> > Hi folks,
>> >
>> > Recently there was a network problem and one instance became
>> unreachable. We
>> > tried to restart it with stop and resume actions but there's been a
>> problem
>> > and the disk has been deleted. The main concern is why, after trying to
>> > restart and an error happened, the directory where the disk image
>> resides
>> > has been deleted? There was no sensible data on it but I just don't get
>> why
>> > there has been a rm -rf of the directory.
>> >
>> > Details:
>> >
>> > The configuration is KVM with shared storage using open nebula 2.2.
>> >
>> > output of virsh version
>> >     Compilado contra la biblioteca: libvir 0.8.8
>> >     Utilizando la biblioteca: libvir 0.8.8
>> >     Utilizando API: QEMU 0.8.8
>> >     Ejecutando hypervisor: QEMU 0.14.0
>> >
>> > related logs:
>> >
>> > Tue Sep  6 12:37:49 2011 [VMM][D]: Message received: SAVE SUCCESS 22
>> Domain
>> > one-22 saved to /srv/cloud/one/var//22/images/checkpoint
>> > Tue Sep  6 12:37:49 2011 [VMM][D]: Message received:
>> > Tue Sep  6 12:37:49 2011 [TM][D]: Message received: LOG - 22 tm_mv.sh:
>> Will
>> > not move, is not saving image
>> > Tue Sep  6 12:37:49 2011 [TM][D]: Message received: TRANSFER SUCCESS 22
>> -
>> >
>> > Tue Sep  6 12:38:12 2011 [DiM][D]: Restarting VM 22
>> > Tue Sep  6 12:38:12 2011 [DiM][E]: Could not restart VM 22, wrong state.
>> > Tue Sep  6 12:38:12 2011 [ReM][E]: Wrong state to perform action
>> >
>> > Tue Sep  6 12:38:18 2011 [ReM][D]: VirtualMachineAction invoked
>> > Tue Sep  6 12:38:18 2011 [DiM][D]: Resuming VM 22
>> > Tue Sep  6 12:38:47 2011 [DiM][D]: Deploying VM 22
>> >
>> > Tue Sep  6 12:38:47 2011 [ReM][D]: VirtualMachineInfo method invoked
>> > Tue Sep  6 12:38:47 2011 [TM][D]: Message received: LOG - 22 tm_mv.sh:
>> Will
>> > not move, is not saving image
>> >
>> > Tue Sep  6 12:38:47 2011 [TM][D]: Message received: TRANSFER SUCCESS 22
>> -
>> >
>> > Tue Sep  6 12:38:48 2011 [ReM][D]: VirtualMachineInfo method invoked
>> > Tue Sep  6 12:38:49 2011 [VMM][D]: Message received: LOG - 22 Command
>> > execution fail: 'if [ -x "/var/tmp/one/vmm/kvm/restore" ]; then
>> > /var/tmp/one/vmm/kvm/restore /srv/cloud/one/var//22/images/checkpoint;
>> > else                              exit 42; fi'
>> > Tue Sep  6 12:38:49 2011 [VMM][D]: Message received: LOG - 22 STDERR
>> > follows.
>> > Tue Sep  6 12:38:49 2011 [VMM][D]: Message received: LOG - 22 error:
>> Failed
>> > to restore domain from /srv/cloud/one/var//22/images/checkpoint
>> > Tue Sep  6 12:38:49 2011 [VMM][D]: Message received: LOG - 22 error:
>> cannot
>> > close file: Bad file descriptor
>> > Tue Sep  6 12:38:49 2011 [VMM][D]: Message received: LOG - 22 ExitCode:
>> 1
>> > Tue Sep  6 12:38:49 2011 [VMM][D]: Message received: RESTORE FAILURE 22
>> > error: Failed to restore domain from
>> > /srv/cloud/one/var//22/images/checkpoint
>> > Tue Sep  6 12:38:49 2011 [VMM][D]: Message received: error: cannot close
>> > file: Bad file descriptor
>> > Tue Sep  6 12:38:49 2011 [VMM][D]: Message received: ExitCode: 1
>> >
>> > Tue Sep  6 12:38:50 2011 [TM][D]: Message received: LOG - 22
>> tm_delete.sh:
>> > Deleting /srv/cloud/one/var//22/images
>> > Tue Sep  6 12:38:50 2011 [TM][D]: Message received: LOG - 22
>> tm_delete.sh:
>> > Executed "rm -rf /srv/cloud/one/var//22/images".
>> > Tue Sep  6 12:38:50 2011 [TM][D]: Message received: TRANSFER SUCCESS 22
>> -
>> >
>> >
>> > Thank you in advance for any hint!
>> > Samuel.
>> >
>> > _______________________________________________
>> > Users mailing list
>> > Users at lists.opennebula.org
>> > http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>> >
>> >
>>
>>
>>
>> --
>> Dr. Ruben Santiago Montero
>> Associate Professor (Profesor Titular), Complutense University of Madrid
>>
>> URL: http://dsa-research.org/doku.php?id=people:ruben
>> Weblog: http://blog.dsa-research.org/?author=7
>>
>
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20110908/8b556069/attachment-0003.htm>


More information about the Users mailing list