[one-users] fault tolerance

Tino Vazquez tinova at opennebula.org
Wed Sep 19 03:28:15 PDT 2012


Dear Gareth,

Ok, it looks like there is a bug in the TM drivers. I've opened a
ticket [1] to fix this in upcoming releases. Meanwhile, you can modify
/var/lib/one/remotes/tm/shared/ln, line 82, to read:

		     "cd $DST_DIR; rm $DST_PATH ; ln -s $SRC_PATH $DST_PATH" \

Best regards, and thanks for the feedback,

-Tino

[1] http://dev.opennebula.org/issues/1482

--
Constantino Vázquez Blanco, MSc
Project Engineer
OpenNebula - The Open-Source Solution for Data Center Virtualization
www.OpenNebula.org | @tinova79 | @OpenNebula


On Tue, Sep 18, 2012 at 5:27 PM, Gareth de Vaux <opennebula at lordcow.org> wrote:
> On Fri 2012-09-14 (12:14), Tino Vazquez wrote:
>> It looks like there was a problem with the VM after it was put in
>> PENDING. Could you please send us the log of the individual VM
>> (/var/lib/one/<vid>/vm.log) to take a look?
>
> Attaching at the end of the mail but it mostly looks like the oned.log.
>
>> It should be possible to resubmit a VM in failed state, what is the
>> error message you get?
>
> Sorry I think I tried to 'restart' it - I've successfully resubmitted it now.
>
>> Regarding the problem with "onehost sync", does oneadmin have writting
>> permissions over /var/lib/one/remotes?
>
> Nope it was owned by root which I figured was normal for /var, forgetting
> this's where one's $HOME was. Fixed now thanx. For the record debian seems
> to set the ownership of $HOME correctly except for ~/remotes and
> ~/datastores.
>
> So, I'm able to resubmit manually but the hook still doesn't work.
>
>
> 10.log:
>
> Wed Sep 12 17:54:13 2012 [VMM][D]: Monitor Information:
>         CPU   : 6
>         Memory: 524288
>         Net_TX: 4761
>         Net_RX: 49874
> Wed Sep 12 17:56:27 2012 [VMM][I]: Command execution fail: 'if [ -x "/var/tmp/one/vmm/kvm/poll" ]; then /var/tmp/one/vmm/kvm/poll one-10 arcus 10 arcus; else                              exit 42; fi'
> Wed Sep 12 17:56:27 2012 [VMM][I]: ssh: connect to host arcus port 22: Connection timed out
> Wed Sep 12 17:56:27 2012 [VMM][I]: ExitCode: 255
> Wed Sep 12 17:56:27 2012 [VMM][E]: Error monitoring VM
> Wed Sep 12 17:56:27 2012 [VMM][E]: Error monitoring VM
> Wed Sep 12 17:57:29 2012 [DiM][I]: New VM state is PENDING
> Wed Sep 12 17:57:29 2012 [TM][W]: Ignored: LOG I 10 ExitCode: 0
>
> Wed Sep 12 17:57:29 2012 [VMM][W]: Ignored: LOG I 10 Driver command for 10 cancelled
>
> Wed Sep 12 17:57:37 2012 [DiM][I]: New VM state is ACTIVE.
> Wed Sep 12 17:57:38 2012 [LCM][I]: New VM state is PROLOG.
> Wed Sep 12 17:57:38 2012 [VM][I]: Virtual Machine has no context
> Wed Sep 12 17:57:38 2012 [TM][I]: Command execution fail: /var/lib/one/remotes/tm/shared/ln cirrus:/var/lib/one/datastores/1/aab3c5409d45f015626af354c827a776 nimbus:/var/lib/one//datastores/0/10/disk.0
> Wed Sep 12 17:57:38 2012 [TM][I]: ln: Linking ../../1/aab3c5409d45f015626af354c827a776 in nimbus:/var/lib/one//datastores/0/10/disk.0
> Wed Sep 12 17:57:38 2012 [TM][E]: ln: Command "cd /var/lib/one/datastores/0/10; ln -s ../../1/aab3c5409d45f015626af354c827a776 /var/lib/one/datastores/0/10/disk.0" failed: ln: failed to create symbolic link `/var/lib/one/datastores/0/10/disk.0': File exists
> Wed Sep 12 17:57:38 2012 [TM][E]: Error linking cirrus:/var/lib/one/datastores/1/aab3c5409d45f015626af354c827a776 to nimbus:/var/lib/one//datastores/0/10/disk.0
> Wed Sep 12 17:57:38 2012 [TM][I]: ExitCode: 1
> Wed Sep 12 17:57:38 2012 [TM][E]: Error executing image transfer script: Error linking cirrus:/var/lib/one/datastores/1/aab3c5409d45f015626af354c827a776 to nimbus:/var/lib/one//datastores/0/10/disk.0
> Wed Sep 12 17:57:38 2012 [DiM][I]: New VM state is FAILED
> Wed Sep 12 17:58:32 2012 [TM][W]: Ignored: LOG I 10 Command execution fail: /var/lib/one/remotes/tm/shared/delete arcus:/var/lib/one//datastores/0/10
>
> Wed Sep 12 17:58:32 2012 [TM][W]: Ignored: LOG I 10 delete: Deleting /var/lib/one/datastores/0/10
>
> Wed Sep 12 17:58:32 2012 [TM][W]: Ignored: LOG E 10 delete: Command "rm -rf /var/lib/one/datastores/0/10" failed: ssh: connect to host arcus port 22: Connection timed out
>
> Wed Sep 12 17:58:32 2012 [TM][W]: Ignored: LOG E 10 Error deleting /var/lib/one/datastores/0/10
>
> Wed Sep 12 17:58:32 2012 [TM][W]: Ignored: LOG I 10 ExitCode: 255
>
> Wed Sep 12 17:58:32 2012 [TM][W]: Ignored: TRANSFER FAILURE 10 Error deleting /var/lib/one/datastores/0/10
>
> Wed Sep 12 17:58:32 2012 [VMM][W]: Ignored: LOG I 10 Command execution fail: /var/tmp/one/vmm/kvm/cancel one-10 arcus 10 arcus
>
> Wed Sep 12 17:58:32 2012 [VMM][W]: Ignored: LOG I 10 ssh: connect to host arcus port 22: Connection timed out
>
> Wed Sep 12 17:58:32 2012 [VMM][W]: Ignored: LOG I 10 ExitSSHCode: 255
>
> Wed Sep 12 17:58:32 2012 [VMM][W]: Ignored: LOG E 10 Error connecting to arcus
>
> Wed Sep 12 17:58:32 2012 [VMM][W]: Ignored: LOG I 10 Failed to execute virtualization driver operation: cancel.
>
> Wed Sep 12 17:58:32 2012 [VMM][W]: Ignored: CANCEL FAILURE 10 Error connecting to arcus



More information about the Users mailing list