<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>


<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

</head>

<body bgcolor="#ffffff" text="#000000">

<small>Hi All...<br>

<br>

I've mounted a testbed with 2 cluster nodes and a frontend to test

opennebula 1.4. By the way, when I ask to download version 1.4, it

pulls one-1.3.80. Is this right? <br>

<br>

Nevertheless, I think I've configured and installed everything as

described in the docs, but since I'm a newbie in opennebula, most

likely I'm doing something wrong. I have set up a self contained

opennebula installation and a storage area for the images, both shared

via iSCSI between frontend and cluster nodes. At this point everything

seems ok since my cluster nodes are properly monitored, and I can start

xen virtual machines.<br>

<br>

---*---<br>

<br>

-bash-3.2$ onehost list<br>

&nbsp;HID NAME&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RVM&nbsp;&nbsp; TCPU&nbsp;&nbsp; FCPU&nbsp;&nbsp; ACPU&nbsp;&nbsp;&nbsp; TMEM&nbsp;&nbsp;&nbsp; FMEM

STAT<br>

&nbsp;&nbsp; 1 core19&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp;&nbsp;&nbsp; 800&nbsp;&nbsp;&nbsp; 799&nbsp;&nbsp;&nbsp; 799 2516480

2371686&nbsp;&nbsp; on<br>

&nbsp;&nbsp; 2 core05&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp; 800&nbsp;&nbsp;&nbsp; 800&nbsp;&nbsp;&nbsp; 800 2516480

2161971&nbsp;&nbsp; on<br>

<br>

<br>

---*---<br>

<br>

-bash-3.2$ onevm list<br>

&nbsp; ID&nbsp;&nbsp;&nbsp;&nbsp; USER&nbsp;&nbsp;&nbsp;&nbsp; NAME STAT CPU&nbsp;&nbsp;&nbsp;&nbsp; MEM&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; HOSTNAME&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; TIME<br>

&nbsp;&nbsp; 7 oneadmin sge02.nc runn&nbsp;&nbsp; 0 1048328&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; core05 00 00:13:55<br>

&nbsp;&nbsp; 8 oneadmin sge03.nc runn&nbsp;&nbsp; 0 1048412&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; core05 00 00:12:29<br>

<br>

---*---<br>

<br>

However, the one relevant thing it seems I'm able to do is to start

VMs. I'm interested in the live migration feature, and this was the

first thing I started to test. The result was no migration at all, and

the following logs: <br>

<br>

### $ONE_LOCATION/var/oned.log ###<br>

<br>

Wed Aug 12 23:46:43 2009 [ReM][D]: VirtualMachineMigrate invoked<br>

Wed Aug 12 23:46:43 2009 [DiM][D]: Live-migrating VM 7<br>

Wed Aug 12 23:46:44 2009 [VMM][D]: Message received: LOG - 7 Command

execution fail: sudo /usr/sbin/xm migrate -l one-7 core19<br>

Wed Aug 12 23:46:44 2009 [VMM][D]: Message received: LOG - 7 STDERR

follows.<br>

Wed Aug 12 23:46:44 2009 [VMM][D]: Message received: LOG - 7 Error:

can't connect: Connection refused<br>

Wed Aug 12 23:46:44 2009 [VMM][D]: Message received: LOG - 7 ExitCode: 1<br>

Wed Aug 12 23:46:44 2009 [VMM][D]: Message received: MIGRATE FAILURE 7 -<br>

Wed Aug 12 23:46:47 2009 [VMM][D]: Message received: POLL SUCCESS 7

USEDMEMORY=1048384 USEDCPU=0.0 NETTX=7 NETRX=165&nbsp; STATE=a<br>

<br>

### $ONE_LOCATION/var/7/vm.log ###<br>

<br>

Wed Aug 12 23:46:43 2009 [LCM][I]: New VM state is MIGRATE<br>

Wed Aug 12 23:46:44 2009 [VMM][I]: Command execution fail: sudo

/usr/sbin/xm migrate -l one-7 core19<br>

Wed Aug 12 23:46:44 2009 [VMM][I]: STDERR follows.<br>

Wed Aug 12 23:46:44 2009 [VMM][I]: Error: can't connect: Connection

refused<br>

Wed Aug 12 23:46:44 2009 [VMM][I]: ExitCode: 1<br>

Wed Aug 12 23:46:44 2009 [VMM][E]: Error live-migrating VM, -<br>

Wed Aug 12 23:46:44 2009 [LCM][I]: Fail to life migrate VM. Assuming

that the VM is still RUNNING (will poll VM).<br>

<br>

There are no FWs around to block connections, so I do not understand

where the message "Error: can't connect: Connection refused" is coming

from. <br>

<br>

Afterwards I decided to go to a simple migrate. Here, it complains it

can not restore the machines.<br>

<br>

### $ONE_LOCATION/var/oned.log ###<br>

<br>

Wed Aug 12 23:56:58 2009 [DiM][D]: Migrating VM 7<br>

Wed Aug 12 23:57:19 2009 [VMM][I]: Monitoring VM 8.<br>

Wed Aug 12 23:57:22 2009 [VMM][D]: Message received: POLL SUCCESS 8

USEDMEMORY=1048320 USEDCPU=0.0 NETTX=8 NETRX=160&nbsp; STATE=a<br>

Wed Aug 12 23:57:29 2009 [VMM][D]: Message received: SAVE SUCCESS 7 -<br>

Wed Aug 12 23:57:29 2009 [TM][D]: Message received: LOG - 7 tm_mv.sh:

Will not move, source and destination are equal<br>

Wed Aug 12 23:57:29 2009 [TM][D]: Message received: TRANSFER SUCCESS 7 -<br>

Wed Aug 12 23:57:29 2009 [VMM][D]: Message received: LOG - 7 Command

execution fail: sudo /usr/sbin/xm restore

/srv01/cloud/images/7/images/checkpoint<br>

Wed Aug 12 23:57:29 2009 [VMM][D]: Message received: LOG - 7 STDERR

follows.<br>

Wed Aug 12 23:57:29 2009 [VMM][D]: Message received: LOG - 7 Error:

Restore failed<br>

Wed Aug 12 23:57:29 2009 [VMM][D]: Message received: LOG - 7 ExitCode: 1<br>

Wed Aug 12 23:57:29 2009 [VMM][D]: Message received: RESTORE FAILURE 7 -<br>

Wed Aug 12 23:57:30 2009 [TM][D]: Message received: LOG - 7

tm_delete.sh: Deleting /srv01/cloud/images/7/images<br>

Wed Aug 12 23:57:30 2009 [TM][D]: Message received: LOG - 7

tm_delete.sh: Executed "rm -rf /srv01/cloud/images/7/images".<br>

Wed Aug 12 23:57:30 2009 [TM][D]: Message received: TRANSFER SUCCESS 7 -<br>

<br>

### $ONE_LOCATION/var/7/vm.log ###<br>

<br>

Wed Aug 12 23:56:58 2009 [LCM][I]: New VM state is SAVE_MIGRATE<br>

Wed Aug 12 23:57:29 2009 [LCM][I]: New VM state is PROLOG_MIGRATE<br>

Wed Aug 12 23:57:29 2009 [TM][I]: tm_mv.sh: Will not move, source and

destination are equal<br>

Wed Aug 12 23:57:29 2009 [LCM][I]: New VM state is BOOT<br>

Wed Aug 12 23:57:29 2009 [VMM][I]: Command execution fail: sudo

/usr/sbin/xm restore /srv01/cloud/images/7/images/checkpoint<br>

Wed Aug 12 23:57:29 2009 [VMM][I]: STDERR follows.<br>

Wed Aug 12 23:57:29 2009 [VMM][I]: Error: Restore failed<br>

Wed Aug 12 23:57:29 2009 [VMM][I]: ExitCode: 1<br>

Wed Aug 12 23:57:29 2009 [VMM][E]: Error restoring VM, -<br>

Wed Aug 12 23:57:29 2009 [DiM][I]: New VM state is FAILED<br>

Wed Aug 12 23:57:30 2009 [TM][W]: Ignored: LOG - 7 tm_delete.sh:

Deleting /srv01/cloud/images/7/images<br>

Wed Aug 12 23:57:30 2009 [TM][W]: Ignored: LOG - 7 tm_delete.sh:

Executed "rm -rf /srv01/cloud/images/7/images".<br>

Wed Aug 12 23:57:30 2009 [TM][W]: Ignored: TRANSFER SUCCESS 7 -<br>

<br>

Even a stop and resume command fail with the following logs:<br>

<br>

### $ONE_LOCATION/var/oned.log ###<br>

<br>

Thu Aug 13 00:25:01 2009 [InM][I]: Monitoring host core19 (1)<br>

Thu Aug 13 00:25:02 2009 [VMM][D]: Message received: SAVE SUCCESS 10 -<br>

Thu Aug 13 00:25:03 2009 [TM][D]: Message received: LOG - 10 tm_mv.sh:

Will not move, is not saving image<br>

Thu Aug 13 00:25:03 2009 [TM][D]: Message received: TRANSFER SUCCESS 10

-<br>

Thu Aug 13 00:25:05 2009 [InM][D]: Host 1 successfully monitored.<br>

Thu Aug 13 00:25:12 2009 [ReM][D]: VirtualMachineDeploy invoked<br>

Thu Aug 13 00:25:31 2009 [InM][I]: Monitoring host core05 (2)<br>

Thu Aug 13 00:25:34 2009 [InM][D]: Host 2 successfully monitored.<br>

Thu Aug 13 00:25:36 2009 [ReM][D]: VirtualMachineAction invoked<br>

Thu Aug 13 00:25:36 2009 [DiM][D]: Restarting VM 10<br>

Thu Aug 13 00:25:36 2009 [DiM][E]: Could not restart VM 10, wrong state.<br>

Thu Aug 13 00:25:52 2009 [ReM][D]: VirtualMachineAction invoked<br>

Thu Aug 13 00:25:52 2009 [DiM][D]: Resuming VM 10<br>

Thu Aug 13 00:26:01 2009 [InM][I]: Monitoring host core19 (1)<br>

Thu Aug 13 00:26:02 2009 [ReM][D]: VirtualMachineDeploy invoked<br>

Thu Aug 13 00:26:02 2009 [DiM][D]: Deploying VM 10<br>

Thu Aug 13 00:26:02 2009 [TM][D]: Message received: LOG - 10 Command

execution fail: /srv01/cloud/one/lib/tm_commands/nfs/tm_mv.sh

one01.ncg.ingrid.pt:/srv01/cloud/one/var/10/images

core19:/srv01/cloud/images/10/images<br>

Thu Aug 13 00:26:02 2009 [TM][D]: Message received: LOG - 10 STDERR

follows.<br>

Thu Aug 13 00:26:02 2009 [TM][D]: Message received: LOG - 10 ERROR

MESSAGE --8&lt;------<br>

Thu Aug 13 00:26:02 2009 [TM][D]: Message received: LOG - 10 mv: cannot

stat `/srv01/cloud/one/var/10/images': No such file or directory<br>

Thu Aug 13 00:26:02 2009 [TM][D]: Message received: LOG - 10 ERROR

MESSAGE ------&gt;8--<br>

Thu Aug 13 00:26:02 2009 [TM][D]: Message received: LOG - 10 ExitCode:

255<br>

Thu Aug 13 00:26:02 2009 [TM][D]: Message received: LOG - 10 tm_mv.sh:

Moving /srv01/cloud/one/var/10/images<br>

Thu Aug 13 00:26:02 2009 [TM][D]: Message received: LOG - 10 tm_mv.sh:

ERROR: Command "mv /srv01/cloud/one/var/10/images

/srv01/cloud/images/10/images" failed.<br>

Thu Aug 13 00:26:02 2009 [TM][D]: Message received: LOG - 10 tm_mv.sh:

ERROR: mv: cannot stat `/srv01/cloud/one/var/10/images': No such file

or directory<br>

Thu Aug 13 00:26:02 2009 [TM][D]: Message received: TRANSFER FAILURE 10

mv: cannot stat `/srv01/cloud/one/var/10/images': No such file or

directory<br>

Thu Aug 13 00:26:03 2009 [TM][D]: Message received: LOG - 10

tm_delete.sh: Deleting /srv01/cloud/images/10/images<br>

Thu Aug 13 00:26:03 2009 [TM][D]: Message received: LOG - 10

tm_delete.sh: Executed "rm -rf /srv01/cloud/images/10/images".<br>

Thu Aug 13 00:26:03 2009 [TM][D]: Message received: TRANSFER SUCCESS 10

-<br>

<br>

So, any feedback on these issues is most welcome.<br>

<br>

Another different issue I'll like to ask is if this opennebula version

supports recover of virtual machines. Some colleague of mine seen in

previous one versions that, if one cluster node goes down, the VMs

running there were marked has failed in the DB, and were never

restarted, even if that physical host recovers completely. What I would

like to see (and most site admins) is the start of those VMs. I do not

care about checkpointing. I just would like to see the VMs starting. If

the VMs start in some inconsistent way, that is a completely different

separated question. Nevertheless, 90% of the times, a simple file

system check is sufficient to recover any machine.<br>

<br>

Thanks for any feedback. Probably, I can only react on Monday.<br>

<br>

Cheers<br>

Goncalo</small>

</body>

</html>