[one-users] Incomplete information from hosts polling (VMWare ESXi 4.1 an OpenNebula 2.0.1)

Tue Feb 8 03:56:42 PST 2011

Ok, I tried some tests today.
The hardware/software environment includes 2 cluster nodes (ESXi 4.1), 2Gb
of RAM, 2 AMD Opteron 246 Processors (2GHz), trial version licenses. The
opennebula installation is *self-contained*.
800MB of memory are used by the hypervisor itself (that info comes from
vSphere Client) so only 1,2 GB are free, but OpenNebula seems unaware of
that :-(

oneadmin at custom2:/srv/cloud/templates/vm$ onehost list
  ID NAME              CLUSTER  RVM   TCPU   FCPU   ACPU    TMEM    FMEM
STAT
   2 custom7.sns.it    default    0    200    200    200      2G      0K
on
   1 custom6.sns.it    default    0    200    200    200      2G      0K
on

oneadmin at custom2:/srv/cloud/templates/vm$ onehost show 1
HOST 1 INFORMATION

ID                    : 1
NAME                  : custom6.sns.it
CLUSTER               : default
STATE                 : MONITORED
IM_MAD                : im_vmware
VM_MAD                : vmm_vmware
TM_MAD                : tm_vmware

HOST SHARES

MAX MEM               : 2096460
USED MEM (REAL)       : 0
USED MEM (ALLOCATED)  : 0
MAX CPU               : 200
USED CPU (REAL)       : 0
USED CPU (ALLOCATED)  : 0

In each test I tried to start 3 VM using a nonpersistent image. The
requirements of all of the three VM cannot be satisfied by a single cluster
node.

FIRST TEST:

The VM template for the first test is:
NAME = "Debian Server"
CPU = 1
MEMORY = 1024
OS = [ ARCH = "i686" ]
DISK = [IMAGE="Debian Server"]

Only CPU and Memory info.

Here is the result:
oneadmin at custom2:/srv/cloud/templates/vm$ onevm list
   ID     USER     NAME STAT CPU     MEM        HOSTNAME        TIME
   66 oneadmin Debian S pend   0      0K                 00 00:07:47
   67 oneadmin Debian S pend   0      0K                 00 00:07:45
   68 oneadmin Debian S pend   0      0K                 00 00:07:18

Forever in "pending" state... The VMs don't get scheduled

oned.log doesn't report anything but resource polling informational
messages.
sched.log repeats this sequence:
Tue Feb  8 10:02:06 2011 [HOST][D]: Discovered Hosts (enabled): 1 2
Tue Feb  8 10:02:06 2011 [VM][D]: Pending virtual machines : 66 67 68
Tue Feb  8 10:02:06 2011 [RANK][W]: No rank defined for VM
Tue Feb  8 10:02:06 2011 [RANK][W]: No rank defined for VM
Tue Feb  8 10:02:06 2011 [RANK][W]: No rank defined for VM
Tue Feb  8 10:02:06 2011 [SCHED][I]: Select hosts
        PRI     HID
        -------------------
Virtual Machine: 66

Virtual Machine: 67

Virtual Machine: 68

SECOND TEST:
VM template:
NAME = "Debian Server"
VCPU = 1
MEMORY = 1024
OS = [ ARCH = "i686" ]
DISK = [IMAGE="Debian Server"]

Only VCPU and MEMORY info.

Results:
oneadmin at custom2:/srv/cloud/templates/vm$ onevm list
   ID     USER     NAME STAT CPU     MEM        HOSTNAME        TIME
   76 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:07:40
   77 oneadmin Debian S runn   0      0K  custom6.sns.it 00 00:07:38
   78 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:05:58

Everything seems fine, but it's not since, as I said previously, each host
has only 1.2 GB of memory free, so there's should be no space for two VMs on
the same host.

oneadmin at custom2:/srv/cloud/templates/vm$ onehost list
  ID NAME              CLUSTER  RVM   TCPU   FCPU   ACPU    TMEM    FMEM
STAT
   2 custom7.sns.it    default    2    200    200    200      2G      0K
on
   1 custom6.sns.it    default    1    200    200    200      2G      0K
on

Both the hosts and the VMs report no useful info on the resource usage.
Logging to the VM of each console and executing "free -m" command I checked
that every VM has 1GB of total memory allocated. So i decided to test the GB
of memory on both VM at the same time using the utility called "memtester"
which allocate a given amount of free memory using malloc and test it. The
results reported memory access problems.

I decided here to go on and check if OpenNebula and VMWare ESXi fail to
allocate VMs exceeding the resource capacity of the hosts, by starting two
more VMs (requiring 1VCPU and 1GB memory each).
Results:
oneadmin at custom2:~/var/79$ onevm list
   ID     USER     NAME STAT CPU     MEM        HOSTNAME        TIME
   76 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:54:47
   77 oneadmin Debian S runn   0      0K  custom6.sns.it 00 00:54:45
   78 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:53:05
   79 oneadmin Debian S boot   0      0K  custom7.sns.it 00 00:10:22
   80 oneadmin Debian S boot   0      0K  custom7.sns.it 00 00:09:47

The new VM are allocated on custom7 machine (why???) but remain frozen on
"boot" state.
That is a problem because those two new VM should not be allocated to any
cluster node.

THIRD TEST:
Here I followed Ruben suggestion...
The VM template:
oneadmin at custom2:/srv/cloud/templates/vm$ cat debian.vm
NAME = "Debian Server"
CPU = 1
VCPU = 1
MEMORY = 1024
OS = [ ARCH = "i686" ]
DISK = [IMAGE="Debian Server"]

Both CPU/VCPU and MEMORY info.

Output with 3 VM:
oneadmin at custom2:~/var$ onevm list
   ID     USER     NAME STAT CPU     MEM        HOSTNAME        TIME
   81 oneadmin Debian S pend   0      0K                 00 00:02:32
   82 oneadmin Debian S pend   0      0K                 00 00:02:30
   83 oneadmin Debian S pend   0      0K                 00 00:02:29

As in FIRST TEST the VMs don't get scheduled and remain in "pending" state.
sched.log repeats this message:
Tue Feb  8 12:00:05 2011 [HOST][D]: Discovered Hosts (enabled): 1 2
Tue Feb  8 12:00:05 2011 [VM][D]: Pending virtual machines : 81 82 83
Tue Feb  8 12:00:05 2011 [RANK][W]: No rank defined for VM
Tue Feb  8 12:00:05 2011 [RANK][W]: No rank defined for VM
Tue Feb  8 12:00:05 2011 [RANK][W]: No rank defined for VM
Tue Feb  8 12:00:05 2011 [SCHED][I]: Select hosts
PRI HID
 -------------------
Virtual Machine: 81

Virtual Machine: 82

Virtual Machine: 83

Here I assumed that probably I should not declare the number of physical CPU
in the VM template.

Another last test...
FOURTH TEST:
Here I disabled an host, custom6, and started 3 VMs.
The VM template is the one that worked before:
oneadmin at custom2:/srv/cloud/templates/vm$ cat debian.vm
NAME = "Debian Server"
VCPU = 1
MEMORY = 1024
OS = [ ARCH = "i686" ]
DISK = [IMAGE="Debian Server"]

Output:
oneadmin at custom2:~$ onehost list
  ID NAME              CLUSTER  RVM   TCPU   FCPU   ACPU    TMEM    FMEM
STAT
   2 custom7.sns.it    default    3    200    200    200      2G      0K
on
   1 custom6.sns.it    default    0    200    200    200      2G      0K
 off
oneadmin at custom2:~$ onevm list
   ID     USER     NAME STAT CPU     MEM        HOSTNAME        TIME
   92 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:12:53
   93 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:12:46
   94 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:12:46

I verified if the VM were up and running by logging to the console of each
one of them through vSphere Client and they were all running and declaring
an amount of 1GB of total memory on each one of them. Since there is less
than 1.2 GB of memory effectively free on a cluster node before the VMs
instantiation how can those VMs run consistently? Why OpenNebula schedule
those VM on the same machine exceeding even the host resource capacity?

On Fri, Feb 4, 2011 at 11:04 PM, Ruben S. Montero <rubensm at dacya.ucm.es>wrote:

> Hi,
>
> You have to add also de CPU capacity for the VM (apart from the number of
> virtual cpus CPUs). The CPU value is used at the allocation phase. However
> you are specifying MEMORY and should be included in the allocated memeory
> (USED MEMORY in onehost show) So I guess there should be other problem with
> your template.
>
> Cheers
>
> Ruben
>
>
> On Fri, Feb 4, 2011 at 10:50 AM, Luigi Fortunati <
> luigi.fortunati at gmail.com> wrote:
>
>> I can post the VM template content on monday. However, as far as I
>> remember, the vm template was really simple:
>> NAME="Debian"
>> VCPU= 2
>> MEMORY=1024
>> DISK=[IMAGE="Debian5-i386"]
>> OS=[ARCH=i686]
>>
>> The VMs can boot and run, I can log on console through vSphere Client on
>> the newly created VMs.
>>
>> I noticed that if you don't declare the number on VCPU the VM doesn't get
>> scheduled on a cluster node. This option seems mandatory but I didn't find
>> any mention about it on the documentation.
>> Another thing that seems mandatory is declaring the cpu architecture as
>> i686, otherwise OpenNebula will return error when writing the deployment.0
>> file.
>>
>>
>> On Thu, Feb 3, 2011 at 5:42 PM, Ruben S. Montero <rubensm at dacya.ucm.es>wrote:
>>
>>> Hi,
>>>
>>> I am not sure this is related to the VMware monitoring... Can you send
>>> the VM Templates?
>>>
>>> Thanks
>>>
>>> Ruben
>>>
>>> On Thu, Feb 3, 2011 at 5:10 PM, Luigi Fortunati <
>>> luigi.fortunati at gmail.com> wrote:
>>>
>>>> Hi,
>>>> I noticed a serious problem about the usage of VMWare ESXi 4.1 and
>>>> OpenNebula 2.0.1.
>>>> I'm actually using the VMWare driver addon which can be found on the
>>>> opennebula website (ver. 1.0) and libvirt (ver. 0.8.7).
>>>> It happens that OpenNebula can't get information about the usage of
>>>> resources on the cluster nodes.
>>>> By running 2 VM (each one requires 2 VCPU and 1 GB of memory) and
>>>> executing some commands I get this output.
>>>>
>>>> oneadmin at custom2:~/src$ onehost list
>>>>   ID NAME              CLUSTER  RVM   TCPU   FCPU   ACPU    TMEM    FMEM
>>>> STAT
>>>>    2 custom7.sns.it    default    0    200    200    200      2G
>>>>  0K  off
>>>>    1 custom6.sns.it    default    2    200    200    200      2G
>>>>  0K   on
>>>> oneadmin at custom2:~/src$ onehost show 1
>>>> HOST 1 INFORMATION
>>>>
>>>> ID                    : 1
>>>> NAME                  : custom6.sns.it
>>>> CLUSTER               : default
>>>> STATE                 : MONITORED
>>>> IM_MAD                : im_vmware
>>>> VM_MAD                : vmm_vmware
>>>> TM_MAD                : tm_vmware
>>>>
>>>> HOST SHARES
>>>>
>>>> MAX MEM               : 2096460
>>>> USED MEM (REAL)       : 0
>>>> USED MEM (ALLOCATED)  : 0
>>>> MAX CPU               : 200
>>>> USED CPU (REAL)       : 0
>>>> USED CPU (ALLOCATED)  : 0
>>>> RUNNING VMS           : 2
>>>>
>>>> MONITORING INFORMATION
>>>>
>>>> CPUSPEED=1992
>>>> HYPERVISOR=vmware
>>>> TOTALCPU=200
>>>> TOTALMEMORY=2096460
>>>>
>>>> As you can see OpenNebula is unable to get correct information about the
>>>> usage of resources on the cluster nodes.
>>>> As these informations are used by the VM scheduler, OpenNebula is unable
>>>> to schedule the VM correctly.
>>>> I tried to create several VM and all of them were placed on the same
>>>> host even if the latter was unable to satisfy the resource requirements of
>>>> all the VMs.
>>>> I think that this problem is strongly related to libvirt as OpenNebula
>>>> use it to recover information about hosts and vm.
>>>>
>>>> Do you get the same behavior? Do you know if there is a way to solve
>>>> this big issue?
>>>>
>>>> --
>>>> Luigi Fortunati
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at lists.opennebula.org
>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Dr. Ruben Santiago Montero
>>> Associate Professor (Profesor Titular), Complutense University of Madrid
>>>
>>> URL: http://dsa-research.org/doku.php?id=people:ruben
>>> Weblog: http://blog.dsa-research.org/?author=7
>>>
>>
>>
>>
>> --
>> Luigi Fortunati
>>
>
>
>
> --
> Dr. Ruben Santiago Montero
> Associate Professor (Profesor Titular), Complutense University of Madrid
>
> URL: http://dsa-research.org/doku.php?id=people:ruben
> Weblog: http://blog.dsa-research.org/?author=7
>

-- 
Luigi Fortunati
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20110208/a59f5caf/attachment-0002.htm>