[one-users] Incomplete information from hosts polling (VMWare ESXi 4.1 an OpenNebula 2.0.1)

Fri Feb 11 06:10:41 PST 2011

Hi Luigi,

I've updated the ticket, I will be implementing this for the next release.

Regards,

-Tino

--
Constantino Vázquez Blanco, MSc
OpenNebula Major Contributor  / Cloud Researcher
www.OpenNebula.org | @tinova79

On Wed, Feb 9, 2011 at 3:14 PM, Luigi Fortunati
<luigi.fortunati at gmail.com> wrote:
> Thanks Tino,
> That is probably more a problem of libvirt, since VMWare IM Driver use it in
> order to access information about the hosts.
> In order to get information about the hosts OpenNebula launches a virsh
> command and parses the output.
> The script that does this work is located in $ONE_LOCATION/lib/remotes/im
> and the output of the virsh command is:
> oneadmin at custom2:~/lib/remotes/im$ virsh -c
> esx://custom6.sns.it/?no_verify=1 nodeinfo
> Enter username for custom6.sns.it [root]:
> Enter root's password for custom6.sns.it:
> CPU model:           AMD Opteron(tm) Processor 246
> CPU(s):              2
> CPU frequency:       1992 MHz
> CPU socket(s):       2
> Core(s) per socket:  1
> Thread(s) per core:  1
> NUMA cell(s):        2
> Memory size:         2096460 kB
> I always get the same output, no matter how many VMs are running on the
> cluster node.
> That is why OpenNebula returns with an output like this:
> oneadmin at custom2:~/var/96$ onehost show 1
> HOST 1 INFORMATION
>
> ID                    : 1
> NAME                  : custom6.sns.it
> CLUSTER               : default
> STATE                 : MONITORING
> IM_MAD                : im_vmware
> VM_MAD                : vmm_vmware
> TM_MAD                : tm_vmware
> HOST SHARES
>
> MAX MEM               : 2096460
> USED MEM (REAL)       : 0
> USED MEM (ALLOCATED)  : 0
> MAX CPU               : 200
> USED CPU (REAL)       : 0
> USED CPU (ALLOCATED)  : 0
> RUNNING VMS           : 1
> MONITORING INFORMATION
>
> CPUSPEED=1992
> HYPERVISOR=vmware
> TOTALCPU=200
> TOTALMEMORY=2096460
> OpenNebula polls cluster nodes periodically and gets only information about
> hypervisor type, cpu frequency, total cpu, total memory size.
> The limitation here is caused by libvirt (virsh) which is unable to return
> more information about the actual usage of resources.
> The integration of OpenNebula with Xen can rely on ssh access to the cluster
> nodes.
> The IM Driver for Xen hypervisors, launches xentop on every cluster node in
> order to get information about the VMs and then parses the output.
> As an example here is the output of commands xm and xentop (some info is
> purged):
> custom9:/ # xentop -bi2
>       NAME  STATE   CPU(sec) CPU(%)     MEM(k) MEM(%)  MAXMEM(k) MAXMEM(%)
> VCPUS NETS NETTX(k) NETRX(k)
>   Domain-0 -----r        102    0.0    1930260   93.7   no limit       n/a
>   2    0        0        0
>       NAME  STATE   CPU(sec) CPU(%)     MEM(k) MEM(%)  MAXMEM(k) MAXMEM(%)
> VCPUS NETS NETTX(k) NETRX(k)
>   Domain-0 -----r        102    0.3    1930260   93.7   no limit       n/a
>   2    0        0        0
> custom9:/ # xm info
> host                   : custom9
> release                : 2.6.34.7-0.5-xen
> version                : #1 SMP 2010-10-25 08:40:12 +0200
> machine                : x86_64
> nr_cpus                : 2
> nr_nodes               : 2
> cores_per_socket       : 1
> threads_per_core       : 1
> cpu_mhz                : 1991
> [...]
> total_memory           : 2011
> free_memory            : 135
> free_cpus              : 0
> max_free_memory        : 1508
> max_para_memory        : 1504
> max_hvm_memory         : 1492
> [...]
> The script $ONE_LOCATION/lib/remotes/im/xen.d/xen.rb parses those two
> outputs and retrieves data about memory, cpu, and network usage.
> I think that VMWare drivers are scarcely useful if they can't provide the
> degree of information which can be achieved with xen hypervisors and
> OpenNebula, I've tested the effects of this issue in my tests.
> On Tue, Feb 8, 2011 at 6:34 PM, Tino Vazquez <tinova at opennebula.org> wrote:
>>
>> Hi Luigi,
>>
>> There is a bug in the IM driver for VMware, is not reporting the Free
>> memory at all. I've opened a ticket to keep track of the issue [1], it
>> will be solved in the next release.
>>
>> Regards,
>>
>> -Tino
>>
>> [1] http://dev.opennebula.org/issues/481
>>
>> --
>> Constantino Vázquez Blanco, MSc
>> OpenNebula Major Contributor  / Cloud Researcher
>> www.OpenNebula.org | @tinova79
>>
>>
>>
>> On Tue, Feb 8, 2011 at 12:56 PM, Luigi Fortunati
>> <luigi.fortunati at gmail.com> wrote:
>> > Ok, I tried some tests today.
>> > The hardware/software environment includes 2 cluster nodes (ESXi 4.1),
>> > 2Gb
>> > of RAM, 2 AMD Opteron 246 Processors (2GHz), trial version licenses. The
>> > opennebula installation is self-contained.
>> > 800MB of memory are used by the hypervisor itself (that info comes from
>> > vSphere Client) so only 1,2 GB are free, but OpenNebula seems unaware of
>> > that :-(
>> > oneadmin at custom2:/srv/cloud/templates/vm$ onehost list
>> >   ID NAME              CLUSTER  RVM   TCPU   FCPU   ACPU    TMEM    FMEM
>> > STAT
>> >    2 custom7.sns.it    default    0    200    200    200      2G      0K
>> > on
>> >    1 custom6.sns.it    default    0    200    200    200      2G      0K
>> > on
>> > oneadmin at custom2:/srv/cloud/templates/vm$ onehost show 1
>> > HOST 1 INFORMATION
>> >
>> > ID                    : 1
>> > NAME                  : custom6.sns.it
>> > CLUSTER               : default
>> > STATE                 : MONITORED
>> > IM_MAD                : im_vmware
>> > VM_MAD                : vmm_vmware
>> > TM_MAD                : tm_vmware
>> > HOST SHARES
>> >
>> > MAX MEM               : 2096460
>> > USED MEM (REAL)       : 0
>> > USED MEM (ALLOCATED)  : 0
>> > MAX CPU               : 200
>> > USED CPU (REAL)       : 0
>> > USED CPU (ALLOCATED)  : 0
>> >
>> > In each test I tried to start 3 VM using a nonpersistent image. The
>> > requirements of all of the three VM cannot be satisfied by a single
>> > cluster
>> > node.
>> > FIRST TEST:
>> > The VM template for the first test is:
>> > NAME = "Debian Server"
>> > CPU = 1
>> > MEMORY = 1024
>> > OS = [ ARCH = "i686" ]
>> > DISK = [IMAGE="Debian Server"]
>> > Only CPU and Memory info.
>> > Here is the result:
>> > oneadmin at custom2:/srv/cloud/templates/vm$ onevm list
>> >    ID     USER     NAME STAT CPU     MEM        HOSTNAME        TIME
>> >    66 oneadmin Debian S pend   0      0K                 00 00:07:47
>> >    67 oneadmin Debian S pend   0      0K                 00 00:07:45
>> >    68 oneadmin Debian S pend   0      0K                 00 00:07:18
>> > Forever in "pending" state... The VMs don't get scheduled
>> > oned.log doesn't report anything but resource polling informational
>> > messages.
>> > sched.log repeats this sequence:
>> > Tue Feb  8 10:02:06 2011 [HOST][D]: Discovered Hosts (enabled): 1 2
>> > Tue Feb  8 10:02:06 2011 [VM][D]: Pending virtual machines : 66 67 68
>> > Tue Feb  8 10:02:06 2011 [RANK][W]: No rank defined for VM
>> > Tue Feb  8 10:02:06 2011 [RANK][W]: No rank defined for VM
>> > Tue Feb  8 10:02:06 2011 [RANK][W]: No rank defined for VM
>> > Tue Feb  8 10:02:06 2011 [SCHED][I]: Select hosts
>> >         PRI     HID
>> >         -------------------
>> > Virtual Machine: 66
>> > Virtual Machine: 67
>> > Virtual Machine: 68
>> > SECOND TEST:
>> > VM template:
>> > NAME = "Debian Server"
>> > VCPU = 1
>> > MEMORY = 1024
>> > OS = [ ARCH = "i686" ]
>> > DISK = [IMAGE="Debian Server"]
>> > Only VCPU and MEMORY info.
>> > Results:
>> > oneadmin at custom2:/srv/cloud/templates/vm$ onevm list
>> >    ID     USER     NAME STAT CPU     MEM        HOSTNAME        TIME
>> >    76 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:07:40
>> >    77 oneadmin Debian S runn   0      0K  custom6.sns.it 00 00:07:38
>> >    78 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:05:58
>> > Everything seems fine, but it's not since, as I said previously, each
>> > host
>> > has only 1.2 GB of memory free, so there's should be no space for two
>> > VMs on
>> > the same host.
>> > oneadmin at custom2:/srv/cloud/templates/vm$ onehost list
>> >   ID NAME              CLUSTER  RVM   TCPU   FCPU   ACPU    TMEM    FMEM
>> > STAT
>> >    2 custom7.sns.it    default    2    200    200    200      2G      0K
>> > on
>> >    1 custom6.sns.it    default    1    200    200    200      2G      0K
>> > on
>> > Both the hosts and the VMs report no useful info on the resource usage.
>> > Logging to the VM of each console and executing "free -m" command I
>> > checked
>> > that every VM has 1GB of total memory allocated. So i decided to test
>> > the GB
>> > of memory on both VM at the same time using the utility called
>> > "memtester"
>> > which allocate a given amount of free memory using malloc and test it.
>> > The
>> > results reported memory access problems.
>> > I decided here to go on and check if OpenNebula and VMWare ESXi fail to
>> > allocate VMs exceeding the resource capacity of the hosts, by starting
>> > two
>> > more VMs (requiring 1VCPU and 1GB memory each).
>> > Results:
>> > oneadmin at custom2:~/var/79$ onevm list
>> >    ID     USER     NAME STAT CPU     MEM        HOSTNAME        TIME
>> >    76 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:54:47
>> >    77 oneadmin Debian S runn   0      0K  custom6.sns.it 00 00:54:45
>> >    78 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:53:05
>> >    79 oneadmin Debian S boot   0      0K  custom7.sns.it 00 00:10:22
>> >    80 oneadmin Debian S boot   0      0K  custom7.sns.it 00 00:09:47
>> > The new VM are allocated on custom7 machine (why???) but remain frozen
>> > on
>> > "boot" state.
>> > That is a problem because those two new VM should not be allocated to
>> > any
>> > cluster node.
>> > THIRD TEST:
>> > Here I followed Ruben suggestion...
>> > The VM template:
>> > oneadmin at custom2:/srv/cloud/templates/vm$ cat debian.vm
>> > NAME = "Debian Server"
>> > CPU = 1
>> > VCPU = 1
>> > MEMORY = 1024
>> > OS = [ ARCH = "i686" ]
>> > DISK = [IMAGE="Debian Server"]
>> > Both CPU/VCPU and MEMORY info.
>> > Output with 3 VM:
>> > oneadmin at custom2:~/var$ onevm list
>> >    ID     USER     NAME STAT CPU     MEM        HOSTNAME        TIME
>> >    81 oneadmin Debian S pend   0      0K                 00 00:02:32
>> >    82 oneadmin Debian S pend   0      0K                 00 00:02:30
>> >    83 oneadmin Debian S pend   0      0K                 00 00:02:29
>> > As in FIRST TEST the VMs don't get scheduled and remain in "pending"
>> > state.
>> > sched.log repeats this message:
>> > Tue Feb  8 12:00:05 2011 [HOST][D]: Discovered Hosts (enabled): 1 2
>> > Tue Feb  8 12:00:05 2011 [VM][D]: Pending virtual machines : 81 82 83
>> > Tue Feb  8 12:00:05 2011 [RANK][W]: No rank defined for VM
>> > Tue Feb  8 12:00:05 2011 [RANK][W]: No rank defined for VM
>> > Tue Feb  8 12:00:05 2011 [RANK][W]: No rank defined for VM
>> > Tue Feb  8 12:00:05 2011 [SCHED][I]: Select hosts
>> > PRI HID
>> > -------------------
>> > Virtual Machine: 81
>> > Virtual Machine: 82
>> > Virtual Machine: 83
>> > Here I assumed that probably I should not declare the number of physical
>> > CPU
>> > in the VM template.
>> > Another last test...
>> > FOURTH TEST:
>> > Here I disabled an host, custom6, and started 3 VMs.
>> > The VM template is the one that worked before:
>> > oneadmin at custom2:/srv/cloud/templates/vm$ cat debian.vm
>> > NAME = "Debian Server"
>> > VCPU = 1
>> > MEMORY = 1024
>> > OS = [ ARCH = "i686" ]
>> > DISK = [IMAGE="Debian Server"]
>> > Output:
>> > oneadmin at custom2:~$ onehost list
>> >   ID NAME              CLUSTER  RVM   TCPU   FCPU   ACPU    TMEM    FMEM
>> > STAT
>> >    2 custom7.sns.it    default    3    200    200    200      2G      0K
>> > on
>> >    1 custom6.sns.it    default    0    200    200    200      2G      0K
>> >  off
>> > oneadmin at custom2:~$ onevm list
>> >    ID     USER     NAME STAT CPU     MEM        HOSTNAME        TIME
>> >    92 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:12:53
>> >    93 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:12:46
>> >    94 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:12:46
>> > I verified if the VM were up and running by logging to the console of
>> > each
>> > one of them through vSphere Client and they were all running and
>> > declaring
>> > an amount of 1GB of total memory on each one of them. Since there is
>> > less
>> > than 1.2 GB of memory effectively free on a cluster node before the VMs
>> > instantiation how can those VMs run consistently? Why OpenNebula
>> > schedule
>> > those VM on the same machine exceeding even the host resource capacity?
>> > On Fri, Feb 4, 2011 at 11:04 PM, Ruben S. Montero <rubensm at dacya.ucm.es>
>> > wrote:
>> >>
>> >> Hi,
>> >> You have to add also de CPU capacity for the VM (apart from the number
>> >> of
>> >> virtual cpus CPUs). The CPU value is used at the allocation phase.
>> >> However
>> >> you are specifying MEMORY and should be included in the allocated
>> >> memeory
>> >> (USED MEMORY in onehost show) So I guess there should be other problem
>> >> with
>> >> your template.
>> >> Cheers
>> >> Ruben
>> >>
>> >> On Fri, Feb 4, 2011 at 10:50 AM, Luigi Fortunati
>> >> <luigi.fortunati at gmail.com> wrote:
>> >>>
>> >>> I can post the VM template content on monday. However, as far as I
>> >>> remember, the vm template was really simple:
>> >>> NAME="Debian"
>> >>> VCPU= 2
>> >>> MEMORY=1024
>> >>> DISK=[IMAGE="Debian5-i386"]
>> >>> OS=[ARCH=i686]
>> >>> The VMs can boot and run, I can log on console through vSphere Client
>> >>> on
>> >>> the newly created VMs.
>> >>> I noticed that if you don't declare the number on VCPU the VM doesn't
>> >>> get
>> >>> scheduled on a cluster node. This option seems mandatory but I didn't
>> >>> find
>> >>> any mention about it on the documentation.
>> >>> Another thing that seems mandatory is declaring the cpu architecture
>> >>> as
>> >>> i686, otherwise OpenNebula will return error when writing the
>> >>> deployment.0
>> >>> file.
>> >>>
>> >>> On Thu, Feb 3, 2011 at 5:42 PM, Ruben S. Montero
>> >>> <rubensm at dacya.ucm.es>
>> >>> wrote:
>> >>>>
>> >>>> Hi,
>> >>>> I am not sure this is related to the VMware monitoring... Can you
>> >>>> send
>> >>>> the VM Templates?
>> >>>> Thanks
>> >>>> Ruben
>> >>>>
>> >>>> On Thu, Feb 3, 2011 at 5:10 PM, Luigi Fortunati
>> >>>> <luigi.fortunati at gmail.com> wrote:
>> >>>>>
>> >>>>> Hi,
>> >>>>> I noticed a serious problem about the usage of VMWare ESXi 4.1 and
>> >>>>> OpenNebula 2.0.1.
>> >>>>> I'm actually using the VMWare driver addon which can be found on the
>> >>>>> opennebula website (ver. 1.0) and libvirt (ver. 0.8.7).
>> >>>>> It happens that OpenNebula can't get information about the usage of
>> >>>>> resources on the cluster nodes.
>> >>>>> By running 2 VM (each one requires 2 VCPU and 1 GB of memory) and
>> >>>>> executing some commands I get this output.
>> >>>>> oneadmin at custom2:~/src$ onehost list
>> >>>>>   ID NAME              CLUSTER  RVM   TCPU   FCPU   ACPU    TMEM
>> >>>>>  FMEM STAT
>> >>>>>    2 custom7.sns.it    default    0    200    200    200      2G
>> >>>>>  0K  off
>> >>>>>    1 custom6.sns.it    default    2    200    200    200      2G
>> >>>>>  0K   on
>> >>>>> oneadmin at custom2:~/src$ onehost show 1
>> >>>>> HOST 1 INFORMATION
>> >>>>>
>> >>>>> ID                    : 1
>> >>>>> NAME                  : custom6.sns.it
>> >>>>> CLUSTER               : default
>> >>>>> STATE                 : MONITORED
>> >>>>> IM_MAD                : im_vmware
>> >>>>> VM_MAD                : vmm_vmware
>> >>>>> TM_MAD                : tm_vmware
>> >>>>> HOST SHARES
>> >>>>>
>> >>>>> MAX MEM               : 2096460
>> >>>>> USED MEM (REAL)       : 0
>> >>>>> USED MEM (ALLOCATED)  : 0
>> >>>>> MAX CPU               : 200
>> >>>>> USED CPU (REAL)       : 0
>> >>>>> USED CPU (ALLOCATED)  : 0
>> >>>>> RUNNING VMS           : 2
>> >>>>> MONITORING INFORMATION
>> >>>>>
>> >>>>> CPUSPEED=1992
>> >>>>> HYPERVISOR=vmware
>> >>>>> TOTALCPU=200
>> >>>>> TOTALMEMORY=2096460
>> >>>>> As you can see OpenNebula is unable to get correct information about
>> >>>>> the usage of resources on the cluster nodes.
>> >>>>> As these informations are used by the VM scheduler, OpenNebula is
>> >>>>> unable to schedule the VM correctly.
>> >>>>> I tried to create several VM and all of them were placed on the same
>> >>>>> host even if the latter was unable to satisfy the resource
>> >>>>> requirements of
>> >>>>> all the VMs.
>> >>>>> I think that this problem is strongly related to libvirt as
>> >>>>> OpenNebula
>> >>>>> use it to recover information about hosts and vm.
>> >>>>> Do you get the same behavior? Do you know if there is a way to solve
>> >>>>> this big issue?
>> >>>>> --
>> >>>>> Luigi Fortunati
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> Users mailing list
>> >>>>> Users at lists.opennebula.org
>> >>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Dr. Ruben Santiago Montero
>> >>>> Associate Professor (Profesor Titular), Complutense University of
>> >>>> Madrid
>> >>>>
>> >>>> URL: http://dsa-research.org/doku.php?id=people:ruben
>> >>>> Weblog: http://blog.dsa-research.org/?author=7
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Luigi Fortunati
>> >>
>> >>
>> >>
>> >> --
>> >> Dr. Ruben Santiago Montero
>> >> Associate Professor (Profesor Titular), Complutense University of
>> >> Madrid
>> >>
>> >> URL: http://dsa-research.org/doku.php?id=people:ruben
>> >> Weblog: http://blog.dsa-research.org/?author=7
>> >
>> >
>> >
>> > --
>> > Luigi Fortunati
>> >
>> > _______________________________________________
>> > Users mailing list
>> > Users at lists.opennebula.org
>> > http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>> >
>> >
>
>
>
> --
> Luigi Fortunati
>