[one-users] Incomplete information from hosts polling (VMWare ESXi 4.1 an OpenNebula 2.0.1)

Fri Feb 11 07:02:13 PST 2011

Thanks Tino!
In contrast with what it's written on the documentation *it is possible* to
connect to the ESXi hypervisors machines via ssh and launch commands (but
only as root user). I noticed that on the ESXi 4.1 machines that we got
installed there is a nice program called esxtop which can also be executed
in batch mode. That command can output more of the informations that are
needed by opennebula to work and schedule VMs correctly. I believe that
maybe it's a good idea to rethink the IM Driver in order to gather
information about resource usage using esxtop instead of virsh commands.
With the latter I didn't find a command capable of retrieving the memory
usage of the hypervisor, which is 800 MB in my case.

On Fri, Feb 11, 2011 at 3:10 PM, Tino Vazquez <tinova at opennebula.org> wrote:

> Hi Luigi,
>
> I've updated the ticket, I will be implementing this for the next release.
>
> Regards,
>
> -Tino
>
> --
> Constantino Vázquez Blanco, MSc
> OpenNebula Major Contributor  / Cloud Researcher
> www.OpenNebula.org | @tinova79
>
>
>
> On Wed, Feb 9, 2011 at 3:14 PM, Luigi Fortunati
> <luigi.fortunati at gmail.com> wrote:
> > Thanks Tino,
> > That is probably more a problem of libvirt, since VMWare IM Driver use it
> in
> > order to access information about the hosts.
> > In order to get information about the hosts OpenNebula launches a virsh
> > command and parses the output.
> > The script that does this work is located in $ONE_LOCATION/lib/remotes/im
> > and the output of the virsh command is:
> > oneadmin at custom2:~/lib/remotes/im$ virsh -c
> > esx://custom6.sns.it/?no_verify=1 nodeinfo
> > Enter username for custom6.sns.it [root]:
> > Enter root's password for custom6.sns.it:
> > CPU model:           AMD Opteron(tm) Processor 246
> > CPU(s):              2
> > CPU frequency:       1992 MHz
> > CPU socket(s):       2
> > Core(s) per socket:  1
> > Thread(s) per core:  1
> > NUMA cell(s):        2
> > Memory size:         2096460 kB
> > I always get the same output, no matter how many VMs are running on the
> > cluster node.
> > That is why OpenNebula returns with an output like this:
> > oneadmin at custom2:~/var/96$ onehost show 1
> > HOST 1 INFORMATION
> >
> > ID                    : 1
> > NAME                  : custom6.sns.it
> > CLUSTER               : default
> > STATE                 : MONITORING
> > IM_MAD                : im_vmware
> > VM_MAD                : vmm_vmware
> > TM_MAD                : tm_vmware
> > HOST SHARES
> >
> > MAX MEM               : 2096460
> > USED MEM (REAL)       : 0
> > USED MEM (ALLOCATED)  : 0
> > MAX CPU               : 200
> > USED CPU (REAL)       : 0
> > USED CPU (ALLOCATED)  : 0
> > RUNNING VMS           : 1
> > MONITORING INFORMATION
> >
> > CPUSPEED=1992
> > HYPERVISOR=vmware
> > TOTALCPU=200
> > TOTALMEMORY=2096460
> > OpenNebula polls cluster nodes periodically and gets only information
> about
> > hypervisor type, cpu frequency, total cpu, total memory size.
> > The limitation here is caused by libvirt (virsh) which is unable to
> return
> > more information about the actual usage of resources.
> > The integration of OpenNebula with Xen can rely on ssh access to the
> cluster
> > nodes.
> > The IM Driver for Xen hypervisors, launches xentop on every cluster node
> in
> > order to get information about the VMs and then parses the output.
> > As an example here is the output of commands xm and xentop (some info is
> > purged):
> > custom9:/ # xentop -bi2
> >       NAME  STATE   CPU(sec) CPU(%)     MEM(k) MEM(%)  MAXMEM(k)
> MAXMEM(%)
> > VCPUS NETS NETTX(k) NETRX(k)
> >   Domain-0 -----r        102    0.0    1930260   93.7   no limit
> n/a
> >   2    0        0        0
> >       NAME  STATE   CPU(sec) CPU(%)     MEM(k) MEM(%)  MAXMEM(k)
> MAXMEM(%)
> > VCPUS NETS NETTX(k) NETRX(k)
> >   Domain-0 -----r        102    0.3    1930260   93.7   no limit
> n/a
> >   2    0        0        0
> > custom9:/ # xm info
> > host                   : custom9
> > release                : 2.6.34.7-0.5-xen
> > version                : #1 SMP 2010-10-25 08:40:12 +0200
> > machine                : x86_64
> > nr_cpus                : 2
> > nr_nodes               : 2
> > cores_per_socket       : 1
> > threads_per_core       : 1
> > cpu_mhz                : 1991
> > [...]
> > total_memory           : 2011
> > free_memory            : 135
> > free_cpus              : 0
> > max_free_memory        : 1508
> > max_para_memory        : 1504
> > max_hvm_memory         : 1492
> > [...]
> > The script $ONE_LOCATION/lib/remotes/im/xen.d/xen.rb parses those two
> > outputs and retrieves data about memory, cpu, and network usage.
> > I think that VMWare drivers are scarcely useful if they can't provide the
> > degree of information which can be achieved with xen hypervisors and
> > OpenNebula, I've tested the effects of this issue in my tests.
> > On Tue, Feb 8, 2011 at 6:34 PM, Tino Vazquez <tinova at opennebula.org>
> wrote:
> >>
> >> Hi Luigi,
> >>
> >> There is a bug in the IM driver for VMware, is not reporting the Free
> >> memory at all. I've opened a ticket to keep track of the issue [1], it
> >> will be solved in the next release.
> >>
> >> Regards,
> >>
> >> -Tino
> >>
> >> [1] http://dev.opennebula.org/issues/481
> >>
> >> --
> >> Constantino Vázquez Blanco, MSc
> >> OpenNebula Major Contributor  / Cloud Researcher
> >> www.OpenNebula.org | @tinova79
> >>
> >>
> >>
> >> On Tue, Feb 8, 2011 at 12:56 PM, Luigi Fortunati
> >> <luigi.fortunati at gmail.com> wrote:
> >> > Ok, I tried some tests today.
> >> > The hardware/software environment includes 2 cluster nodes (ESXi 4.1),
> >> > 2Gb
> >> > of RAM, 2 AMD Opteron 246 Processors (2GHz), trial version licenses.
> The
> >> > opennebula installation is self-contained.
> >> > 800MB of memory are used by the hypervisor itself (that info comes
> from
> >> > vSphere Client) so only 1,2 GB are free, but OpenNebula seems unaware
> of
> >> > that :-(
> >> > oneadmin at custom2:/srv/cloud/templates/vm$ onehost list
> >> >   ID NAME              CLUSTER  RVM   TCPU   FCPU   ACPU    TMEM
>  FMEM
> >> > STAT
> >> >    2 custom7.sns.it    default    0    200    200    200      2G
>  0K
> >> > on
> >> >    1 custom6.sns.it    default    0    200    200    200      2G
>  0K
> >> > on
> >> > oneadmin at custom2:/srv/cloud/templates/vm$ onehost show 1
> >> > HOST 1 INFORMATION
> >> >
> >> > ID                    : 1
> >> > NAME                  : custom6.sns.it
> >> > CLUSTER               : default
> >> > STATE                 : MONITORED
> >> > IM_MAD                : im_vmware
> >> > VM_MAD                : vmm_vmware
> >> > TM_MAD                : tm_vmware
> >> > HOST SHARES
> >> >
> >> > MAX MEM               : 2096460
> >> > USED MEM (REAL)       : 0
> >> > USED MEM (ALLOCATED)  : 0
> >> > MAX CPU               : 200
> >> > USED CPU (REAL)       : 0
> >> > USED CPU (ALLOCATED)  : 0
> >> >
> >> > In each test I tried to start 3 VM using a nonpersistent image. The
> >> > requirements of all of the three VM cannot be satisfied by a single
> >> > cluster
> >> > node.
> >> > FIRST TEST:
> >> > The VM template for the first test is:
> >> > NAME = "Debian Server"
> >> > CPU = 1
> >> > MEMORY = 1024
> >> > OS = [ ARCH = "i686" ]
> >> > DISK = [IMAGE="Debian Server"]
> >> > Only CPU and Memory info.
> >> > Here is the result:
> >> > oneadmin at custom2:/srv/cloud/templates/vm$ onevm list
> >> >    ID     USER     NAME STAT CPU     MEM        HOSTNAME        TIME
> >> >    66 oneadmin Debian S pend   0      0K                 00 00:07:47
> >> >    67 oneadmin Debian S pend   0      0K                 00 00:07:45
> >> >    68 oneadmin Debian S pend   0      0K                 00 00:07:18
> >> > Forever in "pending" state... The VMs don't get scheduled
> >> > oned.log doesn't report anything but resource polling informational
> >> > messages.
> >> > sched.log repeats this sequence:
> >> > Tue Feb  8 10:02:06 2011 [HOST][D]: Discovered Hosts (enabled): 1 2
> >> > Tue Feb  8 10:02:06 2011 [VM][D]: Pending virtual machines : 66 67 68
> >> > Tue Feb  8 10:02:06 2011 [RANK][W]: No rank defined for VM
> >> > Tue Feb  8 10:02:06 2011 [RANK][W]: No rank defined for VM
> >> > Tue Feb  8 10:02:06 2011 [RANK][W]: No rank defined for VM
> >> > Tue Feb  8 10:02:06 2011 [SCHED][I]: Select hosts
> >> >         PRI     HID
> >> >         -------------------
> >> > Virtual Machine: 66
> >> > Virtual Machine: 67
> >> > Virtual Machine: 68
> >> > SECOND TEST:
> >> > VM template:
> >> > NAME = "Debian Server"
> >> > VCPU = 1
> >> > MEMORY = 1024
> >> > OS = [ ARCH = "i686" ]
> >> > DISK = [IMAGE="Debian Server"]
> >> > Only VCPU and MEMORY info.
> >> > Results:
> >> > oneadmin at custom2:/srv/cloud/templates/vm$ onevm list
> >> >    ID     USER     NAME STAT CPU     MEM        HOSTNAME        TIME
> >> >    76 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:07:40
> >> >    77 oneadmin Debian S runn   0      0K  custom6.sns.it 00 00:07:38
> >> >    78 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:05:58
> >> > Everything seems fine, but it's not since, as I said previously, each
> >> > host
> >> > has only 1.2 GB of memory free, so there's should be no space for two
> >> > VMs on
> >> > the same host.
> >> > oneadmin at custom2:/srv/cloud/templates/vm$ onehost list
> >> >   ID NAME              CLUSTER  RVM   TCPU   FCPU   ACPU    TMEM
>  FMEM
> >> > STAT
> >> >    2 custom7.sns.it    default    2    200    200    200      2G
>  0K
> >> > on
> >> >    1 custom6.sns.it    default    1    200    200    200      2G
>  0K
> >> > on
> >> > Both the hosts and the VMs report no useful info on the resource
> usage.
> >> > Logging to the VM of each console and executing "free -m" command I
> >> > checked
> >> > that every VM has 1GB of total memory allocated. So i decided to test
> >> > the GB
> >> > of memory on both VM at the same time using the utility called
> >> > "memtester"
> >> > which allocate a given amount of free memory using malloc and test it.
> >> > The
> >> > results reported memory access problems.
> >> > I decided here to go on and check if OpenNebula and VMWare ESXi fail
> to
> >> > allocate VMs exceeding the resource capacity of the hosts, by starting
> >> > two
> >> > more VMs (requiring 1VCPU and 1GB memory each).
> >> > Results:
> >> > oneadmin at custom2:~/var/79$ onevm list
> >> >    ID     USER     NAME STAT CPU     MEM        HOSTNAME        TIME
> >> >    76 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:54:47
> >> >    77 oneadmin Debian S runn   0      0K  custom6.sns.it 00 00:54:45
> >> >    78 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:53:05
> >> >    79 oneadmin Debian S boot   0      0K  custom7.sns.it 00 00:10:22
> >> >    80 oneadmin Debian S boot   0      0K  custom7.sns.it 00 00:09:47
> >> > The new VM are allocated on custom7 machine (why???) but remain frozen
> >> > on
> >> > "boot" state.
> >> > That is a problem because those two new VM should not be allocated to
> >> > any
> >> > cluster node.
> >> > THIRD TEST:
> >> > Here I followed Ruben suggestion...
> >> > The VM template:
> >> > oneadmin at custom2:/srv/cloud/templates/vm$ cat debian.vm
> >> > NAME = "Debian Server"
> >> > CPU = 1
> >> > VCPU = 1
> >> > MEMORY = 1024
> >> > OS = [ ARCH = "i686" ]
> >> > DISK = [IMAGE="Debian Server"]
> >> > Both CPU/VCPU and MEMORY info.
> >> > Output with 3 VM:
> >> > oneadmin at custom2:~/var$ onevm list
> >> >    ID     USER     NAME STAT CPU     MEM        HOSTNAME        TIME
> >> >    81 oneadmin Debian S pend   0      0K                 00 00:02:32
> >> >    82 oneadmin Debian S pend   0      0K                 00 00:02:30
> >> >    83 oneadmin Debian S pend   0      0K                 00 00:02:29
> >> > As in FIRST TEST the VMs don't get scheduled and remain in "pending"
> >> > state.
> >> > sched.log repeats this message:
> >> > Tue Feb  8 12:00:05 2011 [HOST][D]: Discovered Hosts (enabled): 1 2
> >> > Tue Feb  8 12:00:05 2011 [VM][D]: Pending virtual machines : 81 82 83
> >> > Tue Feb  8 12:00:05 2011 [RANK][W]: No rank defined for VM
> >> > Tue Feb  8 12:00:05 2011 [RANK][W]: No rank defined for VM
> >> > Tue Feb  8 12:00:05 2011 [RANK][W]: No rank defined for VM
> >> > Tue Feb  8 12:00:05 2011 [SCHED][I]: Select hosts
> >> > PRI HID
> >> > -------------------
> >> > Virtual Machine: 81
> >> > Virtual Machine: 82
> >> > Virtual Machine: 83
> >> > Here I assumed that probably I should not declare the number of
> physical
> >> > CPU
> >> > in the VM template.
> >> > Another last test...
> >> > FOURTH TEST:
> >> > Here I disabled an host, custom6, and started 3 VMs.
> >> > The VM template is the one that worked before:
> >> > oneadmin at custom2:/srv/cloud/templates/vm$ cat debian.vm
> >> > NAME = "Debian Server"
> >> > VCPU = 1
> >> > MEMORY = 1024
> >> > OS = [ ARCH = "i686" ]
> >> > DISK = [IMAGE="Debian Server"]
> >> > Output:
> >> > oneadmin at custom2:~$ onehost list
> >> >   ID NAME              CLUSTER  RVM   TCPU   FCPU   ACPU    TMEM
>  FMEM
> >> > STAT
> >> >    2 custom7.sns.it    default    3    200    200    200      2G
>  0K
> >> > on
> >> >    1 custom6.sns.it    default    0    200    200    200      2G
>  0K
> >> >  off
> >> > oneadmin at custom2:~$ onevm list
> >> >    ID     USER     NAME STAT CPU     MEM        HOSTNAME        TIME
> >> >    92 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:12:53
> >> >    93 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:12:46
> >> >    94 oneadmin Debian S runn   0      0K  custom7.sns.it 00 00:12:46
> >> > I verified if the VM were up and running by logging to the console of
> >> > each
> >> > one of them through vSphere Client and they were all running and
> >> > declaring
> >> > an amount of 1GB of total memory on each one of them. Since there is
> >> > less
> >> > than 1.2 GB of memory effectively free on a cluster node before the
> VMs
> >> > instantiation how can those VMs run consistently? Why OpenNebula
> >> > schedule
> >> > those VM on the same machine exceeding even the host resource
> capacity?
> >> > On Fri, Feb 4, 2011 at 11:04 PM, Ruben S. Montero <
> rubensm at dacya.ucm.es>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >> You have to add also de CPU capacity for the VM (apart from the
> number
> >> >> of
> >> >> virtual cpus CPUs). The CPU value is used at the allocation phase.
> >> >> However
> >> >> you are specifying MEMORY and should be included in the allocated
> >> >> memeory
> >> >> (USED MEMORY in onehost show) So I guess there should be other
> problem
> >> >> with
> >> >> your template.
> >> >> Cheers
> >> >> Ruben
> >> >>
> >> >> On Fri, Feb 4, 2011 at 10:50 AM, Luigi Fortunati
> >> >> <luigi.fortunati at gmail.com> wrote:
> >> >>>
> >> >>> I can post the VM template content on monday. However, as far as I
> >> >>> remember, the vm template was really simple:
> >> >>> NAME="Debian"
> >> >>> VCPU= 2
> >> >>> MEMORY=1024
> >> >>> DISK=[IMAGE="Debian5-i386"]
> >> >>> OS=[ARCH=i686]
> >> >>> The VMs can boot and run, I can log on console through vSphere
> Client
> >> >>> on
> >> >>> the newly created VMs.
> >> >>> I noticed that if you don't declare the number on VCPU the VM
> doesn't
> >> >>> get
> >> >>> scheduled on a cluster node. This option seems mandatory but I
> didn't
> >> >>> find
> >> >>> any mention about it on the documentation.
> >> >>> Another thing that seems mandatory is declaring the cpu architecture
> >> >>> as
> >> >>> i686, otherwise OpenNebula will return error when writing the
> >> >>> deployment.0
> >> >>> file.
> >> >>>
> >> >>> On Thu, Feb 3, 2011 at 5:42 PM, Ruben S. Montero
> >> >>> <rubensm at dacya.ucm.es>
> >> >>> wrote:
> >> >>>>
> >> >>>> Hi,
> >> >>>> I am not sure this is related to the VMware monitoring... Can you
> >> >>>> send
> >> >>>> the VM Templates?
> >> >>>> Thanks
> >> >>>> Ruben
> >> >>>>
> >> >>>> On Thu, Feb 3, 2011 at 5:10 PM, Luigi Fortunati
> >> >>>> <luigi.fortunati at gmail.com> wrote:
> >> >>>>>
> >> >>>>> Hi,
> >> >>>>> I noticed a serious problem about the usage of VMWare ESXi 4.1 and
> >> >>>>> OpenNebula 2.0.1.
> >> >>>>> I'm actually using the VMWare driver addon which can be found on
> the
> >> >>>>> opennebula website (ver. 1.0) and libvirt (ver. 0.8.7).
> >> >>>>> It happens that OpenNebula can't get information about the usage
> of
> >> >>>>> resources on the cluster nodes.
> >> >>>>> By running 2 VM (each one requires 2 VCPU and 1 GB of memory) and
> >> >>>>> executing some commands I get this output.
> >> >>>>> oneadmin at custom2:~/src$ onehost list
> >> >>>>>   ID NAME              CLUSTER  RVM   TCPU   FCPU   ACPU    TMEM
> >> >>>>>  FMEM STAT
> >> >>>>>    2 custom7.sns.it    default    0    200    200    200      2G
> >> >>>>>  0K  off
> >> >>>>>    1 custom6.sns.it    default    2    200    200    200      2G
> >> >>>>>  0K   on
> >> >>>>> oneadmin at custom2:~/src$ onehost show 1
> >> >>>>> HOST 1 INFORMATION
> >> >>>>>
> >> >>>>> ID                    : 1
> >> >>>>> NAME                  : custom6.sns.it
> >> >>>>> CLUSTER               : default
> >> >>>>> STATE                 : MONITORED
> >> >>>>> IM_MAD                : im_vmware
> >> >>>>> VM_MAD                : vmm_vmware
> >> >>>>> TM_MAD                : tm_vmware
> >> >>>>> HOST SHARES
> >> >>>>>
> >> >>>>> MAX MEM               : 2096460
> >> >>>>> USED MEM (REAL)       : 0
> >> >>>>> USED MEM (ALLOCATED)  : 0
> >> >>>>> MAX CPU               : 200
> >> >>>>> USED CPU (REAL)       : 0
> >> >>>>> USED CPU (ALLOCATED)  : 0
> >> >>>>> RUNNING VMS           : 2
> >> >>>>> MONITORING INFORMATION
> >> >>>>>
> >> >>>>> CPUSPEED=1992
> >> >>>>> HYPERVISOR=vmware
> >> >>>>> TOTALCPU=200
> >> >>>>> TOTALMEMORY=2096460
> >> >>>>> As you can see OpenNebula is unable to get correct information
> about
> >> >>>>> the usage of resources on the cluster nodes.
> >> >>>>> As these informations are used by the VM scheduler, OpenNebula is
> >> >>>>> unable to schedule the VM correctly.
> >> >>>>> I tried to create several VM and all of them were placed on the
> same
> >> >>>>> host even if the latter was unable to satisfy the resource
> >> >>>>> requirements of
> >> >>>>> all the VMs.
> >> >>>>> I think that this problem is strongly related to libvirt as
> >> >>>>> OpenNebula
> >> >>>>> use it to recover information about hosts and vm.
> >> >>>>> Do you get the same behavior? Do you know if there is a way to
> solve
> >> >>>>> this big issue?
> >> >>>>> --
> >> >>>>> Luigi Fortunati
> >> >>>>>
> >> >>>>> _______________________________________________
> >> >>>>> Users mailing list
> >> >>>>> Users at lists.opennebula.org
> >> >>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> Dr. Ruben Santiago Montero
> >> >>>> Associate Professor (Profesor Titular), Complutense University of
> >> >>>> Madrid
> >> >>>>
> >> >>>> URL: http://dsa-research.org/doku.php?id=people:ruben
> >> >>>> Weblog: http://blog.dsa-research.org/?author=7
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Luigi Fortunati
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Dr. Ruben Santiago Montero
> >> >> Associate Professor (Profesor Titular), Complutense University of
> >> >> Madrid
> >> >>
> >> >> URL: http://dsa-research.org/doku.php?id=people:ruben
> >> >> Weblog: http://blog.dsa-research.org/?author=7
> >> >
> >> >
> >> >
> >> > --
> >> > Luigi Fortunati
> >> >
> >> > _______________________________________________
> >> > Users mailing list
> >> > Users at lists.opennebula.org
> >> > http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
> >> >
> >> >
> >
> >
> >
> > --
> > Luigi Fortunati
> >
>

-- 
Luigi Fortunati
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20110211/2e9588ce/attachment-0002.htm>