[one-users] Incomplete information from hosts polling (VMWare ESXi 4.1 an OpenNebula 2.0.1)
Luigi Fortunati
luigi.fortunati at gmail.com
Wed Feb 9 06:14:16 PST 2011
Thanks Tino,
That is probably more a problem of libvirt, since VMWare IM Driver use it in
order to access information about the hosts.
In order to get information about the hosts OpenNebula launches a virsh
command and parses the output.
The script that does this work is located in $ONE_LOCATION/lib/remotes/im
and the output of the virsh command is:
oneadmin at custom2:~/lib/remotes/im$ virsh -c esx://
custom6.sns.it/?no_verify=1 nodeinfo
Enter username for custom6.sns.it [root]:
Enter root's password for custom6.sns.it:
CPU model: AMD Opteron(tm) Processor 246
CPU(s): 2
CPU frequency: 1992 MHz
CPU socket(s): 2
Core(s) per socket: 1
Thread(s) per core: 1
NUMA cell(s): 2
Memory size: 2096460 kB
I always get the same output, no matter how many VMs are running on the
cluster node.
That is why OpenNebula returns with an output like this:
oneadmin at custom2:~/var/96$ onehost show 1
HOST 1 INFORMATION
ID : 1
NAME : custom6.sns.it
CLUSTER : default
STATE : MONITORING
IM_MAD : im_vmware
VM_MAD : vmm_vmware
TM_MAD : tm_vmware
HOST SHARES
MAX MEM : 2096460
USED MEM (REAL) : 0
USED MEM (ALLOCATED) : 0
MAX CPU : 200
USED CPU (REAL) : 0
USED CPU (ALLOCATED) : 0
RUNNING VMS : 1
MONITORING INFORMATION
CPUSPEED=1992
HYPERVISOR=vmware
TOTALCPU=200
TOTALMEMORY=2096460
OpenNebula polls cluster nodes periodically and gets only information about
hypervisor type, cpu frequency, total cpu, total memory size.
The limitation here is caused by libvirt (virsh) which is unable to return
more information about the actual usage of resources.
The integration of OpenNebula with Xen can rely on ssh access to the cluster
nodes.
The IM Driver for Xen hypervisors, launches xentop on every cluster node in
order to get information about the VMs and then parses the output.
As an example here is the output of commands xm and xentop (some info is
purged):
custom9:/ # xentop -bi2
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%)
VCPUS NETS NETTX(k) NETRX(k)
Domain-0 -----r 102 0.0 1930260 93.7 no limit n/a
2 0 0 0
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%)
VCPUS NETS NETTX(k) NETRX(k)
Domain-0 -----r 102 0.3 1930260 93.7 no limit n/a
2 0 0 0
custom9:/ # xm info
host : custom9
release : 2.6.34.7-0.5-xen
version : #1 SMP 2010-10-25 08:40:12 +0200
machine : x86_64
nr_cpus : 2
nr_nodes : 2
cores_per_socket : 1
threads_per_core : 1
cpu_mhz : 1991
[...]
total_memory : 2011
free_memory : 135
free_cpus : 0
max_free_memory : 1508
max_para_memory : 1504
max_hvm_memory : 1492
[...]
The script $ONE_LOCATION/lib/remotes/im/xen.d/xen.rb parses those two
outputs and retrieves data about memory, cpu, and network usage.
I think that VMWare drivers are scarcely useful if they can't provide the
degree of information which can be achieved with xen hypervisors and
OpenNebula, I've tested the effects of this issue in my tests.
On Tue, Feb 8, 2011 at 6:34 PM, Tino Vazquez <tinova at opennebula.org> wrote:
> Hi Luigi,
>
> There is a bug in the IM driver for VMware, is not reporting the Free
> memory at all. I've opened a ticket to keep track of the issue [1], it
> will be solved in the next release.
>
> Regards,
>
> -Tino
>
> [1] http://dev.opennebula.org/issues/481
>
> --
> Constantino Vázquez Blanco, MSc
> OpenNebula Major Contributor / Cloud Researcher
> www.OpenNebula.org | @tinova79
>
>
>
> On Tue, Feb 8, 2011 at 12:56 PM, Luigi Fortunati
> <luigi.fortunati at gmail.com> wrote:
> > Ok, I tried some tests today.
> > The hardware/software environment includes 2 cluster nodes (ESXi 4.1),
> 2Gb
> > of RAM, 2 AMD Opteron 246 Processors (2GHz), trial version licenses. The
> > opennebula installation is self-contained.
> > 800MB of memory are used by the hypervisor itself (that info comes from
> > vSphere Client) so only 1,2 GB are free, but OpenNebula seems unaware of
> > that :-(
> > oneadmin at custom2:/srv/cloud/templates/vm$ onehost list
> > ID NAME CLUSTER RVM TCPU FCPU ACPU TMEM FMEM
> > STAT
> > 2 custom7.sns.it default 0 200 200 200 2G 0K
> > on
> > 1 custom6.sns.it default 0 200 200 200 2G 0K
> > on
> > oneadmin at custom2:/srv/cloud/templates/vm$ onehost show 1
> > HOST 1 INFORMATION
> >
> > ID : 1
> > NAME : custom6.sns.it
> > CLUSTER : default
> > STATE : MONITORED
> > IM_MAD : im_vmware
> > VM_MAD : vmm_vmware
> > TM_MAD : tm_vmware
> > HOST SHARES
> >
> > MAX MEM : 2096460
> > USED MEM (REAL) : 0
> > USED MEM (ALLOCATED) : 0
> > MAX CPU : 200
> > USED CPU (REAL) : 0
> > USED CPU (ALLOCATED) : 0
> >
> > In each test I tried to start 3 VM using a nonpersistent image. The
> > requirements of all of the three VM cannot be satisfied by a single
> cluster
> > node.
> > FIRST TEST:
> > The VM template for the first test is:
> > NAME = "Debian Server"
> > CPU = 1
> > MEMORY = 1024
> > OS = [ ARCH = "i686" ]
> > DISK = [IMAGE="Debian Server"]
> > Only CPU and Memory info.
> > Here is the result:
> > oneadmin at custom2:/srv/cloud/templates/vm$ onevm list
> > ID USER NAME STAT CPU MEM HOSTNAME TIME
> > 66 oneadmin Debian S pend 0 0K 00 00:07:47
> > 67 oneadmin Debian S pend 0 0K 00 00:07:45
> > 68 oneadmin Debian S pend 0 0K 00 00:07:18
> > Forever in "pending" state... The VMs don't get scheduled
> > oned.log doesn't report anything but resource polling informational
> > messages.
> > sched.log repeats this sequence:
> > Tue Feb 8 10:02:06 2011 [HOST][D]: Discovered Hosts (enabled): 1 2
> > Tue Feb 8 10:02:06 2011 [VM][D]: Pending virtual machines : 66 67 68
> > Tue Feb 8 10:02:06 2011 [RANK][W]: No rank defined for VM
> > Tue Feb 8 10:02:06 2011 [RANK][W]: No rank defined for VM
> > Tue Feb 8 10:02:06 2011 [RANK][W]: No rank defined for VM
> > Tue Feb 8 10:02:06 2011 [SCHED][I]: Select hosts
> > PRI HID
> > -------------------
> > Virtual Machine: 66
> > Virtual Machine: 67
> > Virtual Machine: 68
> > SECOND TEST:
> > VM template:
> > NAME = "Debian Server"
> > VCPU = 1
> > MEMORY = 1024
> > OS = [ ARCH = "i686" ]
> > DISK = [IMAGE="Debian Server"]
> > Only VCPU and MEMORY info.
> > Results:
> > oneadmin at custom2:/srv/cloud/templates/vm$ onevm list
> > ID USER NAME STAT CPU MEM HOSTNAME TIME
> > 76 oneadmin Debian S runn 0 0K custom7.sns.it 00 00:07:40
> > 77 oneadmin Debian S runn 0 0K custom6.sns.it 00 00:07:38
> > 78 oneadmin Debian S runn 0 0K custom7.sns.it 00 00:05:58
> > Everything seems fine, but it's not since, as I said previously, each
> host
> > has only 1.2 GB of memory free, so there's should be no space for two VMs
> on
> > the same host.
> > oneadmin at custom2:/srv/cloud/templates/vm$ onehost list
> > ID NAME CLUSTER RVM TCPU FCPU ACPU TMEM FMEM
> > STAT
> > 2 custom7.sns.it default 2 200 200 200 2G 0K
> > on
> > 1 custom6.sns.it default 1 200 200 200 2G 0K
> > on
> > Both the hosts and the VMs report no useful info on the resource usage.
> > Logging to the VM of each console and executing "free -m" command I
> checked
> > that every VM has 1GB of total memory allocated. So i decided to test the
> GB
> > of memory on both VM at the same time using the utility called
> "memtester"
> > which allocate a given amount of free memory using malloc and test it.
> The
> > results reported memory access problems.
> > I decided here to go on and check if OpenNebula and VMWare ESXi fail to
> > allocate VMs exceeding the resource capacity of the hosts, by starting
> two
> > more VMs (requiring 1VCPU and 1GB memory each).
> > Results:
> > oneadmin at custom2:~/var/79$ onevm list
> > ID USER NAME STAT CPU MEM HOSTNAME TIME
> > 76 oneadmin Debian S runn 0 0K custom7.sns.it 00 00:54:47
> > 77 oneadmin Debian S runn 0 0K custom6.sns.it 00 00:54:45
> > 78 oneadmin Debian S runn 0 0K custom7.sns.it 00 00:53:05
> > 79 oneadmin Debian S boot 0 0K custom7.sns.it 00 00:10:22
> > 80 oneadmin Debian S boot 0 0K custom7.sns.it 00 00:09:47
> > The new VM are allocated on custom7 machine (why???) but remain frozen on
> > "boot" state.
> > That is a problem because those two new VM should not be allocated to any
> > cluster node.
> > THIRD TEST:
> > Here I followed Ruben suggestion...
> > The VM template:
> > oneadmin at custom2:/srv/cloud/templates/vm$ cat debian.vm
> > NAME = "Debian Server"
> > CPU = 1
> > VCPU = 1
> > MEMORY = 1024
> > OS = [ ARCH = "i686" ]
> > DISK = [IMAGE="Debian Server"]
> > Both CPU/VCPU and MEMORY info.
> > Output with 3 VM:
> > oneadmin at custom2:~/var$ onevm list
> > ID USER NAME STAT CPU MEM HOSTNAME TIME
> > 81 oneadmin Debian S pend 0 0K 00 00:02:32
> > 82 oneadmin Debian S pend 0 0K 00 00:02:30
> > 83 oneadmin Debian S pend 0 0K 00 00:02:29
> > As in FIRST TEST the VMs don't get scheduled and remain in "pending"
> state.
> > sched.log repeats this message:
> > Tue Feb 8 12:00:05 2011 [HOST][D]: Discovered Hosts (enabled): 1 2
> > Tue Feb 8 12:00:05 2011 [VM][D]: Pending virtual machines : 81 82 83
> > Tue Feb 8 12:00:05 2011 [RANK][W]: No rank defined for VM
> > Tue Feb 8 12:00:05 2011 [RANK][W]: No rank defined for VM
> > Tue Feb 8 12:00:05 2011 [RANK][W]: No rank defined for VM
> > Tue Feb 8 12:00:05 2011 [SCHED][I]: Select hosts
> > PRI HID
> > -------------------
> > Virtual Machine: 81
> > Virtual Machine: 82
> > Virtual Machine: 83
> > Here I assumed that probably I should not declare the number of physical
> CPU
> > in the VM template.
> > Another last test...
> > FOURTH TEST:
> > Here I disabled an host, custom6, and started 3 VMs.
> > The VM template is the one that worked before:
> > oneadmin at custom2:/srv/cloud/templates/vm$ cat debian.vm
> > NAME = "Debian Server"
> > VCPU = 1
> > MEMORY = 1024
> > OS = [ ARCH = "i686" ]
> > DISK = [IMAGE="Debian Server"]
> > Output:
> > oneadmin at custom2:~$ onehost list
> > ID NAME CLUSTER RVM TCPU FCPU ACPU TMEM FMEM
> > STAT
> > 2 custom7.sns.it default 3 200 200 200 2G 0K
> > on
> > 1 custom6.sns.it default 0 200 200 200 2G 0K
> > off
> > oneadmin at custom2:~$ onevm list
> > ID USER NAME STAT CPU MEM HOSTNAME TIME
> > 92 oneadmin Debian S runn 0 0K custom7.sns.it 00 00:12:53
> > 93 oneadmin Debian S runn 0 0K custom7.sns.it 00 00:12:46
> > 94 oneadmin Debian S runn 0 0K custom7.sns.it 00 00:12:46
> > I verified if the VM were up and running by logging to the console of
> each
> > one of them through vSphere Client and they were all running and
> declaring
> > an amount of 1GB of total memory on each one of them. Since there is less
> > than 1.2 GB of memory effectively free on a cluster node before the VMs
> > instantiation how can those VMs run consistently? Why OpenNebula schedule
> > those VM on the same machine exceeding even the host resource capacity?
> > On Fri, Feb 4, 2011 at 11:04 PM, Ruben S. Montero <rubensm at dacya.ucm.es>
> > wrote:
> >>
> >> Hi,
> >> You have to add also de CPU capacity for the VM (apart from the number
> of
> >> virtual cpus CPUs). The CPU value is used at the allocation phase.
> However
> >> you are specifying MEMORY and should be included in the allocated
> memeory
> >> (USED MEMORY in onehost show) So I guess there should be other problem
> with
> >> your template.
> >> Cheers
> >> Ruben
> >>
> >> On Fri, Feb 4, 2011 at 10:50 AM, Luigi Fortunati
> >> <luigi.fortunati at gmail.com> wrote:
> >>>
> >>> I can post the VM template content on monday. However, as far as I
> >>> remember, the vm template was really simple:
> >>> NAME="Debian"
> >>> VCPU= 2
> >>> MEMORY=1024
> >>> DISK=[IMAGE="Debian5-i386"]
> >>> OS=[ARCH=i686]
> >>> The VMs can boot and run, I can log on console through vSphere Client
> on
> >>> the newly created VMs.
> >>> I noticed that if you don't declare the number on VCPU the VM doesn't
> get
> >>> scheduled on a cluster node. This option seems mandatory but I didn't
> find
> >>> any mention about it on the documentation.
> >>> Another thing that seems mandatory is declaring the cpu architecture as
> >>> i686, otherwise OpenNebula will return error when writing the
> deployment.0
> >>> file.
> >>>
> >>> On Thu, Feb 3, 2011 at 5:42 PM, Ruben S. Montero <rubensm at dacya.ucm.es
> >
> >>> wrote:
> >>>>
> >>>> Hi,
> >>>> I am not sure this is related to the VMware monitoring... Can you send
> >>>> the VM Templates?
> >>>> Thanks
> >>>> Ruben
> >>>>
> >>>> On Thu, Feb 3, 2011 at 5:10 PM, Luigi Fortunati
> >>>> <luigi.fortunati at gmail.com> wrote:
> >>>>>
> >>>>> Hi,
> >>>>> I noticed a serious problem about the usage of VMWare ESXi 4.1 and
> >>>>> OpenNebula 2.0.1.
> >>>>> I'm actually using the VMWare driver addon which can be found on the
> >>>>> opennebula website (ver. 1.0) and libvirt (ver. 0.8.7).
> >>>>> It happens that OpenNebula can't get information about the usage of
> >>>>> resources on the cluster nodes.
> >>>>> By running 2 VM (each one requires 2 VCPU and 1 GB of memory) and
> >>>>> executing some commands I get this output.
> >>>>> oneadmin at custom2:~/src$ onehost list
> >>>>> ID NAME CLUSTER RVM TCPU FCPU ACPU TMEM
> >>>>> FMEM STAT
> >>>>> 2 custom7.sns.it default 0 200 200 200 2G
> >>>>> 0K off
> >>>>> 1 custom6.sns.it default 2 200 200 200 2G
> >>>>> 0K on
> >>>>> oneadmin at custom2:~/src$ onehost show 1
> >>>>> HOST 1 INFORMATION
> >>>>>
> >>>>> ID : 1
> >>>>> NAME : custom6.sns.it
> >>>>> CLUSTER : default
> >>>>> STATE : MONITORED
> >>>>> IM_MAD : im_vmware
> >>>>> VM_MAD : vmm_vmware
> >>>>> TM_MAD : tm_vmware
> >>>>> HOST SHARES
> >>>>>
> >>>>> MAX MEM : 2096460
> >>>>> USED MEM (REAL) : 0
> >>>>> USED MEM (ALLOCATED) : 0
> >>>>> MAX CPU : 200
> >>>>> USED CPU (REAL) : 0
> >>>>> USED CPU (ALLOCATED) : 0
> >>>>> RUNNING VMS : 2
> >>>>> MONITORING INFORMATION
> >>>>>
> >>>>> CPUSPEED=1992
> >>>>> HYPERVISOR=vmware
> >>>>> TOTALCPU=200
> >>>>> TOTALMEMORY=2096460
> >>>>> As you can see OpenNebula is unable to get correct information about
> >>>>> the usage of resources on the cluster nodes.
> >>>>> As these informations are used by the VM scheduler, OpenNebula is
> >>>>> unable to schedule the VM correctly.
> >>>>> I tried to create several VM and all of them were placed on the same
> >>>>> host even if the latter was unable to satisfy the resource
> requirements of
> >>>>> all the VMs.
> >>>>> I think that this problem is strongly related to libvirt as
> OpenNebula
> >>>>> use it to recover information about hosts and vm.
> >>>>> Do you get the same behavior? Do you know if there is a way to solve
> >>>>> this big issue?
> >>>>> --
> >>>>> Luigi Fortunati
> >>>>>
> >>>>> _______________________________________________
> >>>>> Users mailing list
> >>>>> Users at lists.opennebula.org
> >>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Dr. Ruben Santiago Montero
> >>>> Associate Professor (Profesor Titular), Complutense University of
> Madrid
> >>>>
> >>>> URL: http://dsa-research.org/doku.php?id=people:ruben
> >>>> Weblog: http://blog.dsa-research.org/?author=7
> >>>
> >>>
> >>>
> >>> --
> >>> Luigi Fortunati
> >>
> >>
> >>
> >> --
> >> Dr. Ruben Santiago Montero
> >> Associate Professor (Profesor Titular), Complutense University of Madrid
> >>
> >> URL: http://dsa-research.org/doku.php?id=people:ruben
> >> Weblog: http://blog.dsa-research.org/?author=7
> >
> >
> >
> > --
> > Luigi Fortunati
> >
> > _______________________________________________
> > Users mailing list
> > Users at lists.opennebula.org
> > http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
> >
> >
>
--
Luigi Fortunati
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20110209/334db8e4/attachment-0003.htm>
More information about the Users
mailing list