[one-users] Making scheduler allocation aware
Shashank Rachamalla
shashank.rachamalla at hexagrid.com
Thu Nov 11 03:58:46 PST 2010
Hi
What Marlon said regarding RANK=RUNNING_VMS is the exact concern. However, I
will check out the source code once. We are using opennebula 1.4 with the
following template: ( Note that the value of PRIORITY is calculated using
FREEMEMORY, FREECPU and RUNNING_VMS )
DISK=[
clone=no,
source=/mnt/dev_store_100000/images/glpnnu0,
target=hda,
type=disk
]
DISK=[
clone=no,
source=/mnt/dev_store_100000/iso/FABCAC11-768B-4683-EB99-085ECB800000,
target=hdb,
type=cdrom
]
MEMORY=2000
REQUIREMENTS="FREEMEMORY>2048000"
RANK=PRIORITY
NAME=glpnnu
OS=[
boot=cdrom
]
NIC=[
model=e1000,
bridge=br101,
MAC=00:89:10:36:10:26
]
INPUT=[
type=tablet
]
VCPU=1
GRAPHICS=[
port=5971,
type=vnc,
listen=0.0.0.0
]
On 11 November 2010 16:08, <opennebula at nerling.ch> wrote:
> Hi Rubens.
> I'm sure the RANK=-RUNNING_VMS will not apply in such a scenario, because
> the scheduler does not update the RUNNING_VMS after the creation of the VM
> but just after monitoring the host!
> So between this events the RUNNING_VMS value stays unchanged and by my
> experience this Host will become the 'chosen one' for new deployed VMs up to
> the next host monitoring.
> And i'm not really sure if the scheduler sums USED MEMORY (ALLOCATED) and
> the memory used by the VM to prevent overcommiting, we could look in the
> source code for this.
> I must say never have experienced a host who became more VMs as possible.
>
> Best regards
>
> Marlon
>
> Zitat von "Ruben S. Montero" <rubensm at dacya.ucm.es>:
>
>
> Hi,
>>
>> We use the total memory to allocate VM's, this is not going to change
>> between monitoring events. Right now, we are able to schedule multiple
>> VMs in the same scheduling step, to the same host without
>> overcommitting memory.
>>
>> Cheers
>>
>> On Thu, Nov 11, 2010 at 10:42 AM, Shashank Rachamalla
>> <shashank.rachamalla at hexagrid.com> wrote:
>>
>>> Hi
>>>
>>> Thanks for the reply. I agree that a VM will not be allowed to use more
>>> than
>>> what has been specified in its template during creation but I was
>>> referring
>>> to a scenario where memory available on the host is over committed. I
>>> guess
>>> the problem here is with allowing multiple VMs to be dispatched on to a
>>> single host in one schedule cycle.
>>>
>>>
>>> On 11 November 2010 14:47, Ruben S. Montero <rubensm at dacya.ucm.es>
>>> wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> Regarding capacity (CPU, Memory), it is updated every time a VM is
>>>> submitted, so no overcommitment is possible (beyond that specified by
>>>> the CPU attribute in the VM template). This also works in 1.4
>>>>
>>>> Cheers
>>>>
>>>> Ruben
>>>>
>>>> On Thu, Nov 11, 2010 at 10:08 AM, <opennebula at nerling.ch> wrote:
>>>> > Hallo Shashank.
>>>> > I'm having the same problem in 1.4.
>>>> > You must workaround it by youself, so instead of using onevm directly,
>>>> > use
>>>> > it with a wrapper script who looks and waits for deploy if a VM is
>>>> > pending.
>>>> > I hope this behaviour is fixed on 2.0 (Hallo developers??)
>>>> >
>>>> > Best regards
>>>> >
>>>> > Marlon
>>>> > Zitat von Shashank Rachamalla <shashank.rachamalla at hexagrid.com>:
>>>> >
>>>> >> Hi Javier
>>>> >>
>>>> >> Thanks for the inputs but I came across another problem while
>>>> testing:
>>>> >>
>>>> >> If opennebula receives multiple vm requests in a short span of time,
>>>> >> the
>>>> >> scheduler might take decisions for all these vms considering the host
>>>> >> monitoring information available from the last monitoring cycle.
>>>> >> Ideally,
>>>> >> before processing every pending request, fresh host monitoring
>>>> >> information
>>>> >> has to be taken into account as the previous set of requests might
>>>> have
>>>> >> already changed the host’s state. This can result in over committing
>>>> >> when
>>>> >> host is being used close to its full capacity.
>>>> >>
>>>> >> *Is there any workaround which helps the scheduler to overcome the
>>>> >> above
>>>> >> problem ?*
>>>> >>
>>>> >> steps to reproduce the problem scenario:
>>>> >>
>>>> >> Host 1 : Total memory = 3GB
>>>> >> Host 2 : Total memory = 2GB
>>>> >> Assume Host1 and Host2 have same number of CPU cores. ( Host1 will
>>>> have
>>>> >> a
>>>> >> higher RANK value )
>>>> >>
>>>> >> VM1: memory = 2GB
>>>> >> VM2: memroy = 2GB
>>>> >>
>>>> >> Start VM1 and VM2 immediately one after the other. Both VM1 and VM2
>>>> >> will
>>>> >> come up on Host1. ( Thus over committing )
>>>> >>
>>>> >> Start VM1 and VM2 with an intermediate delay of 60sec. VM1 will come
>>>> up
>>>> >> on
>>>> >> Host1 and VM2 will come up on Host2. This is true because opennebula
>>>> >> would
>>>> >> have fetched a fresh set of host monitoring information in that time.
>>>> >>
>>>> >>
>>>> >> On 4 November 2010 02:04, Javier Fontan <jfontan at gmail.com> wrote:
>>>> >>
>>>> >>> Hello,
>>>> >>>
>>>> >>> It looks fine to me. I think that taking out the memory the
>>>> hypervisor
>>>> >>> may be consuming is key to make it work.
>>>> >>>
>>>> >>> Bye
>>>> >>>
>>>> >>> On Wed, Nov 3, 2010 at 8:32 PM, Rangababu Chakravarthula
>>>> >>> <rbabu at hexagrid.com> wrote:
>>>> >>> > Javier
>>>> >>> >
>>>> >>> > Yes we are using KVM and OpenNebula 1.4.
>>>> >>> >
>>>> >>> > We have been having this problem since a long time and we were
>>>> doing
>>>> >>> > all
>>>> >>> > kinds of validations ourselves before submitting the request to
>>>> >>> OpenNebula.
>>>> >>> > (there should be enough memory in the cloud that matches the
>>>> >>> > requested
>>>> >>> > memory & there should be atleast one host that has memory >
>>>> >>> > requested
>>>> >>> memory
>>>> >>> > ) We had to do those because OpenNebula would schedule to an
>>>> >>> > arbitrary
>>>> >>> > host based on the existing logic it had.
>>>> >>> > So at last we thought that we need to make OpenNebula aware of
>>>> >>> > memory
>>>> >>> > allocated of running VM's on the host and started this discussion.
>>>> >>> >
>>>> >>> > Thanks for taking up this issue as priority. Appreciate it.
>>>> >>> >
>>>> >>> > Shashank came up with this patch to kvm.rb. Please take a look and
>>>> >>> > let
>>>> >>> > us
>>>> >>> > know if that will work until we get a permanent solution.
>>>> >>> >
>>>> >>> >
>>>> >>>
>>>> >>>
>>>> >>>
>>>> ====================================================================================
>>>> >>> >
>>>> >>> > $mem_allocated_for_running_vms=0
>>>> >>> > for i in `virsh list|grep running|tr -s ' ' ' '|cut -f2 -d' '` do
>>>> >>> > $dominfo=`virsh dominfo #{i}`
>>>> >>> > $dominfo.split(/\n/).each{|line|
>>>> >>> > if line.match('^Max memory')
>>>> >>> > $mem_allocated_for_running_vms += line.split("
>>>> >>> > ")[2].strip.to_i
>>>> >>> > end
>>>> >>> > }
>>>> >>> > end
>>>> >>> >
>>>> >>> > $mem_used_by_base_hypervisor = [some xyz kb that we want to set
>>>> >>> > aside
>>>> >>> > for
>>>> >>> > hypervisor]
>>>> >>> >
>>>> >>> > $free_memory = $total_memory.to_i - (
>>>> >>> > $mem_allocated_for_running_vms.to_i
>>>> >>> +
>>>> >>> > $mem_used_by_base_hypervisor.to_i )
>>>> >>> >
>>>> >>> >
>>>> >>>
>>>> >>>
>>>> >>>
>>>> ======================================================================================
>>>> >>> >
>>>> >>> > Ranga
>>>> >>> >
>>>> >>> > On Wed, Nov 3, 2010 at 2:16 PM, Javier Fontan <jfontan at gmail.com>
>>>> >>> > wrote:
>>>> >>> >>
>>>> >>> >> Hello,
>>>> >>> >>
>>>> >>> >> Sorry for the delay in the response.
>>>> >>> >>
>>>> >>> >> It looks that the problem is OpenNebula calculating available
>>>> >>> >> memory.
>>>> >>> >> For xen >= 3.2 there is a reliable way to get available memory
>>>> that
>>>> >>> >> is
>>>> >>> >> calling "xm info" and getting "max_free_memory" attribute.
>>>> >>> >> Unfortunately for kvm or xen < 3.2 there is not such attribute. I
>>>> >>> >> suppose you are using kvm as you tell about "free" command.
>>>> >>> >>
>>>> >>> >> I began analyzing the kvm IM probe that gets memory information
>>>> and
>>>> >>> >> there is a problem on the way to get total memory. Here is how it
>>>> >>> >> now
>>>> >>> >> gets memory information:
>>>> >>> >>
>>>> >>> >> TOTALMEMORY: runs virsh info that gets the real physical memory
>>>> >>> >> installed in the machine
>>>> >>> >> FREEMEMORY: runs free command and gets the free column data
>>>> without
>>>> >>> >> buffers and cache
>>>> >>> >> USEDMEMORY: runs top command and gets used memory from it (this
>>>> >>> >> counts
>>>> >>> >> buffers and cache)
>>>> >>> >>
>>>> >>> >> This is a big problem as those values do not match one with
>>>> another
>>>> >>> >> (I
>>>> >>> >> don't really know how I failed to see this before). Here is the
>>>> >>> >> monitoring data from a host without VMs.
>>>> >>> >>
>>>> >>> >> --8<------
>>>> >>> >> TOTALMEMORY=8193988
>>>> >>> >> USEDMEMORY=7819952
>>>> >>> >> FREEMEMORY=7911924
>>>> >>> >> ------>8--
>>>> >>> >>
>>>> >>> >> As you can see it makes no sense at all. Even the TOTALMEMORY
>>>> that
>>>> >>> >> is
>>>> >>> >> got from virsh info is very misleading for oned as the host linux
>>>> >>> >> instance does not have access to all that memory (some is
>>>> consumed
>>>> >>> >> by
>>>> >>> >> the hypervisor itself) as seen calling a free command:
>>>> >>> >>
>>>> >>> >> --8<------
>>>> >>> >> total used free shared buffers
>>>> >>> >> cached
>>>> >>> >> Mem: 8193988 7819192 374796 0 64176
>>>> >>> 7473992
>>>> >>> >> ------>8--
>>>> >>> >>
>>>> >>> >> I am also copying this text as an issue to solve this problem
>>>> >>> >> http://dev.opennebula.org/issues/388. It is masked to be solved
>>>> for
>>>> >>> >> 2.0.1 but the change will be compatible with 1.4 as it seems the
>>>> >>> >> the
>>>> >>> >> only changed needed is the IM problem.
>>>> >>> >>
>>>> >>> >> I can not offer you an immediate solution but we'll try to come
>>>> up
>>>> >>> >> with one as soon as possible.
>>>> >>> >>
>>>> >>> >> Bye
>>>> >>> >>
>>>> >>> >> On Wed, Nov 3, 2010 at 7:08 PM, Rangababu Chakravarthula
>>>> >>> >> <rbabu at hexagrid.com> wrote:
>>>> >>> >> > Hello Javier
>>>> >>> >> > Please let us know if you want us to provide more detailed
>>>> >>> >> > information
>>>> >>> >> > with
>>>> >>> >> > examples?
>>>> >>> >> >
>>>> >>> >> > Ranga
>>>> >>> >> >
>>>> >>> >> > On Fri, Oct 29, 2010 at 9:46 AM, Rangababu Chakravarthula
>>>> >>> >> > <rbabu at hexagrid.com> wrote:
>>>> >>> >> >>
>>>> >>> >> >> Javier
>>>> >>> >> >>
>>>> >>> >> >> We saw that VM's were being deployed to the host where the
>>>> >>> >> >> allocated
>>>> >>> >> >> memory of all the VM's was higher than the available memory on
>>>> >>> >> >> the
>>>> >>> >> >> host.
>>>> >>> >> >>
>>>> >>> >> >> We think OpenNebula is executing free command on the host to
>>>> >>> determine
>>>> >>> >> >> if
>>>> >>> >> >> there is any room and since free would always return the
>>>> actual
>>>> >>> memory
>>>> >>> >> >> that
>>>> >>> >> >> is being consumed and not the allocated, opennebula would push
>>>> >>> >> >> the
>>>> >>> new
>>>> >>> >> >> jobs
>>>> >>> >> >> to the host.
>>>> >>> >> >>
>>>> >>> >> >> That's the reason we want OpenNebula to be aware of memory
>>>> >>> >> >> allocated
>>>> >>> to
>>>> >>> >> >> the VM's on the host.
>>>> >>> >> >>
>>>> >>> >> >> Ranga
>>>> >>> >> >>
>>>> >>> >> >> On Thu, Oct 28, 2010 at 2:02 PM, Javier Fontan
>>>> >>> >> >> <jfontan at gmail.com>
>>>> >>> >> >> wrote:
>>>> >>> >> >>>
>>>> >>> >> >>> Hello,
>>>> >>> >> >>>
>>>> >>> >> >>> Could you describe the problem you had? By default the
>>>> >>> >> >>> scheduler
>>>> >>> will
>>>> >>> >> >>> not overcommit cpu nor memory.
>>>> >>> >> >>>
>>>> >>> >> >>> Bye
>>>> >>> >> >>>
>>>> >>> >> >>> On Thu, Oct 28, 2010 at 4:50 AM, Shashank Rachamalla
>>>> >>> >> >>> <shashank.rachamalla at hexagrid.com> wrote:
>>>> >>> >> >>> > Hi
>>>> >>> >> >>> >
>>>> >>> >> >>> > We have a requirement where in the scheduler should not
>>>> allow
>>>> >>> memory
>>>> >>> >> >>> > over
>>>> >>> >> >>> > committing while choosing a host for new vm. In order to
>>>> >>> >> >>> > achieve
>>>> >>> >> >>> > this,
>>>> >>> >> >>> > we
>>>> >>> >> >>> > have changed the way in which FREEMEMORY is being
>>>> calculated
>>>> >>> >> >>> > for
>>>> >>> >> >>> > each
>>>> >>> >> >>> > host:
>>>> >>> >> >>> >
>>>> >>> >> >>> > FREE MEMORY = TOTAL MEMORY - [ Sum of memory values
>>>> >>> >> >>> > allocated
>>>> >>> >> >>> > to
>>>> >>> >> >>> > VMs
>>>> >>> >> >>> > which
>>>> >>> >> >>> > are currently running on the host ]
>>>> >>> >> >>> >
>>>> >>> >> >>> > Please let us know if the above approach is fine or is
>>>> there
>>>> >>> >> >>> > any
>>>> >>> >> >>> > better
>>>> >>> >> >>> > way
>>>> >>> >> >>> > to accomplish the task. We are using opennebula 1.4.
>>>> >>> >> >>> >
>>>> >>> >> >>> > --
>>>> >>> >> >>> > Regards,
>>>> >>> >> >>> > Shashank Rachamalla
>>>> >>> >> >>> >
>>>> >>> >> >>> > _______________________________________________
>>>> >>> >> >>> > Users mailing list
>>>> >>> >> >>> > Users at lists.opennebula.org
>>>> >>> >> >>> >
>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>> >>> >> >>> >
>>>> >>> >> >>> >
>>>> >>> >> >>>
>>>> >>> >> >>>
>>>> >>> >> >>>
>>>> >>> >> >>> --
>>>> >>> >> >>> Javier Fontan, Grid & Virtualization Technology
>>>> >>> >> >>> Engineer/Researcher
>>>> >>> >> >>> DSA Research Group: http://dsa-research.org
>>>> >>> >> >>> Globus GridWay Metascheduler: http://www.GridWay.org
>>>> >>> >> >>> OpenNebula Virtual Infrastructure Engine:
>>>> >>> >> >>> http://www.OpenNebula.org
>>>> >>> >> >>> _______________________________________________
>>>> >>> >> >>> Users mailing list
>>>> >>> >> >>> Users at lists.opennebula.org
>>>> >>> >> >>>
>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>> >>> >> >>
>>>> >>> >> >
>>>> >>> >> >
>>>> >>> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>> >> --
>>>> >>> >> Javier Fontan, Grid & Virtualization Technology
>>>> Engineer/Researcher
>>>> >>> >> DSA Research Group: http://dsa-research.org
>>>> >>> >> Globus GridWay Metascheduler: http://www.GridWay.org
>>>> >>> >> OpenNebula Virtual Infrastructure Engine:
>>>> http://www.OpenNebula.org
>>>> >>> >
>>>> >>> >
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> Javier Fontan, Grid & Virtualization Technology Engineer/Researcher
>>>> >>> DSA Research Group: http://dsa-research.org
>>>> >>> Globus GridWay Metascheduler: http://www.GridWay.org
>>>> >>> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>>>> >>>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Regards,
>>>> >> Shashank Rachamalla
>>>> >>
>>>> >
>>>> > _______________________________________________
>>>> > Users mailing list
>>>> > Users at lists.opennebula.org
>>>> > http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Dr. Ruben Santiago Montero
>>>> Associate Professor (Profesor Titular), Complutense University of Madrid
>>>>
>>>> URL: http://dsa-research.org/doku.php?id=people:ruben
>>>> Weblog: http://blog.dsa-research.org/?author=7
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at lists.opennebula.org
>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Shashank Rachamalla
>>>
>>>
>>
>>
>> --
>> Dr. Ruben Santiago Montero
>> Associate Professor (Profesor Titular), Complutense University of Madrid
>>
>> URL: http://dsa-research.org/doku.php?id=people:ruben
>> Weblog: http://blog.dsa-research.org/?author=7
>>
>>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
--
Regards,
Shashank Rachamalla
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20101111/43afb8ce/attachment-0003.htm>
More information about the Users
mailing list