[one-users] Making scheduler allocation aware

Thu Nov 11 02:38:46 PST 2010

Hi Rubens.
I'm sure the RANK=-RUNNING_VMS will not apply in such a scenario,  
because the scheduler does not update the RUNNING_VMS after the  
creation of the VM but just after monitoring  the host!
So between this events the RUNNING_VMS value stays unchanged and by my  
experience this Host will become the 'chosen one' for new deployed VMs  
up to the next host monitoring.
And i'm not really sure if the scheduler sums USED MEMORY (ALLOCATED)  
and the memory used by the VM to prevent overcommiting, we could look  
in the source code for this.
I must say never have experienced a host who became more VMs as possible.

Best regards

Marlon

Zitat von "Ruben S. Montero" <rubensm at dacya.ucm.es>:

> Hi,
>
> We use the total memory to allocate VM's, this is not going to change
> between monitoring events. Right now, we are able to schedule multiple
> VMs in the same scheduling step, to the same host without
> overcommitting memory.
>
> Cheers
>
> On Thu, Nov 11, 2010 at 10:42 AM, Shashank Rachamalla
> <shashank.rachamalla at hexagrid.com> wrote:
>> Hi
>>
>> Thanks for the reply. I agree that a VM will not be allowed to use more than
>> what has been specified in its template during creation but I was referring
>> to a scenario where memory available on the host is over committed. I guess
>> the problem here is with allowing multiple VMs to be dispatched on to a
>> single host in one schedule cycle.
>>
>>
>> On 11 November 2010 14:47, Ruben S. Montero <rubensm at dacya.ucm.es> wrote:
>>>
>>> Hi,
>>>
>>> Regarding capacity (CPU, Memory), it is updated every time a  VM is
>>> submitted, so no overcommitment is possible (beyond that specified by
>>> the CPU attribute in the VM template). This also works in 1.4
>>>
>>> Cheers
>>>
>>> Ruben
>>>
>>> On Thu, Nov 11, 2010 at 10:08 AM,  <opennebula at nerling.ch> wrote:
>>> > Hallo Shashank.
>>> > I'm having the same problem in 1.4.
>>> > You must workaround it by youself, so instead of using onevm directly,
>>> > use
>>> > it with a wrapper script who looks and waits for deploy if a VM is
>>> > pending.
>>> > I hope this behaviour is fixed on 2.0 (Hallo developers??)
>>> >
>>> > Best regards
>>> >
>>> > Marlon
>>> > Zitat von Shashank Rachamalla <shashank.rachamalla at hexagrid.com>:
>>> >
>>> >> Hi Javier
>>> >>
>>> >> Thanks for the inputs but I came across another problem while testing:
>>> >>
>>> >> If opennebula receives multiple vm requests in a short span of time,
>>> >> the
>>> >> scheduler might take decisions for all these vms considering the host
>>> >> monitoring information available from the last monitoring cycle.
>>> >> Ideally,
>>> >> before processing every pending request,  fresh host monitoring
>>> >> information
>>> >> has to be taken into account as the previous set of requests might have
>>> >> already changed the host’s state. This can result in over committing
>>> >> when
>>> >> host is being used close to its full capacity.
>>> >>
>>> >> *Is there any workaround which helps the scheduler to overcome the
>>> >> above
>>> >> problem ?*
>>> >>
>>> >> steps to reproduce the problem scenario:
>>> >>
>>> >> Host 1 : Total memory = 3GB
>>> >> Host 2 : Total memory = 2GB
>>> >> Assume Host1 and Host2 have same number of CPU cores. ( Host1 will have
>>> >> a
>>> >> higher RANK value )
>>> >>
>>> >> VM1: memory = 2GB
>>> >> VM2: memroy = 2GB
>>> >>
>>> >> Start VM1 and VM2 immediately one after the other. Both VM1 and VM2
>>> >> will
>>> >> come up on Host1.  ( Thus over committing )
>>> >>
>>> >> Start VM1 and VM2 with an intermediate delay of 60sec. VM1 will come up
>>> >> on
>>> >> Host1 and VM2 will come up on Host2. This is true because opennebula
>>> >> would
>>> >> have fetched a fresh set of host monitoring information in that time.
>>> >>
>>> >>
>>> >> On 4 November 2010 02:04, Javier Fontan <jfontan at gmail.com> wrote:
>>> >>
>>> >>> Hello,
>>> >>>
>>> >>> It looks fine to me. I think that taking out the memory the hypervisor
>>> >>> may be consuming is key to make it work.
>>> >>>
>>> >>> Bye
>>> >>>
>>> >>> On Wed, Nov 3, 2010 at 8:32 PM, Rangababu Chakravarthula
>>> >>> <rbabu at hexagrid.com> wrote:
>>> >>> > Javier
>>> >>> >
>>> >>> > Yes we are using KVM and OpenNebula 1.4.
>>> >>> >
>>> >>> > We have been having this problem since a long time and we were doing
>>> >>> > all
>>> >>> > kinds of validations ourselves before submitting the request to
>>> >>> OpenNebula.
>>> >>> > (there should  be enough memory in the cloud that matches the
>>> >>> > requested
>>> >>> > memory & there should be atleast one host that has memory >
>>> >>> > requested
>>> >>> memory
>>> >>> > )   We had to do those because OpenNebula would schedule to an
>>> >>> > arbitrary
>>> >>> > host based on the existing logic it had.
>>> >>> > So at last we thought that we need to make OpenNebula aware of
>>> >>> > memory
>>> >>> > allocated of running VM's on the host and started this discussion.
>>> >>> >
>>> >>> > Thanks for taking up this issue as priority. Appreciate it.
>>> >>> >
>>> >>> > Shashank came up with this patch to kvm.rb. Please take a look and
>>> >>> > let
>>> >>> > us
>>> >>> > know if that will work until we get a permanent solution.
>>> >>> >
>>> >>> >
>>> >>>
>>> >>>
>>> >>>  
>>> ====================================================================================
>>> >>> >
>>> >>> > $mem_allocated_for_running_vms=0
>>> >>> > for i in `virsh list|grep running|tr -s ' ' ' '|cut -f2 -d' '` do
>>> >>> >         $dominfo=`virsh dominfo #{i}`
>>> >>> >         $dominfo.split(/\n/).each{|line|
>>> >>> >         if line.match('^Max memory')
>>> >>> >                 $mem_allocated_for_running_vms += line.split("
>>> >>> > ")[2].strip.to_i
>>> >>> >         end
>>> >>> > }
>>> >>> > end
>>> >>> >
>>> >>> > $mem_used_by_base_hypervisor = [some xyz kb that we want to set
>>> >>> > aside
>>> >>> > for
>>> >>> > hypervisor]
>>> >>> >
>>> >>> > $free_memory = $total_memory.to_i - (
>>> >>> > $mem_allocated_for_running_vms.to_i
>>> >>> +
>>> >>> > $mem_used_by_base_hypervisor.to_i )
>>> >>> >
>>> >>> >
>>> >>>
>>> >>>
>>> >>>  
>>> ======================================================================================
>>> >>> >
>>> >>> > Ranga
>>> >>> >
>>> >>> > On Wed, Nov 3, 2010 at 2:16 PM, Javier Fontan <jfontan at gmail.com>
>>> >>> > wrote:
>>> >>> >>
>>> >>> >> Hello,
>>> >>> >>
>>> >>> >> Sorry for the delay in the response.
>>> >>> >>
>>> >>> >> It looks that the problem is OpenNebula calculating available
>>> >>> >> memory.
>>> >>> >> For xen >= 3.2 there is a reliable way to get available memory that
>>> >>> >> is
>>> >>> >> calling "xm info" and getting "max_free_memory" attribute.
>>> >>> >> Unfortunately for kvm or xen < 3.2 there is not such attribute. I
>>> >>> >> suppose you are using kvm as you tell about "free" command.
>>> >>> >>
>>> >>> >> I began analyzing the kvm IM probe that gets memory information and
>>> >>> >> there is a problem on the way to get total memory. Here is how it
>>> >>> >> now
>>> >>> >> gets memory information:
>>> >>> >>
>>> >>> >> TOTALMEMORY: runs virsh info that gets the real physical memory
>>> >>> >> installed in the machine
>>> >>> >> FREEMEMORY: runs free command and gets the free column data without
>>> >>> >> buffers and cache
>>> >>> >> USEDMEMORY: runs top command and gets used memory from it (this
>>> >>> >> counts
>>> >>> >> buffers and cache)
>>> >>> >>
>>> >>> >> This is a big problem as those values do not match one with another
>>> >>> >> (I
>>> >>> >> don't really know how I failed to see this before). Here is the
>>> >>> >> monitoring data from a host without VMs.
>>> >>> >>
>>> >>> >> --8<------
>>> >>> >> TOTALMEMORY=8193988
>>> >>> >> USEDMEMORY=7819952
>>> >>> >> FREEMEMORY=7911924
>>> >>> >> ------>8--
>>> >>> >>
>>> >>> >> As you can see it makes no sense at all. Even the TOTALMEMORY that
>>> >>> >> is
>>> >>> >> got from virsh info is very misleading for oned as the host linux
>>> >>> >> instance does not have access to all that memory (some is consumed
>>> >>> >> by
>>> >>> >> the hypervisor itself) as seen calling a free command:
>>> >>> >>
>>> >>> >> --8<------
>>> >>> >>             total       used       free     shared    buffers
>>> >>> >> cached
>>> >>> >> Mem:       8193988    7819192     374796          0      64176
>>> >>>  7473992
>>> >>> >> ------>8--
>>> >>> >>
>>> >>> >> I am also copying this text as an issue to solve this problem
>>> >>> >> http://dev.opennebula.org/issues/388. It is masked to be solved for
>>> >>> >> 2.0.1 but the change will be compatible with 1.4 as it seems the
>>> >>> >> the
>>> >>> >> only changed needed is the IM problem.
>>> >>> >>
>>> >>> >> I can not offer you an immediate solution but we'll try to come up
>>> >>> >> with one as soon as possible.
>>> >>> >>
>>> >>> >> Bye
>>> >>> >>
>>> >>> >> On Wed, Nov 3, 2010 at 7:08 PM, Rangababu Chakravarthula
>>> >>> >> <rbabu at hexagrid.com> wrote:
>>> >>> >> > Hello Javier
>>> >>> >> > Please let us know if you want us to provide more detailed
>>> >>> >> > information
>>> >>> >> > with
>>> >>> >> > examples?
>>> >>> >> >
>>> >>> >> > Ranga
>>> >>> >> >
>>> >>> >> > On Fri, Oct 29, 2010 at 9:46 AM, Rangababu Chakravarthula
>>> >>> >> > <rbabu at hexagrid.com> wrote:
>>> >>> >> >>
>>> >>> >> >> Javier
>>> >>> >> >>
>>> >>> >> >> We saw that VM's were being deployed to the host where the
>>> >>> >> >> allocated
>>> >>> >> >> memory of all the VM's was higher than the available memory on
>>> >>> >> >> the
>>> >>> >> >> host.
>>> >>> >> >>
>>> >>> >> >> We think OpenNebula is executing free command on the host to
>>> >>> determine
>>> >>> >> >> if
>>> >>> >> >> there is any room and since free would always return the actual
>>> >>> memory
>>> >>> >> >> that
>>> >>> >> >> is being consumed and not the allocated, opennebula would push
>>> >>> >> >> the
>>> >>> new
>>> >>> >> >> jobs
>>> >>> >> >> to the host.
>>> >>> >> >>
>>> >>> >> >> That's the reason we want OpenNebula to be aware of memory
>>> >>> >> >> allocated
>>> >>> to
>>> >>> >> >> the VM's on the host.
>>> >>> >> >>
>>> >>> >> >> Ranga
>>> >>> >> >>
>>> >>> >> >> On Thu, Oct 28, 2010 at 2:02 PM, Javier Fontan
>>> >>> >> >> <jfontan at gmail.com>
>>> >>> >> >> wrote:
>>> >>> >> >>>
>>> >>> >> >>> Hello,
>>> >>> >> >>>
>>> >>> >> >>> Could you describe the problem you had? By default the
>>> >>> >> >>> scheduler
>>> >>> will
>>> >>> >> >>> not overcommit cpu nor memory.
>>> >>> >> >>>
>>> >>> >> >>> Bye
>>> >>> >> >>>
>>> >>> >> >>> On Thu, Oct 28, 2010 at 4:50 AM, Shashank Rachamalla
>>> >>> >> >>> <shashank.rachamalla at hexagrid.com> wrote:
>>> >>> >> >>> > Hi
>>> >>> >> >>> >
>>> >>> >> >>> > We have a requirement where in the scheduler should not allow
>>> >>> memory
>>> >>> >> >>> > over
>>> >>> >> >>> > committing while choosing a host for new vm. In order to
>>> >>> >> >>> > achieve
>>> >>> >> >>> > this,
>>> >>> >> >>> > we
>>> >>> >> >>> > have changed the way in which FREEMEMORY is being calculated
>>> >>> >> >>> > for
>>> >>> >> >>> > each
>>> >>> >> >>> > host:
>>> >>> >> >>> >
>>> >>> >> >>> > FREE MEMORY = TOTAL MEMORY -  [ Sum of memory values
>>> >>> >> >>> > allocated
>>> >>> >> >>> > to
>>> >>> >> >>> > VMs
>>> >>> >> >>> > which
>>> >>> >> >>> > are currently running on the host ]
>>> >>> >> >>> >
>>> >>> >> >>> > Please let us know if the above approach is fine or is there
>>> >>> >> >>> > any
>>> >>> >> >>> > better
>>> >>> >> >>> > way
>>> >>> >> >>> > to accomplish the task. We are using opennebula 1.4.
>>> >>> >> >>> >
>>> >>> >> >>> > --
>>> >>> >> >>> > Regards,
>>> >>> >> >>> > Shashank Rachamalla
>>> >>> >> >>> >
>>> >>> >> >>> > _______________________________________________
>>> >>> >> >>> > Users mailing list
>>> >>> >> >>> > Users at lists.opennebula.org
>>> >>> >> >>> > http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>> >>> >> >>> >
>>> >>> >> >>> >
>>> >>> >> >>>
>>> >>> >> >>>
>>> >>> >> >>>
>>> >>> >> >>> --
>>> >>> >> >>> Javier Fontan, Grid & Virtualization Technology
>>> >>> >> >>> Engineer/Researcher
>>> >>> >> >>> DSA Research Group: http://dsa-research.org
>>> >>> >> >>> Globus GridWay Metascheduler: http://www.GridWay.org
>>> >>> >> >>> OpenNebula Virtual Infrastructure Engine:
>>> >>> >> >>> http://www.OpenNebula.org
>>> >>> >> >>> _______________________________________________
>>> >>> >> >>> Users mailing list
>>> >>> >> >>> Users at lists.opennebula.org
>>> >>> >> >>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>> >>> >> >>
>>> >>> >> >
>>> >>> >> >
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> --
>>> >>> >> Javier Fontan, Grid & Virtualization Technology Engineer/Researcher
>>> >>> >> DSA Research Group: http://dsa-research.org
>>> >>> >> Globus GridWay Metascheduler: http://www.GridWay.org
>>> >>> >> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>>> >>> >
>>> >>> >
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Javier Fontan, Grid & Virtualization Technology Engineer/Researcher
>>> >>> DSA Research Group: http://dsa-research.org
>>> >>> Globus GridWay Metascheduler: http://www.GridWay.org
>>> >>> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Regards,
>>> >> Shashank Rachamalla
>>> >>
>>> >
>>> > _______________________________________________
>>> > Users mailing list
>>> > Users at lists.opennebula.org
>>> > http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>> >
>>>
>>>
>>>
>>> --
>>> Dr. Ruben Santiago Montero
>>> Associate Professor (Profesor Titular), Complutense University of Madrid
>>>
>>> URL: http://dsa-research.org/doku.php?id=people:ruben
>>> Weblog: http://blog.dsa-research.org/?author=7
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opennebula.org
>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>
>>
>>
>> --
>> Regards,
>> Shashank Rachamalla
>>
>
>
>
> --
> Dr. Ruben Santiago Montero
> Associate Professor (Profesor Titular), Complutense University of Madrid
>
> URL: http://dsa-research.org/doku.php?id=people:ruben
> Weblog: http://blog.dsa-research.org/?author=7
>