[one-users] Making scheduler allocation aware

Thu Nov 11 06:06:10 PST 2010

Hello,

Are you sure that those are the exact values for the host? OpenNebula
counts "real" (from probes) and allocated (from database) memory so
that should not happen.

Snippet from a onehost show:

--8<------
USED MEM (REAL)       : 0
USED MEM (ALLOCATED)  : 65536
------>8--

I am now working on the kvm monitoring and I have noticed another
mismatch even with your probe changes. The values stored in the
database for total memory should be changed and that's what I am
working on.

I am connected to irc.freenode.org in channel #opennebula if you want
to discuss this further.

Bye

On Thu, Nov 11, 2010 at 5:20 AM, Shashank Rachamalla
<shashank.rachamalla at hexagrid.com> wrote:
> Hi Javier
>
> Thanks for the inputs but I came across another problem while testing:
>
> If opennebula receives multiple vm requests in a short span of time, the
> scheduler might take decisions for all these vms considering the host
> monitoring information available from the last monitoring cycle. Ideally,
> before processing every pending request,  fresh host monitoring information
> has to be taken into account as the previous set of requests might have
> already changed the host’s state. This can result in over committing when
> host is being used close to its full capacity.
>
> Is there any workaround which helps the scheduler to overcome the above
> problem ?
>
> steps to reproduce the problem scenario:
>
> Host 1 : Total memory = 3GB
> Host 2 : Total memory = 2GB
> Assume Host1 and Host2 have same number of CPU cores. ( Host1 will have a
> higher RANK value )
>
> VM1: memory = 2GB
> VM2: memroy = 2GB
>
> Start VM1 and VM2 immediately one after the other. Both VM1 and VM2 will
> come up on Host1.  ( Thus over committing )
>
> Start VM1 and VM2 with an intermediate delay of 60sec. VM1 will come up on
> Host1 and VM2 will come up on Host2. This is true because opennebula would
> have fetched a fresh set of host monitoring information in that time.
>
>
> On 4 November 2010 02:04, Javier Fontan <jfontan at gmail.com> wrote:
>>
>> Hello,
>>
>> It looks fine to me. I think that taking out the memory the hypervisor
>> may be consuming is key to make it work.
>>
>> Bye
>>
>> On Wed, Nov 3, 2010 at 8:32 PM, Rangababu Chakravarthula
>> <rbabu at hexagrid.com> wrote:
>> > Javier
>> >
>> > Yes we are using KVM and OpenNebula 1.4.
>> >
>> > We have been having this problem since a long time and we were doing all
>> > kinds of validations ourselves before submitting the request to
>> > OpenNebula.
>> > (there should  be enough memory in the cloud that matches the requested
>> > memory & there should be atleast one host that has memory > requested
>> > memory
>> > )   We had to do those because OpenNebula would schedule to an arbitrary
>> > host based on the existing logic it had.
>> > So at last we thought that we need to make OpenNebula aware of memory
>> > allocated of running VM's on the host and started this discussion.
>> >
>> > Thanks for taking up this issue as priority. Appreciate it.
>> >
>> > Shashank came up with this patch to kvm.rb. Please take a look and let
>> > us
>> > know if that will work until we get a permanent solution.
>> >
>> >
>> > ====================================================================================
>> >
>> > $mem_allocated_for_running_vms=0
>> > for i in `virsh list|grep running|tr -s ' ' ' '|cut -f2 -d' '` do
>> >         $dominfo=`virsh dominfo #{i}`
>> >         $dominfo.split(/\n/).each{|line|
>> >         if line.match('^Max memory')
>> >                 $mem_allocated_for_running_vms += line.split("
>> > ")[2].strip.to_i
>> >         end
>> > }
>> > end
>> >
>> > $mem_used_by_base_hypervisor = [some xyz kb that we want to set aside
>> > for
>> > hypervisor]
>> >
>> > $free_memory = $total_memory.to_i - (
>> > $mem_allocated_for_running_vms.to_i +
>> > $mem_used_by_base_hypervisor.to_i )
>> >
>> >
>> > ======================================================================================
>> >
>> > Ranga
>> >
>> > On Wed, Nov 3, 2010 at 2:16 PM, Javier Fontan <jfontan at gmail.com> wrote:
>> >>
>> >> Hello,
>> >>
>> >> Sorry for the delay in the response.
>> >>
>> >> It looks that the problem is OpenNebula calculating available memory.
>> >> For xen >= 3.2 there is a reliable way to get available memory that is
>> >> calling "xm info" and getting "max_free_memory" attribute.
>> >> Unfortunately for kvm or xen < 3.2 there is not such attribute. I
>> >> suppose you are using kvm as you tell about "free" command.
>> >>
>> >> I began analyzing the kvm IM probe that gets memory information and
>> >> there is a problem on the way to get total memory. Here is how it now
>> >> gets memory information:
>> >>
>> >> TOTALMEMORY: runs virsh info that gets the real physical memory
>> >> installed in the machine
>> >> FREEMEMORY: runs free command and gets the free column data without
>> >> buffers and cache
>> >> USEDMEMORY: runs top command and gets used memory from it (this counts
>> >> buffers and cache)
>> >>
>> >> This is a big problem as those values do not match one with another (I
>> >> don't really know how I failed to see this before). Here is the
>> >> monitoring data from a host without VMs.
>> >>
>> >> --8<------
>> >> TOTALMEMORY=8193988
>> >> USEDMEMORY=7819952
>> >> FREEMEMORY=7911924
>> >> ------>8--
>> >>
>> >> As you can see it makes no sense at all. Even the TOTALMEMORY that is
>> >> got from virsh info is very misleading for oned as the host linux
>> >> instance does not have access to all that memory (some is consumed by
>> >> the hypervisor itself) as seen calling a free command:
>> >>
>> >> --8<------
>> >>             total       used       free     shared    buffers
>> >> cached
>> >> Mem:       8193988    7819192     374796          0      64176
>> >>  7473992
>> >> ------>8--
>> >>
>> >> I am also copying this text as an issue to solve this problem
>> >> http://dev.opennebula.org/issues/388. It is masked to be solved for
>> >> 2.0.1 but the change will be compatible with 1.4 as it seems the the
>> >> only changed needed is the IM problem.
>> >>
>> >> I can not offer you an immediate solution but we'll try to come up
>> >> with one as soon as possible.
>> >>
>> >> Bye
>> >>
>> >> On Wed, Nov 3, 2010 at 7:08 PM, Rangababu Chakravarthula
>> >> <rbabu at hexagrid.com> wrote:
>> >> > Hello Javier
>> >> > Please let us know if you want us to provide more detailed
>> >> > information
>> >> > with
>> >> > examples?
>> >> >
>> >> > Ranga
>> >> >
>> >> > On Fri, Oct 29, 2010 at 9:46 AM, Rangababu Chakravarthula
>> >> > <rbabu at hexagrid.com> wrote:
>> >> >>
>> >> >> Javier
>> >> >>
>> >> >> We saw that VM's were being deployed to the host where the allocated
>> >> >> memory of all the VM's was higher than the available memory on the
>> >> >> host.
>> >> >>
>> >> >> We think OpenNebula is executing free command on the host to
>> >> >> determine
>> >> >> if
>> >> >> there is any room and since free would always return the actual
>> >> >> memory
>> >> >> that
>> >> >> is being consumed and not the allocated, opennebula would push the
>> >> >> new
>> >> >> jobs
>> >> >> to the host.
>> >> >>
>> >> >> That's the reason we want OpenNebula to be aware of memory allocated
>> >> >> to
>> >> >> the VM's on the host.
>> >> >>
>> >> >> Ranga
>> >> >>
>> >> >> On Thu, Oct 28, 2010 at 2:02 PM, Javier Fontan <jfontan at gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> Hello,
>> >> >>>
>> >> >>> Could you describe the problem you had? By default the scheduler
>> >> >>> will
>> >> >>> not overcommit cpu nor memory.
>> >> >>>
>> >> >>> Bye
>> >> >>>
>> >> >>> On Thu, Oct 28, 2010 at 4:50 AM, Shashank Rachamalla
>> >> >>> <shashank.rachamalla at hexagrid.com> wrote:
>> >> >>> > Hi
>> >> >>> >
>> >> >>> > We have a requirement where in the scheduler should not allow
>> >> >>> > memory
>> >> >>> > over
>> >> >>> > committing while choosing a host for new vm. In order to achieve
>> >> >>> > this,
>> >> >>> > we
>> >> >>> > have changed the way in which FREEMEMORY is being calculated for
>> >> >>> > each
>> >> >>> > host:
>> >> >>> >
>> >> >>> > FREE MEMORY = TOTAL MEMORY -  [ Sum of memory values allocated to
>> >> >>> > VMs
>> >> >>> > which
>> >> >>> > are currently running on the host ]
>> >> >>> >
>> >> >>> > Please let us know if the above approach is fine or is there any
>> >> >>> > better
>> >> >>> > way
>> >> >>> > to accomplish the task. We are using opennebula 1.4.
>> >> >>> >
>> >> >>> > --
>> >> >>> > Regards,
>> >> >>> > Shashank Rachamalla
>> >> >>> >
>> >> >>> > _______________________________________________
>> >> >>> > Users mailing list
>> >> >>> > Users at lists.opennebula.org
>> >> >>> > http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>> >> >>> >
>> >> >>> >
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> Javier Fontan, Grid & Virtualization Technology Engineer/Researcher
>> >> >>> DSA Research Group: http://dsa-research.org
>> >> >>> Globus GridWay Metascheduler: http://www.GridWay.org
>> >> >>> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>> >> >>> _______________________________________________
>> >> >>> Users mailing list
>> >> >>> Users at lists.opennebula.org
>> >> >>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>> >> >>
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Javier Fontan, Grid & Virtualization Technology Engineer/Researcher
>> >> DSA Research Group: http://dsa-research.org
>> >> Globus GridWay Metascheduler: http://www.GridWay.org
>> >> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>> >
>> >
>>
>>
>>
>> --
>> Javier Fontan, Grid & Virtualization Technology Engineer/Researcher
>> DSA Research Group: http://dsa-research.org
>> Globus GridWay Metascheduler: http://www.GridWay.org
>> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>
>
>
> --
> Regards,
> Shashank Rachamalla
>

-- 
Javier Fontan, Grid & Virtualization Technology Engineer/Researcher
DSA Research Group: http://dsa-research.org
Globus GridWay Metascheduler: http://www.GridWay.org
OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org