[one-users] Very high unavailable service
Erico Augusto Cavalcanti Guedes
eacg at cin.ufpe.br
Sun Aug 26 22:06:19 PDT 2012
Hi,
2012/8/26 Ruben S. Montero <rsmontero at opennebula.org>
> Hi
>
> If you want to try the cpu pinning suggested by Steven, simply add a
> RAW attribute in your VM template. Something similar to:
>
> RAW=[
> TYPE="kvm",
> DATA="<cputune><vcpupin vcpu=\"0\" cpuset=\"1\"/></cputune>" ]
>
Added to vm template. Running an expirement right now to observe
availability.
>
> Cheers
>
> Ruben
>
>
> On Sun, Aug 26, 2012 at 3:27 AM, Steven C Timm <timm at fnal.gov> wrote:
> > I run high-availability squid servers on virtual machines although not
> yet
> > in OpenNebula.
> >
> > It can be done with very high availability.
> >
> > I am not familiar with Ubuntu Server 12.04 but if it has libvirt 0.9.7 or
> > better, and you are
> >
> > Using KVM hypervisor, you should be able to use the cpu-pinning and
> > numa-aware features of libvirt to pin
>
> >
> > each virtual machine to a given physical cpu. That will beat the
> migration
> > issue you are seeing now.
>
Done. migration processes was beaten, as you sad. Thank you.
> >
> > With Xen hypervisor you can (and should) also pin.
> >
> > I think if you beat the cpu and memory pinning problem you will be OK.
> >
> >
> >
> > However, you did not say what network topology you are using for your
> > virtual machine,
>
That is my vnet template:
TYPE = RANGED
BRIDGE = br0
NETWORK_SIZE = C
NETWORK_ADDRESS = 192.168.15.0
NETMASK = 255.255.255.0
GATEWAY = 192.168.15.10
IP_START = 192.168.15.110
IP_END = 192.168.15.190
futhermore, there is one squid instance in execution on each VM, connected
on a mesh architecture, with LRU replacement policy. Web-polygraph clients
are configured to send requests on round-robin, to each of VMs (only three
currently: 192.168.15.110-112). Web-polygraph server, that represents
Internet, is running on 192.168.15.10, closing the topology.
Remember that when same test is performed on Physical Machines,
availability is 100% invariably.
and what kind of virtual network drivers,
>
VMs are configured on VirtualBox, based on Ubuntu Server 12.04.
Executing:
ps ax | grep kvm
on 192.168.15.110 cloud node:
/usr/bin/kvm -S -M pc-0.14 -enable-kvm -m 1024 -smp
1,sockets=1,cores=1,threads=1 -name one-609 -uuid
a7115950-3f0e-0fab-ea20-3f829d835889 -nodefconfig -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/one-609.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc -no-acpi
-boot c -drive
file=/srv3/cloud/one/var/datastores/0/609/disk.0,if=none,id=drive-ide0-0-0,format=qcow2
-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive
file=/srv3/cloud/one/var/datastores/0/609/disk.1,if=none,id=drive-ide0-1-1,format=raw
-device ide-drive,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev
tap,fd=20,id=hostnet0 -device
rtl8139,netdev=hostnet0,id=net0,mac=02:00:c0:a8:0f:70,bus=pci.0,addr=0x3
-usb -vnc 0.0.0.0:609 -vga cirrus -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
network driver used on VMs is rtl8139.
>
>
>
> > That is important too. Also—is your squid cache mostly disk-resident
> or
> > mostly RAM-resident?
>
disk-resident (from squid.conf):
cache_mem 128 MB
cache_dir aufs /var/spool/squid3 1024 16 256
Note that disk space is reduced to 1024MB for testing purposes.
> If the former then the virtual disk drivers matter
> > too, a lot.
>
Some reading suggestions? I'll appreciate a lot.
One additional information that can be relevant. Monitoring VMs processes
on D status, through command:
ps -eo pid,tid,class,rtprio,psr,pcpu,stat,wchan:25,comm | grep D
PID TID CLS RTPRIO PSR %CPU STAT WCHAN COMMAND
170 170 TS - 0 0.1 D sleep_on_page jbd2/sda1-8
1645 1645 TS - 0 11.4 Dsl get_write_access squid3
1647 1647 TS - 0 0.2 D get_write_access
autExperiment.sh
1679 1679 TS - 0 0.8 D get_write_access
autExperiment.sh
1680 1680 TS - 0 0.5 D get_write_access
autExperiment.sh
It reveals that journally ext4 process(jdb2/sda1-8), squid and my
monitoring scripting are on D status, always waiting on sleep_on_page and
get_write_access kernel functions. I am researching the relevance/influence
of these WCHANs. I performed tests without my monitoring scripts and
availability is as bad as with them.
Thanks for Steve and Ruben, for your time and answers.
Erico.
> >
> >
> >
> > Steve Timm
> >
> >
> >
> >
> >
> >
> >
> > From: users-bounces at lists.opennebula.org
> > [mailto:users-bounces at lists.opennebula.org] On Behalf Of Erico Augusto
> > Cavalcanti Guedes
> > Sent: Saturday, August 25, 2012 6:33 PM
> > To: users at lists.opennebula.org
> > Subject: [one-users] Very high unavailable service
> >
> >
> >
> > Dears,
> >
> > I 'm running Squid Web Cache Proxy server on Ubuntu Server 12.04 VMs
> (kernel
> > 3.2.0-23-generic-pae), OpenNebula 3.4.
> > My private cloud is composed by one frontend and three nodes. VMs are
> > running on that 3 nodes, initially one by node.
> > Outside cloud, there are 2 hosts, one working as web clients and another
> as
> > web server, using Web Polygraph Benchmakring Tool.
> >
> > The goal of tests is stress Squid cache running on VMs.
> > When same test is executed outside the cloud, using the three nodes as
> > Physical Machines, there are 100% of cache service availability.
> > Nevertheless, when cache service is provided by VMs, nothing better than
> 45%
> > of service availability is reached.
> > Web clients do not receive responses from squid when it is running on
> VMs in
> > 55% of the time.
> >
> > I have monitored load average of VMs and PMs where VMs are been executed.
> > First load average field reaches 15 after some hours of tests on VMs,
> and 3
> > on physical machines.
> > Furthermore, there is a set of processes, called migration/X, that are
> > champions in CPU TIME when VMs are in execution. A sample:
> >
> > top - 20:01:38 up 1 day, 3:36, 1 user, load average: 5.50, 5.47, 4.20
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ TIME
> COMMAND
> > 13 root RT 0 0 0 0 S 0 0.0 408:27.25 408:27
> > migration/2
> > 8 root RT 0 0 0 0 S 0 0.0 404:13.63 404:13
> > migration/1
> > 6 root RT 0 0 0 0 S 0 0.0 401:36.78 401:36
> > migration/0
> > 17 root RT 0 0 0 0 S 0 0.0 400:59.10 400:59
> > migration/3
> >
> >
> > It isn't possible to offer web cache service via VMs in the way the
> service
> > is behaving, with so small availability.
> >
> > So, my questions:
> >
> > 1. Does anybody has experienced a similar problem of unresponsive
> service?
> > (Whatever service).
> > 2. How to state the bootleneck that is overloading the system, so that it
> > can be minimized?
> >
> > Thanks a lot,
> >
> > Erico.
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users at lists.opennebula.org
> > http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
> >
>
>
>
> --
> Ruben S. Montero, PhD
> Project co-Lead and Chief Architect
> OpenNebula - The Open Source Solution for Data Center Virtualization
> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20120827/4fddf54e/attachment-0002.htm>
More information about the Users
mailing list