[one-users] kvm stack traces with hi I/O load
Jurgen Weber
jurgen.weber at theiconic.com.au
Mon Aug 13 23:42:05 PDT 2012
Hi Guys
I have a new KVM server, running software raid (mdadm) and the VM disk
are help in a raid 5 with 5 disks (the system is on SSDs in a mirror).
So far I have about 10 VM's setup, but they are all unable to function
because after we have a few up, and then start to deploy/resubmit the
VM's which have never booted properly the disk IO will stop, the scp
process will hang and it all stops. You will then find the following
error in dmesg:
[ 1201.890311] INFO: task kworker/1:1:6185 blocked for more than 120
seconds.
[ 1201.890430] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1201.890569] kworker/1:1 D ffff88203fc13740 0 6185 2
0x00000000
[ 1201.890573] ffff881ffe510140 0000000000000046 0000000000000000
ffff881039023590
[ 1201.890580] 0000000000013740 ffff8820393fffd8 ffff8820393fffd8
ffff881ffe510140
[ 1201.890586] 0000000000000000 0000000100000000 0000000000000001
7fffffffffffffff
[ 1201.890593] Call Trace:
[ 1201.890597] [<ffffffff81349d2e>] ? schedule_timeout+0x2c/0xdb
[ 1201.890605] [<ffffffff810ebdbf>] ? kmem_cache_alloc+0x86/0xea
[ 1201.890610] [<ffffffff8134a58a>] ? __down_common+0x9b/0xee
[ 1201.890631] [<ffffffffa0452c57>] ? xfs_getsb+0x28/0x3b [xfs]
[ 1201.890635] [<ffffffff81063111>] ? down+0x25/0x34
[ 1201.890648] [<ffffffffa041566f>] ? xfs_buf_lock+0x65/0x9d [xfs]
[ 1201.890665] [<ffffffffa0452c57>] ? xfs_getsb+0x28/0x3b [xfs]
[ 1201.890685] [<ffffffffa045b957>] ? xfs_trans_getsb+0x64/0xb4 [xfs]
[ 1201.890704] [<ffffffffa0452a40>] ? xfs_mod_sb+0x21/0x77 [xfs]
[ 1201.890720] [<ffffffffa0422736>] ? xfs_reclaim_inode+0x22d/0x22d [xfs]
[ 1201.890734] [<ffffffffa041a43e>] ? xfs_fs_log_dummy+0x61/0x75 [xfs]
[ 1201.890754] [<ffffffffa04573a7>] ? xfs_log_need_covered+0x4d/0x8d [xfs]
[ 1201.890769] [<ffffffffa0422770>] ? xfs_sync_worker+0x3a/0x6a [xfs]
[ 1201.890773] [<ffffffff8105aeaa>] ? process_one_work+0x163/0x284
[ 1201.890778] [<ffffffff8105be72>] ? worker_thread+0xc2/0x145
[ 1201.890782] [<ffffffff8105bdb0>] ? manage_workers.isra.23+0x15b/0x15b
[ 1201.890787] [<ffffffff8105efad>] ? kthread+0x76/0x7e
[ 1201.890794] [<ffffffff81351cf4>] ? kernel_thread_helper+0x4/0x10
[ 1201.890799] [<ffffffff8105ef37>] ? kthread_worker_fn+0x139/0x139
[ 1201.890804] [<ffffffff81351cf0>] ? gs_change+0x13/0x13
and lots of them. With this stack track the CPU load will just increase
and I have to power cycle it to get the system back. I have added the
following sysctls:
fs.file-max = 262144
kernel.pid_max = 262144
net.ipv4.tcp_rmem = 4096 87380 8388608
net.ipv4.tcp_wmem = 4096 87380 8388608
net.core.rmem_max = 25165824
net.core.rmem_default = 25165824
net.core.wmem_max = 25165824
net.core.wmem_default = 131072
net.core.netdev_max_backlog = 8192
net.ipv4.tcp_window_scaling = 1
net.core.optmem_max = 25165824
net.core.somaxconn = 65536
net.ipv4.ip_local_port_range = 1024 65535
kernel.shmmax = 4294967296
vm.max_map_count = 262144
but the import part I found out was:
#http://blog.ronnyegner-consulting.de/2011/10/13/info-task-blocked-for-more-than-120-seconds/
vm.dirty_ratio=10
which does not seem to help thou.
Now some info on the disk:
#mount
/dev/md2 on /data type xfs
(rw,noatime,attr2,delaylog,sunit=1024,swidth=4096,noquota)
cat /proc/meminfo
MemTotal: 132259720 kB
MemFree: 122111692 kB
cat /proc/cpuinfo (32 v cores)
processor : 31
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
Some info the Host:
# cat /etc/debian_version
wheezy/sid
# uname -a
Linux chaos 3.2.0-3-amd64 #1 SMP Thu Jun 28 09:07:26 UTC 2012 x86_64
GNU/Linux
ii kvm 1:1.1.0+dfsg-3 dummy transitional package
from kvm to qemu-kvm
ii qemu-kvm 1.1.0+dfsg-3 Full virtualization on
x86 hardware
ii libvirt-bin 0.9.12-4 programs for the
libvirt library
ii libvirt0 0.9.12-4 library for
interfacing with different virtualization systems
ii python-libvirt 0.9.12-4 libvirt Python
bindings
ii opennebula 3.4.1-3+b1 controller which
executes the OpenNebula cluster services
ii opennebula-common 3.4.1-3 empty package
to create OpenNebula users and directories
ii opennebula-sunstone 3.4.1-3 web
interface to which executes the OpenNebula cluster services
ii opennebula-tools 3.4.1-3 Command-line
tools for OpenNebula Cloud
ii ruby-opennebula 3.4.1-3 Ruby bindings
for OpenNebula Cloud API (OCA)
Any ideas on how to get this working right now the server is a lemon! :0
--
Jurgen Weber
Systems Engineer
IT Infrastructure Team Leader
THE ICONIC | E jurgen.weber at theiconic.com.au | www.theiconic.com.au
More information about the Users
mailing list