[one-users] kvm stack traces with hi I/O load

Tue Aug 14 18:03:02 PDT 2012

Right, I have solved this by:

sysctl vm.dirty_ratio=2

mdadm --create --verbose /dev/md2 --level=raid6 -n5 -f /dev/sda2 
/dev/sdb2 /dev/sdc2 /dev/sdd2 /dev/sde2

mkfs.xfs -b size=4096 -d sunit=512,swidth=1536 -L data /dev/md2

and fstab:
/dev/md2 /data          xfs      noatime,sunit=512,swidth=1536 0    0

Not sure if it was any one of these that have fixed it up, or the 
combination of all of the above but now the thing does not stop writing 
and all of my vm's are running.

Thanks

Jurgen
On 15/08/12 02:12, Shankhadeep Shome wrote:
> A hard crash with high i/o can be due to bad memory modules, I would 
> run a memory burn program to make sure your hardware is actually stable.
>
> Shank
>
> On Tue, Aug 14, 2012 at 2:42 AM, Jurgen Weber 
> <jurgen.weber at theiconic.com.au <mailto:jurgen.weber at theiconic.com.au>> 
> wrote:
>
>     Hi Guys
>
>     I have a new KVM server, running software raid (mdadm) and the VM
>     disk are help in a raid 5 with 5 disks (the system is on SSDs in a
>     mirror).
>
>     So far I have about 10 VM's setup, but they are all unable to
>     function because after we have a few up, and then start to
>     deploy/resubmit the VM's which have never booted properly the disk
>     IO will stop, the scp process will hang and it all stops. You will
>     then find the following error in dmesg:
>
>     [ 1201.890311] INFO: task kworker/1:1:6185 blocked for more than
>     120 seconds.
>     [ 1201.890430] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>     disables this message.
>     [ 1201.890569] kworker/1:1     D ffff88203fc13740     0  6185    
>      2 0x00000000
>     [ 1201.890573]  ffff881ffe510140 0000000000000046 0000000000000000
>     ffff881039023590
>     [ 1201.890580]  0000000000013740 ffff8820393fffd8 ffff8820393fffd8
>     ffff881ffe510140
>     [ 1201.890586]  0000000000000000 0000000100000000 0000000000000001
>     7fffffffffffffff
>     [ 1201.890593] Call Trace:
>     [ 1201.890597]  [<ffffffff81349d2e>] ? schedule_timeout+0x2c/0xdb
>     [ 1201.890605]  [<ffffffff810ebdbf>] ? kmem_cache_alloc+0x86/0xea
>     [ 1201.890610]  [<ffffffff8134a58a>] ? __down_common+0x9b/0xee
>     [ 1201.890631]  [<ffffffffa0452c57>] ? xfs_getsb+0x28/0x3b [xfs]
>     [ 1201.890635]  [<ffffffff81063111>] ? down+0x25/0x34
>     [ 1201.890648]  [<ffffffffa041566f>] ? xfs_buf_lock+0x65/0x9d [xfs]
>     [ 1201.890665]  [<ffffffffa0452c57>] ? xfs_getsb+0x28/0x3b [xfs]
>     [ 1201.890685]  [<ffffffffa045b957>] ? xfs_trans_getsb+0x64/0xb4 [xfs]
>     [ 1201.890704]  [<ffffffffa0452a40>] ? xfs_mod_sb+0x21/0x77 [xfs]
>     [ 1201.890720]  [<ffffffffa0422736>] ?
>     xfs_reclaim_inode+0x22d/0x22d [xfs]
>     [ 1201.890734]  [<ffffffffa041a43e>] ? xfs_fs_log_dummy+0x61/0x75
>     [xfs]
>     [ 1201.890754]  [<ffffffffa04573a7>] ?
>     xfs_log_need_covered+0x4d/0x8d [xfs]
>     [ 1201.890769]  [<ffffffffa0422770>] ? xfs_sync_worker+0x3a/0x6a [xfs]
>     [ 1201.890773]  [<ffffffff8105aeaa>] ? process_one_work+0x163/0x284
>     [ 1201.890778]  [<ffffffff8105be72>] ? worker_thread+0xc2/0x145
>     [ 1201.890782]  [<ffffffff8105bdb0>] ?
>     manage_workers.isra.23+0x15b/0x15b
>     [ 1201.890787]  [<ffffffff8105efad>] ? kthread+0x76/0x7e
>     [ 1201.890794]  [<ffffffff81351cf4>] ? kernel_thread_helper+0x4/0x10
>     [ 1201.890799]  [<ffffffff8105ef37>] ? kthread_worker_fn+0x139/0x139
>     [ 1201.890804]  [<ffffffff81351cf0>] ? gs_change+0x13/0x13
>
>     and lots of them. With this stack track the CPU load will just
>     increase and I have to power cycle it to get the system back. I
>     have added the following sysctls:
>
>     fs.file-max = 262144
>     kernel.pid_max = 262144
>     net.ipv4.tcp_rmem = 4096 87380 8388608
>     net.ipv4.tcp_wmem = 4096 87380 8388608
>     net.core.rmem_max = 25165824
>     net.core.rmem_default = 25165824
>     net.core.wmem_max = 25165824
>     net.core.wmem_default = 131072
>     net.core.netdev_max_backlog = 8192
>     net.ipv4.tcp_window_scaling = 1
>     net.core.optmem_max = 25165824
>     net.core.somaxconn = 65536
>     net.ipv4.ip_local_port_range = 1024 65535
>     kernel.shmmax = 4294967296
>     vm.max_map_count = 262144
>
>     but the import part I found out was:
>     #http://blog.ronnyegner-consulting.de/2011/10/13/info-task-blocked-for-more-than-120-seconds/
>     vm.dirty_ratio=10
>
>     which does not seem to help thou.
>
>     Now some info on the disk:
>     #mount
>     /dev/md2 on /data type xfs
>     (rw,noatime,attr2,delaylog,sunit=1024,swidth=4096,noquota)
>
>     cat /proc/meminfo
>     MemTotal:       132259720 kB
>     MemFree:        122111692 kB
>
>     cat /proc/cpuinfo (32 v cores)
>     processor    : 31
>     vendor_id    : GenuineIntel
>     cpu family    : 6
>     model        : 45
>     model name    : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
>
>     Some info the Host:
>     # cat /etc/debian_version
>     wheezy/sid
>     # uname -a
>     Linux chaos 3.2.0-3-amd64 #1 SMP Thu Jun 28 09:07:26 UTC 2012
>     x86_64 GNU/Linux
>     ii  kvm 1:1.1.0+dfsg-3                       dummy transitional
>     package from kvm to qemu-kvm
>     ii  qemu-kvm 1.1.0+dfsg-3                         Full
>     virtualization on x86 hardware
>     ii  libvirt-bin 0.9.12-4 programs for the libvirt library
>     ii  libvirt0 0.9.12-4                             library for
>     interfacing with different virtualization systems
>     ii  python-libvirt 0.9.12-4 libvirt Python bindings
>     ii  opennebula 3.4.1-3+b1 controller which executes the OpenNebula
>     cluster services
>     ii  opennebula-common 3.4.1-3  empty package to create OpenNebula
>     users and directories
>     ii  opennebula-sunstone 3.4.1-3  web interface to which executes
>     the OpenNebula cluster services
>     ii  opennebula-tools 3.4.1-3  Command-line tools for OpenNebula Cloud
>     ii  ruby-opennebula 3.4.1-3  Ruby bindings for OpenNebula Cloud
>     API (OCA)
>
>     Any ideas on how to get this working right now the server is a
>     lemon! :0
>
>     -- 
>     Jurgen Weber
>
>     Systems Engineer
>     IT Infrastructure Team Leader
>
>     THE ICONIC | E jurgen.weber at theiconic.com.au
>     <mailto:jurgen.weber at theiconic.com.au> | www.theiconic.com.au
>     <http://www.theiconic.com.au>
>
>     _______________________________________________
>     Users mailing list
>     Users at lists.opennebula.org <mailto:Users at lists.opennebula.org>
>     http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
>

-- 
Jurgen Weber

Systems Engineer
IT Infrastructure Team Leader

THE ICONIC | E jurgen.weber at theiconic.com.au | www.theiconic.com.au

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20120815/58e4202e/attachment-0003.htm>