[one-users] kvm stack traces with hi I/O load

Shankhadeep Shome shank15217 at gmail.com
Tue Aug 14 09:12:24 PDT 2012


A hard crash with high i/o can be due to bad memory modules, I would run a
memory burn program to make sure your hardware is actually stable.

Shank

On Tue, Aug 14, 2012 at 2:42 AM, Jurgen Weber <jurgen.weber at theiconic.com.au
> wrote:

> Hi Guys
>
> I have a new KVM server, running software raid (mdadm) and the VM disk are
> help in a raid 5 with 5 disks (the system is on SSDs in a mirror).
>
> So far I have about 10 VM's setup, but they are all unable to function
> because after we have a few up, and then start to deploy/resubmit the VM's
> which have never booted properly the disk IO will stop, the scp process
> will hang and it all stops. You will then find the following error in dmesg:
>
> [ 1201.890311] INFO: task kworker/1:1:6185 blocked for more than 120
> seconds.
> [ 1201.890430] "echo 0 > /proc/sys/kernel/hung_task_**timeout_secs"
> disables this message.
> [ 1201.890569] kworker/1:1     D ffff88203fc13740     0  6185      2
> 0x00000000
> [ 1201.890573]  ffff881ffe510140 0000000000000046 0000000000000000
> ffff881039023590
> [ 1201.890580]  0000000000013740 ffff8820393fffd8 ffff8820393fffd8
> ffff881ffe510140
> [ 1201.890586]  0000000000000000 0000000100000000 0000000000000001
> 7fffffffffffffff
> [ 1201.890593] Call Trace:
> [ 1201.890597]  [<ffffffff81349d2e>] ? schedule_timeout+0x2c/0xdb
> [ 1201.890605]  [<ffffffff810ebdbf>] ? kmem_cache_alloc+0x86/0xea
> [ 1201.890610]  [<ffffffff8134a58a>] ? __down_common+0x9b/0xee
> [ 1201.890631]  [<ffffffffa0452c57>] ? xfs_getsb+0x28/0x3b [xfs]
> [ 1201.890635]  [<ffffffff81063111>] ? down+0x25/0x34
> [ 1201.890648]  [<ffffffffa041566f>] ? xfs_buf_lock+0x65/0x9d [xfs]
> [ 1201.890665]  [<ffffffffa0452c57>] ? xfs_getsb+0x28/0x3b [xfs]
> [ 1201.890685]  [<ffffffffa045b957>] ? xfs_trans_getsb+0x64/0xb4 [xfs]
> [ 1201.890704]  [<ffffffffa0452a40>] ? xfs_mod_sb+0x21/0x77 [xfs]
> [ 1201.890720]  [<ffffffffa0422736>] ? xfs_reclaim_inode+0x22d/0x22d [xfs]
> [ 1201.890734]  [<ffffffffa041a43e>] ? xfs_fs_log_dummy+0x61/0x75 [xfs]
> [ 1201.890754]  [<ffffffffa04573a7>] ? xfs_log_need_covered+0x4d/0x8d [xfs]
> [ 1201.890769]  [<ffffffffa0422770>] ? xfs_sync_worker+0x3a/0x6a [xfs]
> [ 1201.890773]  [<ffffffff8105aeaa>] ? process_one_work+0x163/0x284
> [ 1201.890778]  [<ffffffff8105be72>] ? worker_thread+0xc2/0x145
> [ 1201.890782]  [<ffffffff8105bdb0>] ? manage_workers.isra.23+0x15b/**
> 0x15b
> [ 1201.890787]  [<ffffffff8105efad>] ? kthread+0x76/0x7e
> [ 1201.890794]  [<ffffffff81351cf4>] ? kernel_thread_helper+0x4/0x10
> [ 1201.890799]  [<ffffffff8105ef37>] ? kthread_worker_fn+0x139/0x139
> [ 1201.890804]  [<ffffffff81351cf0>] ? gs_change+0x13/0x13
>
> and lots of them. With this stack track the CPU load will just increase
> and I have to power cycle it to get the system back. I have added the
> following sysctls:
>
> fs.file-max = 262144
> kernel.pid_max = 262144
> net.ipv4.tcp_rmem = 4096 87380 8388608
> net.ipv4.tcp_wmem = 4096 87380 8388608
> net.core.rmem_max = 25165824
> net.core.rmem_default = 25165824
> net.core.wmem_max = 25165824
> net.core.wmem_default = 131072
> net.core.netdev_max_backlog = 8192
> net.ipv4.tcp_window_scaling = 1
> net.core.optmem_max = 25165824
> net.core.somaxconn = 65536
> net.ipv4.ip_local_port_range = 1024 65535
> kernel.shmmax = 4294967296
> vm.max_map_count = 262144
>
> but the import part I found out was:
> #http://blog.ronnyegner-**consulting.de/2011/10/13/info-**
> task-blocked-for-more-than-**120-seconds/<http://blog.ronnyegner-consulting.de/2011/10/13/info-task-blocked-for-more-than-120-seconds/>
> vm.dirty_ratio=10
>
> which does not seem to help thou.
>
> Now some info on the disk:
> #mount
> /dev/md2 on /data type xfs (rw,noatime,attr2,delaylog,**
> sunit=1024,swidth=4096,**noquota)
>
> cat /proc/meminfo
> MemTotal:       132259720 kB
> MemFree:        122111692 kB
>
> cat /proc/cpuinfo (32 v cores)
> processor    : 31
> vendor_id    : GenuineIntel
> cpu family    : 6
> model        : 45
> model name    : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
>
> Some info the Host:
> # cat /etc/debian_version
> wheezy/sid
> # uname -a
> Linux chaos 3.2.0-3-amd64 #1 SMP Thu Jun 28 09:07:26 UTC 2012 x86_64
> GNU/Linux
> ii  kvm 1:1.1.0+dfsg-3                       dummy transitional package
> from kvm to qemu-kvm
> ii  qemu-kvm 1.1.0+dfsg-3                         Full virtualization on
> x86 hardware
> ii  libvirt-bin 0.9.12-4                             programs for the
> libvirt library
> ii  libvirt0 0.9.12-4                             library for interfacing
> with different virtualization systems
> ii  python-libvirt 0.9.12-4                             libvirt Python
> bindings
> ii  opennebula 3.4.1-3+b1                           controller which
> executes the OpenNebula cluster services
> ii  opennebula-common 3.4.1-3                              empty package
> to create OpenNebula users and directories
> ii  opennebula-sunstone 3.4.1-3                              web interface
> to which executes the OpenNebula cluster services
> ii  opennebula-tools 3.4.1-3                              Command-line
> tools for OpenNebula Cloud
> ii  ruby-opennebula 3.4.1-3                              Ruby bindings for
> OpenNebula Cloud API (OCA)
>
> Any ideas on how to get this working right now the server is a lemon! :0
>
> --
> Jurgen Weber
>
> Systems Engineer
> IT Infrastructure Team Leader
>
> THE ICONIC | E jurgen.weber at theiconic.com.au | www.theiconic.com.au
>
> ______________________________**_________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/**listinfo.cgi/users-opennebula.**org<http://lists.opennebula.org/listinfo.cgi/users-opennebula.org>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20120814/87b82625/attachment-0003.htm>


More information about the Users mailing list