Well setting the vm.dirty_ratio to such a low number insures that any memory issues you might face will be masked but your core problem will still exist. Keep in mind you should never be having hard crashes on a Linux system with production kernel and drivers unless there is faulty hardware. I suggest you run a memtest86+ full test.<br>
<br><div class="gmail_quote">On Tue, Aug 14, 2012 at 9:03 PM, Jurgen Weber <span dir="ltr"><<a href="mailto:jurgen.weber@theiconic.com.au" target="_blank">jurgen.weber@theiconic.com.au</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    Right, I have solved this by:<br>
    <br>
    sysctl vm.dirty_ratio=2<br>
    <br>
    mdadm --create --verbose /dev/md2 --level=raid6 -n5 -f /dev/sda2
    /dev/sdb2 /dev/sdc2 /dev/sdd2 /dev/sde2<br>
    <br>
    mkfs.xfs -b size=4096 -d sunit=512,swidth=1536 -L data /dev/md2<br>
    <br>
    and fstab:<br>
    /dev/md2 /data          xfs      noatime,sunit=512,swidth=1536   
    0    0<br>
    <br>
    Not sure if it was any one of these that have fixed it up, or the
    combination of all of the above but now the thing does not stop
    writing and all of my vm's are running.<br>
    <br>
    Thanks<span><font color="#888888"><br>
    <br>
    Jurgen</font></span><div><div><br>
    <div>On 15/08/12 02:12, Shankhadeep Shome
      wrote:<br>
    </div>
    <blockquote type="cite">A hard crash with high i/o can be due to bad memory
      modules, I would run a memory burn program to make sure your
      hardware is actually stable.
      <div><br>
      </div>
      <div>Shank<br>
        <br>
        <div class="gmail_quote">On Tue, Aug 14, 2012 at 2:42 AM, Jurgen
          Weber <span dir="ltr"><<a href="mailto:jurgen.weber@theiconic.com.au" target="_blank">jurgen.weber@theiconic.com.au</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Guys<br>
            <br>
            I have a new KVM server, running software raid (mdadm) and
            the VM disk are help in a raid 5 with 5 disks (the system is
            on SSDs in a mirror).<br>
            <br>
            So far I have about 10 VM's setup, but they are all unable
            to function because after we have a few up, and then start
            to deploy/resubmit the VM's which have never booted properly
            the disk IO will stop, the scp process will hang and it all
            stops. You will then find the following error in dmesg:<br>
            <br>
            [ 1201.890311] INFO: task kworker/1:1:6185 blocked for more
            than 120 seconds.<br>
            [ 1201.890430] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
            disables this message.<br>
            [ 1201.890569] kworker/1:1     D ffff88203fc13740     0
             6185      2 0x00000000<br>
            [ 1201.890573]  ffff881ffe510140 0000000000000046
            0000000000000000 ffff881039023590<br>
            [ 1201.890580]  0000000000013740 ffff8820393fffd8
            ffff8820393fffd8 ffff881ffe510140<br>
            [ 1201.890586]  0000000000000000 0000000100000000
            0000000000000001 7fffffffffffffff<br>
            [ 1201.890593] Call Trace:<br>
            [ 1201.890597]  [<ffffffff81349d2e>] ?
            schedule_timeout+0x2c/0xdb<br>
            [ 1201.890605]  [<ffffffff810ebdbf>] ?
            kmem_cache_alloc+0x86/0xea<br>
            [ 1201.890610]  [<ffffffff8134a58a>] ?
            __down_common+0x9b/0xee<br>
            [ 1201.890631]  [<ffffffffa0452c57>] ?
            xfs_getsb+0x28/0x3b [xfs]<br>
            [ 1201.890635]  [<ffffffff81063111>] ? down+0x25/0x34<br>
            [ 1201.890648]  [<ffffffffa041566f>] ?
            xfs_buf_lock+0x65/0x9d [xfs]<br>
            [ 1201.890665]  [<ffffffffa0452c57>] ?
            xfs_getsb+0x28/0x3b [xfs]<br>
            [ 1201.890685]  [<ffffffffa045b957>] ?
            xfs_trans_getsb+0x64/0xb4 [xfs]<br>
            [ 1201.890704]  [<ffffffffa0452a40>] ?
            xfs_mod_sb+0x21/0x77 [xfs]<br>
            [ 1201.890720]  [<ffffffffa0422736>] ?
            xfs_reclaim_inode+0x22d/0x22d [xfs]<br>
            [ 1201.890734]  [<ffffffffa041a43e>] ?
            xfs_fs_log_dummy+0x61/0x75 [xfs]<br>
            [ 1201.890754]  [<ffffffffa04573a7>] ?
            xfs_log_need_covered+0x4d/0x8d [xfs]<br>
            [ 1201.890769]  [<ffffffffa0422770>] ?
            xfs_sync_worker+0x3a/0x6a [xfs]<br>
            [ 1201.890773]  [<ffffffff8105aeaa>] ?
            process_one_work+0x163/0x284<br>
            [ 1201.890778]  [<ffffffff8105be72>] ?
            worker_thread+0xc2/0x145<br>
            [ 1201.890782]  [<ffffffff8105bdb0>] ?
            manage_workers.isra.23+0x15b/0x15b<br>
            [ 1201.890787]  [<ffffffff8105efad>] ?
            kthread+0x76/0x7e<br>
            [ 1201.890794]  [<ffffffff81351cf4>] ?
            kernel_thread_helper+0x4/0x10<br>
            [ 1201.890799]  [<ffffffff8105ef37>] ?
            kthread_worker_fn+0x139/0x139<br>
            [ 1201.890804]  [<ffffffff81351cf0>] ?
            gs_change+0x13/0x13<br>
            <br>
            and lots of them. With this stack track the CPU load will
            just increase and I have to power cycle it to get the system
            back. I have added the following sysctls:<br>
            <br>
            fs.file-max = 262144<br>
            kernel.pid_max = 262144<br>
            net.ipv4.tcp_rmem = 4096 87380 8388608<br>
            net.ipv4.tcp_wmem = 4096 87380 8388608<br>
            net.core.rmem_max = 25165824<br>
            net.core.rmem_default = 25165824<br>
            net.core.wmem_max = 25165824<br>
            net.core.wmem_default = 131072<br>
            net.core.netdev_max_backlog = 8192<br>
            net.ipv4.tcp_window_scaling = 1<br>
            net.core.optmem_max = 25165824<br>
            net.core.somaxconn = 65536<br>
            net.ipv4.ip_local_port_range = 1024 65535<br>
            kernel.shmmax = 4294967296<br>
            vm.max_map_count = 262144<br>
            <br>
            but the import part I found out was:<br>
            #<a href="http://blog.ronnyegner-consulting.de/2011/10/13/info-task-blocked-for-more-than-120-seconds/" target="_blank">http://blog.ronnyegner-consulting.de/2011/10/13/info-task-blocked-for-more-than-120-seconds/</a><br>


            vm.dirty_ratio=10<br>
            <br>
            which does not seem to help thou.<br>
            <br>
            Now some info on the disk:<br>
            #mount<br>
            /dev/md2 on /data type xfs (rw,noatime,attr2,delaylog,sunit=1024,swidth=4096,noquota)<br>
            <br>
            cat /proc/meminfo<br>
            MemTotal:       132259720 kB<br>
            MemFree:        122111692 kB<br>
            <br>
            cat /proc/cpuinfo (32 v cores)<br>
            processor    : 31<br>
            vendor_id    : GenuineIntel<br>
            cpu family    : 6<br>
            model        : 45<br>
            model name    : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz<br>
            <br>
            Some info the Host:<br>
            # cat /etc/debian_version<br>
            wheezy/sid<br>
            # uname -a<br>
            Linux chaos 3.2.0-3-amd64 #1 SMP Thu Jun 28 09:07:26 UTC
            2012 x86_64 GNU/Linux<br>
            ii  kvm 1:1.1.0+dfsg-3                       dummy
            transitional package from kvm to qemu-kvm<br>
            ii  qemu-kvm 1.1.0+dfsg-3                         Full
            virtualization on x86 hardware<br>
            ii  libvirt-bin 0.9.12-4                            
            programs for the libvirt library<br>
            ii  libvirt0 0.9.12-4                             library
            for interfacing with different virtualization systems<br>
            ii  python-libvirt 0.9.12-4                            
            libvirt Python bindings<br>
            ii  opennebula 3.4.1-3+b1                          
            controller which executes the OpenNebula cluster services<br>
            ii  opennebula-common 3.4.1-3                            
             empty package to create OpenNebula users and directories<br>
            ii  opennebula-sunstone 3.4.1-3                            
             web interface to which executes the OpenNebula cluster
            services<br>
            ii  opennebula-tools 3.4.1-3                            
             Command-line tools for OpenNebula Cloud<br>
            ii  ruby-opennebula 3.4.1-3                            
             Ruby bindings for OpenNebula Cloud API (OCA)<br>
            <br>
            Any ideas on how to get this working right now the server is
            a lemon! :0<span><font color="#888888"><br>
                <br>
                -- <br>
                Jurgen Weber<br>
                <br>
                Systems Engineer<br>
                IT Infrastructure Team Leader<br>
                <br>
                THE ICONIC | E <a href="mailto:jurgen.weber@theiconic.com.au" target="_blank">jurgen.weber@theiconic.com.au</a> | <a href="http://www.theiconic.com.au" target="_blank">www.theiconic.com.au</a><br>
                <br>
                _______________________________________________<br>
                Users mailing list<br>
                <a href="mailto:Users@lists.opennebula.org" target="_blank">Users@lists.opennebula.org</a><br>
                <a href="http://lists.opennebula.org/listinfo.cgi/users-opennebula.org" target="_blank">http://lists.opennebula.org/listinfo.cgi/users-opennebula.org</a><br>
              </font></span></blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
    <pre cols="72">-- 
Jurgen Weber

Systems Engineer
IT Infrastructure Team Leader 

THE ICONIC | E <a href="mailto:jurgen.weber@theiconic.com.au" target="_blank">jurgen.weber@theiconic.com.au</a> | <a href="http://www.theiconic.com.au" target="_blank">www.theiconic.com.au</a></pre>
  </div></div></div>

</blockquote></div><br>