Interpreting iostat

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

yu130960

Member
Sep 4, 2013
127
10
18
Canada
I am getting slowdowns on my pool named stripper. Specifically, it has an NFS datastore for my AIO Esxi and the vm's on that datastore are taking a long time to boot and are just generally sluggish. My OmniOS/napp-it vm appliance is showing high CPU usage and there is no scrub job or anything else operating in the background. I am at a loss as to what is going on. My stats page it is saying rel_avr_dsk is pegged. Anyone shed light on what this is?

 
Last edited:

gea

Well-Known Member
Dec 31, 2010
3,163
1,195
113
DE
This is used to compare worst disk vs average disks and only relevant if they are not similar
(indicates that a single disk in a pool gives problems).

CPU load is quite low (11%) so I would check
- ESXi release (with 5.5 use at least 5.5u1+NFS patch or 5.5U2)
- try vmxnet3 vnics
- disable sync property on the NFS share (reduce data security)
 

yu130960

Member
Sep 4, 2013
127
10
18
Canada
Thanks Gea.

I am using the latest build of ESXI 5.5u2, and the napp-it appliance only modified to use the newest tools and adjust the network. I have 4 vcpus and 16 gigs of ram for the napp-it appliance. sync disabled and am using vmxnet3 nics for the napp-it appliance.

Currently all my pools are made up of 2 way mirrored vdevs and raidz1 pool.

I am noticing super high cpu usuage when I am doing large copies. Here is a screen grab of the prstat -avm command in the console when I was doing a 10 TB pool to pool replicate as well as the iostat taken minutes apart on the same 10 TB copy. It looks like the perl/1 process is pegging the cpu. Any ideas?




 
Last edited:

gea

Well-Known Member
Dec 31, 2010
3,163
1,195
113
DE
The perl processes when using napp-it are
- napp-it web-ui webserver
- replication job monitoring
- background accelerators for ZFS, disk and snap states
- realtime monitoring

where the last ones gives the highest load.
You can compare when you disable realtime monitor and accelerators (top level menu items mon, acc)

But your overall CPU load seems quite low (For real values, check CPU in ESXi as Solaris can only see the CPU performance that is assigned from ESXi) . A low pool performance is not CPU related.
 

J-san

Member
Nov 27, 2014
68
43
18
44
Vancouver, BC
You could try using some command line tools as well:

# zpool iostat 1

# iostat xnz 1

I also used the following dtrace script called "rw.d" from:
http://blog.delphix.com/ahl/2014/tuning-openzfs-write-throttle/

Just copy and paste into file and name it rw.d
(watch line endings on file if copying from Windows)

Code:
#pragma D option quiet

BEGIN
{
        start = timestamp;
}

io:::start
{
        ts[args[0]->b_edev, args[0]->b_lblkno] = timestamp;
}

io:::done
/ts[args[0]->b_edev, args[0]->b_lblkno]/
{
        this->delta = (timestamp - ts[args[0]->b_edev, args[0]->b_lblkno]) / 1000;
        this->name = (args[0]->b_flags & (B_READ | B_WRITE)) == B_READ ?
            "read " : "write ";

        @q[this->name] = quantize(this->delta);
        @a[this->name] = avg(this->delta);
        @v[this->name] = stddev(this->delta);
        @i[this->name] = count();
        @b[this->name] = sum(args[0]->b_bcount);

        ts[args[0]->b_edev, args[0]->b_lblkno] = 0;
}

END
{
        printa(@q);

        normalize(@i, (timestamp - start) / 1000000000);
        normalize(@b, (timestamp - start) / 1000000000 * 1024);

        printf("%-30s %11s %11s %11s %11s\n", "", "avg latency", "stddev",
            "iops", "throughput");
        printa("%-30s %@9uus %@9uus %@9u/s %@8uk/s\n", @a, @v, @i, @b);
}

Then run the dtrace script below and then startup your virtual machines to get an idea of latency and iops:

# dtrace -s rw.d -c 'sleep 60'

(this will collect info for 60 seconds, then print out stats)

You can also interrupt it to show the stats earlier (CTRL-C)
 
  • Like
Reactions: yu130960