Interpreting iostat

yu130960 · Sep 26, 2014

I am getting slowdowns on my pool named stripper. Specifically, it has an NFS datastore for my AIO Esxi and the vm's on that datastore are taking a long time to boot and are just generally sluggish. My OmniOS/napp-it vm appliance is showing high CPU usage and there is no scrub job or anything else operating in the background. I am at a loss as to what is going on. My stats page it is saying rel_avr_dsk is pegged. Anyone shed light on what this is?

gea · Sep 27, 2014

This is used to compare worst disk vs average disks and only relevant if they are not similar
(indicates that a single disk in a pool gives problems).

CPU load is quite low (11%) so I would check
- ESXi release (with 5.5 use at least 5.5u1+NFS patch or 5.5U2)
- try vmxnet3 vnics
- disable sync property on the NFS share (reduce data security)

yu130960 · Sep 27, 2014

Thanks Gea.

I am using the latest build of ESXI 5.5u2, and the napp-it appliance only modified to use the newest tools and adjust the network. I have 4 vcpus and 16 gigs of ram for the napp-it appliance. sync disabled and am using vmxnet3 nics for the napp-it appliance.

Currently all my pools are made up of 2 way mirrored vdevs and raidz1 pool.

I am noticing super high cpu usuage when I am doing large copies. Here is a screen grab of the prstat -avm command in the console when I was doing a 10 TB pool to pool replicate as well as the iostat taken minutes apart on the same 10 TB copy. It looks like the perl/1 process is pegging the cpu. Any ideas?

gea · Sep 27, 2014

The perl processes when using napp-it are
- napp-it web-ui webserver
- replication job monitoring
- background accelerators for ZFS, disk and snap states
- realtime monitoring

where the last ones gives the highest load.
You can compare when you disable realtime monitor and accelerators (top level menu items mon, acc)

But your overall CPU load seems quite low (For real values, check CPU in ESXi as Solaris can only see the CPU performance that is assigned from ESXi) . A low pool performance is not CPU related.

J-san · Dec 2, 2014

You could try using some command line tools as well:

# zpool iostat 1

# iostat xnz 1

I also used the following dtrace script called "rw.d" from:
http://blog.delphix.com/ahl/2014/tuning-openzfs-write-throttle/

Just copy and paste into file and name it rw.d
(watch line endings on file if copying from Windows)

Code:

#pragma D option quiet

BEGIN
{
        start = timestamp;
}

io:::start
{
        ts[args[0]->b_edev, args[0]->b_lblkno] = timestamp;
}

io:::done
/ts[args[0]->b_edev, args[0]->b_lblkno]/
{
        this->delta = (timestamp - ts[args[0]->b_edev, args[0]->b_lblkno]) / 1000;
        this->name = (args[0]->b_flags & (B_READ | B_WRITE)) == B_READ ?
            "read " : "write ";

        @q[this->name] = quantize(this->delta);
        @a[this->name] = avg(this->delta);
        @v[this->name] = stddev(this->delta);
        @i[this->name] = count();
        @b[this->name] = sum(args[0]->b_bcount);

        ts[args[0]->b_edev, args[0]->b_lblkno] = 0;
}

END
{
        printa(@q);

        normalize(@i, (timestamp - start) / 1000000000);
        normalize(@b, (timestamp - start) / 1000000000 * 1024);

        printf("%-30s %11s %11s %11s %11s\n", "", "avg latency", "stddev",
            "iops", "throughput");
        printa("%-30s %@9uus %@9uus %@9u/s %@8uk/s\n", @a, @v, @i, @b);
}

Then run the dtrace script below and then startup your virtual machines to get an idea of latency and iops:

# dtrace -s rw.d -c 'sleep 60'

(this will collect info for 60 seconds, then print out stats)

You can also interrupt it to show the stats earlier (CTRL-C)

Search

Interpreting iostat

yu130960

Member

gea

Well-Known Member

yu130960

Member

gea

Well-Known Member

J-san

Member