@gea In your benchmark report you write:
"Result compared to 2.) is very different.
As the filesystem recordsize is equal to the special_small_block size, all data land on the Optane. This is why you want this feature, to decide if a filesystem writes to regular vdevs or the special vdev."
Well, not really. If I want all data of a filesystem go to a particular drive configuration I setup a pool with that drive configuration, create the filesystem on it and call it a day.
The idea behind the special_small_blocks property is to divide the data into one part that fits the performance characteristics of the pools normal vdev configuration and another part that is better serviced by the pools special vdev configuration. Therefore whith recordsize=128K it would for example be reasonable to set special_small_blocks=32K. The actual value for special_small_blocks would depend on the amount of data you expect with this blocksize or smaller in the filesystem. You don't want to fill the special vdev more than the usual 80% because block allocation on it would then suffer the usual zfs problems of metaslab spacemap loading and gang block creation. In case the special vdev would be filled by 100% data would go to the normal vdevs anyhow. So depending on the size of the special vdev, your data pattern and the performance characteristics the value for special_small_blocks could be smaller than 32K or a little bit bigger but definitely not 128K for a filesystem with this recordsize.
Now for the actual benchmarks. The disk pool you use in the majority of your tests has a normal vdev configuration of 3-way mirrors in a 5 wide stripe. This is a pretty performant disk pool that trades storage efficiency (only 33%) for increased fault tolerance and higher performance. An alternative for configuring those 15 disks would be for example to create a single raidz3 pool with a storage efficiency of 80%. The reason not to use this kind of configuration is in most cases the low random r/w performance of such a pool. This is exactly what allocation classes and in particular the special_small_blocks setting promise to change.
It would be very interesting to see the performance numbers for such a raidz3 pool without a special vdev compared to the same pool with a special vdev made from a mirror of SSDs with special_small_blocks set to e.g. 16K or 32K.
On the issue of measuring random r/w performance I would say that you should in general disable all caching because otherwise you will at least partially just measure cache performance. For measuring sequential r/w performance it is sufficient to run streaming benchmarks long enough that the accumulated data over-floods the caches but for random r/w benchmarking you would have to take more sophisticated measures.
However, at the end of the day all those benchmarks are somehow artificial and the real benefit (or lack of it) would only show in a real world multi-user scenario with mixed access patterns like office applications and video streaming running in parallel on the same pool.