Windows SMB performance issues

creamelectricart

New Member
Feb 5, 2019
20
4
3
In case it helps anyone, Oracle support got back to me, and believed this was due to a related bug (Bug 28971493 - 14.9% st_032 iperf204_TCP_IPv4_1M-msg_10 performance regression sparc). The command to issue a temporary fix is:

echo "tcp_reass_max/W 0t1000" | mdb -kw

This does not persist across reboots, and is due to be fixed in SRU9 for 11.4, due out in May.

Not entirely sure still why we were seeing it and not many others (someone else over on the Oracle forum was able to reproduce though), but hopefully this helps anyone else who runs into this.

Also for what it's worth, we reformatted our server and downgraded back to 11.3, and with exactly the same hardware and the same configuration the problem was gone.
 

DedoBOT

Member
Dec 24, 2018
38
5
8
Hah, I'm in the same boat as you. Fighting 1.4 stuttering the whole last week.
Still in the testing period. Details when I reach my desktop at home.

So, my setup:
NAS:
Supermicro X11 SPM-TPF [2x Intel X722 on board]
Xeon silver 4108 1.8Ghz
64 [2x32] ECC RAM 2666 ,running at 24oo due to the cpu limitation.
Samsung EVO plus M.2 NVMe 250GB system drive.

For the tests - bunch of spare 4x4TB HGST Ultrastars and 2x250GB Samsung evo 850, LSI HBA 9207 .

Final config will replace them with still not purchased 2x lsi 9300, 16 x HGST He 12TB SATA, may be SLOG at the m.2, OS at the chipset SATAs , at least this was initial plan .

Client:
Win2012r2 essentials, LSI9361 8x4TB HGST Ultra SATA- RAID10 config, Old Intel SSD for the OS, CPU Intel 3770, 32GB RAM, Asus P8z77,Intel 82599ES 10Gbe [Supermicro AOC-STGN-i2S].
D

Switch:
Netgear GS752TX



Out of the box was the best experience . Pkg update, install desktop and napp-ti, fast setup nappit defaults . Iperf3 9.3+ Gb/s both directions . From the client PC - Win2012r2 hw raid10 8x4TB aray :
With 20GB Prores file it gave me 1GB/s steady read from pool of striped 4x4TB HDDs and 630-640 MB/s write with few short drops . Absolutely same result with the system SSD M.2 . Overall good but something fishy - write speeds looks capped also drops. First look at napp-it shows "soft" iostat errors, equal number on all SATA drives, doesn't matter were they're connected - Intel c622 :8xSATA channels + 4xeSATA , LSI HBA 9207 . The M.2 SSD isn't affected. Not network related in my setup. Pool to pool -rsync, cp are extremely slow - 120-150 MB/s, Gnome's File manager is on par with SMB performance but with drops too. The system M.2 SSD have no single error, drops/stalls are there but definitely rarely than those with SATA drives. .Errors are always on waves , sometimes strictly 30-40 secs interval between., sometime random, rarely they are completely gone,examples:


fmdump -e :
...
Mar 22 17:57:44.6266 ereport.io.scsi.cmd.disk.dev.rqs.derr
Mar 22 17:57:44.6287 ereport.io.scsi.cmd.disk.dev.rqs.derr
Mar 22 17:58:59.0961 ereport.io.scsi.cmd.disk.dev.rqs.derr
Mar 22 17:58:59.0966 ereport.io.scsi.cmd.disk.dev.rqs.derr
...

fmdump -eV :
...

Code:
Mar 22 2019 17:58:59.098274656 ereport.io.scsi.cmd.disk.dev.rqs.derr
nvlist version: 0
    class = ereport.io.scsi.cmd.disk.dev.rqs.derr
    ena = 0x7dd9bdcbcc301c01
    thread-stacks-version = 0x0
    thread-stacks = stack[0] = genunix`fm_report_set+153()|genunix`fm_dev_report_postv+2dc()|scsi`scsi_fm_report_post+2a1()|sd`sd_report_post+d1e()|sd`sd_intr_report_post+19b()|sd`sd_return_command+9c()|sd`sd_return_failed_command+47()|sd`sd_sense_key_illegal_request+c4()|sd`sd_decode_sense+d1()|sd`sd_handle_auto_request_sense+5e()|sd`sdintr+398()|genunix`taskq_d_thread+ca()|unix`thread_start+8()
    detector = (embedded nvlist)
    nvlist version: 0
        version = 0x0
        scheme = dev
        cna_dev = 0x5c9505ec0000008c
        device-path = /pci@0,0/pci15d9,95d@17/disk@2,0
        devid = id1,sd@SATA_____Samsung_SSD_850______S21PNXAG405162Z
    (end detector)

    devid = id1,sd@SATA_____Samsung_SSD_850______S21PNXAG405162Z
    driver-assessment = fail
    op-code = 0x1a
    cdb = 0x1a 0x0 0x3 0x0 0x24 0x0
    pkt-reason = 0x0
    pkt-state = 0x37
    pkt-stats = 0x0
    pkt-hrt-dev = 0
    pkt-hrt-hba = 0
    stat-code = 0x2
    key = 0x5
    asc = 0x24
    ascq = 0x0
    sense-data = 0xf0 0x0 0x5 0x0 0x0 0x0 0x0 0xa 0x0 0x0 0x0 0x0 0x24 0x0 0x0 0x0 0x0 0x0 0x0 0x0
    skaarssa = 0x205240000370004
    __ttl = 0x1
    __tod = 0x5c950643 0x5db8d60
    __hrt = 79017128279235

Mar 22 2019 17:58:59.280736079 ereport.io.scsi.cmd.disk.dev.rqs.derr
nvlist version: 0
    class = ereport.io.scsi.cmd.disk.dev.rqs.derr
    ena = 0x7dda6bce13e03001
    thread-stacks-version = 0x0
    thread-stacks = stack[0] = genunix`fm_report_set+153()|genunix`fm_dev_report_postv+2dc()|scsi`scsi_fm_report_post+2a1()|sd`sd_report_post+d1e()|sd`sd_intr_report_post+19b()|sd`sd_return_command+9c()|sd`sd_return_failed_command+47()|sd`sd_sense_key_illegal_request+c4()|sd`sd_decode_sense+d1()|sd`sd_handle_auto_request_sense+5e()|sd`sdintr+398()|genunix`taskq_thread+3ad()|unix`thread_start+8()
    detector = (embedded nvlist)
    nvlist version: 0
        version = 0x0
        scheme = dev
        cna_dev = 0x5c9505ec0000008d
        device-path = /pci@0,0/pci15d9,95d@17/disk@4,0
        devid = id1,sd@SATA_____HGST_HUS724040AL______PK2331PAJEL0PT
    (end detector)

    devid = id1,sd@SATA_____HGST_HUS724040AL______PK2331PAJEL0PT
    driver-assessment = fail
    op-code = 0x1a
    cdb = 0x1a 0x0 0x3 0x0 0x24 0x0
    pkt-reason = 0x0
    pkt-state = 0x37
    pkt-stats = 0x0
    pkt-hrt-dev = 0
    pkt-hrt-hba = 0
    stat-code = 0x2
    key = 0x5
    asc = 0x24
    ascq = 0x0
    sense-data = 0xf0 0x0 0x5 0x0 0x0 0x0 0x0 0xa 0x0 0x0 0x0 0x0 0x24 0x0 0x0 0x0 0x0 0x0 0x0 0x0
    skaarssa = 0x205240000370004
    __ttl = 0x1
    __tod = 0x5c950643 0x10bbb14f
    __hrt = 79017310740798

Mar 22 2019 17:58:59.280736186 ereport.io.scsi.cmd.disk.dev.rqs.derr
nvlist version: 0
    class = ereport.io.scsi.cmd.disk.dev.rqs.derr
    ena = 0x7dda6bce13a01001
    thread-stacks-version = 0x0
    thread-stacks = stack[0] = genunix`fm_report_set+153()|genunix`fm_dev_report_postv+2dc()|scsi`scsi_fm_report_post+2a1()|sd`sd_report_post+d1e()|sd`sd_intr_report_post+19b()|sd`sd_return_command+9c()|sd`sd_return_failed_command+47()|sd`sd_sense_key_illegal_request+c4()|sd`sd_decode_sense+d1()|sd`sd_handle_auto_request_sense+5e()|sd`sdintr+398()|genunix`taskq_d_thread+ca()|unix`thread_start+8()
    detector = (embedded nvlist)
    nvlist version: 0
        version = 0x0
        scheme = dev
        cna_dev = 0x5c9505ec0000008f
        device-path = /pci@0,0/pci15d9,95d@17/disk@6,0
        devid = id1,sd@SATA_____HGST_HUS724040AL______PK2331PAJEL07T
    (end detector)

    devid = id1,sd@SATA_____HGST_HUS724040AL______PK2331PAJEL07T
...

Tuning, testing begun, the things got from overall to worst and finally - bad :). Now stalls are 20-30 sec. each, like the OP describes . Reinstalling, will give Solaris a brake for a while , but will follow the issue .
 
Last edited:

DedoBOT

Member
Dec 24, 2018
38
5
8
Thanks _Gea for the rapid input.
I'm not so sure, I had this in mind and may be tried with pre-nappitt BE and fmdump -eV still produce same errors but cant remember . Chaotic approach like mine voids usually to this.
First thing to check tomorrow .

Edit:
no napp-it, no smartmontools but the errors are still:
fmdump -e:
...
Mar 26 14:57:47.4307 ereport.io.scsi.cmd.disk.dev.rqs.derr
Mar 26 14:57:47.4316 ereport.io.scsi.cmd.disk.dev.rqs.derr
Mar 26 14:57:47.4401 ereport.io.scsi.cmd.disk.dev.rqs.derr
Mar 26 14:57:47.4410 ereport.io.scsi.cmd.disk.dev.rqs.derr
...



fmdump -eV:
Code:
Mar 26 2019 14:57:47.506014598 ereport.io.scsi.cmd.disk.dev.rqs.derr
nvlist version: 0
        class = ereport.io.scsi.cmd.disk.dev.rqs.derr
        ena = 0xa8bbd8443a00001
        thread-stacks-version = 0x0
        thread-stacks = stack[0] = genunix`fm_report_set+153()|genunix`fm_dev_report_postv+2dc()|scsi`scsi_fm_report_post+2a1()|sd`sd_report_                                                                                                 post+d1e()|sd`sd_intr_report_post+19b()|sd`sd_return_command+9c()|sd`sd_return_failed_command+47()|sd`sd_sense_key_illegal_request+c4()|sd`sd                                                                                                 _decode_sense+d1()|sd`sd_handle_auto_request_sense+5e()|sd`sdintr+398()|genunix`taskq_thread+3ad()|unix`thread_start+8()
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = dev
                cna_dev = 0x5c9a21050000008c
                device-path = /pci@0,0/pci15d9,95d@11,5/disk@2,0
                devid = id1,sd@SATA_____Samsung_SSD_850______S21PNSAG262932V
        (end detector)

        devid = id1,sd@SATA_____Samsung_SSD_850______S21PNSAG262932V
        driver-assessment = fail
        op-code = 0x1a
        cdb = 0x1a 0x0 0x3 0x0 0x24 0x0
        pkt-reason = 0x0
        pkt-state = 0x37
        pkt-stats = 0x0
        pkt-hrt-dev = 0
        pkt-hrt-hba = 0
        stat-code = 0x2
        key = 0x5
        asc = 0x24
        ascq = 0x0
        sense-data = 0xf0 0x0 0x5 0x0 0x0 0x0 0x0 0xa 0x0 0x0 0x0 0x0 0x24 0x0 0x0 0x0 0x0 0x0 0x0 0x0
        skaarssa = 0x205240000370004
        __ttl = 0x1
        __tod = 0x5c9a21cb 0x1e292b86
        __hrt = 724706018362

Mar 26 2019 14:57:47.506463877 ereport.io.scsi.cmd.disk.dev.rqs.derr
nvlist version: 0
        class = ereport.io.scsi.cmd.disk.dev.rqs.derr
        ena = 0xa8bbdf221602001
        thread-stacks-version = 0x0
        thread-stacks = stack[0] = genunix`fm_report_set+153()|genunix`fm_dev_report_postv+2dc()|scsi`scsi_fm_report_post+2a1()|sd`sd_report_                                                                                                 post+d1e()|sd`sd_intr_report_post+19b()|sd`sd_return_command+9c()|sd`sd_return_failed_command+47()|sd`sd_sense_key_illegal_request+c4()|sd`sd                                                                                                 _decode_sense+d1()|sd`sd_handle_auto_request_sense+5e()|sd`sdintr+398()|genunix`taskq_thread+3ad()|unix`thread_start+8()
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = dev
                cna_dev = 0x5c9a21050000008d
                device-path = /pci@0,0/pci15d9,95d@11,5/disk@3,0
                devid = id1,sd@SATA_____Samsung_SSD_850____
 
Last edited:

TRACKER

Member
Jan 14, 2019
58
14
8
This is from Oracle support portal:
fmdump -eV reports ereport.io.scsi.cmd.disk.dev.rqs.derr associated with SCSI Mode Select or SCSI Mode Sense commands (Doc ID 1519925.1)

Applies to:
Solaris Operating System - Version 11 and later
Information in this document applies to any platform.
Symptoms

The Fault Manager Daemon (fmd) runs in the background on each Solaris system and receives telemetry information relating to problems detected by the system software, diagnoses these problems, and initiates proactive self-healing activities such as disabling faulty components. Reference the fmd and fmdump man pages for more information.

The following events from the fault management error log are typically triggered by the Fault Manager Daemon disk-transport module:

% sudo fmdump -eV -c ereport.io.scsi.cmd.disk.dev.rqs.derr
TIME CLASS
Dec 26 2012 17:09:27.369690466 ereport.io.scsi.cmd.disk.dev.rqs.derr
nvlist version: 0
class = ereport.io.scsi.cmd.disk.dev.rqs.derr
ena = 0xc2ded966d3f02c01
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
cna_dev = 0x50db753a0000002d
device-path = /pci@0,0/pci8086,3c04@2/pci1000,3000@0/iport@f/disk@w5000a72b30066ac0,0
devid = id1,sd@n5000a72030066ac0
(end detector)

devid = id1,sd@n5000a72030066ac0
driver-assessment = info
op-code = 0x15
cdb = 0x15 0x11 0x0 0x0 0x20 0x0
pkt-reason = 0x0
pkt-state = 0x3f
pkt-stats = 0x0
stat-code = 0x2
key = 0x5
asc = 0x26
ascq = 0x0
sense-data = 0x70 0x0 0x5 0x0 0x0 0x0 0x0 0x18 0x0 0x0 0x0 0x0 0x26 0x0 0x0 0x0 0x0 0x0 0x0 0x0
__ttl = 0x1
__tod = 0x50db7597 0x16090762
Jan 02 2013 04:44:01.961151613 ereport.io.scsi.cmd.disk.dev.rqs.derr
nvlist version: 0
class = ereport.io.scsi.cmd.disk.dev.rqs.derr
ena = 0x9a922c06ff304001
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
cna_dev = 0x50db753a0000018d
device-path = /pci@0,0/pci8086,3c08@3/pci1000,3020@0/iport@40/disk@w5e83a97000003ef3,0
devid = id1,sd@n5e83a97f0f19a1a0
(end detector)

devid = id1,sd@n5e83a97f0f19a1a0
driver-assessment = fail
op-code = 0x1a
cdb = 0x1a 0x0 0x3 0x0 0x24 0x0
pkt-reason = 0x0
pkt-state = 0x37
pkt-stats = 0x0
stat-code = 0x2
key = 0x0
asc = 0xfe
ascq = 0xca
sense-data = 0x40 0x1 0x0 0x0 0xfe 0xca 0xdd 0xba 0xfe 0xca 0xdd 0xba 0xfe 0xca 0xdd 0xba 0xfe 0xca 0xdd 0xba
__ttl = 0x1
__tod = 0x50e40161 0x394a027d


Changes
The following DTrace script was used to identify the trigger of the SCSI MODE SENSE command resulting in error class ereport.io.scsi.cmd.disk.dev.rqs.derr events:

#!/usr/sbin/dtrace -qs
/*
* Identify trigger of modesense resulting in ereport.io.scsi.cmd.disk.dev.rqs.derr fma events.
*
*/

BEGIN
{
printf("Control+C to interrupt\n");
}

fbt:sd:sd_send_scsi_MODE_SENSE:entry
{
printf("%s+%x triggered by:\n", probefunc, arg0);
printf("UID=%d PID=%d PPID=%d CMD=%s\n",
curpsinfo->pr_euid,pid,curpsinfo->pr_ppid,curpsinfo->pr_psargs);
ustack();
stack();
exit(0);
}
Sample output:

% sudo ./modesense.d
Control+C to interrupt
sd_send_scsi_MODE_SENSE+ffffc1c04a178800 triggered by:
UID=0 PID=898 PPID=1 CMD=/usr/lib/fm/fmd/fmd

libc.so.1`syscall+0x13
libc.so.1`__open+0x29
libc.so.1`open+0xc7
libdiskstatus.so.1`disk_status_open+0x4a
disk-transport.so`dt_test_disk+0xae
disk-transport.so`dt_timeout+0xc9
fmd`fmd_module_dispatch+0x207
fmd`fmd_module_start+0x11b
fmd`fmd_thread_start+0x60
libc.so.1`_thrp_setup+0x9d
libc.so.1`_lwp_start

sd`sd_get_physical_geometry+0xbf
sd`sd_tg_getinfo+0x1d3
cmlb`cmlb_resync_geom_caches+0x140
cmlb`cmlb_validate_geometry+0x9a
cmlb`cmlb_validate+0x5b
sd`sd_ready_and_valid+0x249
sd`sdopen+0x28a
genunix`dev_open+0x55
specfs`spec_open+0x606
genunix`fop_open+0x183
genunix`vn_openat+0x736
genunix`copen+0x493
genunix`openat32+0x27
unix`_sys_sysenter_post_swapgs+0x149
Note the CMD=/usr/lib/fm/fmd/fmd, the associated module disk-transport.so, and function dt_test_disk.

% sudo fmadm config | egrep "MODULE|disk-transport"
MODULE VERSION STATUS DESCRIPTION
disk-transport 2.1 active Disk Transport Agent


Cause
A decode of the first error event above:

op-code = 0x15 -> MODE SELECT(6)
stat-code = 0x2 -> CHECK CONDITION
key = 0x5 -> ILLEGAL REQUEST
asc = 0x26 ascq = 0x0 -> INVALID FIELD IN PARAMETER LIST

A decode of the second error event above:

op-code = 0x1a -> MODE SENSE(6)
cdb = 0x1a 0x0 0x3 0x0 0x24 0x0 -> SCSI Mode Page 0x3 -> Format parameters (direct-access devices)
stat-code = 0x2 -> CHECK CONDITION
key = 0x0 -> NO SENSE

As there is no sense, the sense data displayed is a little-endian 0xfe 0xca 0xdd 0xba. Note that a buffer with value of 0xbaddcafe indicates the buffer has been allocated, but is uninitialized.

Support for SCSI Operation Codes 0x15 MODE SELECT(6) and 0x1a MODE SENSE(6) is optional for everything except SEQUENTIAL ACCESS DEVICES, i.e. tape drives.

The SCSI Mode Select command is used to modify device information contained in mode pages in a SCSI target device.

The SCSI Mode Sense command is used to obtain current device information from mode pages in a SCSI target device.

From the SCSI Primary Commands Specification:

If the logical unit does not implement saved mode pages and the SP bit is set to one, then the command shall be terminated with CHECK CONDITION status, with the sense key set to ILLEGAL REQUEST, and the additional sense code set to INVALID FIELD IN CDB.

If an application client issues a MODE SENSE command with a page code or subpage code value not implemented by the logical unit, the command shall be terminated with CHECK CONDITION status, with the sense key set to ILLEGAL REQUEST, and the additional sense code set to INVALID FIELD IN CDB.

If the following is correctly returned in response to an unsupported MODE SELECT or MODE SENSE command, the Fault Manager Daemon should not create an error event:

stat-code = 0x2 -> CHECK CONDITION
key = 0x5 -> ILLEGAL REQUEST
asc = 0x24 ascq = 0x0 -> INVALID FIELD IN CDB

An enhancement was implemented in Solaris 11.3 and higher such that Fault Management Architecture (FMA) now logs an event where the driver assessment is info as an informational report (ireport), not an error report (ereport), such as an ILLEGAL REQUEST associated with a MODE SELECT command.

For SCSI Operation Codes reference http://www.t10.org/lists/op-num.txt
For SCSI Status Codes reference: SCSI Status Codes
For SCSI Sense Keys reference: SCSI Sense Keys
For SCSI ASC/ASCQ Assignments reference: http://www.t10.org/lists/asc-num.txt

Solution
Contact the disk vendor for updated firmware which complies with the SCSI Primary Commands Specification.
 

DedoBOT

Member
Dec 24, 2018
38
5
8
Hundred of thanks , Tracker!
Few days I'm lurking around the this doc's content w/o luck .
 
Last edited:

DedoBOT

Member
Dec 24, 2018
38
5
8
Hmm, UEFI installation and no errors, few days so far. Default setup without any "exotic" setting in the bios. All OPROMs to UEFI ant that is. Used ipmi virtual drive for the installation media.