2TB HGST s840 enterprise SAS SED SSD

SF-Rob · Jan 28, 2018

sys-online said:
also they don't work with HBA LSI 2008/2308

O, darn... Any idea why?

Verstuurd vanaf mijn ONEPLUS A5000 met Tapatalk

sys-online · Jan 31, 2018

i guess they are for storages, not for servers...

zdude · Feb 1, 2018

Does anybody know if these would work with an LSI SAS3008?

SF-Rob · Feb 1, 2018

Just added the disks to a colocated server last night.
They are detected correctly on a Dell Perc H200. However, disks are not in use/formatted yet, so don't know 100% certain yet, but I'm very hopefull.

Please note that these disks are thicker than usual SSD's (15mm, it scratched the label/sticker when we inserted it) and have dual SAS connector, so normal SAS/SATA cable and power cannot be connected: you need to use a SAS-hotswap bay or special combined connector.

RTM · Feb 7, 2018

Sorry it took a little longer than I would have liked to, to test the disks. Here are some of my results:

First of all, I can not confirm that there are issues (sorry for the double negation) with the disk on SAS2008 controllers, the disks all work fine on my Dell SAS2008 (forgot the exact model) based controller with P20 IT firmware. I have a couple of extra HBA's inbound, so will probably also test with a SAS2308 at some point.

Second, I can confirm that the disks are s846's, as that was the model that HGST's support site accepted when I registered the S/N of one of my disks so I could download updated firmwares.

Third, there is an updated firmware (E4Z1) for the disks, that you can download from the HGST support site.
So far I have only installed it on one of my disks, because the firmware on them was C22F and not something that is listed on the support site. The upgrade did not update that to E4Z1 but C23F, but it seems to work just fine so far.

Fourth, while I was not having any issues with the disks like what @sys-online is having, it may be explainable by the firmware notes for E4Z1, where the following is stated:

The firmware has been modified to fix an issue where an incorrect error status would
be returned internally while processing multiple uncorrectable read errors. This issue
would cause either a 0B/44 (timeout) or an assert condition and render the drive
unresponsive to the host.

So perhaps a firmware update might solve the issue? (at your own risk of course

)

Fifth, I updated the firmware on one of the disks in the hope that it might clear the SMART issue that @Toddh has seen with smartctl, that I am also seeing. It did not make any difference, however my belief is that the error is not correct. From HGST's support site you can download the HGST Device Manager tool (you need it to update firmware anyway), it can also show some SMART data including a wear level indicator, which shows very little use (0% used or 100% remaining - forgot the actual term they used). I realize that this does not guarantee that there is not some issue with the disks, but again it is my personal belief that it is most likely just an issue with smartctl/smartmontools.

Sixth, I get a lot of weird SMART related logs again indicating "predictive failure" (or something like this), again I am not sure if this is just bad SMART parsing or an actual error. In the coming time I will put some data on the system, hopefully that will be a good way to determine whether it is a real issue or not.

Seventh, I have tested one of my firmware updated disks with f3 (Usage — f3 7.0 documentation) and it reported no issues in terms of capacity.

Churchill · Feb 7, 2018

Thank you for your diagnostics and I'm VERY glad I didn't get these drives.

T_Minus · Feb 7, 2018

@RTM -- You plan to update and test and roll out or r eturn?

RTM · Feb 7, 2018

T_Minus said:
@RTM -- You plan to update and test and roll out or r eturn?

Sorry if it wasn't clear from the post, I intend to keep the disks and update the firmware on all of them, I only have 5 though, so i don't know if it is really a trustworthy sample size for results.

Once that is done I will be testing with some kind of data (probably some non-essential VMs).
By the way I am open to suggestions on good way to test disks, such as for intermittent dropouts/weirdness, and general performance.

As I mentioned I have some SAS controllers coming in, that I am also considering testing, currently I am using a PCIe gen.2 board (A1SRI-2758f) but I think it would be interesting to see if a SAS2308 works better than a SAS2008 based controller even when limited to PCIe gen.2.

fossxplorer · Feb 8, 2018

Was exactly my thoughts too when reports started to appear with different kinds of weirdness. Paying a bit more for less trouble is something i value highly to save time really.

Churchill said:
Thank you for your diagnostics and I'm VERY glad I didn't get these drives.

DouglasteR · Feb 10, 2018

Could you please share some CDI or Hard Disk Sentinel screens with us RTM ?

Thanks.

MiniKnight · Feb 10, 2018

What's the consensus on these? I see them at $330 and that's getting mighty tempting.

wazzy83 · Feb 10, 2018

Just received my bunch of disks for CEPH cluster.

Code:

root@ubuntu:/home/wazzy# hdm scan
[0000000000000000]
  Device Type         = ATA Device
  Device Path         = /dev/sda
  UID                 = 0000000000000000
  Alias               = @ata0
  Model Name          = NETLIST SSD128GB-002

[5000A72030098033]
  Device Type         = SCSI Device
  UID                 = 5000A72030098033
  Alias               = @scsi0
  Vendor Name         = STEC
  Model Name          = Z16IZF2E-2TBUCZ
  Parent Device Type  = MegaRAID Controller
  RAID Controller ID  = 0
  RAID Device ID      = 12

Results for scan: Operation succeeded on 2 of 2 devices.
root@ubuntu:/home/wazzy# hdm generate-report --uid 5000A72030098033
[5000A72030098033]
  Device Type                                            = SCSI Device
  UID                                                    = 5000A72030098033
  Alias                                                  = @scsi0
  Vendor Name                                            = STEC
  Model Name                                             = Z16IZF2E-2TBUCZ
  Serial Number                                          = STM000191EC9
  Firmware Version                                       = C22F
  Capacity                                               = 2000398934016
  Parent Device Type                                     = MegaRAID Controller
  RAID Controller ID                                     = 0
  RAID Device ID                                         = 12
  Sector Count                                           = 3907029168
  Sector Size                                            = 512
  Metadata Size                                          = 0
  DIF Level                                              = None
  Protection Interval                                    = 1
  Multipath Support                                      = true
  SAS Port 1 Width                                       = Narrow (1x)
  SAS Port 1 Physical Link Rate                          = 6 Gb/s
  Device Status                                          = Device Status Unknown
  Life Gauge                                             = 100
  Data Units Read                                        = 21138439280
  Data Units Written                                     = 5089297789
  Host Read Commands                                     = 148678244
  Host Write Commands                                    = 720755257
  Percentage Used                                        = 0
  Temperature                                            = 59
  Temperature Threshold                                  = 65

Results for generate-report: Operation succeeded.
root@ubuntu:/home/wazzy# hdparm -tT /dev/sdb

/dev/sdb:
 Timing cached reads:   17860 MB in  2.00 seconds = 8938.07 MB/sec
 Timing buffered disk reads: 1454 MB in  3.00 seconds = 484.21 MB/sec

root@ubuntu:/home/wazzy# dd if=/dev/zero of=/dev/sdb bs=8k count=100k; 
102400+0 records in
102400+0 records out
838860800 bytes (839 MB, 800 MiB) copied, 1.92176 s, 437 MB/s

Iaroslav · Feb 12, 2018

Got it working with a SAS2308 (LSI 9207-8e) and now carefully increasing the load. After all you guys wrote here I'm worried about data integrity, but as a hot cache drive it already made my day.

Dev_Mgr · Feb 15, 2018

I picked up 2 of these and put them initially in my Dell Precision T7910 (LSI 3008 controller on the motherboard).

Quick formatting them took a long time for some reason. Once formatted copying 35GB of data to them from a regular HDD went at around 170MB/s. Copying from one of these to the other went around 400MB/s.

HD Sentinel 'recommends backing up your data'.

It says the disk is perfect and no problematic or weak sectors were found. It does flag the percentage used endurance indicator as being the smart attribute triggering the flag.

It shows an ASC code of 93 and an ASCQ code of 11. On T10 I cannot find an ASC code of 93 to give me an idea what it is for.

One drive reports the power on time to be nearly 5000 days (not hours), which is obviously impossible as 13 years ago I don't think there were 2TB SSDs yet and the spec sheet that was linked in this thread is from 2013/2014. The other drive has around 440 days powered on (more realistic).

On the drive with 5000 days powered on it says there was 7TB written to it, and on the other it says 12.4TB was written to it.

I'm planning to move these to my Dell PowerEdge T630 (on a PERC H730), but my desktop is easier accessible to test them in.

jap · Feb 17, 2018

sys-online said:
I got a lot of 6 drives:
=== START OF READ SMART DATA SECTION ===
SMART Health Status: FAILURE PREDICTION THRESHOLD EXCEEDED: ascq=0xb [asc=5d, ascq=b]
Percentage used endurance indicator: 0%
Current Drive Temperature: 42 C
Drive Trip Temperature: 65 C
Elements in grown defect list: 0

we got a bunch of 22 drives on wednesday and i make some tests..

according SMART data, we got only one good drive and 21 drives reporting SMART error and predicting failure
according CrystalDiskMark tests both - good and bad drives - have same results
as maybe reported in this thread, it took very long time to format the drive (tested on linux with xfs and ext4 filesystem). in compare to intel dc s3700 400gb drive, where the format is done instantly.
power on days from 2186 to 2265 (about 6 years, curious when this drives was first introduced by HGST in 2014)
drives looks like unused 0.014% endurance used - TBW varies from 6 to 15TB - it's something like nothing, when TBW of S846 2TB series drive is about 110PB and DWPD 37x
it's still very curious why almost all drives we got have so little endurance usage and still reporting "FAILURE PREDICTION THRESHOLD EXCEEDED"
we used only 3G 1068E sas controller, so max transfer rates are low and i trust, that on 6G controller will be so high as written in datasheet

You can see all the reports here: HGST S846 2TB SAS disks from Ebay - reports

Bye

Jan

jap · Feb 17, 2018

okay, some more tests and infos:

a difference in random access can be found between "good" and "bad" drive
the above info tell us, i think, that the SMART status isn't only some missinterretation of readed values, but the drive is at least slowed down in random access
slowed down random access can be the ground for long time, that takes a drive formatting

"Good" drive:

"Bad" drive:

All tests and reports are available here: HGST/STEC S846 2TB SAS Z16IZF2E-2TBUCZ disks from Ebay

Bye

Jan

jap · Feb 17, 2018

RTM said:
Fifth, I updated the firmware on one of the disks in the hope that it might clear the SMART issue that @Toddh has seen with smartctl, that I am also seeing. It did not make any difference, however my belief is that the error is not correct. From HGST's support site you can download the HGST Device Manager tool (you need it to update firmware anyway), it can also show some SMART data including a wear level indicator, which shows very little use (0% used or 100% remaining - forgot the actual term they used). I realize that this does not guarantee that there is not some issue with the disks, but again it is my personal belief that it is most likely just an issue with smartctl/smartmontools.

Sixth, I get a lot of weird SMART related logs again indicating "predictive failure" (or something like this), again I am not sure if this is just bad SMART parsing or an actual error. In the coming time I will put some data on the system, hopefully that will be a good way to determine whether it is a real issue or not.

I think - according to hddtune tests (see my posts) - the reported SMART error is a real error with big impact to drive random access performance. I got one drive in my batch which has no SMART issues and i have done some comparison between "good" and "bad" drive results..

Jan

jap · Feb 17, 2018

DouglasteR said:
Could you please share some CDI or Hard Disk Sentinel screens with us RTM ?

Thanks.

i did - see my posts. even i have one good drive and results from both - good and bad ones..

DouglasteR · Feb 17, 2018

Amazing report Jap !

Thanks.

hifijames · Feb 21, 2018

Thanks for the detailed report. Are you planning to return the bad ones?

jap said:
okay, some more tests and infos:

a difference in random access can be found between "good" and "bad" drive

the above info tell us, i think, that the SMART status isn't only some missinterretation of readed values, but the drive is at least slowed down in random access

slowed down random access can be the ground for long time, that takes a drive formatting

"Good" drive:

"Bad" drive:

All tests and reports are available here: HGST/STEC S846 2TB SAS Z16IZF2E-2TBUCZ disks from Ebay

Bye

Jan

2TB HGST s840 enterprise SAS SED SSD

New Member

New Member

What is a Computer?

New Member

Well-Known Member

Admiral

Build. Break. Fix. Repeat

Well-Known Member

Active Member

Active Member

Well-Known Member

New Member

Member

Active Member

Member

Member

Member

Member

Active Member

Member