What does Media and Data Integrity Errors mean?

elag

Member
Dec 1, 2018
78
14
8
In my fallback server I have a Samsung 970 evo disk that reports the following:
root@lair ~]# smartctl -a /dev/nvme0
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1062.1.2.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number: Samsung SSD 970 EVO 500GB
Serial Number: S466NB0K405934V
Firmware Version: 1B2QEXE7
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 500,107,862,016 [500 GB]
Unallocated NVM Capacity: 0
Controller ID: 4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 500,107,862,016 [500 GB]
Namespace 1 Utilization: 229,721,784,320 [229 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 002538 5481b24802
Local Time is: Thu Oct 10 13:14:44 2019 CEST
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 85 Celsius
Critical Comp. Temp. Threshold: 85 Celsius

Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.20W - - 0 0 0 0 0 0
1 + 4.30W - - 1 1 1 1 0 0
2 + 2.10W - - 2 2 2 2 0 0
3 - 0.0400W - - 3 3 3 3 210 1200
4 - 0.0050W - - 4 4 4 4 2000 8000

Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 37 Celsius
Available Spare: 94%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 10,122,289 [5.18 TB]
Data Units Written: 34,031,482 [17.4 TB]
Host Read Commands: 212,554,037
Host Write Commands: 1,166,877,653
Controller Busy Time: 4,563
Power Cycles: 40
Power On Hours: 10,440
Unsafe Shutdowns: 8
Media and Data Integrity Errors: 12
Error Information Log Entries: 14
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 37 Celsius
Temperature Sensor 2: 42 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged

I know how to read smart data for sata disks, but I have no experience with nvme. I am worried about the
Media and Data Integrity Errors: 12
Yesterday I noticed that there were a few errors reported, but after reading the whole disk with dd the error count increased to 11, another dd read today added one more error.

Does this mean that my NVME SSD is dying? It is not much over one year old and pretty lightly used. Should I get it replaced? Is this a warranty case?
 

elag

Member
Dec 1, 2018
78
14
8
Bump: really nobody that can give me a hint to how bad this "Media and Data Integrity Errors: 12" thingy is? Is this equivalent to the prending errors on SATA or is this (even) worse?
 

vrod

Active Member
Jan 18, 2015
233
33
28
28
You've had 8 unsafe shutdowns, as far as I remember, those 970 EVO drives get their speed from a cache where all data first gets written to. if the cache was handling data at the time of an unsafe shutdown, it could be that this data was then lost, thereby causing what you see, the Media/Data integrity errors. If you get read errors, that could def. be that your SSD is on the way out.