Dell TL2000 LTO-7 drives stopped reading any tapes

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gargravarr

Member
Jul 1, 2021
39
3
8
Hey all,

Again, no tape subforum so hopefully this is close enough.

I have a TL2000 with 2 working drives, an LTO-5 and an LTO-4. Most of my tapes are LTO-5. I got a pair of LTO-7 drives with it, but the media was too expensive. So after updating the firmware and testing them both last year with LTO-6 media (I have a standalone drive so I could justify buying the media), I figured I would sell one of them. Only one had its correct mounting sled, the dual-SAS-port variant, and its warranty sticker was intact, so that's the one I figured would make the most money. The other had no sled, but I had a spare single-port SAS sled that would work.

Well, it took months to sell, but last month, it finally did. And while I was celebrating (because LTO-7s hold their value), it was short-lived.

The buyer, a tape service company, reports that the drive failed a read test using ITDT, providing the logs. They offered to try a deep clean of the read/write head, but this didn't work, it still fails the test.

I bought an LTO-7 tape and have been desperately trying to get the other one to pass an ITDT test, but that's also faulty! Every time I try a System Test or Health Test with an LTO-7 or LTO-6 tape, the drive Aborts, ejects the tape and reports error EC 6, which seems to be Invalid Media? The JSON output from ITDT is below. The library reports Bad Tape but I guess that's just what the drive is telling it.

JSON:
{
  "ITDT HEADER": {
    "START TIME": "Fri Feb 10 15:29:46.846 2023" ,
    "PROGRAM NAME": "ITDT-SE" ,
    "PROGRAM VERSION": "9.5.0.20210517" ,
    "OPERATING SYSTEM": "Linux" ,
    "OPERATING VERS": " Release: 5.10.0-21-amd64 Version:#1 SMP Debian 5.10.162-1 (2023-01-21)" ,
    "SYSTEM INFO": " Systemname:Linux Nodename: athena MachineType:x86_64" ,
    "TAPE DEVICE NAME": "/dev/sg0" ,
    "TAPE DRIVER VERSION": "LinuxDefault" ,
    "PID NAME": "PID_SYSTEM_TEST" ,
    "SERIAL_NO": "10WT015623" ,
    "MODEL_NAME": "ULT3580-HH7" ,
    "MICROCODE": "J4D1" ,
    "DEVICE TYPE": "TAPE" ,
    "DRIVE INTERFACE TYPE": "SAS" ,
    "HOST ID": 0 ,
    "BUS ID": 0 ,
    "SCSI ID": 3 ,
    "LUN ID": 0
  } ,
  "ITDT DISPOSITION": {
    "START TIME": "Fri Feb 10 15:29:50.563 2023" ,
    "DIAG RESULT": "ABORTED" ,
    "DIAG CODE": "EC 6" ,
    "DIAG STEP ID": "INIT DRIVE" ,
    "COMMENTS": ""
  } ,
  "INIT DRIVE": {
    "START TIME": "Fri Feb 10 15:29:46.853 2023" ,
    "TEST RESULT": "ABORTED" ,
    "RESULT CODE": "EC 6" ,
    "SERIAL_NO": "10WT015623" ,
    "MODEL_NAME": "ULT3580-HH7" ,
    "MICROCODE": "J4D1" ,
    "data": [
      {
        "HBAPropertyText": "Driver" ,
        "HBAPropertyValue": "mpt2sas"
      }    ]
  }
}
I cannot believe both drives are faulty. After I tested them in my TL4000 (which I've also sold), I removed them and left them in a pile in my server room, which doesn't get unusually hot or cold. There were two other faulty drives I got with my TL4000 - I'm honestly starting to think they somehow 'infected' my good drives!

I'm now terrified I will have to refund the seller a very, very needed amount of money and that's going to cause me severe problems.

I've tried:
  1. LTO-6 and LTO-7 tapes, the former from a standalone drive which R/W some test data successfully
  2. Downgrading and upgrading the firmware
  3. Different hosts with different HBAs
It's currently attached to a Dell PE210II with a Dell UCS-70 6Gbps SAS HBA (so no chance of incompatibility, everything is Dell!!) running Devuan Linux 4 (Debian 11 without systemd). The library is running the latest F.11 firmware. I've tried the drive on J4D1 (latest) and H9E3 (previous) firmwares (and yes, I made sure to use the 'sas' variant, not 'fc'). I also have an Atto H644 HBA if it helps.

Please tell me there's a simple fix...!

Any help appreciated!

Thanks,
Gargravarr
 

gargravarr

Member
Jul 1, 2021
39
3
8
So, I removed the drive from the sled and plugged it into my PV114X chassis. This gives me power, SAS and even ethernet. However, in this guise, I simply cannot get a SAS connection. The Dell card and the drive show no link, even after rebooting both the server and the chassis. I think there's a problem with it. Outside the library, the single-character display shows E, which I assume is Error.

I managed to get into the integrated webserver:
Code:
 system time: 2023/02/11 19:17:32
drive time : 684 seconds

DRIVE INFORMATION
-----------------
Serial Number              YX10WT015623
Model                      ULT3580-HH7     
Code Level                 LTO7_J4D1
Status                     empty
Single Character Display   E
Status Indicators         
Current Time (origin)      684 seconds (Power On)

HOST INTERFACE
--------------
                                        Port 0             Port 1
Status                                  unknown            unknown
WWID                                    5000E1114092B005   5000E1114092B006
SAS Address                                               
Speed                                   unknown            unknown
Transport Layer Retries this Power-On   0                  0

DRIVE STATISTICS
----------------
Drive Mounts                     1314
Drive MB Written                 172818439
Drive MB Read                    743062601
Power On Hours (current / VPD)   36097 / 36097

TAPE STATISTICS
---------------
Volume Serial     
Tape Mounts       
Tape MB Written   
Tape MB Read     

VPD ENCRYPTION SETTINGS
-----------------------
Encryption Method   None
Key Management      Default (by Method)
BOP Encryption      Disabled

ETHERNET SETTINGS
-----------------
                           Port 0
IP Addresses (Current)     fe80::9abe:94ff:fe19:51aa/64
                           169.254.0.3/24
MAC Address (VPD)          98:BE:94:19:51:AA
Drive IP Address 1 (VPD)   not set
Drive IP Address 2 (VPD)   not set
DHCP (VPD)                 disabled

ERROR CODE LOG
--------------
Power Time stamp                    Code              Media    Media
Count YYYY/MM/DD HH:MM:SS FSC  Flag Lvl  SCD VolSer   MFG      Serial
   55     0 DAYS 02:40:23 2E06 0000 J4D1  6   
   55     0 DAYS 02:40:33 2E06 0000 J4D1  6   
   55     0 DAYS 02:40:44 2E06 0000 J4D1  6   
   55     0 DAYS 02:40:54 2E06 0000 J4D1  6   
   55     0 DAYS 02:41:04 2E06 0000 J4D1  6   
   55     0 DAYS 02:41:16 2E06 0000 J4D1  6   
   55     0 DAYS 02:41:27 2E06 0000 J4D1  6   
   55     0 DAYS 02:41:37 2E06 0000 J4D1  6   
   55     0 DAYS 02:41:48 2E06 0000 J4D1  6   
   55     0 DAYS 02:41:57 2E06 0000 J4D1  6   
   55     0 DAYS 02:42:07 2E06 0000 J4D1  6   
   55     0 DAYS 02:42:19 2E06 0000 J4D1  6   
   55     0 DAYS 02:42:30 2E06 0000 J4D1  6   
   55     0 DAYS 02:42:41 2E06 0000 J4D1  6   
   55     0 DAYS 02:42:51 2E06 0000 J4D1  6   
   55     0 DAYS 02:43:02 2E06 0000 J4D1  6   
   68     0 DAYS 00:06:06 2E07 0004 J4D1  6  T6-001L6 HP       50430459
   69     0 DAYS 00:01:22 2E07 0004 J4D1  6   HP       50430459
   69     0 DAYS 00:03:25 2E07 0004 J4D1  6  T6-001L6 HP       50430459
   69     0 DAYS 00:10:23 2E07 0004 J4D1  6  T6-002L6 HP       50430488
   70     0 DAYS 00:01:12 2E07 0004 J4D1  6   HP       50430488
   70     0 DAYS 00:10:35 2E07 0004 J4D1  6  T6-003L6 HP       50430454
   70     0 DAYS 00:12:39 2E07 0004 J4D1  6  T6-004L6 HP       50501012
   72     0 DAYS 00:45:10 2E07 0004 J4D1  6  SLOT0000 HP       C0XVV61P
   72     0 DAYS 00:55:00 2E07 0004 J4D1  6  SLOT0000 HP       C0XVV61P
   72     0 DAYS 01:19:14 2E07 0004 J4D1  6  T7-001L7 HP       C0XVV61P
   72     0 DAYS 01:33:26 2E07 0004 J4D1  6  T6-002L6 HP       50430488
   72     0 DAYS 01:46:10 2E07 0004 J4D1  6  T6-008L6 HP       50430421
   72     0 DAYS 01:57:52 2E07 0004 J4D1  6  T6-008L6 HP       50430421
   73     0 DAYS 00:24:03 2E07 0004 J4D1  6  T7-001L7 HP       C0XVV61P
   73     0 DAYS 00:51:53 2E07 0004 J4D1  6  T7-001L7 HP       C0XVV61P
   73     0 DAYS 01:04:57 2E07 0004 J4D1  6  T6-008L6 HP       50430421
   74     0 DAYS 00:14:15 2E07 0004 H9E3  6  T6-008L6 HP       50430421
   74     0 DAYS 00:17:29 2E07 0004 H9E3  6  T6-008L6 HP       50430421
   74     0 DAYS 01:07:54 2E07 0004 H9E3  6  T7-001L7 HP       C0XVV61P
   75     0 DAYS 00:01:21 2E07 0004 J4D1  6   HP       C0XVV61P
   75     0 DAYS 01:10:23 2E07 0004 J4D1  6  T5-029L5 FUJIFILM 95R5VNAD
   75     0 DAYS 01:16:21 2E07 0004 J4D1  6  T7-001L7 HP       C0XVV61P
   76     0 DAYS 00:01:12 2E07 0004 J4D1  6   HP       C0XVV61P
   77     0 DAYS 00:01:18 2E07 0004 J4D1  6   HP       C0XVV61P

HOST INFORMATION (DETAILS VIEW)
-------------------------------
Initiator   Last access time   Device name   Host name   WWPN (target port id)   Reservation information   Media removal
No SAS connection. What's weird is that it clearly identifies the media, and the library is communicating the barcode to the drive normally, but it still shows Media Error when loaded (SCD 6). The tapes themselves can't be (all) faulty - I've tried the LTO-6s in my standalone (in the same 114X) and it works. I don't have any other drives that can read LTO-7 but this tape was brand new in the wrapper. Unless it was stored near a magnet, I don't think the tape is at fault.
 

gargravarr

Member
Jul 1, 2021
39
3
8
Figures, not a simple fix. Top is one of my faulty drives. Bottom is the LTO-7. The pin catcher in the bottom drive has broken.
IMG_20230211_195856.resized.jpg

Wonder if I can transplant the intact one from the top drive...
 

Stephan

Well-Known Member
Apr 21, 2017
943
712
93
Germany
EC 6 is mechanical failure to load the tape. The broken leader pin grabber is probably the fault. If it pulls the pin+tape only half way because it loses the pin, you will face a torn tape at some point. Drive will slowly rewind to try and recover, but that is not good for the tape. Might get leader pin lodged somewhere and rip it off.

The hook part could be the same on earlier drives bought cheaply from ebay, because the form factor of these half height drives hasn't changed that much. Tape head is right below, hopefully nothing has been damaged because then its really game over. If trying to replace, under no circumstances loosen or remove the head assembly, because I think I remember you need a precision guide tool to get it back into place precisely. Also there is a lot of vibration going on, screws probably need a bit of Locktite in right color.

Drives from libraries need "Set_Config" over their RS422 port before they come online on SAS. See post by DzurDzen from https://www.reddit.com/r/DataHoarder/comments/mk93n4 and also GitHub - AC7RNsphnHVbyT4/ibm-tape-drive-automatic-standalone: How to turn an IBM Drive into automatic standalone mode Only relevant if you'd want to use the drive standalone.

Drives run Linux, firmware can be analyzed with binwalk. And unpacked with some tools. And since PPC4xx, debugged with BDI 2000 over JTAG. Internally an 1 MiB SCODE.EXE is loaded I guess into another chip/processor, responsible for the low-level drive mechanics. DCODE.EXE is one huge binary that runs the show on the Linux side. If you google this, you will find nothing. There is supposedly a way to dive further into the firmware than the web browser you saw. Maybe another serial port. Or the RS422 port mentioned. Outside IBM, that knowledge is limited to maybe 3-5 people I would say. I have only seen one shop on Youtube repairing drives with a large binder of proprietary knowledge shown. Nowhere else. I have a stripped LTO 5 PCB for experiments, but keep running out of time to look further into it.
 

gargravarr

Member
Jul 1, 2021
39
3
8
Now that's some valuable information, Stephan, thanks.

So far my tapes don't seem to be damaged - I think the broken grabber fails to lock onto the pin, period, so the drive records no tension when it tries to load the tape (I stepped it through by hand and it didn't latch). I swear I tested this drive but maybe I only tested the one I ultimately sold (still doesn't explain why THAT one isn't working, but I don't have it in my hands at present).

I tried to transplant the grabber out of a faulty LTO-3 drive, but it's ever so slightly different in design - the lower guide pin doesn't fit in the track and it doesn't travel around the mechanism properly. The faulty LTO-4 is the same design. Further research required to find one that'll fit. I may have to try to source another of the same drive (but even those are expensive!). Thanks for the warning not to remove the head. So far, the grabber comes out with a bit of wiggling but without disturbing any other components.

At least the SAS connection isn't indicative of another problem.
 

gargravarr

Member
Jul 1, 2021
39
3
8
Okay, so the grabbers are not cross-compatible but at least they're easy to tell apart without opening the drive:
IMG_20230212_092039.resized.jpg
Up to LTO-4 uses the left design, with a large black plastic circle protruding from the casing. LTO-5 and on uses the right design, with a small black circle and a metal retaining ring. The two designs are not interchangeable, but within the design, they are compatible.

Right is my personal LTO-6 drive. I had to cannibalise it for parts, but the grabber is the same design as the LTO-7. I was able to remove the whole arm and swap it without removing the plastic grabber.

Now to test it.
 

gargravarr

Member
Jul 1, 2021
39
3
8
Okay, so the replacement grabber allows it to load tapes. However, I then get a write error in ITDT:
JSON:
  "MOUNT TAPE": {
    "START TIME": "Sun Feb 12 09:25:03.292 2023" ,
    "DURATION": 60233 ,
    "TEST RESULT": "PASSED" ,
    "RESULT CODE": "OK" ,
    "TARGET STATUS": "Status Good" ,
    "HOST STATUS": "Status Good" ,
    "EC": 0
  } ,
  "SYSTEM TEST": {
    "START TIME": "Sun Feb 12 09:26:03.527 2023" ,
    "DURATION": 23320 ,
    "TEST RESULT": "ABORTED" ,
    "RESULT CODE": "WRITE FAILURE" ,
    "TOTAL WRITE RETRIES": 0 ,
    "TOTAL WRITE ERRORS": 0 ,
    "TOTAL WRITTEN (KB)": 0 ,
    "REMAINING CAPACITY (MB)": 5721778 ,
    "_SENSEDATA": "" ,
    "TARGET STATUS": "Status Good" ,
    "HOST STATUS": "Unknown Error" ,
    "EC": 5 ,
    "data": [
      {
        "COMPRESSIBLE": "No" ,
        "Block Size (KB)": 1024 ,
        "Data Rate (MB/s)": 0 ,
        "Avg. Data Rate (MB/s)": 13.681 ,
        "Data Size (MB)": 0 ,
        "Elapsed Time (s)": 0
      }    ]
  }
So I tried to read an LTO-6 tape that already has data on it. It started well, showing the correct filenames, but then threw an I/O error that locked up `tar` and meant I had to reboot the machine.

On reboot, I decided to try the LTO-7 tape and write to it. I grabbed a Windows ISO (~5GB), wrote it to the tape, read it back and compared MD5 checksums - they match. I'm now trying writing ~4TB of video files from my media library (which is going to take a while). However, it's written a few dozen GB already without error. If this works and the checksums match on read, I'll try ITDT again.
 

gargravarr

Member
Jul 1, 2021
39
3
8
So, good news, I solved the immediate problem. Writing all 4TB of media to tape and reading it back on a separate machine worked perfectly. I spot-checked some files with MD5 checksums and they matched. Then the ITDT test passed. I was able to exchange the drive with the buyer and they tested it as good too. So at least I no longer have to worry about refunding the drive.

The buyer sent the faulty drive back. I mounted it in a sled and installed it in my TL2000, and on power-up it threw a SCSI error:
Code:
 HE: internal SCSI cmd failed with check condition
Code: FC 02
mtx is unable to do anything with it - even a status command fails. The TL2000's web UI and front panel will let me move tapes, but I get IO errors in tar if I try to write anything to a tape. ITDT takes an enormous amount of time (10x normal, it's usually instant) to scan for a drive.

Now, here's the curious thing - ITDT health and system tests pass with LTO-7 media. I'm currently running a Full Write test which has been going for 12,626 seconds at 281MB/s without issue.

At a guess, something is up with the interface between the drive and the library (though that doesn't explain the IO errors in the OS). Both drives (the other is my working LTO-5) have Control Path enabled so mtx should be able to control the library, but it just throws a long list of SCSI command errors.

Any ideas?
 

Asuka17377

New Member
Mar 15, 2023
23
8
3
So, good news, I solved the immediate problem. Writing all 4TB of media to tape and reading it back on a separate machine worked perfectly. I spot-checked some files with MD5 checksums and they matched. Then the ITDT test passed. I was able to exchange the drive with the buyer and they tested it as good too. So at least I no longer have to worry about refunding the drive.

The buyer sent the faulty drive back. I mounted it in a sled and installed it in my TL2000, and on power-up it threw a SCSI error:
Code:
 HE: internal SCSI cmd failed with check condition
Code: FC 02
mtx is unable to do anything with it - even a status command fails. The TL2000's web UI and front panel will let me move tapes, but I get IO errors in tar if I try to write anything to a tape. ITDT takes an enormous amount of time (10x normal, it's usually instant) to scan for a drive.

Now, here's the curious thing - ITDT health and system tests pass with LTO-7 media. I'm currently running a Full Write test which has been going for 12,626 seconds at 281MB/s without issue.

At a guess, something is up with the interface between the drive and the library (though that doesn't explain the IO errors in the OS). Both drives (the other is my working LTO-5) have Control Path enabled so mtx should be able to control the library, but it just throws a long list of SCSI command errors.

Any ideas?
I have an idea,It seems that Dell's tape drive is actually IBM's, IBM has a document that may help you.
IBM® TotalStorage® LTO Ultrium Tape Drive SCSI Reference (LTO-5 through LTO-9)
well,sir,my english is not good,so i can't read the document,I can only give you this idea.
 

Asuka17377

New Member
Mar 15, 2023
23
8
3
The following are only guesses:
Maybe LTO7 is too new for tape libraries. Maybe the seller's test is a direct connection with the HBA/FC card. Have you tried upgrading the firmware of the tape library?