Adaptec 72405 SH***S Itself after logging into windows

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Brokenbitsagain

New Member
Mar 18, 2023
9
0
1
I dont often have a problem that i cant work out myself but this one has me stumpted I have a adaptec 72405 raid controller and about 2 months ago it started giving IO errors and raid port reset entries in windows event viewer but the thing is it works fine sometimes not very often mind you also if i dont log into windows the Vms using the raid arrays work fine last time i left it 2 weeks before i logged into windows and it worked fine for another week before i had to reset the system for other reasons then not long after logging into windows task manager shows 100% disk usage on both arrays but 0MBS read write speeds and going anymore that 1 layer deep into the file structure locks up windows explorer i have reinstalled windows a couple of times every time it seems to work for a bit then goes back to doing the same thing i also swapped out the power supply tryed 2 different 2nd hand PSU's i have nothing changed i tried popping acouple of disks of each array out it started fine when i put the disks back in it rebuilt them and worked fine untill i reset the system then went back to the same old problem so im thinking either windows is installing a update that my raid card doesnt like which loads once a user is logged on or all my power supplies are stuffed which i think would be unlikely. Im open to any suggestions as what my problem could be
 

DavidWJohnston

Active Member
Sep 30, 2020
242
188
43
When I Google raid port reset windows event I get quite a few articles suggesting various fixes. Have you tried any of them? Like these pages:


.

https://www.reddit.com/r/techsupport/comments/c9rkak
You may have a drive (or multiple) which are close to failing, and when the system tries to read/write certain regions of the disk in a certain way, it causes the disk(s) to go into panic/recovery mode - Consumer drives not meant for NAS can do this, and prevent the controller from accessing them for a long time while they attempt to recover. Since they are in a wait state, the controller doesn't know whether to consider them failed, or to wait.

Even when reinstalling windows, the OS installer and controller may be laying down the same bits in the same places on the disks, setting it up for failure again. Using the system as a hypervisor may not trigger the same issue in the same way, perhaps because the corrupted portion of the filesystem is not being accessed until a user logs in.

Let me know which of the potential fixes you've tried. Good Luck.
 

Brokenbitsagain

New Member
Mar 18, 2023
9
0
1
Ok so its a amd board so the rules out intel storage drivers I tried sfc come up no integrity violations although i had previously done that aswell as updated the driver from microsofts to adaptecs latest after the registry tweeks and checking the ahci settings it was already set to what the page recommended i also disabled fast startup still the same. Reluctant to blame a failing disk as ive pulled disks and done rebuilds without problems out side of windows then once its done log into windows and it just refuses to respond shares file tranfers/backup all work flawlessly untill i login to windows im now having a similar problem on a second system that i though was fixed with a new power supply but now is having similar issue but not exactly the same as the first it was working fine after the power supply replacement until i tried expanding the array with new disks just yesterday but this one doesnt work fine even if not logged into windows its my backup system luckily i have a 3rd system that backs up that system so as long as the 3rd system doesnt have issues my data should be safe.
 

Moopere

New Member
Mar 19, 2023
26
3
3
Locking up and generally misbehaving only when you login to windows along with device reset errors sounds to me like either a failing drive or a corrupted volume. The reason it only manifests (for now) when you login is because the act of login in is reaching parts of the volume that normally are not used if you are not logged in ... ie, some files are only going to be referenced during a live session - if these files reside on a problematic part of the filesystem then that is likely whats causing the bizarre behaviour.

You'd hope that a RAID controller would notice a drive with a serious problem, drop it from the stack and continue to serve up the volume - but I've had many _many_ cases of serious drive problems locking up a RAID volume and thereby locking up the machine. Pull the drive and life is good again.
 
Last edited:
  • Like
Reactions: DavidWJohnston

Brokenbitsagain

New Member
Mar 18, 2023
9
0
1
After all this time and pulling any drive that even remotely seems suspect im still having the same problems sometimes if i dont login to windows for a while and use everything i need it for for say a week then login to windows it will work fine also sometimes i can log into windows after a restart and it will work fine im no longer getting port reset errors in event viewer but still having drives show 100% usage in task manager with 0mbs read or right and system locking up if i use explorer to try and read those drives what gets me is why sometimes it works and sometimes it doesnt im sure if there was any problems with disconnecting drives or misbehaving drives something would of presented itself when i distroyed the arrays did a full format of all the drives and rebuilt the array, looking through the loggs of the raid management software there is a entry that says
[ sysmgt:ERROR ][ 07/06/23 11:54:38:739 ]Heartbeat Monitor - Exception in call to CIM for getPollStatus CIM_ERR_FAILED: :: Operation Failed(cimadapter,isDriverAndFirmwareResponding) and in the past i have seen entries along the line of kernel panic but i dont see kernel panic anywhere at the moment, although ive had the machine lock up 3-4 times today without these entries in the log
 

Moopere

New Member
Mar 19, 2023
26
3
3
Still sounds like the drives to me. What HDD's are you using? Weak bits will work sometimes and not others and will cause the drive to retry (100% utilization, 0MB/s transferred). Consumer drives will often try _really_ hard to recover weak or dead sectors ... enterprise drives are more likely to trigger a drive fail relatively quickly and give a RAID controller half a chance to drop it from the stack.
 

Brokenbitsagain

New Member
Mar 18, 2023
9
0
1
I did destroy all the arrays do full formats on Harddrives and rebuild if there was any bad sectors or problems surely there would be smart data problems or the full format wouldnt finish, you think there would be some sign that somethings not right, In saying that the array that used to lock up all the time first has changed it used to be the 25x4tb raid6 that used to be the first to lock up now its the 8x4tb raid 10 in one array i have sas drives seagate ST4000NM0023 8 in a raid 10 array, the other array is a mixture of mostly seagate ironwolfs but some of the drives are wd black one wd red a few other seagate drives, I think there was about 30 drives at one stage after taking out any with errors (a few wd blacks developed medium errors or had a high number of aborted commands after trying to rebuild the array multiple times) now there are 25 in a raid 6 another array has ST6000NM0034 16 in a raid 6 then a couple of ssd arrays that arent being use right now.

Is the only way to be 100% sure a disk is working correctly to pull it do a extended test using the manufactures software?
that could be a real pain for 49 disks
 

Brokenbitsagain

New Member
Mar 18, 2023
9
0
1
I removed the ssds aswell the 100% disk usage problem just presented itself on the last array still connected at the minute the 2 arrays doing consistancy checks as normal and the system hasnt locked up but this is the same behavour as when the problem presented itself 5 moths ago i originally though it was a faulty disk so i removed disks that i suspected it could be and the problem went away when id revoved enough disks for both the main arrays to be offline then once rebuilt it would be fine untill the system was restarted then logged into windows. To be clear i have 2 arrays that were constantly in use, both disk 1 raid 6 1 raid 10 also a ssd array that was connected and not used as well as 2 arrays that werent connected 1 disks and one ssds that id used for cold storage that id hadnt used till recently when id lost so many disks that i need it to store files on it i havent had the IO errors come up in the event vewer in a long time
 

Moopere

New Member
Mar 19, 2023
26
3
3
Is the only way to be 100% sure a disk is working correctly to pull it do a extended test using the manufactures software?
that could be a real pain for 49 disks
Even using manufacturers software isn't a guarantee imho of a 100% good disk. I've had weird fails even after extensive drive testing. Its a good start though. If the drives pass the manufacturers consumer testing software at least thats some sort of base to work from.

After that, if you've got a drive misbehaving because of heat or vibration or something then you've just got to weed it out by trial and error. If I get even a sniff of a drive problem in a RAID stack I'll label the drive so I can see patterns of behavior over time.

Just an FYI - with both SATA and SAS drives I don't recall many cases of SMART giving me a useful heads up before catastrophic drive fails. In fact as I sit here writing this I can't recall this ever happening. So don't rely too heavily on expectations related to SMART.

As for full formats and platter surface checking ... in the case of a drive struggling because of microfractures or heat related issues - you won't necessarily see these issues emerge at format/surface check. They might, but might not.
 

Brokenbitsagain

New Member
Mar 18, 2023
9
0
1
Ok so ive removed all disks that had medium errors and more than 10 aborted commands in the main storage raid array, remade the array with the remaining disks (24) and still had the same problem if i remove the array the computer works as normal so ive got 24 disks to find the problem within and a pile of disks that may not have any issues i made a virtual machine running truenas scale with 4 disks that i removed intially in the first purge of potentially bad disks and one of them was actually causing problems there to replaced that and its been running fine for a least a month, Im so glad im finally getting closer to finding the actual problem this has been going on for months now, i guess all i can do now is put a few disk in a array and see if the problem comes back narrow it down to the exact disk or disks
 

Brokenbitsagain

New Member
Mar 18, 2023
9
0
1
OK back on this again Ive tried a lot but have sort of come to a road block i ended up using the disks to build arrays of 4 disk until i found which ones didn't lock up the system and i ended up with 8 disks out of the lot that would work together with out locking up the system that wasn't enough storage so i brought bigger disks 4 20tb disks 5 of them installed them then built a raid 5 array and and soon as it was built i restarted the system it started happening al over again i thought there might be a faulty back plane or cable so i moved my 8 disk SCSI array to different slots and did a speed test with crystal disk in each of the slots no issue at all it seems my raid card only has problems when SATA disks are configured in a raid array if they are just stand alone disks or its a array of SCSI disk there is not issue anything that would explain this? Sort of related to this i had a different set of disks on a different controller (same model) on a different system do the same thing after having a issue that left the array in a state, i think the array was optimal but one of the disk had a triangle with a exclamation mark in it and it wouldn't rebuild the array i tried forcing the disk offline then online and initializing it but the array kept going back to the same state again. Maybe i didn't leave it long enough to realist exactly what was happening sometime it takes for ever to see on the web GUI what is actually going on but any way i took the disk out and put it into another computer and formatted it and fitted back in then this other computer did exactly the same as the computer im having all these problems with which i don't under stand. one failing disk shouldn't be a problem on a raid 6 array even with raid 5 you should still be able to rebuild it. anyway i destroyed the array and started from scratch and haven't had a problem since restoring the data from backup now currently 57TB in.