Truenas Scale TrueNAS-SCALE-22.02.3 read write errors

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Curggles

New Member
May 4, 2020
16
1
3
Hi just wanted to get some insight if possible. I'm running truenas scale virtualized in proxmox and have been since around april I'd say maybe a little longer. I'm using a supermicro x9srh-7f motherboard that has onboard lsi 2308 sas controller connected to a BPN-sas846-el1 backplane with ibm ssg h0vh600 sas 2.5 inch 600gb drives, 6 drives in a pool 1 dev zfs raidZ2. data on the server is not mission critical so if my methods may have resulted in bad data it's not a big deal and I have 3 other copies of all the data on different drives on a pc and externals.
Anyways about a week ago truenas emailed me my pool was degraded and i found 2 drives faulted and the rest degraded. when I checked status I had multiple read write errors but no check sum errors. i swapped out the faulted drives to some spare drives same make and size and rebuilt them and did scrub and smart tests and both came back fine. in less than 24 hours the same thing happened 2 faulted drives that were different and 4 degraded. this time I did smart and scrub again with zero problems. I didn't even see any listed files in the scrub that had errors. after all tests were scrubbed i cleared zfs again so the pool went back to online and ok status then 3 hours later the same thing happened. at this point 2 different drives showed faulted so I was really at the point that I have a controller, cable, or backplane issue.
I didn't have a different reverse breakout cable to use so i installed my lsi 9207-8i with the same sas 2308 chipset and passed it through from proxmox and was using mini sas to mini sas cable well it has run error free for 6 days now so i'd say backplane is fine down to the onboard lsi controller or the reverse breakout cable. I just recieved a new revers breakout cable from startech, I have no clue what the other cables brand was. I have no real idea of any particular brand a guy might want to use but new and different is a good start.
Currently I installed the new reverse breakout cable and am waiting to see if I have any issues. I am more or less looking for some thoughts to see if there is anything else I can check, also wanted on an opinion on using this motherboard if I determine the onboard lsi hba is the issue would you trust that it won't cause any other issues even if it wasn't in use. Also yes I know doing the vritualized thing is not the most ideal situation, but I haven't had any issues for months and I hadn't done any updates the day of the original problem to either proxmox or truenas or any of the apps I have in truenas.
Sorry for the long winded post and thanks for any insight anyone has. This is just my hobby in down time I am not in the IT sector but rather enjoy trying to run different servers and troubleshoot things.
 

Curggles

New Member
May 4, 2020
16
1
3
Yes it's in IT mode. So was the PCIe hba I installed.
It's been closer to 12 hours now and I haven't had the issue come back using the new cable onthe onboard lsi HBA. So its really looking like I had a cable failure at this point. As with the old cable it wasn't taking this long before it would start having massive read write errors. I find it kind of strange but guess not impossible.
 

Curggles

New Member
May 4, 2020
16
1
3
As I stated as well the unit had been running for 6-8 months before the issue started. So as much as I did start checking everything from PCIe passthrough setup in proxnox right through to firmware I was pretty confident it was on the hardware side.
 

Railgun

Active Member
Jul 28, 2018
148
56
28
Out of curiosity, during writes, do you see similar messages on the console or in the log?


2022 Oct 4 17:53:12 truenas BUG: Bad page state in process smbd pfn:35d65b
2022 Oct 4 17:54:23 truenas BUG: Bad page state in process ksoftirqd/2 pfn:9e6fb3
2022 Oct 4 17:54:38 truenas BUG: Bad page state in process swapper/2 pfn:16125a6
2022 Oct 4 17:54:40 truenas BUG: Bad page state in process spl_kmem_cache pfn:13bd9e3
2022 Oct 4 17:54:43 truenas BUG: Bad page state in process smbd pfn:bdaee6
2022 Oct 4 17:54:48 truenas BUG: Bad page state in process smbd pfn:19884ac
2022 Oct 4 17:54:48 truenas BUG: Bad page state in process smbd pfn:ebbdb2
2022 Oct 4 17:54:48 truenas BUG: Bad page state in process smbd pfn:17b048
2022 Oct 4 17:54:52 truenas BUG: Bad page state in process smbd pfn:155ca7e
2022 Oct 4 17:54:56 truenas BUG: Bad page state in process smbd pfn:15d5ca4
2022 Oct 4 17:55:00 truenas BUG: Bad page state in process smbd pfn:19337fc
2022 Oct 4 17:55:26 truenas BUG: Bad page state in process smbd pfn:210117
2022 Oct 4 17:55:27 truenas BUG: Bad page state in process smbd pfn:1a43198
2022 Oct 4 17:55:42 truenas BUG: Bad page state in process smbd pfn:cd0452
2022 Oct 4 17:55:42 truenas BUG: Bad page state in process swapper/1 pfn:170d917
2022 Oct 4 17:55:54 truenas BUG: Bad page state in process smbd pfn:284eda
2022 Oct 4 17:56:10 truenas BUG: Bad page state in process smbd pfn:12c09c8
2022 Oct 4 17:56:14 truenas BUG: Bad page state in process smbd pfn:b34976
2022 Oct 4 17:56:15 truenas BUG: Bad page state in process smbd pfn:2a783a
2022 Oct 4 17:56:15 truenas BUG: Bad page state in process smbd pfn:cbb60a
2022 Oct 4 17:56:16 truenas BUG: Bad page state in process smbd pfn:1c6043f
2022 Oct 4 17:56:36 truenas BUG: Bad page state in process smbd pfn:a8b0ab
2022 Oct 4 17:56:38 truenas BUG: Bad page state in process smbd pfn:145275a
 

Curggles

New Member
May 4, 2020
16
1
3
Out of curiosity, during writes, do you see similar messages on the console or in the log?


2022 Oct 4 17:53:12 truenas BUG: Bad page state in process smbd pfn:35d65b
2022 Oct 4 17:54:23 truenas BUG: Bad page state in process ksoftirqd/2 pfn:9e6fb3
2022 Oct 4 17:54:38 truenas BUG: Bad page state in process swapper/2 pfn:16125a6
2022 Oct 4 17:54:40 truenas BUG: Bad page state in process spl_kmem_cache pfn:13bd9e3
2022 Oct 4 17:54:43 truenas BUG: Bad page state in process smbd pfn:bdaee6
2022 Oct 4 17:54:48 truenas BUG: Bad page state in process smbd pfn:19884ac
2022 Oct 4 17:54:48 truenas BUG: Bad page state in process smbd pfn:ebbdb2
2022 Oct 4 17:54:48 truenas BUG: Bad page state in process smbd pfn:17b048
2022 Oct 4 17:54:52 truenas BUG: Bad page state in process smbd pfn:155ca7e
2022 Oct 4 17:54:56 truenas BUG: Bad page state in process smbd pfn:15d5ca4
2022 Oct 4 17:55:00 truenas BUG: Bad page state in process smbd pfn:19337fc
2022 Oct 4 17:55:26 truenas BUG: Bad page state in process smbd pfn:210117
2022 Oct 4 17:55:27 truenas BUG: Bad page state in process smbd pfn:1a43198
2022 Oct 4 17:55:42 truenas BUG: Bad page state in process smbd pfn:cd0452
2022 Oct 4 17:55:42 truenas BUG: Bad page state in process swapper/1 pfn:170d917
2022 Oct 4 17:55:54 truenas BUG: Bad page state in process smbd pfn:284eda
2022 Oct 4 17:56:10 truenas BUG: Bad page state in process smbd pfn:12c09c8
2022 Oct 4 17:56:14 truenas BUG: Bad page state in process smbd pfn:b34976
2022 Oct 4 17:56:15 truenas BUG: Bad page state in process smbd pfn:2a783a
2022 Oct 4 17:56:15 truenas BUG: Bad page state in process smbd pfn:cbb60a
2022 Oct 4 17:56:16 truenas BUG: Bad page state in process smbd pfn:1c6043f
2022 Oct 4 17:56:36 truenas BUG: Bad page state in process smbd pfn:a8b0ab
2022 Oct 4 17:56:38 truenas BUG: Bad page state in process smbd pfn:145275a
Out of curiosity, during writes, do you see similar messages on the console or in the log?


2022 Oct 4 17:53:12 truenas BUG: Bad page state in process smbd pfn:35d65b
2022 Oct 4 17:54:23 truenas BUG: Bad page state in process ksoftirqd/2 pfn:9e6fb3
2022 Oct 4 17:54:38 truenas BUG: Bad page state in process swapper/2 pfn:16125a6
2022 Oct 4 17:54:40 truenas BUG: Bad page state in process spl_kmem_cache pfn:13bd9e3
2022 Oct 4 17:54:43 truenas BUG: Bad page state in process smbd pfn:bdaee6
2022 Oct 4 17:54:48 truenas BUG: Bad page state in process smbd pfn:19884ac
2022 Oct 4 17:54:48 truenas BUG: Bad page state in process smbd pfn:ebbdb2
2022 Oct 4 17:54:48 truenas BUG: Bad page state in process smbd pfn:17b048
2022 Oct 4 17:54:52 truenas BUG: Bad page state in process smbd pfn:155ca7e
2022 Oct 4 17:54:56 truenas BUG: Bad page state in process smbd pfn:15d5ca4
2022 Oct 4 17:55:00 truenas BUG: Bad page state in process smbd pfn:19337fc
2022 Oct 4 17:55:26 truenas BUG: Bad page state in process smbd pfn:210117
2022 Oct 4 17:55:27 truenas BUG: Bad page state in process smbd pfn:1a43198
2022 Oct 4 17:55:42 truenas BUG: Bad page state in process smbd pfn:cd0452
2022 Oct 4 17:55:42 truenas BUG: Bad page state in process swapper/1 pfn:170d917
2022 Oct 4 17:55:54 truenas BUG: Bad page state in process smbd pfn:284eda
2022 Oct 4 17:56:10 truenas BUG: Bad page state in process smbd pfn:12c09c8
2022 Oct 4 17:56:14 truenas BUG: Bad page state in process smbd pfn:b34976
2022 Oct 4 17:56:15 truenas BUG: Bad page state in process smbd pfn:2a783a
2022 Oct 4 17:56:15 truenas BUG: Bad page state in process smbd pfn:cbb60a
2022 Oct 4 17:56:16 truenas BUG: Bad page state in process smbd pfn:1c6043f
2022 Oct 4 17:56:36 truenas BUG: Bad page state in process smbd pfn:a8b0ab
2022 Oct 4 17:56:38 truenas BUG: Bad page state in process smbd pfn:145275a
I don't recall seeing anything in any logs like that.
 

Curggles

New Member
May 4, 2020
16
1
3
Still holding strong 5 days now. I would say I have my issue solved being the cable. It only took hours to replicate the failure before I did anything else.