Greetings all,
Looking to put mdadm for the first time into production and am looking for some best practices. I have used it in the past for non-production work, but a production install absolutely requires rock-solid uptime, reliability, and performance.
Background:
I have 4x NFS servers in production - each with 6x 2TB Samsung SSD drives connected to LSI 9300 hardware RAID cards. These servers have been running flawlessly for the past 3-4 years. They are fast and easy to manage (easy to replace a failed drive, etc). The new servers will hold 3x 6.2TB NVMe drives, but unfortunately, they don't have an available PCIe slot for a RAID card. Thus, I am looking at software RAID this time.
As a test, I installed FreeNAS on the server and setup a RAID-Z array with auto-tune enabled. Unfortunately, no amount of tuning provided any decent performance (ashift=12, disabling compression/sync-writes/atime, tuning min/max active reads, etc) . The disk reads were capped around 650MB/sec and disk writes capped around 1.2GB/sec. I spent an entire day pouring over the zfs tuning options to no avail.
Purpose:
The purpose of the servers is to host ESXi vmdk files. Thus, XFS makes a great filesystem option in this case. My idea is to use mdadm RAID-5 and put an XFS filesytem on top. Export the volume via NFS and call it a day. The servers will have a dedicated (non-RAID) boot drive via 64GB SATA-DOM.
The issue is day-2 (and beyond) maintenance. Since I have not done mdadm in production, I don't know how easy/hard it is for someone to walk into the data center and replace a failing drive. Or, what happens if the server panics and reboots w/out attaching the RAID volume. Or, how reliable, in general, is mdadm.
I am wondering if anyone could share their best practices for a real production setup with mdadm. For example, how often do you scrub the drives, what tuning parameters did you use, etc.
Thanks for any feedback.
-Ron
Looking to put mdadm for the first time into production and am looking for some best practices. I have used it in the past for non-production work, but a production install absolutely requires rock-solid uptime, reliability, and performance.
Background:
I have 4x NFS servers in production - each with 6x 2TB Samsung SSD drives connected to LSI 9300 hardware RAID cards. These servers have been running flawlessly for the past 3-4 years. They are fast and easy to manage (easy to replace a failed drive, etc). The new servers will hold 3x 6.2TB NVMe drives, but unfortunately, they don't have an available PCIe slot for a RAID card. Thus, I am looking at software RAID this time.
As a test, I installed FreeNAS on the server and setup a RAID-Z array with auto-tune enabled. Unfortunately, no amount of tuning provided any decent performance (ashift=12, disabling compression/sync-writes/atime, tuning min/max active reads, etc) . The disk reads were capped around 650MB/sec and disk writes capped around 1.2GB/sec. I spent an entire day pouring over the zfs tuning options to no avail.
Purpose:
The purpose of the servers is to host ESXi vmdk files. Thus, XFS makes a great filesystem option in this case. My idea is to use mdadm RAID-5 and put an XFS filesytem on top. Export the volume via NFS and call it a day. The servers will have a dedicated (non-RAID) boot drive via 64GB SATA-DOM.
The issue is day-2 (and beyond) maintenance. Since I have not done mdadm in production, I don't know how easy/hard it is for someone to walk into the data center and replace a failing drive. Or, what happens if the server panics and reboots w/out attaching the RAID volume. Or, how reliable, in general, is mdadm.
I am wondering if anyone could share their best practices for a real production setup with mdadm. For example, how often do you scrub the drives, what tuning parameters did you use, etc.
Thanks for any feedback.
-Ron