RAID for Windows (That's Not Storage Spaces)?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gea

Well-Known Member
Dec 31, 2010
3,578
1,406
113
DE
1. you can extend napp-it menus or reports with your own menus or reports
2. you can display physical disk stats does not matter how you use them ex
Code:
Get-PhysicalDisk | Select-Object FriendlyName, SerialNumber, @{N='HealthStatus'; E={$_.HealthStatus}}, @{N='HealthStatusNumeric'; E={$_.CimInstanceProperties['HealthStatus'].Value}}, @{N='OperationalStatus'; E={$_.OperationalStatus -join ', '}}, @{N='OperationalStatusNumeric'; E={$_.CimInstanceProperties['OperationalStatus'].Value -join ', '}}
update
napp-it cs v 25.10.19 alerts/shows Windows Pool State in Job > Report

You can check installed version in About > Update
To update, just copy over newer csweb-gui folder
 
Last edited:

Micro

Member
Oct 20, 2019
44
12
8
What hardware are you running and how large an array are you contemplating?
Would VROC work with your hardware?
 

macdaddy2012

Member
Oct 10, 2025
32
0
6
What hardware are you running and how large an array are you contemplating?
Would VROC work with your hardware?
Intel raid? I'm currently on Ryzen. I would like to be able to remain intel/amd agnostic as I will likely end up transitioning this array to a different machine.
 

nabsltd

Well-Known Member
Jan 26, 2022
763
558
93
Sorry, not agreeing. ZFS is dog slow if your pool is north of 50% full. A hardware raid volume has the same performance throughout.
I have found that you can almost "throw together" a RAID5 or RAID6 system using a LSI 9361 and just about any newer 3.5" hard disk and get 400MB/sec sequential reads and writes. If you jump to more spindles, 600MB/sec is easy. This is with absolutely no tuning, no SSD cache, no RAM cache other than what Windows does, etc.

Add an SSD cache that the RAID controller manages, or a delayed write cache stored in RAM, and you can see 30-second bursts of 2-3GB/sec.

All for about $65 (including cache card and supercap "battery"). You will more than make up that money in time not spent tuning ZFS.
 
  • Like
Reactions: kapone

nabsltd

Well-Known Member
Jan 26, 2022
763
558
93
1) I presume you mean Battery Backup?
No modern hardware RAID card uses an actual battery...they use a very large capacitor (supercap) that will last far longer than a rechargeable battery.

I'm looking that up but not seeing what you're talking about. I've heard of deploying a SLOG to speed up writes which I might do.
SLOG does not change the speed of async writes at all...they are all cached in RAM.

SLOG does not significantly speed up sync writes if you are trying to write much faster than the raw speed of the pool. You get a short burst of high speed, but it is still limited (by default) by the "must write everything within 5 seconds of it being requested" logic. This is why you see recommendations of 50GB of SLOG regardless of the size of your pool.
 
Last edited:
  • Like
Reactions: kapone

kapone

Well-Known Member
May 23, 2015
1,799
1,189
113
I have found that you can almost "throw together" a RAID5 or RAID6 system using a LSI 9361 and just about any newer 3.5" hard disk and get 400MB/sec sequential reads and writes. If you jump to more spindles, 600MB/sec is easy. This is with absolutely no tuning, no SSD cache, no RAM cache other than what Windows does, etc.

Add an SSD cache that the RAID controller manages, or a delayed write cache stored in RAM, and you can see 30-second bursts of 2-3GB/sec.

All for about $65 (including cache card and supercap "battery"). You will more than make up that money in time not spent tuning ZFS.
I use a RAID60 setup in each of my SAN nodes (12 wide RAID6 x2 in each 24 port expander) and 8x SSDs in RAID10 for maxCache r/w cache. 3GB/s is easy, it'll saturate the pcie3.0x8 bus.

Edit: BUT...I'm (right now) playing with ZFS on top of these arrays, because I need compression. I don't care much about other features in ZFS (although snapshots and send/receive is really handy). My datasets are growing quite rapidly (almost 1TB per week) and they compress real good with ZFS (almost a factor of 5x). More testing needed...
 
Last edited:

nabsltd

Well-Known Member
Jan 26, 2022
763
558
93
I use a RAID60 setup in each of my SAN nodes (12 wide RAID6 x2 in each 24 port expander) and 8x SSDs in RAID10 for maxCache r/w cache. 3GB/s is easy, it'll saturate the pcie3.0x8 bus.
That is a lot more spindles than I use, and a lot more SSD than I use.

Do you use the 8x SSDs for total cache volume? Assuming the "hot" fits in a single RAID10 pair of SSDs, how large of performance drop would you expect?
 

kapone

Well-Known Member
May 23, 2015
1,799
1,189
113
Do you use the 8x SSDs for total cache volume?
Yup. Gives me a superfast ~4TB SSD cache for the entire array and is completely transparent to any filesystem on top of this. I can't use NVMEs for this because the SSDs have to be on the Adaptec card, otherwise I'd have used NVME.

Assuming the "hot" fits in a single RAID10 pair of SSDs, how large of performance drop would you expect?
So, there's no good answer to this. The primary motivation for the SSD cache is fast writes (I'm ingesting the entire NBBO stock quotes feed from all US exchanges). The way the application is architected, the quotes need to be persisted as such (and quickly), but they're not read by the worker nodes in real time (the proxy multicasts the quotes both to the worker nodes and the DB write queue simeltaneously).

The worker nodes do however do heavy (and I mean heavy) processing overnight/after markets close. That's when the database cluster (which sits on top of these arrays) is read from.

So, there's really not much "hot" data so to speak.
 

gea

Well-Known Member
Dec 31, 2010
3,578
1,406
113
DE
I use a RAID60 setup in each of my SAN nodes (12 wide RAID6 x2 in each 24 port expander) and 8x SSDs in RAID10 for maxCache r/w cache. 3GB/s is easy, it'll saturate the pcie3.0x8 bus.

Edit: BUT...I'm (right now) playing with ZFS on top of these arrays, because I need compression. I don't care much about other features in ZFS (although snapshots and send/receive is really handy). My datasets are growing quite rapidly (almost 1TB per week) and they compress real good with ZFS (almost a factor of 5x). More testing needed...
ZFS compress is nice and even realtime dedup can now be an option with the new fast dedup as long as your data allows higher dedup rates. But the real killerfeatures of ZFS are Copy on Write and checksums on all datablocks and metadata on a per single disk base. Copy on Write means that you never modify datablocks of data already on disk but write every modified datablock new. On success the new datablock is valid and the former block can be kept by snaps - otherwise the former block remains active. On a crash during write there is no "half finished" write. Atomic writes like write datablock and update metadata or write a raidstripe sequentially over several disks are done completely or discarded. Say goodby to offline fschk or chkdsk. No need for these tools as ZFS remains intact. You need a real disaster to damage ZFS (software bug or bad hardware)

This is a very important detail. If you pull the ac plug during writes, there is a ultra low chance of corrupted data, a corrupted ZFS filesystem or a damaged ZFS raid. If you do the same with a Raid 5/6 even with ZFS on top, there is a quite high chance of corrupted data or a corrupted raid. A hardware raid with BBU (supercap/flash) protection can reduce the risk but not avoid in a way ZFS can do.

Even if something happens ex a disk error in a degraded Raid-5 the effect is different. With ZFS checksums on a per disk/datablock base, ZFS knows if a datablock is good or not so you will usually see a bad file in such a case while a readerror due a bad sector in a degraded Raid 5 mostly means array lost.

A newer ZFS killer feature is a hybrid pool with a special vdev in a disk pool. This allows to force smaller files ex <128K, all datablocks of a certain filesystem, metadata, fast dedup data and now slog data to a faast nvme mirror with other data on cheaper disks.
 
Last edited:

Micro

Member
Oct 20, 2019
44
12
8
I have found that you can almost "throw together" a RAID5 or RAID6 system using a LSI 9361 and just about any newer 3.5" hard disk and get 400MB/sec sequential reads and writes. If you jump to more spindles, 600MB/sec is easy. This is with absolutely no tuning, no SSD cache, no RAM cache other than what Windows does, etc.

Add an SSD cache that the RAID controller manages, or a delayed write cache stored in RAM, and you can see 30-second bursts of 2-3GB/sec.

All for about $65 (including cache card and supercap "battery"). You will more than make up that money in time not spent tuning ZFS.
Just curious, but, what are you using to measure those read/writes, as my numbers differ.
 

kapone

Well-Known Member
May 23, 2015
1,799
1,189
113
@gea - I'm quite familiar with what ZFS is and what/how it does what it does.

The challenge as always is evaluating a technology to see if it'll solve the problem at hand...or not. My biggest issue with ZFS is...performance. With the amount and nature of data I'm dealing with, filesystem performance is absolutely at the top.

The "best practices" that get thrown around in relation to ZFS, simply don't work in this case. I can't give ZFS direct access to drives because:

1. It's too slow that way.
2. That takes away all the management and ops aspect of running a large storage infra. The production infra lives in an Equinix DC and I need to be able to tell a remote hands person "Go replace the disk in that bay where the red light is lit". No fumbling around, no running OS commands to find out which disk should be replaced etc. I can't do this with ZFS with native HDD access.
3. No, moving to all flash is not the answer either. There's just way too much data and I maintain replication in the DC itself (two copies), with a full replication to a DR site (my home) and a cold backup in Iron Mountain. This is simply not doable within a reasonable cost envelope with all flash.

On a crash during write there is no "half finished" write. Atomic writes like write datablock and update metadata or write a raidstripe sequentially over several disks are done completely or discarded. Say goodby to offline fschk or chkdsk. No need for these tools as ZFS remains intact. You need a real disaster to damage ZFS (software bug or bad hardware)
This is both a feature and a problem. The way ZFS works with write coalescing (in RAM) and an optional ZIL, still leaves me with potential data loss in case of a power crash. Whatever device ZFS can be given for a ZIL can be no better protected than the DRAM cache on a raid controller (with Supercap power loss protection). And a raid card does similar things, where if it was unable to write out the full stripe from the DRAM cache to the underlying disks (due to a power event), the array will come up dirty and will need to be rebuilt. But the full stripe I (to be written) is alive and well in the DRAM cache.

To me, this is a lower probability of data loss compared to ZFS with native disks.
This is a very important detail. If you pull the ac plug during writes, there is a ultra low chance of corrupted data, a corrupted ZFS filesystem or a damaged ZFS raid. If you do the same with a Raid 5/6 even with ZFS on top, there is a quite high chance of corrupted data or a corrupted raid. A hardware raid with BBU (supercap/flash) protection can reduce the risk but not avoid in a way ZFS can do.
I'm not entirely convinced. Any data inflight before a file system gets it, is certainly at risk in case of a power event. But, the only advantage with copy-on-write is that the original blocks remain unchanged in such a case, hence no (potential corruption). The problem of course is that there's data that did need to get written and I can't lose it. (Well, I can per se...but fixing that hole in the dataset is ...painful).

So, the choice is what design can prevent that loss of data (to the extent that it's humanly possible), still maintain data integrity, AND maintain performance.

I do agree, there's lots of good things about ZFS, which is precisely why I'm working on this updated design to include ZFS, that will hopefully last for a few years before anything needs to change.
 
Last edited:
  • Like
Reactions: name stolen

gea

Well-Known Member
Dec 31, 2010
3,578
1,406
113
DE
In many ways a hardwareraid with cache protection can do similar things like ZFS with sync enabled (you must enable sync to protect content of writecache) on data block level. It cannot fully offer the same as a transactional system like ZFS can do on on filesystem and raid level. Due the lack of end to end checksums it also cannot guarantee data validity as ZFS can do.

But you are right regarding performance. ZFS with sync enabled must write every data twice what reduces performance. ZFS must also write more data due the checksums. With mechanical disks you also see a lower performance due a higher fragmentation. This is the price for write cache protection in software whereas a hardwareraid can protect cache in hardware.
 

kapone

Well-Known Member
May 23, 2015
1,799
1,189
113
ZFS with sync enabled
I'm leaning in that direction as well. (Need to run more testing).

Sync enabled (Always), No ZIL, and let ZFS write to the RAID controller directly. The DRAM cache in the controller will offer crash protection in case of a power event.

This way, there's no additional overhead of a ZIL and ZFS is only writing (twice) to RAM and the DRAM cache on the controller. The underlying hardware raid arrays should solve the performance issues vs giving individual disk access to ZFS.

So, essentially, the options are:

1. xx RAID6 hardware raid arrays in the controller, and let ZFS simply stripe over them - But this has a problem, where ZFS cannot do any self-healing as it does not have another copy of the data.

2. Two RAID60 hardware raid arrays in the controller, and let ZFS mirror over them. This way, ZFS is happy that it has another copy of the data and can self heal, performance is not a bottleneck, and...my storage size goes up, due to mirroring. That's acceptable, and I think that's where the benefits of ZFS compression negate the additional storage overhead of mirroring.

Thoughts?
 

gea

Well-Known Member
Dec 31, 2010
3,578
1,406
113
DE
- The dram controller cache and the ZFS writecache have a different write behaviour, size and commit behaviours (no CoW) so better than nothing but far away from perfect.
- Sync enabled always logs all committed writes to ZIL or Slog. You can use sync=disabled and hope the best for your dram cache
- Mirror does not increase size, only offers redundancy with same writeperformance than a single "disk" and twice the readperformance.

Maybe you can use ZFS software raid with sync disabled and a UPS to be at least protected against power outages. If the system crashes, ZFS remains intact with a few seconds of last writes lost that are otherwise protected by an Slog (or a special vdev nvme mirror in newest OpenZFS)
 

kapone

Well-Known Member
May 23, 2015
1,799
1,189
113
The dram controller cache and the ZFS writecache have a different write behaviour, size and commit behaviours (no CoW) so better than nothing but far away from perfect.
I understand that.

Sync enabled always logs all committed writes to ZIL or Slog.
What happens if you enable sync with no ZIL or SLOG? It'll try to write to the disk(s) and RAM concurrently, right?

You can use sync=disabled and hope the best for your dram cache
sync disabled doesn't work for this use case. I can't have ZFS holding data in RAM only.

Mirror does not increase size, only offers redundancy with same writeperformance than a single "disk" and twice the readperformance.
I may not have phrased my statement correctly. Mirroring implies I need more raw storage to store the same amount of data, right? Yes, it's redundant now, but needs twice the amount of space.

Maybe you can use ZFS software raid with sync disabled and a UPS to be at least protected against power outages.
The infrastructure is in a data center with redundant power already. But that doesn't mean that a DC can't die, so you plan for a blast radius around the DC. I have a second line of thought going as well, to get a second DC presence > 50 miles away from the first one and use that for DR, including synchronous replication between the two. The costs of course go up due to the need for high speed network links between the two DCs (most likely 100g).

If the system crashes, ZFS remains intact with a few seconds of last writes lost that are otherwise protected by an Slog (or a special vdev nvme mirror in newest OpenZFS)
That's what I was saying. Why use a SLOG if the controller cache (power protected) is much faster than any other device? And that device will need to be power protected anyway.

What am I missing?

Edit: What I meant to say "What happens if you enable sync with no ZIL or SLOG?" was - no SLOG. Sorry.
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
3,578
1,406
113
DE
What happens if you enable sync with no ZIL or SLOG? It'll try to write to the disk(s) and RAM concurrently, right?
ZIL is a special faster part of a ZFS pool and always there. If you enable sync, ZFS logs committed writes to this ZIL area. You can only use a faster Slog instead for logging. If you do not want this logging, sync=disbled is the setting.

What you can do is using a fast Slog, ex a dram based one (with power protection) or buy an Intel Optane (has plp) if you can get one.
 

kapone

Well-Known Member
May 23, 2015
1,799
1,189
113
ZIL is a special faster part of a ZFS pool and always there. If you enable sync, ZFS logs committed writes to this ZIL area. You can only use a faster Slog instead for logging. If you do not want this logging, sync=disbled is the setting.
I know that. I edited my post above to say "No SLOG", sorry, was writing too fast. :)

So, if there's no SLOG, and sync=always, and the pool is all on the raid controller(s), it'll try to write there concurrently with RAM. Correct?