Intel VROC, 2023

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.
May 20, 2020
40
26
18
This was prompted by some discussion in the comments section of the Dell 760 server review - basically that there's not enough data available about VROC in a production setting (i.e. not home user).

I have some experience with VROC in a production setting. Windows Server 2019. Only 1 box (we're a small shop). Purchase was late 2021.

See here for the Intel landing page:

Intel® Virtual RAID on CPU (Intel® VROC) Enterprise RAID Solution

Note the significant limitations, including:

Only specific drive models supported, RAID 10 limited to 4 drives.

My goal was a RAID 5 setup, as that level of expected performance (about equal to 1 NVMe drive) should be adequate. RAID 5 is advertised up to 24 drives. Cool I thought, I can start with 4 drives and expand as needed. That's what I did, expanding ultimately to 5 and 6 drives after months of problem free operation on 4 drives.

The performance was about as expected and was sufficient for my application.

The unexpected part is that for RAID 5 arrays larger than 4 drives, it does not perform at least some of the parity calculations on the fly. This is undocumented (as far as I could find). To run these calculations, one needs to execute a function within the VROC driver GUI, “verify and repair”. The larger the array, the more “errors” are found and fixed with each run of this function. Talking hundreds to thousands on 5 and 6 drive arrays and this process takes ~12 hours to complete. This is after only writing a few TB to the array (1-2).

How to Verify Data and Repair Inconsistencies on Existing RAID Volume...

RAID Volume Data Verify and Repair

I was disturbed by this and assumed it was indicative of a malfunction. I backed up the data and rebuilt the array. I updated firmware, BIOS and drivers. I added and removed drives to try and isolate the problem. The problem persisted - parity erros (but no bad blocks) whenever I tested with 5 or more drives. Since this is my only such hardware and this was a production backup server this was unpleasantly stressful.

Finally, I opened a ticket and worked through a support process with Supermicro and Intel. Intel ultimately replied that it was “expected behavior”. These calculations were deferred for performance reasons. This was news to me and apparently to Supermicro.

This was unacceptable for me, so I ended up splitting the array into 2 separate 4 drive RAID 5 arrays, thus costing me some additional complexity and the cost of an additional drive lost to parity. Since there's no shrinking an ReFS volume this was a pain. All told, I probably wasted 60 hours on this given the quantity of data that was already on the array and the time to move it around.

This problem was hinted at in some old Intel VROC forum posts. See the last post in this thread: VROC - RAID5 - Parity Errors

If I had a do-over I would not have gone with VROC. Since we do not use *nix in our shop, I probably would have backed off to SAS or (edit: SATA) SSD Hardware RAID.

Additionally, as we saw here: https://www.servethehome.com/intel-...ng-vroc-around-sapphire-rapids-server-launch/ I strongly suspect VROC is not long for the world. If so, this implies it will no longer be developed and we can expect support to suffer.

VROC is now up to version 8.0, but good luck finding much about it on Intel’s website. They seem to have pushed primary support to the OEMs?

This appears to be the most useful landing page I have found: Resources for Intel® Virtual RAID on CPU (Intel® VROC)

Notice how no new drives were added in the move from 7.8 to 8.0 (and really, drives added since 7.5 are paltry):

https://www.intel.com/content/dam/s...ware/Intel_VROC_VMD_Supported_Configs_8_0.pdf

Apologies if this comes off as rambling, but it's a long story. I am not an expert, but happy to answer any questions I can.
 
Last edited:

nasbdh9

Active Member
Aug 4, 2019
166
96
28
Hardware RAID is still predominantly using SATA SSDs.

If RAID is not required, then a single NVME drive must be the best solution

If you must have hardware RAID, and such RAID5/6 on SATA SSD can't meet the performance only then you will consider hardware RAID using NVME (but trust me, the difference is not too big for most use cases).

And SATA/SAS drives can use SAS EXP to expand ports (EXP is already very inexpensive in 2023), while hardware RAID expansion ports require PCIe Switch or those X1/X2 port specific cages.

SAN is the same, in general only SATA becomes SAS (dual-link to support primary and secondary controllers) NVME requires drives to support X2/X2 mode ports.


if the system environment does not necessarily require hardware RAID, then the software solution must be preferred.
 
  • Like
Reactions: codgician

DevereauxA

New Member
Sep 20, 2019
4
2
3
Hardware RAID is still predominantly using SATA SSDs.

If RAID is not required, then a single NVME drive must be the best solution

If you must have hardware RAID, and such RAID5/6 on SATA SSD can't meet the performance only then you will consider hardware RAID using NVME (but trust me, the difference is not too big for most use cases).
A statement often expressed in the past- however it would seem that SSD manufacturers are moving past this quickly. The majority of their development focus appears to be on NVMe with SAS being a 3rd string after-thought. Some of the most cost effective per TB for 1DWPD or 3DWPD are in the NVMe space and certainly if you consider performance. Two Micron 9400 Pro SSDs in RAID1 would provide effective redundancy and stellar performance for Database / OLTP workloads.

Certainly, Dell has put a decent amount of PR into their PERC12 controllers, sponsoring performance papers, such as from Principled Technologies. I understand SAS (not even going to talk about SATA or IDE either for that matter) providing sufficient performance for many use cases, such as a basic file server, but that's not my experience in the database world, NVMe can make an absolutely world of difference in practical performance.

What I don't like seeing is vendors not wanting to upset the applecart and avoiding adopting these potentially disruptive technologies. You mention SAN vendors and they are a perfect example of trying to hold back technology for their own profits. Their artificial fragmentation of performance tiers and great reluctance to adopt SSDs at all or in any meaningful way, is reminiscent of what we are seeing here again.

The drive lock in is another black eye on some vendors. Again, I understand and support OEM drive premiums if warranted, if you look at the enterprise HDDs space, they would get away with justifying a 20-50% premium to cover their validation and warranty costs. But right now, in some cases Dell has a 4-6X multiplier on NVMe drives! Charging $12k for a $3k Kioxia CM6 or $8,960 for a $1,300 (at highest) Solidigm D7-P5520.

They have to see current enterprise NVMe drive pricing as disruptive to much of their overall stack. A single R6615 with an EPYC 9554P, 384-768GB ram and two Micron 9400 Max drives in RAID1 is a killer box that covers a massive amount of SMB and edge computing needs with very low power and space requirements (1U). That model built out with just two 15.36TB Kioxio drives and twelve 32GB DIMMS: they currently want $70k with a 3-year warranty. I'm historically a big proponent of Dell for business use, but this is ridiculously uncompetitive and essentially insulting.

So, that leads to me question of if I buy a R6615 with small NVMe drives and swap in a couple of Micron 9400 MAX/Pro drives. Will it accept it and allow RAID to function?
 
  • Like
Reactions: Negative Entropy
May 20, 2020
40
26
18
The reason I went NVMe on the Supermicro backup box was exactly what @DevereauxA mentioned - the price was about a wash with other SSD interface/form factors and the performance capability multiples better.

In my case I was unsure how much storage we needed so I needed to be able to expand it. At the time I think 15TB was the max drive size - too small (checks - yes, today there is 1 model at 15TB on the approved list). Today I could get by with multiple RAID 1 30TB setups (this is enough space per volume to get good dedupe from Veeam/ReFS). Anyway, the 7.68TB per drive ended up making sense with a RAID 5 approach.

Definitely considering the RAID 1 approach for our next general purpose VM server. Current one is 4 years old, Dell PERC H740 SATA SSD.
30TB RAID 1 would be enough. But not at those drive prices...Is MS software RAID 1 OK in production? MS software RAID is generally maligned for a non-clustered setup (no storage spaces direct), but that was always in the context of parity RAID.
 

DevereauxA

New Member
Sep 20, 2019
4
2
3
I did some quick testing of Windows software RAID with two SK Hynix P41 Platinums. this was on a Precision T5820 as it was readily available to me, which unfortunately is limited to PCIe gen3. Overall performance was lower than expected for these drives, but I'll retest on a PC with PCIe Gen4 when I have time.

Attached are 4 CrystalDiskMark runs:

  1. Single drive
  2. Mirrored drives with 64K format (using Disk Management)
  3. Mirrored drives with default format (using Disk Management)
  4. Mirrored drives with default format utilizing Storage Spaces with a 2 way mirror
Watching CPU usage, it maxes out 1 core (i7-9800x) when running the Random 4K with 32Q1T. I'll re-run on a newer PC and use the NVMe profile as well.
 

Attachments

  • Like
Reactions: Negative Entropy

nasbdh9

Active Member
Aug 4, 2019
166
96
28
So, that leads to me question of if I buy a R6615 with small NVMe drives and swap in a couple of Micron 9400 MAX/Pro drives. Will it accept it and allow RAID to function?
Need an NVME backplane, AMD doesn't have anything like VROC on EPYC (although VROC doesn't really count as HW RAID compared to Broadcom and Microsemi).

HW RAID would require an H965i/H755 or something like that, SW RAID for direct CPU access.

drives that are non-DELL OEM, the health won't even show up in iDRAC

But HPE iLO5 (or even iLO6? but I have not yet contacted Gen11 devices) can already have good support for non-HPE OEM drives (temperature, health values can be correctly displayed), Dell lags behind on this.
 

DevereauxA

New Member
Sep 20, 2019
4
2
3
Need an NVME backplane, AMD doesn't have anything like VROC on EPYC (although VROC doesn't really count as HW RAID compared to Broadcom and Microsemi).

HW RAID would require an H965i/H755 or something like that, SW RAID for direct CPU access.

drives that are non-DELL OEM, the health won't even show up in iDRAC

But HPE iLO5 (or even iLO6? but I have not yet contacted Gen11 devices) can already have good support for non-HPE OEM drives (temperature, health values can be correctly displayed), Dell lags behind on this.
So in your experience non-Dell NVMe drives don't show health, but it will let you create a RAID volume with them and "works"?

This is what Dell does provide as "drive requirements":

Dell PowerEdge RAID Controller 11 User’s Guide PERC H755 adapter, H755 front SAS, H755N front NVMe, H755 MX adapter, H750 adapter SAS, H355 adapter SAS, H355 front SAS, H350 adapter SAS, H350 Mini Mo | Dell UK

Conditions under which a PERC supports an NVMe drive
  • In NVMe devices the namespace identifier (NSID) with ID 1, which is (NSID=1) must be present.
  • In NVMe devices with multiple namespace(s), you can use the drive capacity of the namespace with NSID=1.
  • The namespace with NSID=1 must be formatted without protection information and cannot have the metadata enabled.
  • PERC supports 512-bytes or 4 KB sector disk drives for NVMe devices.
 

nasbdh9

Active Member
Aug 4, 2019
166
96
28
but it will let you create a RAID volume with them and "works"?
non-DELL OEM drives can indeed create RAID
But iDRAC may be unable to get the status of the drive, and then spit out an error message or up fan speeds increase noise
 

DevereauxA

New Member
Sep 20, 2019
4
2
3
non-DELL OEM drives can indeed create RAID
But iDRAC may be unable to get the status of the drive, and then spit out an error message or up fan speeds increase noise

Ah yes- the errors are the critical part and where this would fall apart for my purposes. Thank you!