SSD NAS - Broadwell/Haswell vs Rome - Sanity check?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

bryan_v

Active Member
Nov 5, 2021
135
64
28
Toronto, Ontario
www.linkedin.com
Hey everyone,

Need a quick sanity check, or maybe a math check:

Which would be more cost effective platform for a SOHO SSD NAS/SAN attached to a 10G/40G switch via a Mellanox CX3: a used Broadwell platform with PCIe Gen3 drives (e.g. Intel P4510 or 905p, or Samsung 970 Evo Plus); or a used Rome platform with Gen3 or Gen4 drives (e.g. Intel P5510 or Samsung 980 Pro)?

I'd probably have 2-8 SSDs depending on price and size, but with the aim to have the lowest latency and max out bandwidth at the lowest cost.

By my math it makes sense to go with Broadwell because:
  1. Unless I'm willing to bump up to 100G, a 2x P4510 drives will saturate a 40G link on read when in RAID 0 (or 4x when in RAID 10). Any new consumer PCIe Gen3 will be similar.
  2. A single Gen4 drive can saturate a 40G link.
  3. Rome is still prime tech because no Gen5 server platform is out yet; Rome has more PCIe Gen4 lanes than a cost comparable Intel platform; and Milan doesn't really offer a big value prop outside of compute. However, with Sapphire Rapids launching this year (2022), and Genoa early next year (2023), both with PCIe Gen5/DDR5, Rome servers will start to move to retirement in significant quantities starting late 2023, early 2024, and hence they should come down significantly in price in 20-24 months.
  4. Skylake and Cascade lake, just like Milan, only adds value in compute, and not in storage, namely all are PCIe G3, albiet only 2x 16 if 1 CPU, 3x x16 if using 2 CPUs.
  5. Ice Lake has PCIe Gen4, however cost wise, a storage focused Rome will be cheaper and more available second-hand (e.g 8c or 12c).
  6. Haswell and Broadwell DDR4 platforms are effectively the same cost second hand depending on the day (~$600 US)
  7. Most VMs either run on my network appliance, a M720q 8c/16t i5-8700 with a CX3, or on a Raspberry Pi node for network segmentation reasons (i.e. IOT, management network controllers, etc). Though do need a VM host for surveillance video analytics.
Is Broadwell the correct platform? Or does Rome or Skylake make more sense? Or does using a Ryzen 5000/Alder Lake make more sense?

The only reason I said 10G/40G, is that I'm not sure if I'm going the 40G QSFP+ MSX6036 route and can successfully swap out the fans, or will have to settle for a cheap brocade or TP-Link SFP+ that runs quiet. I'm running Cat6 and OM3 through new conduits in the house so both will be fine.

Cheers,
Bryan
 

Spartus

Active Member
Mar 28, 2012
323
121
43
Toronto, Canada
"Which would be more cost effective platform"

probably like an x299 or x99 board honestly. So yeah haswell through ice-lake. Strictly speaking you could use X79 too. They are all Gen 3 with enough pcie lanes. Not sure I'd recommend buying x79 + DDR3 now though for the meager savings.


Ryzen x570 could easily do it too. Won't have lots of room for SSD expansion, be sure to get a board with pcie bifurcation options.
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,640
2,058
113
Most cost effective and only 2-8 drives... and lowest latency... that seems like easy choice ... put a NVME drive in the each host and be done then :D
 

bryan_v

Active Member
Nov 5, 2021
135
64
28
Toronto, Ontario
www.linkedin.com
"Which would be more cost effective platform"

probably like an x299 or x99 board honestly. So yeah haswell through ice-lake. Strictly speaking you could use X79 too. They are all Gen 3 with enough pcie lanes. Not sure I'd recommend buying x79 + DDR3 now though for the meager savings.
Yea I thought compromising down to DDR3 wouldn't offer that many savings, but would seriously handicap I/O performance, when exposing the volumes via RoCE or NFS. Good to know that math makes sense.

I do like the idea of bumping up to something like Skylake, as on paper there are significant I/O improvements that might have a measurable impact on any software raid or shared file server; namely 4- vs 6-channel DDR4 and PCIe IO blocks move from x8 to x16 for delivering up to 50+% higher aggregate I/O bandwidth. However, when the FD.io group tested Broadwell v. Skylake in a network context, the packet routing numbers under DPDK/VPP showed minimal improvement except when doing an VPP L2 cross-connect and routing. FD.io seemed to indicate that because VPP was doing more packet processing, it was benefiting from IPC uplift.
From: Benchmarking Software Data Planes, Intel® Xeon® Skylake vs. Broadwell. Cisco and Intel. https://www.lfnetworking.org/wp-content/uploads/sites/55/2019/03/benchmarking_sw_data_planes_skx_bdx_mar07_2019.pdf

It's almost as though I should grab a Broadwell, Skylake, and Rome platform and run regression tests for a storage context. Outside of that FD.io paper I haven't found any substantial research on the subject.

Also if anyone has any experience in layering DPDK or RoCE on top of SPDK, or knows of any research along those lines, can you shoot me a note?
The conclusion I drew from the paper was that if you could achieve zero-copy, lock-less between the network and storage subsystems, then the IPC improvements would be negligible.

Oh also:
Ryzen x570 could easily do it too. Won't have lots of room for SSD expansion, be sure to get a board with pcie bifurcation options.
Oh yea, totally agree. I've spent two weeks combing through both Z690, and X570 offerings, and they all require high-end configs, and you always get capped out at 4 drives. To make the I/O work you would have to use the following config:
  • Support the GPU x16 CPU lane to be broken out into x8/x8 across two slots (i.e. Crossfire arrangement)
  • One x8 has to be the CX3. Since it's Gen3, it need x8 or it'll get bottle-necked. So Gen4/Gen5 is no help here.
  • PCIe switch on the other crossfire slot to break out the x8 to x4/x4 so support 2 drives. So Gen4/Gen5 is also no help here.
  • A M.2 drive, or a M.2 to U.2 adapter for the M.2 that connects to the CPU
  • One M.2 to connect to the X570 chipset. Connecting any more doesn't work if you use mirroring, as the chipset link is saturated at that point. Z690 can support two M.2 drives since the DMI is x8 equivilant.
No matter which way I cut it, a high-end config like that is very similar to a Skylake/Cascade Lake setup, with far less PCIe lanes, and within striking distance of a modest Rome setup, both which would also get you ECC (which helps alot in a zero-copy world).
 

Spartus

Active Member
Mar 28, 2012
323
121
43
Toronto, Canada
Actually, just FYI, I do multi GPU virtualization on my X570, and I can't mix my gen4 and gen 3 GPU unless I downgrade both to Gen 3. I can't even run the slot 3 GPU at gen 4 if I have a gen 3 in slot 1/2
 

Spartus

Active Member
Mar 28, 2012
323
121
43
Toronto, Canada
What im trying to say is, the CX4 card might bring your whole system down to PCIe 3 speeds, or at least major parts of it (maybe not the NVMe 1 & 2 slots)
 

bryan_v

Active Member
Nov 5, 2021
135
64
28
Toronto, Ontario
www.linkedin.com
That makes sense. When you're doing multi-GPU, you're bifurcating the x16 PCIe I/O block on the CPU link, which is why both slots go down to Gen3.

I'm now torn between and Asus X99-WS/IPMI + a Xeon E5-2683 (16c/32t) @$600 USD vs a ASRock EPC621D8A + Xeon 8124M (18c/36t) @$600. They're both the same price and same PCIe, but the Skylake AWS CPU has 6-channel RAM and better IPC performance.

Am I crazy or is the Xeon 8124M AWS CPU a way better deal?

Does anyone know if any of the Broadwell or Skylake CPUs / Chipset BIOS can bifurcate the x16 PCIe slots down to x4/x4/x4/x4 so they can support a U.2 breakout card? (trying to determine if I need a raid card or pcie switch)
 

ericloewe

Active Member
Apr 24, 2017
295
129
43
30
Does anyone know if any of the Broadwell or Skylake CPUs / Chipset BIOS can bifurcate the x16 PCIe slots down to x4/x4/x4/x4 so they can support a U.2 breakout card? (trying to determine if I need a raid card or pcie switch)
The LGA2011/2066 parts can, though firmware support on Haswell/Broadwell is uneven (in the sense that it may require a custom BIOS image with the CPU configuration edited). I know that Dell added the bifurcation options menu in later BIOS revisions and that ASRock did not.
LGA115x parts cannot, they only do x8/x4/x4.
 

bryan_v

Active Member
Nov 5, 2021
135
64
28
Toronto, Ontario
www.linkedin.com
The LGA2011/2066 parts can, though firmware support on Haswell/Broadwell is uneven (in the sense that it may require a custom BIOS image with the CPU configuration edited). I know that Dell added the bifurcation options menu in later BIOS revisions and that ASRock did not.
LGA115x parts cannot, they only do x8/x4/x4.
Well that's problematic. Do you know if Skylake + C621 (LGA3647), has support platform-wide for in-lane bifurcation?
 

ericloewe

Active Member
Apr 24, 2017
295
129
43
30
I suspect most C621 boards will have extensive bifurcation options, but nothing like checking specific models you come across.
 

nemaddux

New Member
May 29, 2020
6
1
3
That makes sense. When you're doing multi-GPU, you're bifurcating the x16 PCIe I/O block on the CPU link, which is why both slots go down to Gen3.

I'm now torn between and Asus X99-WS/IPMI + a Xeon E5-2683 (16c/32t) @$600 USD vs a ASRock EPC621D8A + Xeon 8124M (18c/36t) @$600. They're both the same price and same PCIe, but the Skylake AWS CPU has 6-channel RAM and better IPC performance.

Am I crazy or is the Xeon 8124M AWS CPU a way better deal?

Does anyone know if any of the Broadwell or Skylake CPUs / Chipset BIOS can bifurcate the x16 PCIe slots down to x4/x4/x4/x4 so they can support a U.2 breakout card? (trying to determine if I need a raid card or pcie switch)
My answer is ASRock EPC621D8A + Xeon 8124M all day long. I just ditched my old HP Gen9 Broadwell server for this same setup and have not looked back. I think these are an amazing deal and would not be surprised if we see prices rise on the 8124m chips.

Asrock says "
SLOT6: PCIe3.0 x16, auto-switch to PCIe3.0 x8 when SLOT5 occupied
SLOT4: PCIe3.0 x16, auto-switch to PCIe3.0 x8 when SLOT3 occupied "

So I would think the PCIe slots could run at 8x/8x/8x/8x/4x but I would not swear by it and don't have a means of testing that configuration.