Best Approach to Cheap Distributed NVMe?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

rich0

New Member
Mar 17, 2023
15
2
3
I'm running Ceph and trying to move towards NVMe for future expansion. I'm not super-concerned with performance, and at least initially they'll probably be mixed with enterprise SATA SSDs - it just seems like the trend is towards NVMe and the cost of a U.2 SSD isn't really any higher than any other format.

Since I'm running Ceph my goal is to minimize the overhead for the hosts, and to a point more hosts are better than less. Servers actually marketed for hosting NVMes seem to still be pretty pricey.

Main requirements are at least 4-8GB RAM per NVMe, SFP+ support (can be via NIC), then as many enterprise (PLP, decent write endurance, used is fine) NVMes as possible for the host overhead. CPU isn't much of a concern for these, and integrated graphics would be preferred just to keep the PCIe slots free.

I notice I can get used 64GB workstations like the Z640 for maybe $200 or so with enough PCIe to handle a SFP+ NIC and at least one HBA, and there are a couple of options for putting NVMes in a 16x slot. These include U.2 NVMe adapters, M.2 adapters, and so on - typically 4 per 16x slot. Getting 4 large u.2 SSDs in a single host would be quite a bit of capacity. There are also proper HBAs that will do U.2 with all the modes, though really I just care about NVMe (the SAS drives don't seem to be any cheaper). The Z640 at least seems to support bifurcation, which is probably something to keep an eye on for the cheaper adapter options.

Is this the best way to go for something like this? It seems like the systems targeted at U.2 storage are still new enough to be very expensive, even used, though some of those solutions can fit a very large number of drives and of course tons of RAM.
 
  • Like
Reactions: zunder1990

nabsltd

Well-Known Member
Jan 26, 2022
428
291
63
Since you are looking at systems/chassis that are taller than 2U, and are likely modern enough to support bifurcation in the BIOS, then something like this installed in an x8 slot (set to x4x4 in the BIOS) will likely be the cheapest way to get it working. If the motherboard layout works, install two of these cards separated by one slot, and point a 120mm fan at the four drives.

HBAs that support connecting NVMe drives have very specific cabling/backplane requirements to get more than one drive per x4 PCIe lanes, and if you don't get that kind of density, the price of the HBA too much.
 

Sean Ho

seanho.com
Nov 19, 2019
774
357
63
Vancouver, BC
seanho.com
Only 4x NVMe per node isn't too bad, but don't underestimate how much CPU the NVMe OSDs will use. Another option if you need more capacity is to use spinners for OSDs, with DB/WAL on a partition of NVMe.
 

ano

Well-Known Member
Nov 7, 2022
654
272
63
budget? usage? dataneeds? usecase?

2U servers with 4x nvme u.2 can be had quite cheap
 

ano

Well-Known Member
Nov 7, 2022
654
272
63
2 x FAST gen4 nvme can use as much as 50% of a 7402 cpu.. with 25g nics! quite interesting really, it loves cpu.

we are doing 7313 cpus with 4x nvme per HCI host now.. due to licensing.. so yeah :|
 

rich0

New Member
Mar 17, 2023
15
2
3
budget? usage? dataneeds? usecase?

2U servers with 4x nvme u.2 can be had quite cheap
Mostly cephfs, so performance isn't super-critical. Some block store but it is relatively light.

Can you make suggestions on 2U servers with 4x NVMe, because it is actually kinda hard to actually find stuff. Nobody has filters on U.2 ports, or free PCIe slots (granted, an actual server won't have trouble with the PCIe slot bit, but it does matter for consumer CPUs).
 

rich0

New Member
Mar 17, 2023
15
2
3
The Supermicro 826 has some NVMe backplane options, the one with 4x3.0 slots (the other 8 are 12G SAS) is around $100 used and there's a one with 12x5.0 for $230 new: Supermicro BPN-NVME5-LA26A-S12 Hybrid Backplane for X13 Generation Servers. Some 826 variants also support dual rear 2.5" bays that can be had with their own NVMe backplane, but for another $100 (new) it's not a fantastic value.
That's just the cost of a backplane though? I'm guessing a used functional 826 would cost a fair bit more on top of that. However, that backplane would definitely handle a large number of drives, which is a good sign. At larger scale that would make sense, but I couldn't put more than a few drives in one host until I have enough hosts to balance them. The overhead of the host would definitely go down if I actually put 12 drives in it...
 

nexox

Well-Known Member
May 3, 2023
692
283
63
The 826 pricing varies based on age and features, I got a reasonable deal on one with a Broadwell system and the rear SAS bays for around $300 shipped, lately they seem to go for a bit more. Obviously if you want PCIe4 or 5 the board and CPU(s) will cost somewhat more.
 

rich0

New Member
Mar 17, 2023
15
2
3
The 826 pricing varies based on age and features, I got a reasonable deal on one with a Broadwell system and the rear SAS bays for around $300 shipped, lately they seem to go for a bit more. Obviously if you want PCIe4 or 5 the board and CPU(s) will cost somewhat more.
So, that's another concern I have about trying to modify a server - none of this stuff seems to be standard. I had never heard of an "826" until I read your post, and it appears to be the model number of a chasis, which apparently could have different motherboards in it. If I were to buy something used that had exactly what I needed the risk would be low, but if I wanted to start buying parts I'd obviously need to find compatible ones and finding a deal would be harder. It is of course easier to find older used servers that are fitted for HDD than NVMe. I definitely wasn't expecting to get PCIe v5, though I don't mind mixing and matching the hardware as long as it operates at the lowest common version. Getting 4x v3 speeds would already be a significant step up over what I have now.

I'm mainly interested in U.2 because it is pretty hard to find higher capacity drives in anything else, and of course the additional performance is nice too.
 

nexox

Well-Known Member
May 3, 2023
692
283
63
Anything suitably useful is going to be proprietary to some degree (backplanes, PSUs,) but Supermicro at least keeps their stuff compatible for a long time, and they have 3.5" trays with extra holes that fit 2.5" drives with inexpensive (or 3D printed) bay adapters.

Only one of these available, and it's a proprietary form factor, but that may be a bonus because replacement/upgrade motherboards are cheaper, already has the quad NVMe backplane and probably the 2.5" compatible trays: SuperMicro 2U Barebone Server w/ X10DDW-IN w/ NVME Support Dual 1620W PWS | eBay
 

ano

Well-Known Member
Nov 7, 2022
654
272
63
supermicro 826, and 829 comes with N4 backplanes, can be had for 2-300 and can fit H12 motherboard or H11 or similar, so ends up decent cheap, <1k with h11, and about just over 1k with h12 (motherboard, chassis, cpu, ram)

stay away from the models requireing risers, they are hard to get parts/swaps for

cheapest nodes from dell would be r7515 and can be had for 2k? ish per

for hpe you can get the backplanes for gen10 and gen10plus quite easy and affordable as well, but not as cheap


also are you going for u.2 or m.2? u.2 can usually be had much easier.


we dont use much cephfs yet, but use rgw/s3, and rbd for vm images

for rgw and s3 we use either 1x 15.36TB for 11 or 15 spinners, or 2x7.68 for 10 or 14 spinners

for RBD its U.2 nvme or SAS SSD only

we have tested new enterprise m.2 as well, and its turning out very well! you get high tbw/dwpd m.2 now, like 7450 series micron and pm9a3, so we have some 60 drive + 4x m.2 3.84TB running, all m.2 on a quad adapter, also tested up as high as 20x m.2 7450 pro per system, fio numbers are fun at least. iostat turns out funny with 82 drives per system (boot included)

Mixing slower SAS SSD or even SATA ssd with fast u.2 will decrease results, but its possible if you dont require that high IO.

and yes, u.2 nvme has become cheap! but!!! we pay 2-3x as much now for a kioxia CD8 as when it came out?!whats up with that
 
Last edited:

Chriggel

Member
Mar 30, 2024
64
22
8
Since I'm transitioning from SATA and SAS to NVMe as well, I've looked into possible options for this. A 2U chassis with 24x U.2 bays is something I consider further down the road, but for now the system doesn't need to be rackmounted and I'll be using a Meshify 2 XL and so I was looking for solutions with and without cables to distribute drives between the PCIe area of the board and the drive bays of the case.

The JEYI 4xU.2 PCIe x16 adapters work well and are very cost effective: https://www.aliexpress.com/item/1005005574030013.html
Prices are changing from day to day, I've bought three that all were below 40 EUR. It's at 45 EUR today, but that's still reasonable. Locally available alternatives are about 3x the price, like this Delock part: Delock Produkte 90169 Delock PCI Express 4.0 x16 Karte zu 4 x intern U.2 NVMe SFF-8639 - Bifurcation (LxB: 288 x 122 mm)

For a solution with MiniSAS HD cables, I got this: https://www.aliexpress.com/item/1005005653918833.html
The first cables I tried by FOVORE were pretty bad, they had MiniSAS HD connectors that didn't really fit and when I made them fit they still wouldn't work. So either there still wasn't a physical connection because of the poor fit, or the cables had problems with a faulty pinout on top of it. As of today, these cables are not available anymore after I've reported the problem.
These cables worked: https://www.aliexpress.com/item/32819343916.html
So far, I only tried the 50cm version. I still need to figure out if I need the 100cm ones. The adapter (obviously) doesn't come with a redriver, so I want to keep the cables as short as possible
Again, this is much more cost effective than buying for example this: Delock Produkte 90077 Delock PCI Express 4.0 x16 Karte zu 4 x SFF-8639 NVMe - Bifurcation - Low Profile Formfaktor
Granted, the Delock part uses a redriver and MCIO connectors, but the set for 4 drives is close to 400 EUR while the solution with the parts from Aliexpress is under 100 EUR and seems to be working just fine (with 50cm cables).

With the Asus WS C621E that I'll be using I think I can fit 26 U.2 drives. I don't necessarily expect the drives to operate at full PCIe 4.0 speeds even if the platform would support it (which it doesn't), because of the cheap parts I'm using, but I'm not really concerned about bandwidth at these levels. That's more than enough drives and bandwidth for me at the moment. I could even fit additional HDD storage if I wanted to do some storage tiering or whatever use case I could come up with. The board has the connectivity and the case has all the space in the world.

So, maybe that's an option for you as well.
 

rich0

New Member
Mar 17, 2023
15
2
3
This is admittedly getting a bit outside the bounds of "cheap", but you can fit 10xU.2 drives in 1U and you don't need to add any more cables or adapter cards, which do start to add up in price when you're looking at lots of drives: Supermicro SuperServer 1029U-TN10RT 2x LGA3647 w/10x 2.5" NVMe Bays 2x M.2 Slots | eBay
Yeah - something like that was what I was looking at in terms of more "off the shelf" solutions, but they're still pretty pricey. I do look forward to when that hardware starts dropping in price, as it eventually will.

I'm also at the point where I want to scale horizontally more than vertically with erasure coding/etc, so a cheaper node that can handle 4 U.2 drives is a better option for me right now than an expensive node that can handle 10+.


The overhead isn't that bad as you point out if you really do use all those slots.
 

dooferorg

New Member
Jul 7, 2020
15
4
3
What I've been working on is putting together 5 systems of the SYS-1027GR-TRT type since there's 3 PCIe x16 slots in it. Unfortunately the system does not support bifurcation but it will at least allow me to get a Ceph/Proxmox cluster off the ground. I found the systems for around $100 on ebay and processors were like $7 each. It's DDR3 so again, not very expensive. A U.2 drive or two per system will give me enough SSD backed storage to run the VMs I wish to run. 10GbE fiber cards are relatively cheap as well.

This is my work in progress at least, and it's coming along well in the basement :D I may make a post on /r/ceph once done and I can benchmark it :D