Server for 2-4 NVMe for Ceph

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

rich0

New Member
Mar 17, 2023
15
2
3
What is the best option these days for an OSD server for Ceph to host 2-4 NVMes for minimal cost? This would be running Rook.

Things I would want:
2-4 NVMes (M.2 or U.2 I guess, though it seems hard to find large enterprise M.2 format SSDs)
32-64GB RAM
At least 1 SFP+ port, though SFP28 would be better. That is already going to be a bottleneck, but I don't mind a bit of sacrifice to save money here.
I'd prefer something x86_64-based

I'd rather have more smaller nodes than one giant one, since I'll need multiple nodes anyway for redundancy. More NVMes are of course always nice if they don't drive up the price but Ceph would need more RAM to use them. I don't care if some of this is accomplished with PCIe cards/switches/etc as long as it can saturate the NIC.

On a side note, I'm interested in where you can even buy larger enterprise NVMes in M.2 format - they seem easier to find in U.2. I realize you can switch between them.

The new NAS appliance that was just reviewed seems like a possible option, though it only can hold 2 NVMes and it sounds like the NIC is fussy with recent linux versions. It also sounds like setting it up would be a bit of a pain with only a VGA output. I wouldn't mind something cheaper per-M.2.
 

nexox

Well-Known Member
May 3, 2023
700
289
63
There are lots of potential options out there, to help narrow it down, can you tell us how you prioritize: physical size, energy consumption, noise, and hot swap on U.2?

Unfortunately m.2 doesn't have a lot of utility in enterprise hardware beyond boot drives, so they don't really hit high capacity. Also lots of the very affordable options are going to involve old server hardware which means a VGA display, though if you get a modern enough board with IPMI the remote display access will probably work well enough.
 

rich0

New Member
Mar 17, 2023
15
2
3
There are lots of potential options out there, to help narrow it down, can you tell us how you prioritize: physical size, energy consumption, noise, and hot swap on U.2?

Unfortunately m.2 doesn't have a lot of utility in enterprise hardware beyond boot drives, so they don't really hit high capacity. Also lots of the very affordable options are going to involve old server hardware which means a VGA display, though if you get a modern enough board with IPMI the remote display access will probably work well enough.
I care mostly about cost, which would include energy consumption. I don't really need hot swap - if I need to add/replace a drive I can just shut it down - it will be running Ceph/k8s after all. I'm fine with U.2 whether natively or via adapters, though obviously they need to fit in the case (which would probably be a problem with that NAS appliance).

If it has IPMI I don't really care what the video output is like. I don't mind buying used server hardware as long as I can find it - ebay is an endless list of model numbers that give no indication of how many U.2 slots or what networking they have generally.
 

rich0

New Member
Mar 17, 2023
15
2
3
Dell r630 supports NVMe on some of the 2.5drive slots.
Hmm, looks like it can take 4x U.2 drives? If I could get one with enough RAM/etc for a few hundred that might make sense. Would obviously be a bit large but I guess I can stack them. Looks like it idles at over 80W which isn't ideal, but 4 SFF desktops would pull that much most likely and probably take up at least as much space.
 

nexox

Well-Known Member
May 3, 2023
700
289
63
I think you could do better on power with a lower-spec server board in an ATX case, but the wiring is obnoxious unless you get one of those over-size quad u.2 PCIe cards. Something like an X10SRL-F with a low core count CPU (the X9 series are much cheaper but mostly don't do bifurcation and have terrible IPMI remote display stuff that you want to avoid.)
 

NPS

Active Member
Jan 14, 2021
147
44
28
I still don't get what you actually try to achieve with this. Is it for learning? Do you have an actual workload (with performance requirements) this system should be capable to handle? How much usable disk space do you want (need) to have?

Enterprise m.2 ends at 3.84TB per drive. This will probably never change.
 
  • Like
Reactions: Sean Ho

Sean Ho

seanho.com
Nov 19, 2019
774
357
63
Vancouver, BC
seanho.com
^^^ the above are the real questions that should be answered first before we dive into hardware recommendations. What's the use-case? Is it currently running, and if so on what hardware, and with what visible pain-points or measured benchmarks?
 

rich0

New Member
Mar 17, 2023
15
2
3
^^^ the above are the real questions that should be answered first before we dive into hardware recommendations. What's the use-case? Is it currently running, and if so on what hardware, and with what visible pain-points or measured benchmarks?
Sure, this is for homelab Rook. Random stuff around the house. Right now I'm using 5400RPM HDDs, which work but obviously don't perform great especially for recovery.

Long-term I'm thinking about migrating to NVMe. I'm not sure I'd ever want to 100% migrate to NVMe due to the high cost, but I figured I'd try to start with something expandable and a few TB of storage.

It would be hard for it to not outperform the current HDDs of course. The most demanding stuff would be things like photo editing and such I guess, though I might move more stuff to there if it performs well. This is a mix of block storage for k8s PVCs and cephfs file storage - mostly media (though the media would be the last thing I'd migrate to NVMe).

If this were for any kind of serious application I wouldn't be here... :)
 

nexox

Well-Known Member
May 3, 2023
700
289
63
You may want to consider SATA SSDs rather than NVMe, they're going to be a whole lot simpler and lower power, and while there are some deals to be found on NVMe drives, the price of adapters and cables add up quickly.
 
  • Like
Reactions: Sean Ho

rich0

New Member
Mar 17, 2023
15
2
3
You may want to consider SATA SSDs rather than NVMe, they're going to be a whole lot simpler and lower power, and while there are some deals to be found on NVMe drives, the price of adapters and cables add up quickly.
That's a fair point I hadn't considered. Sure, they don't perform as well, but system with 4 SATA SSDs would easily saturate an SFP+ or two.

I was focused on the fact that NVMes aren't much more expensive per TB, but the interfaces are the expensive part. Pretty much any old SFF desktop bought used will accommodate a few SATA SSDs.

I suspect it will get hard to find higher capacity enterprise drives with a SATA interface, so they're not super-future-proof, but by the time the price of higher capacity U.2/3 drives come down I suspect there will be more used server hardware available with these interfaces, or maybe even more solutions for consumer hardware. Since I don't need bleeding edge it would be fine if somebody made a PCIe 16x v5->v3 64x switch with a bunch of U.2 ports. I guess we're a bit of a niche though in that regard.
 

ano

Well-Known Member
Nov 7, 2022
657
273
63
you probably want sm863a, not sm863 if that gen sata

but.. nvme is nvme, drasticly faster
 

rich0

New Member
Mar 17, 2023
15
2
3
you probably want sm863a, not sm863 if that gen sata

but.. nvme is nvme, drasticly faster
To be fair though, this is still a distributed filesystem, so the blazing IOPS of direct access of NVMe is probably not going to happen either way. I think the question is how they compare in that context, and I honestly don't know the answer there.

What is the difference between the 863a and 863? I see the one is marked for datacenter use. I didn't see specs on write endurance. Really the main things I care about are power loss protection, and to a lesser degree write endurance (especially if buying used). Of course performance matters but if I'm putting 2-3 of these in a PC with an SFP+ and DDR4 that is only going to be so high anyway.
 

ano

Well-Known Member
Nov 7, 2022
657
273
63
To be fair though, this is still a distributed filesystem, so the blazing IOPS of direct access of NVMe is probably not going to happen either way. I think the question is how they compare in that context, and I honestly don't know the answer there.

What is the difference between the 863a and 863? I see the one is marked for datacenter use. I didn't see specs on write endurance. Really the main things I care about are power loss protection, and to a lesser degree write endurance (especially if buying used). Of course performance matters but if I'm putting 2-3 of these in a PC with an SFP+ and DDR4 that is only going to be so high anyway.
then in general sm863/sm863a are both great



I would usually agree, but try the latest reef versions, its FAST, SO MUCH FASTER!! (yes it has earned multiple exclamation marks)

ceph in general, and ceph on nvme has passed the magic marker

sm863a has less fw bugs inlater versions, can be fixed, and more umph (we have hundreds of both still in prod though) work horses. write endurance is 5 or 3dwpd, from top of my head.

I have a small lab running, 4 x pm883 in each node, and Im able to get them to utilize about 450-500MBs continous! Im running out of cpu! I did the 10k iops challenge for less than 2k usd (my take on it) and it passed it and hit 70k/ 30k read/write just using thoose fio iptions and librbd on 5 images!

same lab machines have PM5's as well, again, much faster and maxing 40G links and cpu easy, have an nvme lab as well, and you can achive

thanks to the forums here I also have 16 micron 7450 m.2's going up in same lab machines! think cpu will limit me there (know it will) have single 6132, impossible to find cpu coolers for cisco ucs m5 without paying more than I payed for servers..
 
  • Like
Reactions: Sean Ho

rich0

New Member
Mar 17, 2023
15
2
3
I would usually agree, but try the latest reef versions, its FAST, SO MUCH FASTER!! (yes it has earned multiple exclamation marks)
I'm actually running Reef right now. However, almost all my storage is on 5400RPM USB3 HDDs, which don't seem to perform any better in this release.

I definitely would prefer NVMe, but I do have to consider whether SATA SSD gets me to a point where I'm actually running a significant amount of storage on flash, vs waiting years for the prices of U.2 drives and systems that can host them to come down. Most of my existing nodes could accomodate a few SATA SSDs right now, so the marginal cost of adding them is just the drives. Plus I have enough of those hosts that I could actually use EC and get more mileage out of them, vs just buying 3 expensive used servers and having to run size=3 to start.

I'll think about it, but the SATA option is one I had just immediately dismissed but when I think about the alternative being HDDs right now, it makes sense to consider a compromise. Actually, the stuff that is most performance-sensitive right now is already running on SSD - a mix of M.2 and SATA - the biggest risk there is that these are consumer units without PLP. Anything I buy going forward for this is going to be enterprise grade.
 

ano

Well-Known Member
Nov 7, 2022
657
273
63
funny thing is even a pm863, or sm863a or whatever is drasticly faster than a 1 mill iops consumer drive by a mile.

enterprise grade, even old stuff ia magic! and I can vouch for thoose, we have run them for years and years.
 
  • Like
Reactions: rich0

rich0

New Member
Mar 17, 2023
15
2
3
Damn those are becoming cheap. I paid $180 for three 1.92 TB SM863s last October.
Yeah, I just picked up half a dozen and now all my SSD storage has power loss protection, and I'm expanding my use of it. I'm not sure how soon, if ever, I'll be 100% SSD due to the cost, but they're much more reasonable.

Only issue I see with SATA is that they're a bit legacy and I suspect high capacity storage might not become available on this interface. Needing a SATA port per 2TB of storage isn't ideal. That said, this gets me up and running with more cheap nodes, and I can look to expand hardware in the future and U.2/PCIe lanes will hopefully come down in price at some point...