Or you can implement simple job scheduling and finetune on customer data during off-peak. Keep GPUs earning $ and enjoy a cost-of-money capital base instead of paying hyperscalers.
We (stardog.ai) use both but L40S for production and 6000 Ada for developers.
But for our workload GH200 is 5x better than L40s for $ per 1k tokens per second. That’s on unoptimized inference stack so I expect that to creep up to 6 or even 7x better.
YMMV.
Got new GH200 with CX7 in a rack with an older switch temporarily. But I can’t figure out how to cable this…
I can’t find an OSFP to 4-port QSFP28 breakout except one for an absurd $2500.
I need 1.5m and passive/DAC etc.
Help?
I’ve been building GPU workstations for my AI developers and using Tailscale to form an R&D mesh. Good times!
Most of these systems have BMC headers and I realized having IPMI/BMC/Redfish could be very useful.
I can’t to save my life find an expansion card or chip to add to these systems...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.