Co-location for high power consumption GPU server?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

larrysb

Active Member
Nov 7, 2018
108
49
28
My work in GPU compute has hit the point where I can't do any more in my office or home, without popping breakers or running 220v service and upgrading the electric utility entrance to my house, much less install the cooling to deal with the heat produced.

The choices now are pay by the hour and the data flow at a cloud scale GPU, or perhaps build up and co-lo my own hardware somewhere. I've not shopped for co-lo services, but most of them seem to be predicated on a low-power 1U slot. I'm contemplating a 4U thing that draws a couple of kilowatts.

Thoughts on shopping for a space for my box? What the heck do I look for?
 

BlueFox

Legendary Member Spam Hunter Extraordinaire
Oct 26, 2015
2,059
1,478
113
Most datacenters are not equipped to handle that kind of density and will likely not offer a single server colo for anywhere near that kind of power draw. I have 30A for a full rack for example.
 

Blinky 42

Active Member
Aug 6, 2015
615
232
43
48
PA, USA
Do you only have a single server, or going for several?

If just the one, a 30A 208V circuit is commonly available at most colos - it will just be a whole (or at least 1/2) a rack that you need to lease. It should still be cheaper than paying for 24x7 power & cooling in your home to the level they can provide.

If you want several servers then look for higher power density colo. Newish rooms built out to support 20kVA+/rack are in major metros, and you can go higher as well but at a price. If you are starting out small and growing slow, start somewhere less expensive and just have a lot of extra space, or lease the extra space to friends with low powered servers to offset the expense of hosting your server.
 
  • Like
Reactions: Patrick

kapone

Well-Known Member
May 23, 2015
1,095
642
113
The one question nobody asked...

@larrysb - You said "My work..." - Are you paying for this yourself (self employed/business) or is a client/employer going to pay for this?
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
What @Blinky 42 said. The other benefit is that you then have space for more build-out whether that is for a VPN/ firewall box, control server, storage server, or anything like that.

Also, something to think about is how much direct access you want/ need. We have data centers we use in Silicon Valley, but I have been thinking about putting some of the older nodes we use for regression testing outside of Chicago or Virginia to get to lower-cost power and space.

A lot of GPU clusters actually look like mining installations in their requirements.
 
  • Like
Reactions: T_Minus

larrysb

Active Member
Nov 7, 2018
108
49
28
The one question nobody asked...

@larrysb - You said "My work..." - Are you paying for this yourself (self employed/business) or is a client/employer going to pay for this?
Self-funded, at least for now.

I've essentially worked myself into a corner and simply can't scale the machines up in a normal office (or home office thanks to covid) environment. Not enough power or cooling available outside of dedicated server space. As noted, I've discovered that most server locations are not focused on co-lo with machines that draw that kind of power.

What @Blinky 42 said. The other benefit is that you then have space for more build-out whether that is for a VPN/ firewall box, control server, storage server, or anything like that.

Also, something to think about is how much direct access you want/ need. We have data centers we use in Silicon Valley, but I have been thinking about putting some of the older nodes we use for regression testing outside of Chicago or Virginia to get to lower-cost power and space.

A lot of GPU clusters actually look like mining installations in their requirements.
Yeah, I hadn't really got that far into it to the whole picture of what I need next to the GPU's. Certainly, ample data storeage and high-speed between storage and compute. I don't need crazy CPU capability, cores or clockspeed, just lots of PCIe lanes and slots and memory is good.

I hit a wall on the power question. It's not hard to hit 2000 watts with a handful of GPU's running.

I thought the miners went ASIC. Most of the mining farms I've seen pictures of look pretty improvised.

Never shopped for this kind of thing.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
@larrysb - you are correct on the actual mining machines, but a lot of the facilities you find miners in you also find GPU compute servers in. Low cost for power. Bandwidth and power you do not typically shop for the lowest latency/ most redundant.

Perhaps it should be ex-miners since a lot of the mining folks moved to places off of hydro plants to get inexpensive power and cooling.
 

Blinky 42

Active Member
Aug 6, 2015
615
232
43
48
PA, USA
You could start by planning out what you would want next to GPU-heavy server(s) and spec out that hardware's needs to scope out your colo requirements. And have a ball-park budget in mind because you will need to haggle with everyone and be willing to enter into multi-year contracts to get decent pricing.

Patrick touched on some of them:
- router / firewall / vpn server?
- 1G managed switch for mgmt ports on servers, colo up-link etc
- switch for high-speed interconnect
- file server for training and model data etc
- utility / vm server for non gpu things?
- PDUs (un-managed ones often can be included in the NRC & MRC of your cabinet, but manged ones you can source on your own)

If you end up with a colo facility near you, much of that you can build up over time and add hardware as you go. If you determine what your base needs are so most all GPU related data is contained within the rack, you won't need to provision a lot of excess Internet bandwidth to start. You can direct connect the GPU server and file server high-speed network ports then add a switch later for example.

Then with your numbers for:
- power needs A+B redundant 208V 30A
- bandwidth (50Mb commited on aburstable, or 100M fixed port, etc)
- Size of IPv4 block (and IPv6 if you can handle it at home)
- If fractional cabinets are offered that meet your power needs

Keep in mind that if you do go with less than a full cabinet, you will probably need horizontal PDUs, so expect to use up 2U+ for each (4U for an A+B redundant power feed).

For bandwidth, with what you describe I would try and stick to facilities that can offer you bandwidth directly from a blend of providers they manage. The other style of site is "carrier neutral" and the colo will sell you a cross connect to a transit provider who you have to do a seperate contract with for IP & Bandwidth. That is more hassle than you probably want to deal with for your first colo setup. Also if they start asking you about Transfer or even advertise things that way on their site, don't bother with them. You should be able to get a price for 50Mb commit on a 100M or 1G port and know how much overages cost if it is a burstable connection. Find out the providers they use to make up their blend, or look the provider up in peering DB to find out yourself. You can also as who is on-net in the building for the future if you want to get bandwidth directly from a provider.

You won't need a lot of IPs if you are basically extending your current home lab to the colo - you only need one to setup a tunnel between your current lab and the colo and keep everything else in the rack on private IPs. A $50 EdgeRouter-X on each end would get you started. You can setup a tunnel between the 2 points, and on the colo side offer up VPN connections so you can "dial-in" to the rack and do work within from anywhere, plus provide DHCP and DNS services for the equipment in the rack.

Then just google for colocation and the city you want to start looking in and call around with your power and bandwidth needs. You will quickly run into the 3 or 4 colo directory sites that aggregate info by location (DataCenterMap, CloudAndColo etc). Talk to some folks to get numbers and play them off each other, then see if you can get a tour of one or 2 (they should have skeleton staff 24x7 even in the COVID era).
If you also get a price from a provider 1-2h away in a less popular city or across a state line you can get an idea of what the bulk power costs are they they have to pay and see if you can save a few thousand per year by putting your equipment a bit father from home to see if it is worth the drive.

If you have a target area you are looking in that you can share, people may have suggestions on sites to try or to avoid.

Good luck and have fun with it! Now is a great time of year to do this if you are quick - sales guys want to add another client to their books before EOY, so take advantage of that.
 

larrysb

Active Member
Nov 7, 2018
108
49
28
I was looking around in the south end of $ilicon Valley where I'm located. Another possibility might be the Las Vegas area, where I have a second home.

Certainly, a server can go anywhere on the planet in theory, but being a newbie to this I'd be more comfortable being able to get to it in person in a reasonable time frame, in case I screw up or hardware needs tending.

My storage right now is in the range of 18tb of data sets, model checkpoints and results. That's really just for development purposes. Fortunately, with today's very large disk drives, it isn't too hard to keep that going. I'd really like to look into NVME-of at some point.

The local setup is two substantial workstations, each with a pair of linked RTX cards, 25gb ethernet with RDMA between them for distributed GPU compute, and 10gbe link to another machine acting as a file server and job hub. With only a 15 amp circuit to the home office and shared with some other outlets, I need to keep the GPU's power capped in software, otherwise the peak current demands teeter on tripping the breaker. Heat removal is also an issue, as you might imagine.

I want to scale up, and I think I've hit the practical limits of power/heat/space/noise of what can run in my home or office space.
 

mackle

Active Member
Nov 13, 2013
221
40
28
With only a 15 amp circuit to the home office and shared with some other outlets, I need to keep the GPU's power capped in software, otherwise the peak current demands teeter on tripping the breaker. Heat removal is also an issue, as you might imagine.
I’d just appropriate the laundry and the dryer outlet... (a clothes rack in the hot aisle would take care of the laundry too)
 
  • Like
Reactions: Marsh

larrysb

Active Member
Nov 7, 2018
108
49
28
I’d just appropriate the laundry and the dryer outlet... (a clothes rack in the hot aisle would take care of the laundry too)
LOL, yeah a scene right of HBO's, "Silicon Valley". More truth to that than one might think.