Lab planning (statistics and machine learning heavy)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gsk3

New Member
Sep 13, 2017
28
10
3
53
Budget: $2K
Rackspace: This will be hosted in a university rack, space will be more limited than power consumption. Will have at least 4-5U, maybe as much as 10U. 1x1GbE internet hookup.

Immediate use: Single-person, high-memory computations
Future use: Would be nice if it would scale to a 4-5 user cluster for similar uses

Basically I need to be able to VPN in, run calculations that require hours to days of computing time but fit within ~100GB of RAM. VM or containers required due to very specialized dependencies.

Things I will definitely need and approximate cost:

1U UPS: $100-200

R210 II Router/VPN Server: $200

1U top-of-rack switch (L2 managed, either with a separate small/inexpensive 10GbE switch for the data network or just one big GbE/10GbE switch): $300? Maybe?

Storage server: $200 for an older DDR3 server, $200 for RAM, $400 for LFF bulk storage drives, $50 for 2x small SSD boot drives, $150 for L2ARC and ZLOG drives, $300 for 2xRAID1 drives for VM. I have some appropriate SSDs already so this will not be quite as expensive as it looks.

Compute/VM server: Here's where it gets tricky. I need >=144GB RAM, but also need a box that will take a machine learning-appropriate consumer graphics card like the GeForce 1080 Founders Edition. Ideally this would be one box, something like the HP DL380p G8, if that works? That would put me at $1000 for the machine and RAM, nevermind the GPU. Power isn't really an issue within reason, so if I can go with the generation before this (G7) and still get the GPU working, that would be ideal for budgetary purposes. Otherwise I have to start thinking about further consolidation, like moving the storage server into the VM server, getting rid of the switch and R210 II and virtualizing the VPN appliance as well.
That makes this more of a single-box solution which would need $1300+ for machine and ram (256GB) plus the cost of the hard drives above, but saves on the cost of the switch, extra box to house the storage server. If I go this route I'd still probably buy a R210 (the older, cheaper one that doesn't support AES-NI) to be a remote access server to the IMPI port. Major downsides of the single-box solutions I see are: More heat in one box (lots of HDDs plus a hot GPU), significantly less scalable for the future, and a lot more setup complexity as everything has to be virtualized in such a way that resources are prioritized and shared appropriately.

Network diagram of the full eventual setup in my head attached.

Criticism welcome from group. You guys are far more experienced than I. What are the pitfalls of either approach? What am I not budgeting for or planning for? Is there a good 2U box that more easily supports the GeForce 1080?
 

Attachments

gsk3

New Member
Sep 13, 2017
28
10
3
53
How much storage do you need? Does it need to be 10GbE backend or will 1GbE work?
In general, 100MB to 20GB of stuff gets loaded into RAM at the start, computed on for a few hours or more in RAM, and then a few GB written to disk. I had a huge speedup in my workflow when I went to SSD from spinning disks on my desktop. So 10GbE would be very nice. I can save on the hub by just direct-connecting dual-port 10GbE cards to each other since I think I can get away with 3 or fewer nodes that need to be on the storage network.