Deeplearning010 - 8x GTX 1080 Ti Build (Deep learning and crypto mining)

Patrick · Jun 5, 2017

gotchu said:
Hi Patrick,

I am really concerned about the noise level. My plan was to place one of such machine (maybe less cards depends on the noise level) in the lab for my DL projects and utilize the free electricity for mining . If it gets too loud, my labmates may complaint about it. Could you give me a rough estimation how loud it can get when 8 cards/or 4 cards full-loaded? Thanks.

Wrong box for an inhabited area. I will not even go by the rack with it without ear protection.

Also, I run the fans very high just to ensure that everything stays cool. If you are mining, 90%+ fan speed is what you will be looking at to stay under 80C on the cards.

William · Jun 5, 2017

Patrick said:
You may have noticed two missing expansion brackets in the middle. I may try 9 GPUs later today.

I think DJ was saying sometime ago there is a version that can run 10 cards.

William · Jun 5, 2017

Patrick said:
Wrong box for an inhabited area. I will not even go by the rack with it without ear protection.

Also, I run the fans very high just to ensure that everything stays cool. If you are mining, 90%+ fan speed is what you will be looking at to stay under 80C on the cards.

LOL
I ran this beast at home, I could hear it out in the front yard. Yeah you could not sleep in the house when it was running so I had to limit my testing time HAHA.

Patrick · Jun 5, 2017

@William - the TR2 takes 10 cards. There are power connectors in the current TR for 10 cards. Had the 9th 1080 Ti. Ran out of cables!

zer0sum · Jun 5, 2017

How are your temps looking while mining?

Patrick · Jun 5, 2017

Well I now have the red side and the black side:

At least the 3-fan model has been purged (now in a dev system.)

@zer0sum I did have to do some fan speed work. but 76-80C.

William · Jun 5, 2017

Patrick said:
@William - the TR2 takes 10 cards. There are power connectors in the current TR for 10 cards. Had the 9th 1080 Ti. Ran out of cables!

Ahhh, good luck !!

gotchu · Jun 5, 2017

Patrick said:
Wrong box for an inhabited area. I will not even go by the rack with it without ear protection.

Also, I run the fans very high just to ensure that everything stays cool. If you are mining, 90%+ fan speed is what you will be looking at to stay under 80C on the cards.

It reminds me one of the video I saw:

.

Patrick · Jun 6, 2017

Main site post on this one: DeepLearning10: The 8x NVIDIA GTX 1080 Ti GPU Monster (Part 1)

DWSimmons · Jun 6, 2017

Great write up! Lots of questions answered and context and economic and future recommendations and nicely done.

For the cost breakdown, I see a small sliver at the bottom of the SuperMicro vendor as if 2 products from them. Is that the GPU hump?

Patrick · Jun 6, 2017

DWSimmons said:
For the cost breakdown, I see a small sliver at the bottom of the SuperMicro vendor as if 2 products from them. Is that the GPU hump?

Yes it is. $120 or whatever it was is a very small cost in the total.

mackle · Jun 7, 2017

Patrick said:
Assuming a $300/ card 180-day depreciation it is ~60 day payback for the cards including data center power and pool fees. Someone pointed out to me that the resale value of the cards is not necessarily 0 if you can sell them off in a few months.

If I learnt one thing folding on GPUs it is that consumer cards really don't like running flat out 24/7.

For that reason I never sell them used. Of course mining for profit rather than folding for charity changes the dynamics on that, especially if they're being 'flipped' after a couple of months, rather than perhaps a year or so.

MiniKnight · Jun 7, 2017

Some big box there.

0xbit · Aug 9, 2017

Patrick said:
Main site post on this one: DeepLearning10: The 8x NVIDIA GTX 1080 Ti GPU Monster (Part 1)

I am new to discovering your website and am loving it so far. I made my first purchase just recently due to things you've covered here. I have a particular question about your deep learning build.

I am looking into GPU-GPU computing and GPUdirect via RDMA and I was wondering a couple of things about a communication channel across boxes :
Q1 : Is it possible to connect two boxes directly to each other using : 40Gps QSFP+ capable cards and DAC cables?
Q2 : Is it possible to use a QSFP+ 40Gps mellanox card that splits into 4 SFP+ cables and plug them into 4 computers with 4 10GPs SFP cards using DAC cables?
Q3 : Is it possible to do GPUDirect RDMA on the GTX line video cards across two computers via a mellanox card interconnect?
Q4 : For Mellanox cards, do the Connectix-2 cards support this or do you need Connectx-3 cards?
I read here that Connectx-2 support RDMA RoCE : http://www.mellanox.com/related-docs/prod_software/ConnectX-2_RDMA_RoCE.pdf

But it seems like only Connectx-3 and up are centered on this feature :
http://www.mellanox.com/page/products_dyn?product_family=116&mtag=gpudirect

Am I good to go with a pair of MNPA19-XTR cards?

Hoping to save a little money and get some direction on this experimental build. I'm sorry if my questions are complete amateur hour. I am very new to this all and am trying to get some bearings.

Thank you so very much in advance Patrick !

Patrick · Aug 9, 2017

0xbit said:
I am looking into GPU-GPU computing and GPUdirect via RDMA and I was wondering a couple of things about a communication channel across boxes :
Q1 : Is it possible to connect two boxes directly to each other using : 40Gps QSFP+ capable cards and DAC cables?
Q2 : Is it possible to use a QSFP+ 40Gps mellanox card that splits into 4 SFP+ cables and plug them into 4 computers with 4 10GPs SFP cards using DAC cables?
Q3 : Is it possible to do GPUDirect RDMA on the GTX line video cards across two computers via a mellanox card interconnect?
Q4 : For Mellanox cards, do the Connectix-2 cards support this or do you need Connectx-3 cards?
I read here that Connectx-2 support RDMA RoCE : http://www.mellanox.com/related-docs/prod_software/ConnectX-2_RDMA_RoCE.pdf

A few bits on this:

Welcome to STH!
On the Mellanox adapters, get ConnectX-3, CX-3 Pro if you can as a minimum. You can have one port direct connect via 56Gb Infiniband and the other service a 40GbE network with the right VPI cards. On my phone so I do not have the part numbers handy.
You can get adapters to step down the second port to SFP+ if you only have 10GbE.
You can use DACs.
On 3, we use a bunch of different configurations, not just GTX cards. Double-check on NVIDIA forums that the cards you want to use will do this feature. I know it is an area that they are very picky on. Apparently, some of the big guys have made their own feature that functions similarly but I am sparse on details there since it is their IP
If you need CX-3 adapters inexpensively, ask for help in the networking forum here after checking great deals. They can be had well under $200 and $500 for CX-4 100Gb cards. Sometimes you need to hunt for inventory but people here use them often.

0xbit · Aug 10, 2017

Patrick said:
A few bits on this:

Welcome to STH!

On the Mellanox adapters, get ConnectX-3, CX-3 Pro if you can as a minimum. You can have one port direct connect via 56Gb Infiniband and the other service a 40GbE network with the right VPI cards. On my phone so I do not have the part numbers handy.

You can get adapters to step down the second port to SFP+ if you only have 10GbE.

Thank you. Yes, If possible i'd greatly appreciate it if you can quickly detail which Infiniband capable Connectx-3 dual port card you grabbed that I see wedged in between the vidya here :

Also, is there a reason you went with the pro vs the non pro?
For the non-pro, I see :
MCX313A-BCBT Single 40/56GbE QSFP
MCX314A-BCBT Dual 40/56GbE QSFP
To shorten your load, correct me if I'm wrong but it seems as though GPUDirect doesn't work on GTX cards.
Also, for the QSFP to SFP step down, can I use a break-out cable 40->10/10/10/10 DAC?
It seems I need to search for VPI mellanox cards with infiniband capability in order to use infiniband?

Patrick said:
You can use DACs.

On 3, we use a bunch of different configurations, not just GTX cards. Double-check on NVIDIA forums that the cards you want to use will do this feature. I know it is an area that they are very picky on. Apparently, some of the big guys have made their own feature that functions similarly but I am sparse on details there since it is their IP

If you need CX-3 adapters inexpensively, ask for help in the networking forum here after checking great deals. They can be had well under $200 and $500 for CX-4 100Gb cards. Sometimes you need to hunt for inventory but people here use them often.

Excellent. I spent a good amount of time learning about DAC vs FIBER cables.
As for the different cards supporting GPUDirect, it seems as though Nvidia gimped the line.
Only supported on professional line cards not the GTX line :
> NVIDIA GPUDirect™ for Video
> NVIDIA GPUDirect
So, it doesn't seem infiniband will provide any benefit for a GTX line powered stack.
Indeed, it appears to be a software/driver disable to drive sales for more expensive cards :

GPUdirect is only enabled on fermi/kepler

GeForce GPUs do not support GPU-Direct RDMA. Although the MPI calls will still return successfully, the transfers will be performed through the standard memory-copy paths. The only form of GPU-Direct which is supported on the GeForce cards is GPU Direct Peer-to-Peer (P2P). This allows for fast transfers within a single computer, but does nothing for applications which run across multiple servers/compute nodes.

Tesla GPUs have full support for GPU Direct RDMA and the various other GPU Direct capabilities. They are the primary target for these capabilities and thus have the most testing and use in the field.

So, as it doesn't seem as though GTX cards can take advantage of Connectx-3 feature-sets it seems I will work with cheaper non-featured Mellanox cards for now.

Patrick said:
Apparently, some of the big guys have made their own feature that functions similarly but I am sparse on details there since it is their IP

As it seems like a software/driver disable and likely not a hardware feature, it seems I am going to have to get creative and do some re-plumbing. I looked at the competition which seems to market a similar tech :
GitHub - RadeonOpenCompute/ROCnRDMA: ROCm Driver RDMA Peer to Peer Support
But then find :
AMD DirectGMA | BitFlow
for which they only enable it on their professional line of cards as well.

So, it seems like Radeon and Nvidia have decided to make this a little fun for me.
Off hand, what do you think the savings of piping across Infiniband vs ethernet will be?
And who might some of these big guys be who wrote custom work arounds?

I'm assuming you mean folks like this :
News ?

And lastly, am I shut out of infinband if I went with (MNPA19-XTR)? :
Cheap 10Gb SFP+ $19 Mellanox ConnectX-2 EN Cards

Thank you so very much in advance !

pgh5278 · Aug 11, 2017

Patrick said:
Yes it is. $120 or whatever it was is a very small cost in the total.

Patrick are the hump special order from Supermicro, or purchase where? Thanks Peter

Patrick · Aug 11, 2017

pgh5278 said:
Patrick are the hump special order from Supermicro, or purchase where? Thanks Peter

Just need the part number MCP-230-41803-0N

pgh5278 · Aug 11, 2017

Patrick said:
Just need the part number MCP-230-41803-0N

Thank YOU

Search

Deeplearning010 - 8x GTX 1080 Ti Build (Deep learning and crypto mining)

Patrick

Administrator

William

Well-Known Member

William

Well-Known Member

Patrick

Administrator

zer0sum

Well-Known Member

Patrick

Administrator

William

Well-Known Member

gotchu

New Member

Patrick

Administrator

DWSimmons

Member

Patrick

Administrator

mackle

Active Member

MiniKnight

Well-Known Member

0xbit

New Member

Patrick

Administrator

0xbit

New Member

pgh5278

Active Member

Patrick

Administrator

pgh5278

Active Member