4P board with 8 double width card backplane(unobtanium?)

Discussion in 'Processors and Motherboards' started by RobertFontaine, Dec 22, 2015.

  1. RobertFontaine

    RobertFontaine Active Member

    Joined:
    Dec 17, 2015
    Messages:
    666
    Likes Received:
    148
    The SM X9-DRG-Qf or X10-DRG-Qf boards seem like a perfect set up for a 4 Xeon Phi's workstation/server and I have managed to grab and X9-DRG for cheap.

    Which lead me to looking at 4P boards and thinking of 8 Xeon Phi's because more.

    All the GPGPU boards with an 8 card backplane that I am seeing are 2P. The Xeon Phi's strength is that it can be used both in distributed and smp models as a general processor as well as a fp64 coprocessor. For offloading that has a lot of chatter between the mother ship and the Phi card you want all the pci bandwidth you can have. A 2P board is probably optimal for a server with 5 or six cards or a workstation with 4 and a video card for Quake/raid card ).

    I've never really looked at commercial boards with backplanes before. Did/does SM make a X9/X10 board that does this kind of hpc (I don't see on on their site)? Am I even going to find such a beast on the junk market or is this stepping into custom board territory?

    Thanks,
    Robert
     
    #1
  2. Patriot

    Patriot Moderator

    Joined:
    Apr 18, 2011
    Messages:
    1,291
    Likes Received:
    673
    Why are yall wanting Phi's so bad? The GPGPU performance is abysmal compared to gfx cards. Next gen should be much better... but currently they suck.
     
    #2
  3. RobertFontaine

    RobertFontaine Active Member

    Joined:
    Dec 17, 2015
    Messages:
    666
    Likes Received:
    148
    If you want to use a PHI to calculate a matrix operation then they suck compared to a 4k tesla. (although these are $125 a video card would still be better 7990.).

    If you want a Teraflop of general purpose compute that you can either offload or distribute then the PHI architecture crushes video cards.

    Knight's landing is also NOT a video card and will also suck at $4k if you compare it to the next generation of Teslas IFF you are rendering pictures.

    But then again I am not rendering video. :)
     
    #3
    mstone likes this.
  4. Patriot

    Patriot Moderator

    Joined:
    Apr 18, 2011
    Messages:
    1,291
    Likes Received:
    673
    I am not rendering video's either... I am scaling out a Neural network. Still not competitive lol.
    I have 3 5110Ps btw. lol.

    Neural nets don't require double precision... so even the older K10s I have do 4.5Tflops/200w
    But yes, if you want dual precision, which not many things need then I guess the PHI's are not the worst choice... They are just deprecated, knights landing will be nothing like the current ones.
     
    #4
  5. RobertFontaine

    RobertFontaine Active Member

    Joined:
    Dec 17, 2015
    Messages:
    666
    Likes Received:
    148
    Picking an algorithm that works in fp32 and does not require general compute and then saying look my 980ti is faster is great advertising for nvidia. You should take note that the only parallel compute they pushed was dnn. FP64 was crippled since Fermi and a Cuda core is not capable of general computation. I digress. Gpgpu is absolutely the fastest at what it does but it is a one trick pony.
     
    #5
  6. RobertFontaine

    RobertFontaine Active Member

    Joined:
    Dec 17, 2015
    Messages:
    666
    Likes Received:
    148
    When I can afford Knight's Landing for my basement lab I might jump in (likely not). I keep hoping they will have a developers program or that the US government will block them from shipping to China (Tianhe) after they have filled their warehouse with them again. Till then the PHI is perfect for my pile of purposes; Kaggle Box, Work on making GnuBg a little Smarterer, Data Mining, Machine Learning, General Mathbox, Studying OpenMP/AC Programming and a couple of other interesting ideas that I have wanted to explore but am not ready to chat about. C++/R/Python(anaconda)/R/MathLab etc. The programming model for the PHI and KNIGHT's LANDING is the same. KNIGHT's landing will use a more powerful set of cpu cores and will have added a nice collection of 512 wide instructions but from a development perspective the code that works well on my phi box will scale well on a KNIGHT's LANDING cluster. I think that when I have a model that I am happy with on my little box that the best thing to do is buy AWS time to run the job or find a sponsor with heavy metal. 3 Nodes is actually a pretty good number. It isn't 1 or 2 so it is effectively many. With more than 3 PHI's the smart lab thing to do is probably create a second node and put a 40Gb fibre cable between them. So much of optimizing these programs is latency; pci-e, qpi, cache. Amdahl's law is a pain in the butt but in practice it's tough to get past all the wait times.
     
    #6
  7. Patriot

    Patriot Moderator

    Joined:
    Apr 18, 2011
    Messages:
    1,291
    Likes Received:
    673
    ivankreso/caffe-xeon-phi ยท GitHub Memory is the biggest limitation generally for those things... you want to stay on card as much as possible. or use DMA on the fiber to mirror card sets memory.
     
    #7
  8. tjsgifan

    tjsgifan New Member

    Joined:
    Nov 20, 2015
    Messages:
    10
    Likes Received:
    1
    These solutions do have PCIe switching in place due to the number of PCIe lanes available, so the theory is that your bottleneck could be if you have 2 offloads going across the same PCIe switch port at the same time, but with PCIe3.0 x16 that's still a fast bus. If you pare it with some 18 Core E5 v3 CPU's that still gives you a lot of power.

    The Supermicro 8 way GPU such as the Supermicro | Products | SuperServer | 4U | 4028GR-TRT is still probably the fastest single image system you can get off the shelf at a reasonable cost.
    You could talk to someone like SGI for a single image system that scales to 32 GPU's as a single image but then your budget needs to be much bigger. Look at the SGI UV series for such product.

    Or you could look at working outside of the box and getting your 4 way server with lots of PCIe 3 x16 slots, and then using an external PCI Expansion cabinet such as this :
    Quantum EXR3600 - Exxact Corp to connect all your cards to your server.

    :)

    We have done some interesting solutions using this type of product and can always find something that will fit the job....
     
    #8
  9. RobertFontaine

    RobertFontaine Active Member

    Joined:
    Dec 17, 2015
    Messages:
    666
    Likes Received:
    148
    Don't get too excited my rack is built with plywood. :)

    I'm thinking risers with a plx chip for models where offload isn't the constraint. The Phis are only pcie-2 so the constraint would be the switching overhead of the plx chip.

    Sometimes that could be perfectly fine.
    The v3s should drop in price (hopefully in early 17. 16 core v3s however probably won't be in the bargain bin till 19.

    Will probably build 3 - 2P - 4 phi compute nodes and stop as that gives me n nodes in a networked environment to write codes against. The 8 core v1's are a bargain. 1400watts per node may annoy my wife if I don't pay the power bill though. ;)
     
    #9
  10. William

    William Active Member

    Joined:
    May 7, 2015
    Messages:
    766
    Likes Received:
    237
    I ran a review on a Supermicro 7048GR-TR about a year ago, impressive machine.
    Supermicro 7048GR-TR (Intel C612) Workstation Tower System Review

    You can see that when running LinPack on everything it pulled about 1,600watts, running 8x Phi's would put you close to 30amps. Not many homes can handle that.

    I only had those cards for a very short time and had to do a crash course on operations, right in the middle of Christmas/New Years, so as far as I got was getting LinPack to run on them. Its not an easy task, for me anyway. But if you know what you are doing you should have no problems.
     
    #10
  11. RobertFontaine

    RobertFontaine Active Member

    Joined:
    Dec 17, 2015
    Messages:
    666
    Likes Received:
    148
    I have no idea what I'm doing but it's entirely a skunkworks project and I am perfectly happy to learn how to build it out as well as write the code.

    If it turns into something then it would be nice but if not it's a hobby that keeps me properly engaged.
     
    #11
  12. William

    William Active Member

    Joined:
    May 7, 2015
    Messages:
    766
    Likes Received:
    237
    In that link I posted I went over how to get communications with a card started under windows, from there you can upload programs and run them. It should be rather simple to do but if you have any questions fire away.

    Basically each Phi is a separate Linux based computer on a PCIe slot that "can" communicate with the host or other cards or over the network. It can get rather complex pretty fast.

    They were fun to mess around with and I would have liked to get a few other benchmarks running on them but just ran out of time. Intel only allowed us to have the 4x Cards for 30 days and I had to deep dive very fast on how to get these up and running, it was no easy task for me LOL
     
    #12
    Chuntzu likes this.
  13. RobertFontaine

    RobertFontaine Active Member

    Joined:
    Dec 17, 2015
    Messages:
    666
    Likes Received:
    148
    Thanks, with a little luck the power supply in my back pack will get me to first boot this evening. Kvm/centos for this one... Will be an all purpose machine for a couple of months while I build a cheap workstation and a file./data sewer.
     
    #13
    William likes this.
  14. Deslok

    Deslok Well-Known Member

    Joined:
    Jul 15, 2015
    Messages:
    1,087
    Likes Received:
    119
    What about dell's R9xx series systems? taking a look at them a 930 could hold 7 double with cards i'm not sure about the older models off hand
     
    #14
  15. William

    William Active Member

    Joined:
    May 7, 2015
    Messages:
    766
    Likes Received:
    237
    Well those are beasts that's for sure. I can picture lots of noise and 30amp line needed.
    If you can budget the systems and power use sure ;)
     
    #15
  16. RobertFontaine

    RobertFontaine Active Member

    Joined:
    Dec 17, 2015
    Messages:
    666
    Likes Received:
    148
    alphacool out of Germany makes a watercooling block for the 31S1P so it is entirely possible to build a wife safe rig. I have two of them and the fit is adequate.

    With 3 compute nodes running I should be able make espresso. A xeon phi draws about the same power as gtx580, 270watts ish per card. Essentially the same as running 4 video cards.

    220 and a proper power system will probably be an issue but I will burn that bridge when I come to it. I might find a victim, I mean sponsor, with rackspace before the divorce papers are filed.

    For this week I will be happy if I can get kvm/centos/phis booting and a development vm running. Purely version. 01
     
    #16
  17. Deslok

    Deslok Well-Known Member

    Joined:
    Jul 15, 2015
    Messages:
    1,087
    Likes Received:
    119
    Does that water cooler actually reduce it to a single slot card? if so that should certainly help with 8 of them in a 4p system
     
    #17
  18. RobertFontaine

    RobertFontaine Active Member

    Joined:
    Dec 17, 2015
    Messages:
    666
    Likes Received:
    148
    No it's very double wide.
    The x9drg-qf is set up nicely for 4 double wide cards. 2 on each Cpu...

    Except the ram blocks the 5th pcie slot. I'm going to need pcie extension cable to get my infiniband cart on the node with 4 phis.
     
    #18
  19. RobertFontaine

    RobertFontaine Active Member

    Joined:
    Dec 17, 2015
    Messages:
    666
    Likes Received:
    148
    I have bios :)
     
    #19
    Quasduco likes this.
Similar Threads: board double
Forum Title Date
Processors and Motherboards Low power, multi core server board recommendation for home KVM use Feb 20, 2020
Processors and Motherboards Supermicro mainboard not detecting PCI-E card Feb 11, 2020
Processors and Motherboards ga-p67a-ud7 motherboard + IBM M1115 LSI 9223-8i - bootloop Feb 5, 2020
Processors and Motherboards X299 Motherboard Recommendation for i9-10900X? Jan 19, 2020
Processors and Motherboards Quad socket 2011-3 board for cheap (i think inventec B4000B0) Jan 19, 2020

Share This Page