Some critical updates for those technically minded individuals (all the members of this forum)
Supermicr otherboard update
New bios fixes so many little issues - my compliments to supermicro
Motherboard itself, new same board, noticed some different components and a few minor revisions, still labeled as vs 1.02
Current build:
Supermicro X10DRG-Q
2x E5-2699 v4 (44 cores/ 88HT) (3.7ghz turbo)
512GB ram DDR4 2400mhz ecc reg
(QUAD SLI) 4 Titan X PASCAL
2x Samsung NVMe 961 Pro pcie 3.0 (os drives)
10x Samsung 850 Pro SSD RAID (apps drive)
LG 31MU97z 4096x2160 true 4K rev C
modded P5 case, Noctua heatsinks
Digital power supply 1600w
Windows Server 2012 R2 Data Center, Ubuntu 15
I also have moved to the larger faster Samsung 961 nvme drives
THE LARGEST UPDATE IS:
I have fully functional quad sli using the new Titan x pascal cards.
Point 1
I've had the cards since day 1, all 4 cards are sequential numbers.
The cards came with a new back plate which acted as a thermal insulator, removed the back plate and heat dissipation is so much better. If you buy these cards remove the back plate immediately. The back plate is paper this aluminum with a plastic layer on the inside. Obviously it was just for looks at the price of cookimg the cards
Point 2
Nvidia started off by disabling 4 way sli, and making folks buy a new high bandwidth bridge to get the most out of the maximum 2-way sli.
Sub point a.
The new driver 372.54 renewables 3 and 4 way sli, due possibly relentless flaming and heckling from myself demanding g compensation for the lie that they communicated that 3 and 4 way sli would be enabled with a free enthusiast key, which they dropped and dropped anything higher than 2 way sli
Sub point b.
The regular stiff metal plug 3 and 4 way sli bridges such as evga v2, works the same without any performance impact.
The plastic plug and ribbons, and ones such as the plastic plug asus ROG and evga v1, they are low bandwidth and will impact performance
Sub point c - and the most complicated
The sli bridges on 4 way sli are not populated with 8 connectors, they are populated with 6 connectors.
The monitor connected video card in the 4 way sli build has only one connection from the other three cards.
I've made a hybrid connection bridge layout connecting all video card sli pathways from the bridge and performance is higher than any other sli bridge configuration.
This means that a potential bridge with full connectivity would boost sli performance to the order of 20-25%.
Maya that is in the plans or not from nvidia. But as soon as I refine it, I'll post those findings and schematic.
Point 3
If tiny are going to sli, one must makE their own sli custom profile for applications/games.
Most of nvidia sli profiles are crap and use very little of the power from cards 3 and 4. Sli scaling with the nvidia default profiles is good at best for 1.2-1.8 scaling in performance. Build the custom profile yourself and in example, I'm getting 3.2-3.3 scaling in performance
Last point, the most important -
MOST IMPORTANT POINT
One cannot have good sli scaling higher than 1.8 UNLESS you build your sli system on a multi-cpu and multi socket system with alternating cpu lanes per video card or tesla card.
And if you NUMA optimize your app, you have to take that into consideration
And
You have to have COD (cache on die) enabled for you cpu snoop function, and...
...maybe I shouldn't give out too much secret sauce just yet!
If you are not multi socket YOU CAN NOT get real 3 and 4 way sli performance.
I have lots of details on this.
Anyhow
I hope you have found this info helpful
Detailed pictures and explanation to follow next:
I am only using the game benchmark as way of using a reproducible standard benchmark that is devoid of paid for optimization so such as fire strike , 3dmark, catzilla, steam, etc
So I'm using the built in 'benchmark c5l1" and "benchmark c5l2"
So here is the setup
Driver 372.54
Test material apples to apples
Painkiller Black
Havok physics enhanced mod - 24 threads
Texture mod, running at 32xAF (yes, I said 32x AF)
FXAA - more than that is wasted at that resolution
Full light mod
Full shadows mod
Ragdoll mod
Occlusion
High Q setting for optimization
Resolution - VERY IMPORTANT - 4096x2160
Again true 4K, 4096x2160
Running full eye candy Sli 4-GPU AFR 2 profile
Lane config is alternating by AFR 2 order, and matching all custom profile builds selecting the appropriate gpu to cpu, AFR2 layout.
From the bios shot you can see the layout I selected, please notice the what cpu goes to what pcie slot
Here is the view with the glass side off
Here are the bridge choices
Here are some of my selected manual bridge builds
So
This is the best score you can get with default brige - 303Fps is not bad at 4096x2160 with the fullest eye candy, custom profile is the same for all tests running benchmark c5l1
This is with the higher v2 bridge (2 year old bridge) a 50fps increase, not bad
Same bridge installed upside down so that primary card has both connections, score only went up a few fps, but look at the minimum frame count
And the secret sauce
Manual build layout running the more intensive benchmark c5l2 with greater physics
This is a custom sli bridge layout (not the one in any pics, trying to decide if nvidia would consider it a slap in the face)
Important - my motherboard type -the x10drg-q is the only board type in the world with reverse order sockets. The primary video card is the first socket on the bottom of the board, NOT the closest socket to the cpu(s) like all the other motherboards. Yes, it comes from supermicro that way.
390FPS average with peaks bouncing ar 1024FPS (game engine limitation)
So
Thought you might appreciate what is to come, these config schemes may also help those running tesla and optimizing everything from Cuda to games.
So, just on bridge config at 4096x2160, I went from 302FPS average, to 390FPS average.
...and honestly, --with full eye candy and physics, almost 400 FPS-- at 4096x2160, isn't all that bad...
Thx
J