Solitaire must play really well
.... and checkers too ....
Still looking for a GPU update to
MS Decathlon
Seriously,
the system "needs" more I/O bandwidth and a Xeon CPU (for one specific reason).
One Titan connected to a x16 slot can transfer approx. 11 GB/sec over the PCI Express connector. If connected to a x8 slot, the speed goes down to 5,5 GB/sec.
With a 5,5 GB/sec transfer speed, one full transfer of the GDDR RAM which is 6 GB, takes approx 1 sec - an eternity when a lot of CUDA cores are waiting for new data.
Another "limitation" is the aggregated bandwidth. All 4 cards have an aggregated bandwidth of 22 GB/sec on the X79 platform. While this looks like only 50-60% of the total available memory bandwidth the LGA2011 socket has, the i7-3930K CPU misses a critical capability of the Xeons. It doesn't have the
Data Direct functionality.
Basically, all I/O operations writing to main memory must preserve memory consistency, which is done via writing through the cache. Writing 1 GB of I/O data through a 10-20 MB cache basically invalidates all cache data the CPU thought might be useful for future usage. Bad luck. It has to get its data from main memory again - which is significantly slower than reading from the cache. The performance of the i7 tanks as it doesnt get data fast enough. The SB Xeon's circumvent this bottleneck by splitting the cache. Only a part of the cache will be used for this fast I/O into main memory and the remaing part of the cache can still serve the CPU's desire for data.
Andy