Intro & Built notes

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Andreas

Member
Aug 21, 2012
127
1
18
That was my thinking in going with the 240GB Sandisk drives. I only require ~ 3TB of "hot" data and (24) X 240GB provides plenty of room for OP (~50%). Even though the Sandisks do not show the same "penalty" as other drives as they fill, the $/GB on the Sandisks was compelling enough to move to the larger capacity and offer the benefits of large OP.
The reason I didn't take the SanDisk wasn't the issue I had with the Vertex 4, but were 2 different ones

The write perf of all 3 SanDisk SSDs I have (120,240,480GB) is below the write perf of the Samsung 128GB (320MB/sec) for uncompressible data.
The SanDisk need quite long for its GC. In your setup with 50% OP it wouldn't be a problem though.

Both approaches (with Vertex and SanDisk over provisioned) are for a set budget more cost efficient than the Samsung considering storage space, but still lag IOPS and IO bandwidth.

Andy
 

ehorn

Active Member
Jun 21, 2012
342
52
28
Agreed Andy... I also observe that the Samsungs' perform quite well in deeper QD's, where the separate themselves in IOPS...

peace,
 
Last edited:

Andreas

Member
Aug 21, 2012
127
1
18
A quick update what happened over the weekend.

Built the second workstation, the Dual Xeon "mothership"
2 x E5-2687W, Asus Z9PE-D16 MB, 128 GB regECC 1600 MHz, dual Corsair H100 watercoolers, normal PC case
I haven't made up my mind yet on the new storage subsystem, so I used the "old" setup from the i7-3930K rig with 32 SSDs connected via 4 x LSI 9207-8i.
Performance is VERY good:
It is about 35% faster than a E7-7560 Xeons with 32 cores and HT
Or ca. 30% faster than a 48 core AMD Opteron (4 x 6168)
twice as fast as a previous generation dual Xeon 5650
All in all, nothing to complain about. :)

Memory bandwidth is ca. 76 GByte/sec, which is roughly twice as fast as the 40 GB/sec I had on the LGA2011 machine. Potentially ECC and timing makes the difference.
One SAS port on one LSI HBA died :-( Have to send the card back ....
The issue with the power supply is resolved. The new PSU delivers 40A on the 5V rail. No issue writing with all 31 SSDs. (10 GB/sec)
The Asus MB would not boot with Windows Server 2012. WinServer 2008 R2 is fine (as is Win7, but unsupported). Asus Helpdesk could not say when the BIOS upgrade for WS2012 will arrive.

more to come,
Andy
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,517
5,812
113
Andy... dba and I have been corresponding via e-mail over the last day or two. He has some really interesting results on the 2P system and almost hit 6GB/s off of a single controller.
 

Andreas

Member
Aug 21, 2012
127
1
18
Andy... dba and I have been corresponding via e-mail over the last day or two. He has some really interesting results on the 2P system and almost hit 6GB/s off of a single controller.
Is the 2P system a SB/Romley platform? 6 GB/sec per HBA sounds very, very good. I assume this is off the LSI 9202-16e adapters with 16 SSDs connected. My 9202-16e adapters arrived last friday (i got 4 of them), but I still have to wait for cables to connect them to the SSDs. 4 times 6 GB = 24 GB/sec seems to be the upper limit for a balanced machine with 2 SB CPUs. I/O at this rate consumes ca. 30% of available memory bandwidth. It is less of an issue with systems tuned for storage, but if the processing of that 24 GB/sec datastream requires more than 2 byte memory access by the CPU for each byte read in from the SSDs, the application becomes quite fast CPU bound. Unless the app is well optimized (high % of memory accesses are covered by the cache hierarchy), the 8 memory channels become fast the next bottleneck. Funny times after so many years of I/O "starvation" ...

For the type of app currently running on the 2-socket workstation - the CPU is really maxed out from the computational workload - (almost constantly at 150 watt during 24 hrs) including ca. 300 TB of I/O in one day. To preserve as much CPU capacity for the compute part of the job, my current setting is a bit different from the past. Reflecting on the "bottleneck" of the LSI 9207-8i past 5 drives and the inevitable higher CPU load in a pure HBA setting with 32 SSDs passed through the HBA's to establish a software raid-0, I configured the 4 LSI HBAs with 2 raid0 each (4 SSDs per raid0). 2 software raid0 on top of that with each taking one 4-SSD raid per LSI HBA. This arrangement is in my case faster than the "classic" arrangement of of connecting all 16 SSDs of 2 LSI adapters into one raid0 (2 x LSI based 8-SSD raid0, with one software raid0 on top).

On power consumption: according to HWInfo, the 16 ECC RAMs are consuming 50 watt per CPU under high load, giving 100 watt of power just for memory. Add the energy consumption of fans, the LSI controllers, the SSDs, etc. and the 2 high-powered CPUs with 150 Watt TDP each are responsible for a tad above 50% of the total energy consumption under high load (ca. 550 watt peak measured on the wall outlet).

On temperature: The 2 Corsair H100 keep the Xeons below 60 degrees, allowing Intels speed step to throttle up all 8 cores to sustainable 3.4 GHz (nominal speed is 3.1 GHz) with eventual peak of single cores reaching 3,8 Ghz.

On NUMA: CoreInfo reports that the difference between local memory access for a CPU thread and a CPU thread accessing process memory located in memory connected to the second CPU is 1.6. Better make sure that CPU intensive apps are NUMA aware to maintain high workloads. Databases usually are, but other server apps less often. There is a good thing with the 2-socket SB boards vs. their larger silblings. Sandy bridge CPUs for dual socket motherboards have 2 QPI interconnects, connecting 2 CPUs. In a 4 socket motherboard, the CPUs are only connected via single QPI interconnects. One QPI can transfer ca. 16 GB/sec. The 2-socket CPUs can utilize a higher share of available memory bandwidth of the second CPU than the 4 socket machine.

Lets take the 38-40 GB/sec memory bandwidth the 4 channels deliver per CPU socket.
In a dual CPU system the 2 QPI can theoretically transfer 32 GB/sec, the single QPI link in a 4-socket machine 16 GB/sec. As long as the OS can accomodate that process/memory affinity is local, the impact is negligible. If not, the 4-socket machine is more exposed in this domain than the 2-socket machine.

cheers,
Andy
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,517
5,812
113
Great info! Yes this is with the LSI 9202-16e and only a bit under 6GB/s. What voltage memory are you using? Guessing 16GB * 8 DIMMs.
 

Andreas

Member
Aug 21, 2012
127
1
18
you are welcome, Patrick.

re memory: normal 1,5 volt regECC DDR3 (Kingston)
I am using the Z9PE-D16 main board, which has 16 DIMM slots. I use 16 x 8GB DIMMs for 128 GB.
More memory would have been nice, but the per GB price is currently 2,5x higher.
 

odditory

Moderator
Dec 23, 2010
384
69
28
Isn't that a pitty? There I get the new CPU's from the parcel service today, look forward to bring them to use and here comes the news that I got old, outdated and basically inefficient technology..... ;);)
You're sweating Haswell when it won't even see the light of day til Q313? :)

That's a SICK # of sata ports on that DP board, btw.
 
Last edited:

Patrick

Administrator
Staff member
Dec 21, 2010
12,517
5,812
113
you are welcome, Patrick.

re memory: normal 1,5 volt regECC DDR3 (Kingston)
I am using the Z9PE-D16 main board, which has 16 DIMM slots. I use 16 x 8GB DIMMs for 128 GB.
More memory would have been nice, but the per GB price is currently 2,5x higher.
Interesting notes there. Come to think of it, seems a bit higher than I remember the single processor boards using with 4x UDIMMs.

BTW odditory... it is crazy these days. dba is doing his testing on the Supermicro X9DR7-LN4F ("only" 18 ports plus four LAN (and one IPMI)):

And I thought the X8ST3-F was awesome not that many years ago.
 

Andreas

Member
Aug 21, 2012
127
1
18
Got a mail from ASUS support today, that a new BIOS for the Z9PE-D16 will be available in the next few days . With support for Windows Server 2012.
 

Andreas

Member
Aug 21, 2012
127
1
18
That's good to know... Any other changes coming apart from the Windows support?
I don't know about other changes. I contacted them as the MB doesn't boot Windows Server 2012 with the current BIOS, while it did with WS08R2 and Win7.
 

Jeggs101

Well-Known Member
Dec 29, 2010
1,529
241
63
I don't know about other changes. I contacted them as the MB doesn't boot Windows Server 2012 with the current BIOS, while it did with WS08R2 and Win7.
This is crazy. For that price it should boot. Windows saw your storage specs - got scared - and refused to show up :p
 

Andreas

Member
Aug 21, 2012
127
1
18
Yet another weekend update:
1) Flashed the new BIOS (3107) - Besides WS2012 compatibility fixed, the boot time is significantly faster
2) Expanded the I/O capabilities of the Dual Xeon workstation: Could not achieve linear bandwidth and IOPS growth versus previous configuration.
Possible reasons:
a) Architectural limit of IOMeter reached
b) QPI overloaded
c) threading model of LSI driver​
3) Silverstone ST-1200 PSU keeps running with 48 SSDs in max write perf mode (I do not dare to increase the 5V load further)
4) Out of 6 new LSI 9207-8i HBA's, 2 are faulty (interestingly, same error: Port 0 on the upper plug does not work)
5) Reached initially 19,9 GB/sec (out of 24,4 GB/sec max possible).
6) IOPS a bit short of 2,2 mio. CPU load increases super linear vs. smaller # of SSDs. Probably a NUMA issue.
7) Stability of the system is good. Last week's average on compute bound jobs with large IO components was above 600 MB/sec (360 TB over the full week, peaks were above 8 GB/sec).
8) Power consumption in idle is now approx 200 Watt. Fully loaded (really full): 620 Watt
9) Number of "CPUs" in the system: 166. (16 Intel, 6 Power, 144 ARM)

rgds,
Andy
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,517
5,812
113
The new Samsung 840 Pro looks like to become a good performing SSD - to be released in october.
http://www.custompcreview.com/reviews/samsung-840-pro-series-256gb-ssd-review/15768/

Andy
Yea, I think we will be seeing a lot more of these announcements in the near future. Given how much firmware changes, my plan was to wait for shipping firmware.

Very interesting on the LSI HBA failure rates. That looks very high to me.

One other thing to consider is that as OCZ and others release their new drives/ controllers, we are going to start seeing more price competition. IIRC the Vertex 4 256GB went from $300 to sub $200 in 2-3 months. Also would be good to allow some shakeout of new offerings to happen.
 

Andreas

Member
Aug 21, 2012
127
1
18
There will be a wave of new products before christmas season - good anyway :)

In the meantime, my "old" Samsung 830 are working well ....

Here is a snip from perfmon.

Current configuration is 6 x LSI HBA, 48 x Samsung 128 GB (2 ports not working).
All SSDs are grouped to 4xSSD hw raid0 by the IR firmware in the controllers, showing up as 12 visible drives in the OS. This setup makes reconfiguration in the OS quite fast.

This screenshot is taken from perfmon, displaying 10 drives in the os ( with 4SSD blocks each). The remaining 6 drives ( 2 blocks of 3 SSD each) are not used in this phase. Given that the max write speed for a 4 SSD block is 1280 MB/sec, net write performance is currently nothing to complain about.



This level of IO performance is not delivered on an otherwise idle machine, but on a fully loaded system. Here is a second snippet from the captured desktop, task manager.


rgds,
Andy
 
Last edited: