Server 2012 R2, Storage Spaces and Tiering

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

PigLover

Moderator
Jan 26, 2011
3,213
1,570
113
Same basic disk configuration, 100GB WriteCache but this time with SSDs in the pool.

New-StorageTier -StoragePoolFriendlyName TieredPool -FriendlyName SSD_Tier -MediaType SSD
New-StorageTier -StoragePoolFriendlyName TieredPool -FriendlyName HDD_Tier -MediaType HDD
New-VirtualDisk -StoragePoolFriendlyName TieredPool -FriendlyName HDD_Parity -UseMaximumSize -ResiliencySettingName Parity -ProvisioningType Fixed -NumberOfColumns 4 -WriteCacheSize 100GB


100% read IOPs: 32,967.



Output is pretty much identical to the case with 2 disk. I did some more checking. Looks like when you force the drives to "journal" and create a VirtualDisk it will only use exactly the number of journal drives that it needs. Since this is single-parity then we get 2 Journal drives assigned into the virtual disk, no more, no less.

I re-ran the test without explicitly setting the SSD drives to journal. Same result.
 

PigLover

Moderator
Jan 26, 2011
3,213
1,570
113
OK. So no impact going from 2 SSDs to 3 SSDs. Just to check if it will stripe the journal across multiples of 2...Lets give it a try with four SSDs.

Add the 4th SSD to the pool from the wizard - then these commands in PowerShell (actually, they are exactly the same commands as before!):

Get-StoragePool -FriendlyName TieredPool | Get-PhysicalDisk | ? MediaType -eq SSD | Set-PhysicalDisk –Usage Journal
New-StorageTier -StoragePoolFriendlyName TieredPool -FriendlyName SSD_Tier -MediaType SSD
New-StorageTier -StoragePoolFriendlyName TieredPool -FriendlyName HDD_Tier -MediaType HDD
New-VirtualDisk -StoragePoolFriendlyName TieredPool -FriendlyName HDD_Parity -UseMaximumSize -ResiliencySettingName Parity -ProvisioningType Fixed -NumberOfColumns 4 -WriteCacheSize 100GB


100% read IOPs: 34,408.



So going to 4 SSDs resulted in a jump in performance. Not too bad. Amazing compared to SS without the Cache. I think this is what I will use for the bench-off with ZFS.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,531
5,850
113
Hey PigLover - any interest in turning this into a main site post/ summary? Something like this should get more visibility.
 

PigLover

Moderator
Jan 26, 2011
3,213
1,570
113
Hey PigLover - any interest in turning this into a main site post/ summary? Something like this should get more visibility.
Let me get to the end and see if I can summarize. I haven't even gotten to the real interesting part...comparing a faster disk subsystem on ZFS constrained by older Samba/CIFS vs a slower disk subsystem with SS/Server 2012 R2 but with SMB3 file sharing. That's the part I'm really after...and I don't have any idea how it will turn out.

I also still have the problem that I don't know the right way to bench that. I've got reason to believe that Crystal won't give accurate results in that config.
 

PigLover

Moderator
Jan 26, 2011
3,213
1,570
113
One last test before I move on to the next phase: Test a 12-drive Dual Parity (Raid-6/RaidZ2 equiv.) with SSD journal and large Write Cache.

Build the pool in the wizard with 12 HDDs and 3 SSDs (the 4th SSD wouldn't help and I don't have 6). Powershell to build the disk:

Get-StoragePool -FriendlyName TieredPool | Get-PhysicalDisk | ? MediaType -eq SSD | Set-PhysicalDisk –Usage Journal
New-StorageTier -StoragePoolFriendlyName TieredPool -FriendlyName SSD_Tier -MediaType SSD
New-StorageTier -StoragePoolFriendlyName TieredPool -FriendlyName HDD_Tier -MediaType HDD
New-VirtualDisk -StoragePoolFriendlyName TieredPool -FriendlyName HDD_Parity -UseMaximumSize -ResiliencySettingName Parity -ProvisioningType Fixed -PhysicalDiskRedundancy 2 -WriteCacheSize 100GB


Note that when building the virtual disk I didn't specify "dual parity" anywhere. Apparently you specify dual parity with the "-PhysicalDiskRedundancy 2" option. Also note that I removed specification of the number of columns.

Interesting note: the disk seems to be short 1 disk worth of capacity. I expected ~18GB but it configured ~16GB (same as the prior test with single parity but 4 columns). Odd. I'll look into that later.

100% read IOPs: 43,268.



So performance is good. Slightly lower than the last setup except in 512k reads and 30% more read IOPs. Overall resiliency is probably safer - can lose any 2 drives and survive. But for the next round I think I'll stick with the last configuration.
 

awedio

Active Member
Feb 24, 2012
777
227
43
..real interesting part...comparing a faster disk subsystem on ZFS constrained by older Samba/CIFS vs a slower disk subsystem with SS/Server 2012 R2 but with SMB3 file sharing. That's the part I'm really after...and I don't have any idea how it will turn out.
Really looking forward to this part
 

PigLover

Moderator
Jan 26, 2011
3,213
1,570
113
Really looking forward to this part
Well - I'm looking for good ways to bench network disk performance. Crystal is obviously all confused. ATT0 insists on only testing "real" disks. Large file copies to/from a Ramdisk are most promising - but somehow not very satisfying as being rigorous.
 

PigLover

Moderator
Jan 26, 2011
3,213
1,570
113
I've been playing with various ways to bench NAS performance as seen from a Windows 8 based client. After googling for quite a while I've come to the conclusion that there isn't any good tool for this and no generally accepted method...sucks.

I did decide not to tear down the Server 2012 R2 system to load napp-it/ZFS. Just too much hassle to put it back. I do have two other high quality ZFS servers on my 10Gbe LAN to test against.

Server 1:
- Solaris based ZFS file server
- Bare metal load of Solaris 11.1 and Napp-it.
- Xeon X3460 2.8Ghz, 32GB 1333 ECC
- 20x Hitachi 2TB 5900 RPM (coolspin) set up as a single pool with 2x 10-drive RaidZ2 vdevs
- Drives connected by 3 separate M1015s flashed to IT mode (no bottlenecks)
- 2x 20Gbe links active on Intel 82599-based NIC

Server 2:
- ZFS on Linux running on Proxmox 3.0 (Debian Wheezy)
- Dual Xeon E5-2667 on SM X9DRL-3f MB, 128GB 1666 ECC
- 8x Seagate 4TB 5900 RPM drives in a single RaidZ2
- Drives connected by on-board SAS ports (no bottlenecks)
- 2x 20Gbe on Mellanox ConnectX-3 EN

The pools on both systems bench in Bonnie above 1,000MB/s seq read and above 500MB/s seq write. They should provide a good point of comparison.

So...I built an RAMdisk on the client machine. I've got a 4.68GB folder containing the ripped files from a DVD (good mix of small and big files). I'll look at speeds copying this to/from each server. Since there are no reliable benches available this will have to do for now.

Here's a quick look at the RamDisk performance.
It ought to be fast enough to be a source or sink of file copies. :)


Here are copies to/from the Server 1 (Solaris ZFS, 20x Hitachi 2TB, local speeds >1,000MB/s read. >500MB/s write):

Copying from the RamDisk to the Solaris server showed a slow start, mostly around 200MB/s, and a speed burst at the end to >250MB/s


While copying the same directory back hit a nice solid 300MB/s for the while transfer


Here the same copies to/from the Server 2 (Debian Wheezy ZoL, 8x Seagate 2TB, local speeds >1,000MB/s read. >500MB/s write):

Copying from the RamDisk to the ZoL server showed a slow start, mostly around 200MB/s, and a speed burst at the end to >270MB/s
Looks like a carbon copy of the Solaris based server, just a bit smoother.


While copying the same directory back hit a nice solid 300MB/s for the whole transfer
This one is shockingly similar to the Solaris machine.


So how did the Server 2012 R2 machine do? Did SMB3 give it a huge advantage?

No - but it didn't do too badly either.

Here's copying from the Workstation Ramdisk to the Server 2012 R2 machine. Its interesting - with a long slow start period and then a ramp to over 400MB/s!


The return path (Server back to client) came in at a nice smooth 220-230MB/s.
Only about 70% of the speed of the Solaris servers.


Conclusion

So what's the bottom line? I'd say that Server 2012 R2 and its derivatives are finally showing some moxy for file services vs ZFS. Overall, the SS file system still struggles with software parity raids - but they've created a strong SSD-based journal/cache approach that can make it a usable filesystem, if still a bit slower on the local machine than a comparable ZFS build.

Also, SMB3 does bring some advantages. Neither the ZFS nor Server 2012 system could deliver anything close to saturating the 10Gbe links. This is highly disappointing. But the Server 2012 build did get a larger percentage of its filesystem performance onto the link - leaving the two options almost exactly the same performance when seen at the client.

Note that this was all done on a single-client workload. Not realistic in most cases. Most production NAS builds expect to serve dozens to hundreds of workstations. I don't have any way to predict how either build would work in that environment.

Next steps

Over the next several weeks I'll keep tuning and see if I can squeeze out just a bit more. For now - I think that's it.

I do still need to finish setting up a 2nd Server 2012 R2 machine and test with SMB Direct (RDMA over Ethernet). That part is going to have to wait a few weeks while I wait for a couple of parts.
 
Last edited:

pgh5278

Active Member
Oct 25, 2012
478
130
43
Australia
PigLover, The High Density SSD storage is superb. Ghetto, Agricultural, Presidential solution / engineering or otherwise, it is fit for purpose, and obviously robust enough. Cheers P
 

dba

Moderator
Feb 20, 2012
1,477
184
63
San Francisco Bay Area, California, USA
Well - I'm looking for good ways to bench network disk performance. Crystal is obviously all confused. ATT0 insists on only testing "real" disks. Large file copies to/from a Ramdisk are most promising - but somehow not very satisfying as being rigorous.
IOMeter is a great tool for this. It will benchmark network mounted disks. Your IOPS tests used 4kb random reads. You can set up different profiles to test different types of IO, including "mixed" IO workloads that have reads and writes of different sizes combined.

Also, SMB Direct (RDMA) is absolutely amazing. I get 3GB/s transfers on a daily basis, without even trying!
 
Last edited:

PigLover

Moderator
Jan 26, 2011
3,213
1,570
113
IOMeter is a great tool for this. It will benchmark network mounted disks. Your IOPS tests used 4kb random reads. You can set up different profiles to test different types of IO, including "mixed" IO workloads that have reads and writes of different sizes combined.
I'll take a look at what I can get with IOmeter.

Also, SMB Direct (RDMA) is absolutely amazing. I get 3GB/s transfers on a daily basis, without even trying!
Yeah - I was very hopeful when I got a deal on the ConnextX-3 EN cards. Then I discovered that MS decided to disable SMB Direct on Windows 8/8.1. Servers only says the License Revenue Mavens in Redmond. Kinda stinks...
 

33_viper_33

Member
Aug 3, 2013
204
3
18
Another interesting data point would be the RAM/Processor resources required for both systems. Is one anymore resource intensive than the other?
 

MiniKnight

Well-Known Member
Mar 30, 2012
3,073
976
113
NYC
Ha and they are saying technet that hyper-v is better because they do not make you pay for the network stuff like vmware
 

PigLover

Moderator
Jan 26, 2011
3,213
1,570
113
I copied about 9TB of data from one of my other servers overnight. I few notes:

Used simple File Manager drag/drop (note below) to pick up two folders full of video files (one 4.3TB and one 5.1TB). I did them as separate copies to see if the speeds would be better (due to two network streams) or worse (due to filesystem contention). The first copy ran at about 250MB/s nice and smooth. It slowed a bit when I started the second one, but not much. The second copy chugged along at about 120MB/s. So net I got over 300MB/s for the duration of the copy.

I was especially interested to see what would happen when the 100GB cache filled up. To my surprise, it apparently never did (though there are no tools to watch it that I have been able to find). I did find one thing interesting throughout the copy period. I watched the activity lights on the hard disks. To my surprise, they stayed off completely for the first few minutes of the copy, then there was a flurry of disk activity for a few minutes, then they went off again. This pattern continued throughout the copy. It appears - though I have no way to prove - that MS actually did a good job with the cache/journal design. It acts like the cache/journal collect the data for a while and then there is a highly optimized flush out to the hard disks behind it. It almost causes me pain but I have to say "well done, Microsoft".

There was hardly any noticeable memory use on the server during the copies. CPUs sitting near idle.

Both copies ran smooth at at a fairly constant speed for all 4.3TB/5.1TB respectively. I last checked them about 2am and they looked good. One finished sometime before I got up and the other just finished an hour or so ago.

Just to check non-cached reads I went copied a DVD directory that I KNOW there is no way it could still be in the 100GB SSD cache (it copied last night and there have been several more TBs copied since it finished). Copied it from the server to ramdisk on my client and it looked like this, with the same slow-start but eventually ramping up to something not too bad:


note: OT. I discovered, to my chagrin, that my favorite folder sync program charges $995(!!!) to activate it if you run it on a "Server OS". That's just shit (sorry Patrick for swearing).
 
Last edited:

PigLover

Moderator
Jan 26, 2011
3,213
1,570
113
This is what happens when you try to enable SMB Direct on a Windows 8 workstation:



Sucks. Thank you, Microsoft.
 

donedeal19

Member
Jul 10, 2013
47
17
8
Could you run an iperf test? I would like to see single thread and maybe a four thread run? I'm curious as that is what I have been using and looking to compare my results with yours.
I use something like iperf -c 10.11.12.1 -fg -P 4 -t 60. I tried on windows 7, 8, and 2012 OS so far.
I also just tried hyper v and remote desktop connection for my first time. I enjoyed that so far. I don't have remote fx but can pull great speed while remote in a vm is amazing.
 

nickveldrin

New Member
Sep 4, 2013
23
3
3
Hi guys, I wanted to personally thank everyone for the work put in thus far on the tiered storage spaces. I was doing research on how the technology compared to other solutions and how much better that it had gotten since ms's first revision, and based on the numbers shown, it defintely does look like MS has been making it so that SMB3 could be a real deal against inexpensive storage systems.

I wanted to ask PigLover if it would be worthwhile to try other benchmark tests, like maybe SQLIO with a varied block size and maybe a QD of 4 or 8. I think while looking at all of the other tools can be good from a throughput perspective, using SQLIO in a script may make things more realistic? Granted, it isnt any type of real, representative workload, but i think maybe just in how the test works makes more sense to me that better data comes out of it. Another good thing about the test is the histogram that comes out of it, so you can see your MB/S rate, your IOPS at your parameters, and then how much of the test was 1ms up to 25ms, and what was over. Having decent hw like your system could make it interesting, since i have no raid cards to test mine, and my latency is always shameful :)

I was going to also request the impact of having the default write cache size without modifying it (1gb or 2gb) versus higher numbers, but now that i think about it, that was done.

The last request has to do with file management at the tier level. According to Jose's blog, you have the ability to pin files to the tiers seemlessless, or let the drive manage the tiering on its own. Since the block organization occurs ~3a, you'd have to get your test file in place, trigger the scheduled task, and then begin testing. Jose's example, while still not the most clear, is so far the only one that ive found that goes into these commands. But using real hardware instead of a 1hdd 1ssd pool carved up and displayed into a virtual machine doesnt sound as good as the real thing could potentially be.

http://www.aidanfinn.com/?p=15439
Aidan Finn also has some testing on write-back caching, and here are his sample data sets:

Disk1.vhdx: This is on SCSI 0 0 and is placed on \\SOFS1\CSV1. The share is stored on a tiered storage space (50 GB SSD + 150 GB HDD) with 1 column and a write cache of 5 GB. This is the D drive in the VM.

Disk2.vhdx: This is on SCSI 0 1 and is placed on \\SOFS1\CSV2. The share is stored on a non-tiered storage space (200 GB HDD) with 4 columns. There is no write cache. This is the E: drive in the VM.




So it looks as though even going with a small, 5gb WBC, can make a massive difference. But what isn't clear in the example above is how much performance can be coming from the ssd tier itself and caching versus just the WBC, and this is where i'm confused. If i want to build up a raid 10 equivilant storage space (which i need to validate how to do), and use it like a ZFS system with SSD cache for read and separate SSD for write, does it make sense to do that? And how big should the WBC be before it is a negative return and will take away from the cache pool too much?

Sorry for all the typing - this is definitely pretty interesting stuff.
 

PigLover

Moderator
Jan 26, 2011
3,213
1,570
113
Thanks for your kind words and suggestions.

Unfortunately, due to work, travel and family needs I won't be able to spend any time doing more benches on this for at least two weeks.

I am interested in doing some other benches. IOmeter as suggested by DBA and SQLIO are both interesting. But it will have to be done later. As I noted, my focus going forward will be benches from the client workstation rather than raw speeds at the server.

On pining file. Note that although the pool was defined as 'tiered', the actual storage space was built as 'parity'. Because of this tiring is actually disabled. The speed gains are all due to the WBC and journaling being licked into the SSDs. Unfortunately, true Tiered Storage only works for Simple and Mirror spaces.