Slow mirrored tiered storage spaces

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Chuntzu

Active Member
Jun 30, 2013
383
98
28
Please don't take this as a shot at yourself as you clearly know plenty about SS, the above are nice benchmarks but really, screams, poor use of money. If you had enough coins for all those NVME drives and SSD's wich both have same limits of space compared to Spinners, then a Hardware RAID card would found its way in there long ago. SS still sucks, it is not suited to production environments and especially where VM's are concerned.

I use it because I have around 50TB hanging off it. It suits me well and the average files I am moving are around 1-10GB which move fine, even when it fill the RAM and then falls back to drive write speeds of 60 odd MB/sec (8x drive, double parity ReFS) for one array and the same style array with just single parity is 15MB/sec faster. The speeds when these were NTFS were only less than 5MB/sec different.

I won't bother with the arguments around how much of a pain in the arse SS is to use when it decides to have a shit in its pants. If I had coins for either a HW RAID card or more drives (need transfer space) then I would go HW or ZFS.
100% no insult taken, My use case was to find the fastest setup possible for the best price. It definitely didn't start with me looking at windows, it truly started with zol and open stack and then progressed extremely slow for almost a year and half while I figured out how file systems and storage stacks and networking all worked together. Then one day storage spaces and rdma smb 3.0 was brought to my attention and as it turned out it was easier to use, its networking was much faster, and once I understood how it worked and what parts were the weak links I could account for that in my setup and create a setup in which the bottle neck was the cpu (imagine that, not a raid card, not ram, not drives, not network, but the cpu!).

I will say that with parity raid types you maybe right, a raid card may be faster. But currently have a 24 disk 17 column storage space (17column is the max size of a dual parity space) with 3 400gb ssds as journals and it works great. Not nearly as fast as the mirrored arrays with storage spaces but good enough for the 100tb + of space I am using and sequential speeds are extremely fast.

As for poor use of money I disagree, storage spaces has its draw backs for sure but as performance goes it scales well beyond what a single raid controller can handle in certain configurations and that's where the catch is. Yes dual parity and single parity spaces may be slower than a raid 6 or 5 on a proper raid card after storage spaces uses its write back cache. But those layouts are not meant for production vm workloads but for back ups, digital media type archives, or for most home server tasks (Microsoft expresses this sentiment many times when describing storage spaces).

With that said find me a raid card that can hit 2 million iops in any configuration? I used 3 x 9300-16e hba with 48 x 256gb ssds and total cost was ~$5200. This was a little over a year ago and prices have changed, but I couldn't hit those numbers any cheaper and couldn't touch those performance number using raid cards let alone at that price.

I have moved over to nvme drives for fun after selling off some of this equipment and am very impressed with this technology. It actually cost me a lot less to hit these performance numbers using nvme drives. It was a win win changing over to nvme, cheaper and faster.


The fact is I really like zfs and would prefer to use it (I hope refs can raise up to that level at some point, but i am using ntfs since refs was dramatically slower in server 2012 r2 when i was benchmarking with all solid state storage)but the networking stacks for both Linux and bsd are far to slow to utilize the full speed of the file systems using a file based shares. If my use case was utilizing block based storage more than file based then I could spend the time compiling rdma network storage tools for Linux and bsd and get the best of both worlds but as it stands right now windows has the best of both worlds as long as I am willing to use mirrored spaces for my production workloads and don't mind the that my dual parity spaces can not sustain multiple 100,000s iops (but my sequential speeds are that of 17 columns/hdds). Along with that I would prefer not to spend my time compiling and tweaking these rdma based tools when I can spend $20 on a connectx2 vpi card hit 3.2 gigabytes per second per card and have smb 3.0 rdma scale every time I add a card. So with that I believe it comes down to the fact that if performance and cost are top concerns then storage spaces is the best way to go.
 

Chuntzu

Active Member
Jun 30, 2013
383
98
28
Chuntzu, Thanks for that. All understood.

Just a straight question, WTF are you putting on that SSD array? Farken good porn server!
Bahahah, no not that! Truth is the dirt cheap data warehouse @dba inspired me when I first read it and I wanted to replicate the experience. It took me close to two years of trial and error to mimic it on a different platform (he was using a quad socket amd set up and as it turns out socket 1366 can't reach those speeds but dual socket 2011 can) but I was able to achieve it.

I have to be honest other than vm traffic it was mainly to experiment with the technology. I don't have a business use case for at this point and I would have to reevaluate the types and quantity of drives if I were to use it in production.

The next step with this technology for me is two fold and involves storage spaces direct. 1st being how well does storage spaces direct scale using nvme and hdds and in what arrangement are they optimal. Then figure out how to scale them down to very low power ha home use, think 4 x 8 or 12 disk Nas boxes running xenon d with 10gbit nics for a very low power hyper convered cluster, very similar to @Patrick proxmox ceph set up, but hopefully with better performance (@Patrick I read the numbers and they look really good with the proxmox setup just curious if s2d can do better and to that extent would scaleio be better as well) It will scale up and out very useful for "home server" and beyond workloads.
 
  • Like
Reactions: ultradense

Chuntzu

Active Member
Jun 30, 2013
383
98
28
ScaleIO is cool. however, the fault domain has to be at the server level and not at the disk level. that is why it requires a minimum of 3 servers.

Chris
Darn you for putting this on my radar scale io looks really cool and at bare minimum number of servers is one less than s2d (3 v 4 nodes).
 

cesmith9999

Well-Known Member
Mar 26, 2013
1,417
468
83
The difference is that S2D is part of the (future) server OS with no cost add. Where in ScaleIO you have to buy the license and support costs.

both S2D and ScaleIO use the server as the fault domain. S2D supports RDMA, ScaleIO does not support it yet...

Chris
 
  • Like
Reactions: Chuntzu

gigatexal

I'm here to learn
Nov 25, 2012
2,913
607
113
Portland, Oregon
alexandarnarayan.com
You might also be seeing Storage Spaces attempting to optimise the use of the SSDs. As I understand it, SS detects streaming writes and sends the data direct to the HDD tier while the SSD tier and your 1GB write cache/journal is principally used for smaller random writes (e.g. metadata updates, normal random IO).
whoa had no idea SS was this smart. I might have to take a look at it again.
 

cesmith9999

Well-Known Member
Mar 26, 2013
1,417
468
83
The other thing to do is to make the WBC larger on the volume that you are writing to. That helps a lot for at home use as there are infrequent writes. for Enterprise use you need a lot of spindles to write to.

Chris
 

Morgan Simmons

Active Member
Feb 18, 2015
134
25
28
44
The other thing to do is to make the WBC larger on the volume that you are writing to. That helps a lot for at home use as there are infrequent writes. for Enterprise use you need a lot of spindles to write to.

Chris
I personally made my WBC 20Gb, and it didn't really make any difference at all. I personally don't understand why anyone would want sequential writes to go directly to the HDD when the SSD's are dramatically faster.

Since my last post, I've gone back and just make a 6 column HDD array and it is performing fine. It's actually way more consistant than having the tiering involved.

I'd love to find a way to use the SSD's I have for a cache. At some point, maybe I'll mess around with BTRFS and BCache.
 

DavidRa

Infrastructure Architect
Aug 3, 2015
329
152
43
Central Coast of NSW
www.pdconsec.net
I personally don't understand why anyone would want sequential writes to go directly to the HDD when the SSD's are dramatically faster.
While a pair of SSDs are indeed faster than a pair of spinning rust drives, the equation shifts dramatically once you start hitting more than a few spindles. At only 100MBps sequential for the rust, 20 of them in RAID 10 can push over 1GBps - which is significantly faster than the 500MBps you'll get on the SSDs. With larger numbers of rust drives, say 120 in a larger deployment, you'd need 6-8 NVMe SSDs (3-4 RAID 1 pairs) or 20+ SAS SSDs (10+ RAID 1 pairs) to achieve the same write speeds. And even then you're using all the IO in the SSDs for writing, and none for random read or random write IO.

It's not quite as simple as "SSDs are faster than HDDs so everything goes there".
 

Morgan Simmons

Active Member
Feb 18, 2015
134
25
28
44
While a pair of SSDs are indeed faster than a pair of spinning rust drives, the equation shifts dramatically once you start hitting more than a few spindles. At only 100MBps sequential for the rust, 20 of them in RAID 10 can push over 1GBps - which is significantly faster than the 500MBps you'll get on the SSDs. With larger numbers of rust drives, say 120 in a larger deployment, you'd need 6-8 NVMe SSDs (3-4 RAID 1 pairs) or 20+ SAS SSDs (10+ RAID 1 pairs) to achieve the same write speeds. And even then you're using all the IO in the SSDs for writing, and none for random read or random write IO.

It's not quite as simple as "SSDs are faster than HDDs so everything goes there".

That makes complete sense. I'm not thinking on a big enough scale. But you would think that windows would be able to detect the performance of both tiers and direct traffic to the appropriate one.
 

cesmith9999

Well-Known Member
Mar 26, 2013
1,417
468
83
A new storage feature showed up yesterday in the Win10 Update. Multi-Resiliency Tiered Spaces

This is something that you may want to try.

Now the column count does not have to match between SSD and spinning disks.

I do not know if this changes if you use parity. but now you can have 2 SSD (3 if you want to triple mirror config) and a single column. and then have a 8 column mirror on 16 disks.

I do not know if this is in TP3. still waiting on server 2016 to ship...

Chris
 

cesmith9999

Well-Known Member
Mar 26, 2013
1,417
468
83
here is an example of how to do this (now).

# get the disks to create the pool over
$disks = Get-PhysicalDisk | where {$_.CanPool}

# Create the storage pool
New-StoragePool -PhysicalDisks$disks -StorageSubSystemFriendlyName "Windows Storage*" -FriendlyName "StoragePool01"

# Create a one-column, mirrored SSD tier
$ssdTier = New-StorageTier -StoragePoolFriendlyName "StoragePool01" -MediaTypeSSD -FriendlyName "SSDTier" -NumberOfColumns 1 -ResiliencySettingName Mirror

# Create a four-column, mirrored HDD tier
$hddTier = New-StorageTier -StoragePoolFriendlyName "StoragePool01" -MediaTypeHDD -FriendlyName "HDDTier" -NumberOfColumns 4 -ResiliencySettingName Mirror

# Figure out how big to make the tiers
$ssdTierSize = Get-StorageTierSupportedSize "SSDTier" | select -ExpandProperty TierSizeMax
$hddTierSize = Get-StorageTierSupportedSize "HDDTier" | select -ExpandProperty TierSizeMax

# Subtract a few gigs for padding and whatnot just in case, we can use Resize-StorageTier later to fill up any leftover space if there is any
$ssdTierSize -= 4GB
$hddTierSize -= 4GB

# Create the virtual disk
# Note: ResiliencySettingName, WriteCacheSize, and ReadCacheSize are only allowed if all tiers are the same resiliency. If not, omit these parameters. Read Cache is only available with REFS as the file system. There is no Thin provisioning with this configuration.
New-VirtualDisk -StoragePoolFriendlyName "StoragePool01" -FriendlyName "VDisk" -StorageTiers $ssdTier,$hddTier -StorageTierSizes $ssdTierSize,$hddTierSize -ResiliencySettingNameMirror -WriteCacheSize 4GB -ReadCacheSize 8GB

# Now format your disk as usual


Chris
 
Last edited:

gigatexal

I'm here to learn
Nov 25, 2012
2,913
607
113
Portland, Oregon
alexandarnarayan.com
here is an example of how to do this (now).

# get the disks to create the pool over
$disks = Get-PhysicalDisk | where {$_.CanPool}

# Create the storage pool
New-StoragePool -PhysicalDisks$disks -StorageSubSystemFriendlyName "Windows Storage*" -FriendlyName "StoragePool01"

# Create a one-column, mirrored SSD tier
$ssdTier = New-StorageTier -StoragePoolFriendlyName "StoragePool01" -MediaTypeSSD -FriendlyName "SSDTier" -NumberOfColumns 1 -ResiliencySettingName Mirror

# Create a four-column, mirrored HDD tier
$hddTier = New-StorageTier -StoragePoolFriendlyName "StoragePool01" -MediaTypeHDD -FriendlyName "HDDTier" -NumberOfColumns 4 -ResiliencySettingName Mirror

# Figure out how big to make the tiers
$ssdTierSize = Get-StorageTierSupportedSize "SSDTier" | select-ExpandPropertyTierSizeMax
$hddTierSize = Get-StorageTierSupportedSize "HDDTier" | select-ExpandPropertyTierSizeMax

# Subtract a few gigs for padding and whatnot just in case, we can use Resize-StorageTier later to fill up any leftover space if there is any
$ssdTierSize -= 4GB
$hddTierSize -= 4GB

# Create the virtual disk
# Note: ResiliencySettingName, WriteCacheSize, and ReadCacheSize are only allowed if all tiers are the same resiliency. If not, omit these parameters. Read Cache is only available with REFS as the file system. There is no Thin provisioning with this configuration.
New-VirtualDisk -StoragePoolFriendlyName "StoragePool01" -FriendlyName "VDisk" -StorageTiers $ssdTier,$hddTier -StorageTierSizes $ssdTierSize,$hddTierSize -ResiliencySettingNameMirror -WriteCacheSize 4GB -ReadCacheSize 8GB

# Now format your disk as usual


Chris
nice to see code in use here. +1
 
  • Like
Reactions: Morgan Simmons

Chuntzu

Active Member
Jun 30, 2013
383
98
28
here is an example of how to do this (now).

# get the disks to create the pool over
$disks = Get-PhysicalDisk | where {$_.CanPool}

# Create the storage pool
New-StoragePool -PhysicalDisks$disks -StorageSubSystemFriendlyName "Windows Storage*" -FriendlyName "StoragePool01"

# Create a one-column, mirrored SSD tier
$ssdTier = New-StorageTier -StoragePoolFriendlyName "StoragePool01" -MediaTypeSSD -FriendlyName "SSDTier" -NumberOfColumns 1 -ResiliencySettingName Mirror

# Create a four-column, mirrored HDD tier
$hddTier = New-StorageTier -StoragePoolFriendlyName "StoragePool01" -MediaTypeHDD -FriendlyName "HDDTier" -NumberOfColumns 4 -ResiliencySettingName Mirror

# Figure out how big to make the tiers
$ssdTierSize = Get-StorageTierSupportedSize "SSDTier" | select-ExpandPropertyTierSizeMax
$hddTierSize = Get-StorageTierSupportedSize "HDDTier" | select-ExpandPropertyTierSizeMax

# Subtract a few gigs for padding and whatnot just in case, we can use Resize-StorageTier later to fill up any leftover space if there is any
$ssdTierSize -= 4GB
$hddTierSize -= 4GB

# Create the virtual disk
# Note: ResiliencySettingName, WriteCacheSize, and ReadCacheSize are only allowed if all tiers are the same resiliency. If not, omit these parameters. Read Cache is only available with REFS as the file system. There is no Thin provisioning with this configuration.
New-VirtualDisk -StoragePoolFriendlyName "StoragePool01" -FriendlyName "VDisk" -StorageTiers $ssdTier,$hddTier -StorageTierSizes $ssdTierSize,$hddTierSize -ResiliencySettingNameMirror -WriteCacheSize 4GB -ReadCacheSize 8GB

# Now format your disk as usual


Chris
Great piece of info, I was trying to learn more about the new additions to powershell and storage spaces but my google foo is failing me right now, don't suppose you could share where you found this information.

Wish I didn't have 45ish hours of work over the next 4 days. Definitely going to play around with this on my next day off! Thank you!
 

kathampy

New Member
Oct 25, 2017
17
11
3
CrystalDiskMark seems to ignore the write back cache in a tiered storage space even for random writes and thus measures the HDD tier alone. This is the correct approach in most cases where the WBC is RAM, but I was hoping the WBC would be treated as part of the disk in a tiered storage space.

I have a 2x SSD, 6x HDD tiered two-way mirror space with 3 columns on the HDD tier. The random write benchmark measures only 2-3 MB/s, but in actual usage it's easily 100+ MB/s.
 
Last edited:

i386

Well-Known Member
Mar 18, 2016
4,221
1,540
113
34
Germany
In storage spaces tiering means moving hot data manually or automatically to the faster tier. It's not happening in real time. ( If I remeber correctly this process is started with the default settings at midnight or so ._.)