Slow mirrored tiered storage spaces

Chuntzu · Nov 10, 2015

Lost-Benji said:
Please don't take this as a shot at yourself as you clearly know plenty about SS, the above are nice benchmarks but really, screams, poor use of money. If you had enough coins for all those NVME drives and SSD's wich both have same limits of space compared to Spinners, then a Hardware RAID card would found its way in there long ago. SS still sucks, it is not suited to production environments and especially where VM's are concerned.

I use it because I have around 50TB hanging off it. It suits me well and the average files I am moving are around 1-10GB which move fine, even when it fill the RAM and then falls back to drive write speeds of 60 odd MB/sec (8x drive, double parity ReFS) for one array and the same style array with just single parity is 15MB/sec faster. The speeds when these were NTFS were only less than 5MB/sec different.

I won't bother with the arguments around how much of a pain in the arse SS is to use when it decides to have a shit in its pants. If I had coins for either a HW RAID card or more drives (need transfer space) then I would go HW or ZFS.

100% no insult taken, My use case was to find the fastest setup possible for the best price. It definitely didn't start with me looking at windows, it truly started with zol and open stack and then progressed extremely slow for almost a year and half while I figured out how file systems and storage stacks and networking all worked together. Then one day storage spaces and rdma smb 3.0 was brought to my attention and as it turned out it was easier to use, its networking was much faster, and once I understood how it worked and what parts were the weak links I could account for that in my setup and create a setup in which the bottle neck was the cpu (imagine that, not a raid card, not ram, not drives, not network, but the cpu!).

I will say that with parity raid types you maybe right, a raid card may be faster. But currently have a 24 disk 17 column storage space (17column is the max size of a dual parity space) with 3 400gb ssds as journals and it works great. Not nearly as fast as the mirrored arrays with storage spaces but good enough for the 100tb + of space I am using and sequential speeds are extremely fast.

As for poor use of money I disagree, storage spaces has its draw backs for sure but as performance goes it scales well beyond what a single raid controller can handle in certain configurations and that's where the catch is. Yes dual parity and single parity spaces may be slower than a raid 6 or 5 on a proper raid card after storage spaces uses its write back cache. But those layouts are not meant for production vm workloads but for back ups, digital media type archives, or for most home server tasks (Microsoft expresses this sentiment many times when describing storage spaces).

With that said find me a raid card that can hit 2 million iops in any configuration? I used 3 x 9300-16e hba with 48 x 256gb ssds and total cost was ~$5200. This was a little over a year ago and prices have changed, but I couldn't hit those numbers any cheaper and couldn't touch those performance number using raid cards let alone at that price.

I have moved over to nvme drives for fun after selling off some of this equipment and am very impressed with this technology. It actually cost me a lot less to hit these performance numbers using nvme drives. It was a win win changing over to nvme, cheaper and faster.

The fact is I really like zfs and would prefer to use it (I hope refs can raise up to that level at some point, but i am using ntfs since refs was dramatically slower in server 2012 r2 when i was benchmarking with all solid state storage)but the networking stacks for both Linux and bsd are far to slow to utilize the full speed of the file systems using a file based shares. If my use case was utilizing block based storage more than file based then I could spend the time compiling rdma network storage tools for Linux and bsd and get the best of both worlds but as it stands right now windows has the best of both worlds as long as I am willing to use mirrored spaces for my production workloads and don't mind the that my dual parity spaces can not sustain multiple 100,000s iops (but my sequential speeds are that of 17 columns/hdds). Along with that I would prefer not to spend my time compiling and tweaking these rdma based tools when I can spend $20 on a connectx2 vpi card hit 3.2 gigabytes per second per card and have smb 3.0 rdma scale every time I add a card. So with that I believe it comes down to the fact that if performance and cost are top concerns then storage spaces is the best way to go.

Lost-Benji · Nov 11, 2015

Chuntzu, Thanks for that. All understood.

Just a straight question, WTF are you putting on that SSD array? Farken good porn server!

Chuntzu · Nov 11, 2015

Lost-Benji said:
Chuntzu, Thanks for that. All understood.

Just a straight question, WTF are you putting on that SSD array? Farken good porn server!

Bahahah, no not that! Truth is the dirt cheap data warehouse @dba inspired me when I first read it and I wanted to replicate the experience. It took me close to two years of trial and error to mimic it on a different platform (he was using a quad socket amd set up and as it turns out socket 1366 can't reach those speeds but dual socket 2011 can) but I was able to achieve it.

I have to be honest other than vm traffic it was mainly to experiment with the technology. I don't have a business use case for at this point and I would have to reevaluate the types and quantity of drives if I were to use it in production.

The next step with this technology for me is two fold and involves storage spaces direct. 1st being how well does storage spaces direct scale using nvme and hdds and in what arrangement are they optimal. Then figure out how to scale them down to very low power ha home use, think 4 x 8 or 12 disk Nas boxes running xenon d with 10gbit nics for a very low power hyper convered cluster, very similar to @Patrick proxmox ceph set up, but hopefully with better performance (@Patrick I read the numbers and they look really good with the proxmox setup just curious if s2d can do better and to that extent would scaleio be better as well) It will scale up and out very useful for "home server" and beyond workloads.

Chuntzu · Nov 11, 2015

cesmith9999 said:
ScaleIO is cool. however, the fault domain has to be at the server level and not at the disk level. that is why it requires a minimum of 3 servers.

Chris

Darn you for putting this on my radar scale io looks really cool and at bare minimum number of servers is one less than s2d (3 v 4 nodes).

cesmith9999 · Nov 11, 2015

The difference is that S2D is part of the (future) server OS with no cost add. Where in ScaleIO you have to buy the license and support costs.

both S2D and ScaleIO use the server as the fault domain. S2D supports RDMA, ScaleIO does not support it yet...

Chris

gigatexal · Nov 11, 2015

DavidRa said:
You might also be seeing Storage Spaces attempting to optimise the use of the SSDs. As I understand it, SS detects streaming writes and sends the data direct to the HDD tier while the SSD tier and your 1GB write cache/journal is principally used for smaller random writes (e.g. metadata updates, normal random IO).

whoa had no idea SS was this smart. I might have to take a look at it again.

cesmith9999 · Nov 11, 2015

The other thing to do is to make the WBC larger on the volume that you are writing to. That helps a lot for at home use as there are infrequent writes. for Enterprise use you need a lot of spindles to write to.

Chris

Morgan Simmons · Nov 12, 2015

cesmith9999 said:
The other thing to do is to make the WBC larger on the volume that you are writing to. That helps a lot for at home use as there are infrequent writes. for Enterprise use you need a lot of spindles to write to.

Chris

I personally made my WBC 20Gb, and it didn't really make any difference at all. I personally don't understand why anyone would want sequential writes to go directly to the HDD when the SSD's are dramatically faster.

Since my last post, I've gone back and just make a 6 column HDD array and it is performing fine. It's actually way more consistant than having the tiering involved.

I'd love to find a way to use the SSD's I have for a cache. At some point, maybe I'll mess around with BTRFS and BCache.

DavidRa · Nov 12, 2015

Marshall Simmons said:
I personally don't understand why anyone would want sequential writes to go directly to the HDD when the SSD's are dramatically faster.

While a pair of SSDs are indeed faster than a pair of spinning rust drives, the equation shifts dramatically once you start hitting more than a few spindles. At only 100MBps sequential for the rust, 20 of them in RAID 10 can push over 1GBps - which is significantly faster than the 500MBps you'll get on the SSDs. With larger numbers of rust drives, say 120 in a larger deployment, you'd need 6-8 NVMe SSDs (3-4 RAID 1 pairs) or 20+ SAS SSDs (10+ RAID 1 pairs) to achieve the same write speeds. And even then you're using all the IO in the SSDs for writing, and none for random read or random write IO.

It's not quite as simple as "SSDs are faster than HDDs so everything goes there".

Morgan Simmons · Nov 12, 2015

DavidRa said:
While a pair of SSDs are indeed faster than a pair of spinning rust drives, the equation shifts dramatically once you start hitting more than a few spindles. At only 100MBps sequential for the rust, 20 of them in RAID 10 can push over 1GBps - which is significantly faster than the 500MBps you'll get on the SSDs. With larger numbers of rust drives, say 120 in a larger deployment, you'd need 6-8 NVMe SSDs (3-4 RAID 1 pairs) or 20+ SAS SSDs (10+ RAID 1 pairs) to achieve the same write speeds. And even then you're using all the IO in the SSDs for writing, and none for random read or random write IO.

It's not quite as simple as "SSDs are faster than HDDs so everything goes there".

That makes complete sense. I'm not thinking on a big enough scale. But you would think that windows would be able to detect the performance of both tiers and direct traffic to the appropriate one.

gigatexal · Nov 12, 2015

I thought a few posts back it was said that random writes get sent to the SSDs and sequential go to the spindles?

cesmith9999 · Nov 13, 2015

A new storage feature showed up yesterday in the Win10 Update. Multi-Resiliency Tiered Spaces

This is something that you may want to try.

Now the column count does not have to match between SSD and spinning disks.

I do not know if this changes if you use parity. but now you can have 2 SSD (3 if you want to triple mirror config) and a single column. and then have a 8 column mirror on 16 disks.

I do not know if this is in TP3. still waiting on server 2016 to ship...

Chris

cesmith9999 · Nov 13, 2015

here is an example of how to do this (now).

# get the disks to create the pool over
$disks = Get-PhysicalDisk | where {$_.CanPool}

# Create the storage pool
New-StoragePool -PhysicalDisks$disks -StorageSubSystemFriendlyName "Windows Storage*" -FriendlyName "StoragePool01"

# Create a one-column, mirrored SSD tier
$ssdTier = New-StorageTier -StoragePoolFriendlyName "StoragePool01" -MediaTypeSSD -FriendlyName "SSDTier" -NumberOfColumns 1 -ResiliencySettingName Mirror

# Create a four-column, mirrored HDD tier
$hddTier = New-StorageTier -StoragePoolFriendlyName "StoragePool01" -MediaTypeHDD -FriendlyName "HDDTier" -NumberOfColumns 4 -ResiliencySettingName Mirror

# Figure out how big to make the tiers
$ssdTierSize = Get-StorageTierSupportedSize "SSDTier" | select -ExpandProperty TierSizeMax
$hddTierSize = Get-StorageTierSupportedSize "HDDTier" | select -ExpandProperty TierSizeMax

# Subtract a few gigs for padding and whatnot just in case, we can use Resize-StorageTier later to fill up any leftover space if there is any
$ssdTierSize -= 4GB
$hddTierSize -= 4GB

# Create the virtual disk
# Note: ResiliencySettingName, WriteCacheSize, and ReadCacheSize are only allowed if all tiers are the same resiliency. If not, omit these parameters. Read Cache is only available with REFS as the file system. There is no Thin provisioning with this configuration.
New-VirtualDisk -StoragePoolFriendlyName "StoragePool01" -FriendlyName "VDisk" -StorageTiers $ssdTier,$hddTier -StorageTierSizes $ssdTierSize,$hddTierSize -ResiliencySettingNameMirror -WriteCacheSize 4GB -ReadCacheSize 8GB

# Now format your disk as usual

Chris

Morgan Simmons · Nov 13, 2015

That sounds like a much better solution. Now I'm excited for 2016

gigatexal · Nov 13, 2015

cesmith9999 said:
here is an example of how to do this (now).

# get the disks to create the pool over
$disks = Get-PhysicalDisk | where {$_.CanPool}

# Create the storage pool
New-StoragePool -PhysicalDisks$disks -StorageSubSystemFriendlyName "Windows Storage*" -FriendlyName "StoragePool01"

# Create a one-column, mirrored SSD tier
$ssdTier = New-StorageTier -StoragePoolFriendlyName "StoragePool01" -MediaTypeSSD -FriendlyName "SSDTier" -NumberOfColumns 1 -ResiliencySettingName Mirror

# Create a four-column, mirrored HDD tier
$hddTier = New-StorageTier -StoragePoolFriendlyName "StoragePool01" -MediaTypeHDD -FriendlyName "HDDTier" -NumberOfColumns 4 -ResiliencySettingName Mirror

# Figure out how big to make the tiers
$ssdTierSize = Get-StorageTierSupportedSize "SSDTier" | select-ExpandPropertyTierSizeMax
$hddTierSize = Get-StorageTierSupportedSize "HDDTier" | select-ExpandPropertyTierSizeMax

# Subtract a few gigs for padding and whatnot just in case, we can use Resize-StorageTier later to fill up any leftover space if there is any
$ssdTierSize -= 4GB
$hddTierSize -= 4GB

# Create the virtual disk
# Note: ResiliencySettingName, WriteCacheSize, and ReadCacheSize are only allowed if all tiers are the same resiliency. If not, omit these parameters. Read Cache is only available with REFS as the file system. There is no Thin provisioning with this configuration.
New-VirtualDisk -StoragePoolFriendlyName "StoragePool01" -FriendlyName "VDisk" -StorageTiers $ssdTier,$hddTier -StorageTierSizes $ssdTierSize,$hddTierSize -ResiliencySettingNameMirror -WriteCacheSize 4GB -ReadCacheSize 8GB

# Now format your disk as usual

Chris

nice to see code in use here. +1

Chuntzu · Nov 13, 2015

cesmith9999 said:
here is an example of how to do this (now).

# get the disks to create the pool over
$disks = Get-PhysicalDisk | where {$_.CanPool}

# Create the storage pool
New-StoragePool -PhysicalDisks$disks -StorageSubSystemFriendlyName "Windows Storage*" -FriendlyName "StoragePool01"

# Create a one-column, mirrored SSD tier
$ssdTier = New-StorageTier -StoragePoolFriendlyName "StoragePool01" -MediaTypeSSD -FriendlyName "SSDTier" -NumberOfColumns 1 -ResiliencySettingName Mirror

# Create a four-column, mirrored HDD tier
$hddTier = New-StorageTier -StoragePoolFriendlyName "StoragePool01" -MediaTypeHDD -FriendlyName "HDDTier" -NumberOfColumns 4 -ResiliencySettingName Mirror

# Figure out how big to make the tiers
$ssdTierSize = Get-StorageTierSupportedSize "SSDTier" | select-ExpandPropertyTierSizeMax
$hddTierSize = Get-StorageTierSupportedSize "HDDTier" | select-ExpandPropertyTierSizeMax

# Subtract a few gigs for padding and whatnot just in case, we can use Resize-StorageTier later to fill up any leftover space if there is any
$ssdTierSize -= 4GB
$hddTierSize -= 4GB

# Create the virtual disk
# Note: ResiliencySettingName, WriteCacheSize, and ReadCacheSize are only allowed if all tiers are the same resiliency. If not, omit these parameters. Read Cache is only available with REFS as the file system. There is no Thin provisioning with this configuration.
New-VirtualDisk -StoragePoolFriendlyName "StoragePool01" -FriendlyName "VDisk" -StorageTiers $ssdTier,$hddTier -StorageTierSizes $ssdTierSize,$hddTierSize -ResiliencySettingNameMirror -WriteCacheSize 4GB -ReadCacheSize 8GB

# Now format your disk as usual

Chris

Great piece of info, I was trying to learn more about the new additions to powershell and storage spaces but my google foo is failing me right now, don't suppose you could share where you found this information.

Wish I didn't have 45ish hours of work over the next 4 days. Definitely going to play around with this on my next day off! Thank you!

cesmith9999 · Nov 14, 2015

Trial and error

Chris

Diavuno · Nov 15, 2015

I want 2016 to launch for I can setup spaces on some production hardware, not just the antique X7 systems with a miox match of spinners

Once I can purchase I'll begin my refresh and start to scale away from esxi (no real reasons, I just want change)

kathampy · Oct 25, 2017

CrystalDiskMark seems to ignore the write back cache in a tiered storage space even for random writes and thus measures the HDD tier alone. This is the correct approach in most cases where the WBC is RAM, but I was hoping the WBC would be treated as part of the disk in a tiered storage space.

I have a 2x SSD, 6x HDD tiered two-way mirror space with 3 columns on the HDD tier. The random write benchmark measures only 2-3 MB/s, but in actual usage it's easily 100+ MB/s.

i386 · Oct 26, 2017

In storage spaces tiering means moving hot data manually or automatically to the faster tier. It's not happening in real time. ( If I remeber correctly this process is started with the default settings at midnight or so ._.)

Slow mirrored tiered storage spaces

Active Member

Member

Active Member

Active Member

Well-Known Member

I'm here to learn

Well-Known Member

Active Member

Infrastructure Architect

Active Member

I'm here to learn

Well-Known Member

Well-Known Member

Active Member

I'm here to learn

Active Member

Well-Known Member

Active Member

New Member

Well-Known Member