HA SAN build recommendations

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

legen

Active Member
Mar 6, 2013
213
39
28
Sweden
Hello everyone!

I am looking on advice regarding a new SAN build. Currently we have a custom supermicro build running OmniOS with 8xM500 SSDs in RAID-z2 with 32GB ECC RAM.
The SAN offer storage through NFS over 1 Gbit ethernet to 10 xenserver machines running ~20 VMs.

We now have a dedicated budget of about 15 000$ to build a better SAN infrastructure for our current and future needs.

I am still working on and trying to estimate our current and future requirements. But something like this:
  • 800 MB/s sequential write
  • 1000MB/s sequential read
  • 24TB usable storage
  • ?? 4k random read
  • ?? 4k random write
  • ?? IOPS
  • HA. Minimum of one failure should be OK
I wonder what STH users would recommend me doing. I see some alternatives,
  1. Commercial solutions (netapp, other?)
  2. OmniOS with RSF-1 or nexentastor
  3. Windows with storage spaces
  4. Linux with ceph/glusterFS
Which alternative of the above would you recommend me looking into first with regards to,
  • Support
  • Possibility to take in remote help for performance issues (consultancy hours?)
  • Help with understanding our current workload to know our complete requirements
  • Preferably EU based (we are located in Sweden)

This will probably be the hardest task i have taken (yet). So any input on this is very welcome :)!
 

gea

Well-Known Member
Dec 31, 2010
3,175
1,198
113
DE
Your primary goals are performance and availability

For your budget of 15k $ you will not get one or the other from netApp or Nexenta.
Even with a free OS like OmniOS and RSF1, you can get the HA software but not a fast hardware

So you can either reduce performance or availability (no high availablity in an active/active setup).
You must also care about the complexity of a real HA solution. I would not do without own knowledge
AND a support contract.

If you can allow say up to an half an hour or more of a service interruption, you may check solutions
with redundancy and/or a fast manual switchover of storage or services with performant hardware

ex as an extension to your current setup
- two 10G storage servers, either with iSCSI to provide a mirror for your Xenservers
and/or with short interval ZFS replication between them and a remote backup

- use your current system as a remote backupsystem (best another building)

you may need
- about 1500 $ for one 10G switch ex Netgear XS712 (use another 1G Switch for redundant cabling)
or something better like a HP 5820 (renew is quite affordable)
- about 3600 $ for two 10G server ex
Supermicro | Products | Chassis | 2U | SC216BE16-R920LPB
with a board like Supermicro | Products | Motherboards | Xeon® Boards | X9SRH-7TF

add 10 G adapters to your Xenserver
10 x 350$ (ex Intel X540 or X520)=3500 $

This leaves about 6-7k $ for disks
I would build two pools, one pure SSD for high performance and one from spindels for the rest.
Prefer enterprise SSDs like S3500..S3710 (up to 1,6TB per SSD) or use your current and HGST/Toshiba disks
 
Last edited:

markpower28

Active Member
Apr 9, 2013
413
104
43
For disk, just make sure get SAS HD/SSD, otherwise HA will not work.

Have you consider build another SAN then utilize App/OS to replicate? Exchange DAG, SQL AlwaysOn, DFS, so you will have data in two difference places.
 
  • Like
Reactions: Patriot

voodooFX

Active Member
Jan 26, 2014
247
52
28
When you say you would like to have 1GB/s / 0.8GB/s (R/W) you mean over IP?
So you will need some 10G connectivity (and relative 10G Switch) or a lot of 1G interfaces in active/active multipath.

That said, if you want HA with some sort of support I think you could consider 2x high end Synology RackStations

One idea could be

2x Synology RS3614XS+
4x Intel SSD DC S3500 Series 480GB
20x WD RE 3000GB
2x 10G Network adapters (intel X520?)

This setup will give you

- HA ( RS3614xs+ - Products - Synology - Network Attached Storage (NAS) )
- 480GB of read/write cache
- ~21.6 TiB (24TB) of space in a 8+2 (or 2x 4+1) configuration
- 5 years of warranty on drives, ssds and servers
- your performance requests (more or less)
- all the easy of use of the Synology DSM

and based on the prices here in Switzerland should fit exactly in your budget (I don't know the prices in Sweden but should not be more expansive than Switzerland, probably very close :D )

The other option is a DIY solution, but you will need a lot of skill and time, and skill, and time... and.. a lot of them to get something stable (in my opinion)
 

Chuntzu

Active Member
Jun 30, 2013
383
98
28
Clustered storage spaces/sofs and SAS drives (too bad storage spaces direct is still in so early in developement). Works best with smb direct but can be shared via iscsi and NFS. Relatively speaking I have hit all your perf numbers using l5520s, c6100, 9202-16e, and dual port connectx2 NICs. And this is a pretty old setup. Truth is upgrading to even a single processor e5-16/26xx setups and pcie 3.0 removed a ton of bottlenecks and then the sky is the limit. It was a long road figuring out all the little tricks but I have to say once I figured them out it works very well.

For example:
1. Make you storage space with the correct number of columns, ie more disks wrights are striped to. Ex a mirrored pool would need 16 disks to have 8 columns so the performance would be that of 8 drives striped 850-1000 MB/sec from HDDs and 4000MB/sec from ssds. Columns x redundancy = number of disks. Nice thing with reads is that reads would be read from all 16 disks so typically reads would be roughly twice as fast as writes in a mirrored setup.

2. If possible use tiring and write back cache. So if you can pull it off this 8 column mirror would require 16 HDDs and 16 ssds. You can also use 8 or 4 ssds instead but it would decrease performance due to stripe/column count be decreased. But by having tiring and then also dedicating a portion (up to 100gb per virtual disk created from the storage pool, though Microsoft recommends up to 10gb) as write cache.
3.NTFS I have found to be faster with all ssd pools so I use that instead of refs. Also online file system checks or check disks in server 2012r2 narrow the extra benefits of a cow file system like refs.

There are a lot better written and more detailed guides for setting up and optimizing performance of both the drive setups and back end networking (which is a whole other can of worms) but this should get you started.
 
  • Like
Reactions: T_Minus

Chuntzu

Active Member
Jun 30, 2013
383
98
28
Just as an added note the sofs role is to be used for hypervisor/SQL workloads not for just regular file sharing. By that I mean just regular file shares for media files would use a different kind of ha file share that does specifically need to be continually available (absoluelty zero interruption...ie writes must acked immediately by the workloads on it,ie VMS and database workloads) so regular file share performance would be diminished on a sofs share but hyperv VMS or VMS in general require the writes to be acked perform really really well I will post a like that will describe this in better detail in just a moment..ie once I get home from work


Long story short your drive setup won't really change either way but the role you deploy in server 2012r2 will be different depending on your workload.
 

dswartz

Active Member
Jul 14, 2011
610
79
28
Note that the linux solution can still be zfs. If you go with SAS and two HBA to talk to them, you can do a pacemaker/corosync active/passive approach, where failover consists of disable virtual NFS IP, export pool, import pool (on other host), enable virtual NFS IP. I have actually done this (not doing it right now, for other reasons...) Worked great, and no ceph/gluster/drbd replication overhead.
 

legen

Active Member
Mar 6, 2013
213
39
28
Sweden
First, thanks to you all for the input on this. Greatly appreciated! I will try to answer all of you as best I can.


First a clarification. I was thinking of an active/passive HA-setup with shared storage between the two SAN servers. Will this change things cost and complexity wise?


Nice blog. Very much good information on there (I have yet to read the whole thing but I will!)

Your primary goals are performance and availability


For your budget of 15k $ you will not get one or the other from netApp or Nexenta.

Even with a free OS like OmniOS and RSF1, you can get the HA software but not a fast hardware

Hi gea. I hoped you would respond in this thread since I know you are very experienced with zfs setups.


I was guessing that NetApp was out of our budget. I might contact a reseller just to check what 15k $ would give us going that route.

Is nexentastor really that expensive, id did not know that :(? I see that on your webpage you have banners for zstor.de. Would you recommend them?

I would like to contact a nexentastor reseller to get an estimate of how much it would cost to get the performance + HA if buying nexentastor + hardware.


So you can either reduce performance or availability (no high availablity in an active/active setup).

You must also care about the complexity of a real HA solution. I would not do without own knowledge

AND a support contract.

I agree that I might currently not have 100% knowledge about HA-setups. So therefore we will require a support contract with the solution we choose in the end. With the right support and documentation I am confident I would manage to keep it up and running :)

If you can allow say up to an half an hour or more of a service interruption, you may check solutions

with redundancy and/or a fast manual switchover of storage or services with performant hardware


ex as an extension to your current setup

- two 10G storage servers, either with iSCSI to provide a mirror for your Xenservers

and/or with short interval ZFS replication between them and a remote backup


- use your current system as a remote backupsystem (best another building)


you may need

- about 1500 $ for one 10G switch ex Netgear XS712 (use another 1G Switch for redundant cabling)

or something better like a HP 5820 (renew is quite affordable)

- about 3600 $ for two 10G server ex

Supermicro | Products | Chassis | 2U | SC216BE16-R920LPB

with a board like Supermicro | Products | Motherboards | Xeon® Boards | X9SRH-7TF


add 10 G adapters to your Xenserver

10 x 350$ (ex Intel X540 or X520)=3500 $


This leaves about 6-7k $ for disks

I would build two pools, one pure SSD for high performance and one from spindels for the rest.

Prefer enterprise SSDs like S3500..S3710 (up to 1,6TB per SSD) or use your current and HGST/Toshiba disks

I would love to have a system in active/passive so we could almost instantly switch over if the active system went offline or if we need to take down one node for maintenance.


30 minutes downtime might be OK. But if any other route can avoid this with an active/passive setup that will be preferable.


In your scenario above you have one set of disks in each server right? Why not use SAS disks instead? Then the two servers could share the storage avoiding the need for replication and buying two identical sets of disks?


My first idea for this build was to keep the current SAN and buy another one. Then buy an external disk enclosure, RSF-1, some raid-cards and disks. Connect the two SAN boxes to the external disk enclosure in an active/passive setup.


What I started fearing with this was,

- Lack of support

- Complexity


Would you advice against this :)?



For disk, just make sure get SAS HD/SSD, otherwise HA will not work.


Have you consider build another SAN then utilize App/OS to replicate? Exchange DAG, SQL AlwaysOn, DFS, so you will have data in two difference places.

In an active/passive setup with an external disk enclosure shared by two SAN servers I would need SAS disks.


My current estimates shows that SAS disks would be more cost effective than using one set of SATA disks in each server.


Too bad SAS SSDs are so nasty expensive. Would probably have to be ordinary SAS spindles due to that.



Im not sure if replicate is the way to go. I think it might be simpler and cheaper to go the SAS route using one shared storage enclosure?


When you say you would like to have 1GB/s / 0.8GB/s (R/W) you mean over IP?

So you will need some 10G connectivity (and relative 10G Switch) or a lot of 1G interfaces in active/active multipath.


That said, if you want HA with some sort of support I think you could consider 2x high end Synology RackStations


One idea could be


2x Synology RS3614XS+

4x Intel SSD DC S3500 Series 480GB

20x WD RE 3000GB

2x 10G Network adapters (intel X520?)


This setup will give you


- HA ( RS3614xs+ - Products - Synology - Network Attached Storage (NAS) )

- 480GB of read/write cache

- ~21.6 TiB (24TB) of space in a 8+2 (or 2x 4+1) configuration

- 5 years of warranty on drives, ssds and servers

- your performance requests (more or less)

- all the easy of use of the Synology DSM


and based on the prices here in Switzerland should fit exactly in your budget (I don't know the prices in Sweden but should not be more expansive than Switzerland, probably very close
)


The other option is a DIY solution, but you will need a lot of skill and time, and skill, and time... and.. a lot of them to get something stable (in my opinion)

Yes we want to access the storage via IP. We will invest in new network infrastructure too (10Gbe or IPoIB with infiniband). But that will go from another budget and I have not yet started reading on what to buy there.



The Synology alternative does indeed look interesting. They are Citrix ready to!


I checked the specifications of those machines. I would say one gets more bang for the buck doing a DIY-build BUT if their HA solution works and it keeps me from having to learn/understand/fix everything that would be awesome.


I have never tested their DSM environment before but I checked the live demo on their website. It does contain a lot of functionality. However currently we don’t need much more than a NFS/iSCSI share and some performance monitoring. It’s quite a change going from our current terminal based OmniOS to that fancy GUI :)


Are you using synology storage? Some questions if you are:

1. Do they support an active/passive setup so we can avoid buying two sets of identical disks (i.e. with their expansion units?)
2. Can they really deliver that speed in sync-writes? Our ZFS alternative will probably require a zeusRAM for that.
3. Is the support any good (i.e. technical questions and finding our bottlenecks)
 

legen

Active Member
Mar 6, 2013
213
39
28
Sweden
Note that the linux solution can still be zfs. If you go with SAS and two HBA to talk to them, you can do a pacemaker/corosync active/passive approach, where failover consists of disable virtual NFS IP, export pool, import pool (on other host), enable virtual NFS IP. I have actually done this (not doing it right now, for other reasons...) Worked great, and no ceph/gluster/drbd replication overhead.
Hi!

I do not doubt this is possible and that it will give good results. The problem with all these setups are the complexity involved. I would love something more simple and supported to avoid future problems with performance, compatibility and support :)

Clustered storage spaces/sofs and SAS drives (too bad storage spaces direct is still in so early in developement). Works best with smb direct but can be shared via iscsi and NFS. Relatively speaking I have hit all your perf numbers using l5520s, c6100, 9202-16e, and dual port connectx2 NICs. And this is a pretty old setup. Truth is upgrading to even a single processor e5-16/26xx setups and pcie 3.0 removed a ton of bottlenecks and then the sky is the limit. It was a long road figuring out all the little tricks but I have to say once I figured them out it works very well.

For example:


1. Make you storage space with the correct number of columns, ie more disks wrights are striped to. Ex a mirrored pool would need 16 disks to have 8 columns so the performance would be that of 8 drives striped 850-1000 MB/sec from HDDs and 4000MB/sec from ssds. Columns x redundancy = number of disks. Nice thing with reads is that reads would be read from all 16 disks so typically reads would be roughly twice as fast as writes in a mirrored setup.



2. If possible use tiring and write back cache. So if you can pull it off this 8 column mirror would require 16 HDDs and 16 ssds. You can also use 8 or 4 ssds instead but it would decrease performance due to stripe/column count be decreased. But by having tiring and then also dedicating a portion (up to 100gb per virtual disk created from the storage pool, though Microsoft recommends up to 10gb) as write cache.


3.NTFS I have found to be faster with all ssd pools so I use that instead of refs. Also online file system checks or check disks in server 2012r2 narrow the extra benefits of a cow file system like refs.



There are a lot better written and more detailed guides for setting up and optimizing performance of both the drive setups and back end networking (which is a whole other can of worms) but this should get you started.
Thanks for the suggestion on storage spaces. I honestly don’t know much about it yet. I think it would have worked better with a hyper-v environment (SMB 3.0 and RDMA), but sadly I cannot use either of these with our xenservers :(

I would think this build would match a DIY omnios + RSF-1 + supermicro build both performance and cost-wise? I would guess that the downsides are the complexity, the knowledge required to maintain such a solution and the knowhow of finding performance bottlenecks.

Some questions:

1. Would you say it is production ready?
2. Is support available?

I will check your links to get a better understanding of how it works!
 

PigLover

Moderator
Jan 26, 2011
3,186
1,546
113
Have you looked into using a fully replicated filesystem? Perhaps GlusterFS over ZFS? Should be ok for serving VMs to Xen.

I just don't know if you'd hit your write performance objective with synchronous replication. Read performance should actually improve a bit
 

Chuntzu

Active Member
Jun 30, 2013
383
98
28
Hi!

I do not doubt this is possible and that it will give good results. The problem with all these setups are the complexity involved. I would love something more simple and supported to avoid future problems with performance, compatibility and support :)



Thanks for the suggestion on storage spaces. I honestly don’t know much about it yet. I think it would have worked better with a hyper-v environment (SMB 3.0 and RDMA), but sadly I cannot use either of these with our xenservers :(

I would think this build would match a DIY omnios + RSF-1 + supermicro build both performance and cost-wise? I would guess that the downsides are the complexity, the knowledge required to maintain such a solution and the knowhow of finding performance bottlenecks.

Some questions:

1. Would you say it is production ready?
2. Is support available?

I will check your links to get a better understanding of how it works!
Complexity may be there but cluster manager and the GUI work great and power shell is super easy as well and maybe even easier to use to set up the sofs and clustered storage spaces than the GUI. In regards to performance I will not knock zfs since its where my journey for crazy fast redundant storage started, but I have found server 2012r2 and storage spaces to be faster, easier to setup, and manage. Shared iscsi (non rdma) from this sofs setup would be very fast (though would use more CPU cycles on the storage nodes than rdma) and work perfectly with your xen hypervisors. Heck use 40gb NICs if you are inclined. The information on proper configuration is out there and not hard to find. If your like me a couple of days of searching and reading is all you will need to really understand how to set up and configure all this. You do have access to customer support but there are other people here who may know better than me on the Microsoft support end, I didnt find it necessary for me. Good luck with the set up!
 
  • Like
Reactions: HotFix

legen

Active Member
Mar 6, 2013
213
39
28
Sweden
Have you looked into using a fully replicated filesystem? Perhaps GlusterFS over ZFS? Should be ok for serving VMs to Xen.

I just don't know if you'd hit your write performance objective with synchronous replication. Read performance should actually improve a bit
I have looked some at glusterFS. My understanding is that i would have to use zol then. Performance wise i "think" replication or active/passive might be better in this case.

But i will do some more reading about it!
 

legen

Active Member
Mar 6, 2013
213
39
28
Sweden
Complexity may be there but cluster manager and the GUI work great and power shell is super easy as well and maybe even easier to use to set up the sofs and clustered storage spaces than the GUI. In regards to performance I will not knock zfs since its where my journey for crazy fast redundant storage started, but I have found server 2012r2 and storage spaces to be faster, easier to setup, and manage. Shared iscsi (non rdma) from this sofs setup would be very fast (though would use more CPU cycles on the storage nodes than rdma) and work perfectly with your xen hypervisors. Heck use 40gb NICs if you are inclined. The information on proper configuration is out there and not hard to find. If your like me a couple of days of searching and reading is all you will need to really understand how to set up and configure all this. You do have access to customer support but there are other people here who may know better than me on the Microsoft support end, I didnt find it necessary for me. Good luck with the set up!
Thanks for the info. I will do some reading about storage spaces. When i know more about it i might get back with some questions. So-far it does look like a quite good alternative i do have to say (even thou im a little allergic to microsoft and would prefer to base this on open software :))

edit: I guess infiniband would be easier to get going in this setup too!
 

Chuntzu

Active Member
Jun 30, 2013
383
98
28
I was in the same boat as you, open source storage and compute, but after a lot of trial and error I couldn't get the performance I was wanting from the tools available at the time. Then rdma was integrated in windows with server 2012. I gave it a shot and within 45 min I was configured and hiting performance #'s I wasn't even remotely close to after over a year for dicking around with open stack,ceph, zfs, zfs on Linux, iscsi, etc.. It just worked! My network performance and storage performance was fantastic and I just had to plug it in and go. Its also nice that the only licenses I had to acquire were for the storage nodes. All the hypervisor or compute nodes were freeish since hyperv server is free to use. Though I did spend the next 6-8 months trying to figure out why I could not breakthrough the 4-5GB/sec 500,000 4k iops issue I was having. Turns out it was a limitation of the socket 1366 systems I was using. So boom problem solved moving to 2011 platforms.
 
  • Like
Reactions: legen

Patrick

Administrator
Staff member
Dec 21, 2010
12,519
5,828
113
I was guessing that NetApp was out of our budget. I might contact a reseller just to check what 15k $ would give us going that route.
A few years ago I knew NetApp pricing like the back of my hand (I did their deal management system.) Even at discounted pricing you are probably well below what they would sell at.
 
  • Like
Reactions: MiniKnight

legen

Active Member
Mar 6, 2013
213
39
28
Sweden
A few years ago I knew NetApp pricing like the back of my hand (I did their deal management system.) Even at discounted pricing you are probably well below what they would sell at.
I suspected that. Guess it will have to be closer to 500k$ to get near that performance and HA from netapp?

When i have a clear understanding of our requirements (IOPS, 4K Q32 etc) i will check what they can give us and how much it would be to get what we want. Will take some time but i will post back when i have some figures :)