I need a SAN...or vSAN..or Shared Storage...or....WTF is this so complex?

kapone · Aug 6, 2018

The title is a bit misleading, it's not that I don't know the differences, but I'm at a point where I need additional thoughts, and I needed to get your attention...

Anyway, I'm in the planning stages of a production (for my own, not for a client) build, where I need "storage". My criteria (in no particular order) are:
- Highly available. Must survive _______ (Fill in the blank)
- Fast. Not blazingly fast (like @Patrick does with his Optane and what not), but enough to host database(s) among other things.
- Redundant (goes with #1, HA).

I have enough hardware (chassis, motherboards, CPUs memory etc) to put something(s) together, but...the million $$$ question in the end is...actually I'm not even sure. Hear me out.

The primary purpose of this "storage" is to host VMs, database(s) (the OLTP kind), and digital assets (files). Simple enough? My baseline is to saturate at least a 40gb connection, if not more. The main "storage" will be all flash/SSD, with a second tier of spinners, with a third tier of offsite backup/Disaster recovery. The "compute" side of things will probably end up being 6-10 eight core servers that are accessing this storage over 10gbps each.

I can go out and buy a SAN, but...scalability is a bit of an issue with SANs, so is performance, so is redundancy, and so is cost.

After countless hours of analysis/paralysis, I'm leaning towards Starwinds vSAN using Hyper-V as the hypervisor. If Vmware vSAN licensing was not so expensive, or if Storage Spaces Direct did not require a Datacenter license, I'd have considered both of those as well.

I'm not wedded to any particular virtualization environment per se, and can easily pick whatever makes sense.

What would you do?

Marsh · Aug 6, 2018

kapone said:
Highly available. Must survive _______ (Fill in the blank)

You have my attention, please fill in the blank, define "Highly available"

The cost between "three nines" and "four nines" is not trivial.

Evan · Aug 6, 2018

What kind of DB ? If MS SQL I would use always-on and have no shared storage for that, then the rest shared using whatever like Starwind can be a bit lower performance.

The Hyper-V and starwind is a solid option,
Also in Windows server 2019 storage replica will be in standard edition to 2TB capacity, bit less HA than S2D but may work.

i386 · Aug 6, 2018

40gbe for an oltp database, that's >500k iops AND high availibility???

kapone · Aug 7, 2018

Marsh said:
You have my attention, please fill in the blank, define "Highly available"View attachment 9000
The cost between "three nines" and "four nines" is not trivial.

Here's the thing. If we design it correctly, xx.yy uptime shouldn't matter. You should be able to bring down a "node" (or whatever) and do what you need to do. At the block level, I'd say, a two disk failure should be survivable.

kapone · Aug 7, 2018

Evan said:
What kind of DB ? If MS SQL I would use always-on and have no shared storage for that, then the rest shared using whatever like Starwind can be a bit lower performance.

The Hyper-V and starwind is a solid option,
Also in Windows server 2019 storage replica will be in standard edition to 2TB capacity, bit less HA than S2D but may work.

Chances are, it'll be PostgreSQL, not SQL Server, but the principles should be the same. When it comes to the database, redundancy is why I'm separating "compute" from "storage". Microsoft's SQL licensing kills you if you go to a hyper-converged environment. I'd much rather keep it separate.

kapone · Aug 7, 2018

i386 said:
40gbe for an oltp database, that's >500k iops AND high availibility???

Yes sir.

i386 · Aug 7, 2018

kapone said:
Yes sir.

70k random writes @ 4k @ hgst ssd,
per host:7 stripped ssds for ~500k, 14 ssds in mirror config
at least 3 hosts > 42+ hgst ssds just for the cache layer

I'm not sure if I missed something...

kapone · Aug 7, 2018

i386 said:
70k random writes @ 4k @ hgst ssd,
per host:7 stripped ssds for ~500k, 14 ssds in mirror config
at least 3 hosts > 42+ hgst ssds just for the cache layer

I'm not sure if I missed something...

Nope, you didn't. And that was close to my back-of-the-napkin calculation as well. The question is, which platform will provide better resiliency/availability and most importantly, scalability (in the future). Getting raw speed out of storage is not that difficult, turning it into a production setup is much harder.

I've even looked at pre-built systems from Dell, HP, Nutanix etc, but my personal belief is that they are robbing you blind. I can have my team build something for far less.

i386 · Aug 7, 2018

kapone said:
I've even looked at pre-built systems from Dell, HP, Nutanix etc, but my personal belief is that they are robbing you blind.

List prices are always higher

But considering your requirements, I wouldn't expect a "cheap" solution.

kapone said:
I can have my team build something for far less.

Are you just looking at the hardware price or also costs like keeping enough spare parts in stock?

kapone · Aug 7, 2018

i386 said:
List prices are always higher
But considering your requirements, I wouldn't expect a "cheap" solution.

Are you just looking at the hardware price or also costs like keeping enough spare parts in stock?

Keeping spares as well. My typical strategy over the years has been to keep a 20% overflow buffer for hardware. For every 5 pieces, I keep one spare.

I don't need absolute cutting edge hardware, and these requirements, while not bottom of the barrel, are not exactly rocket science. A good, stable motherboard platform (even if it is a gen or two old) will work. Where the latest comes in, is probably HBAs, SSDs and disks. Even the networking requirement of 40gb is old stuff by now.

I've got a stack of Brocade 6610s and Arista 7050s that I can use.

Evan · Aug 7, 2018

kapone said:
Chances are, it'll be PostgreSQL, not SQL Server, but the principles should be the same. When it comes to the database, redundancy is why I'm separating "compute" from "storage". Microsoft's SQL licensing kills you if you go to a hyper-converged environment. I'd much rather keep it separate.

Yes run SQL separate is cheaper usually, sorry I don’t know anything about high availability PostgreSQL and if anything is possible with a shared nothing cluster.

gea · Aug 7, 2018

If you check performance, data security, easyness, availabilty and price, you cannot optimize them all together but must try to find your optimum for your use case

In my setups I mostly set data security first, this means a crash during a write must not affect filesystem or raid consistency. Also you must trust all data that you read. This demands a new gen filesystem with CopyOnWrite and data + metadata checksums, mainly ZFS but btrfs or ReFS are other candidates with a reduced feature set.

Performance and data security cannot be achieved at the same time. This is why Sun puts most efforts in developping ZFS into advanced caches and secure but fast sync write due Slog logging on dedicated logging devices like nowadays the Intel Optane.

Easyness ist also a plus for ZFS. No other system offers similar features but is easier to maintain. This is the keep it simple aspect to increase availability.

Availability can additionally mean your service is up again within some time or you can survive a disk, controller, server service or whole storage failure in near to realtimr with an automated failover. From disk to storage failover complexity and price will increase massively. If you can allow a 30 min service down, your system has 20% of the costs and compexity as when you want to survive a Jbod failure automatically. If you can allow a short service down, check ZFS and replication to a backup/failover system. You can keep them in sync down to a minute delay even on a highload Petabyte system.

Price scales with complexity and the question whether you can use OpenSource.

kapone · Aug 7, 2018

@gea - Sorry, this is a production system, and must be continuously available. If that means, the cost goes up, so be it.

Evan · Aug 7, 2018

@gea makes the point correctly.
It’s probably 1/5th the cost for slightly less HA.
Are you sure you really need 100k IOPs ? That’s a super serious disk array your looking for, at that level unless your building your own gpfs cluster for generally HPC workload then it’s probably easier and more reliable just to go buy a vplex all flash from Dell EMC at the lower end or A9000R from IBM at the higher end... (just examples there is others of course)
(On that topic the A9000R is an example of a kind of HA system that can be built. Each disk shelf has 2 controllers, Infiniband backend, the various FC and network out the front)

i386 · Aug 7, 2018

@kapone
Whats your budget?

Markus · Aug 8, 2018

Probably a distributed Storage in combination with a capable Hypervisor is suitable for this case.

Gluster + X
Ceph + X

Regarding PostgreSQL you probably want to jump into Replication with Pacemaker or the project Patroni (Zalando).

Regards
Markus

kapone · Aug 8, 2018

i386 said:
@kapone
Whats your budget?

The budget is not exactly infinite, but I want to do this right. It'll be used for running multiple production systems, so if it costs a bit, that's fine. I want to do this (hopefully) once, and not have to do it again for a few years or until a radical change comes along, in terms of storage tech.

kapone · Aug 13, 2018

Spun up a test environment with the various bits and pieces, and all seems to work fine. I haven't run any benchmarks or anything, because this is on test hardware, and it would be pointlesss.

However...something's been nagging me throughout, as I was doing this.

Starwind runs on the Hyper-V partition i.e. directly on the "host", not as a VM. Isn't that a complete no no for virtualization?? Shouldn't the host run NOTHING but the hypervisor?

cesmith9999 · Aug 13, 2018

it is Storage Software. that is OK. it itself is not a virtualized component.

Chris

I need a SAN...or vSAN..or Shared Storage...or....WTF is this so complex?

Well-Known Member

Moderator

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Member

Well-Known Member

Well-Known Member

Well-Known Member