Can a cluster do everything a SAN can?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.
So i'm trying to sort this out and it's mostly self explanatory. Researching "enterprise class storage" - more for the future than immediately (trying to come up with strategies to start small and migrate into high availability/fault tolerant stuff) and on one side, especially reading up on VMware type virtual machine use seems to really focus on using SANs. I am curious if clusters are every bit as good, or better at some things, worse at others. What the strengths and weaknesses are either way.
 

sno.cn

Active Member
Sep 23, 2016
211
75
28
There are so many different answers to your question, depending on your use case. I'm running SAN-style setups for things like serving files over a corporate network, but taking "everything on everything" approach to database/web servers for replication/HA/load balancing with dead-simple maintenance and minimal cost.
 
  • Like
Reactions: gigatexal
I guess mostly i'm trying to figure out what the killer app is that really requires a SAN. Reading VMware documents they seem to really push people onto SANs. I'm wondering if it's something like a NAS is just as good provided you are using entire files at a time - if you are trying to read/write to huge monoliths of data, like massive databases you aren't even going to load all into memory of, but instead will just modify some segment of - then it's no longer purposeful to use a file based NAS. Applications like Google Earth, or CFD, or FEA type calculations maybe. Weather modeling. Unless even that can work on a NAS with monster multiterabyte files.

I'm mostly planning on working with large media files up to 8k source movie content, large, but can be broken down into frames, clips and such for the most part, because every segment has to be processed in multiple stages. Only the end master file might end up a multiterabyte monolith and even that should break into chunks if necessary that just link together.

My priorities were first storage space, second performance, then third finally getting reliability and high availability (to prevent interruption of work when it's more important to work on film, and keep the hardware costs low). I'm currently expecting to use SAS Expander chassis and just have a built up spare server though that can plug into there if there are any problems. Automatic failover or multiple servers is more graceful but not dramatically better if someone can push power buttons and swap a plug even if i'm not there.
 

cesmith9999

Well-Known Member
Mar 26, 2013
1,417
468
83
I guess mostly i'm trying to figure out what the killer app is that really requires a SAN.
It is really uptime requirement. It used to be shared storage requirements (for HA clusters). Software Defined Storage and faster Ethernet connections have made the old SAN arrays only have a couple of (expensive) features that are compelling to use any longer. You can not scale out SAN storage effectively in the new world of storage.

Chris
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,625
2,043
113
It is really uptime requirement. It used to be shared storage requirements (for HA clusters). Software Defined Storage and faster Ethernet connections have made the old SAN arrays only have a couple of (expensive) features that are compelling to use any longer. You can not scale out SAN storage effectively in the new world of storage.

Chris
I was just checking Proxmox HA docs again last night... they still state no HA with local storage, must be shared :(
 
It is really uptime requirement. It used to be shared storage requirements (for HA clusters). Software Defined Storage and faster Ethernet connections have made the old SAN arrays only have a couple of (expensive) features that are compelling to use any longer. You can not scale out SAN storage effectively in the new world of storage.

Chris
I assumed you could set up clusters of redundant load balancing servers, with like mutual heartbeat signals to prove theyre still up, which access the same downstream fileservers and such. My assumption is if a cluster fails, users dont even know it, it's just replaced silently while everything routes around it.

Do you have an example of a SAN system being even more online than that?

For what it's worth i'm still not against home deploying a SAN, especially with decade old fibrechannel equipment, though I admit I was under the impression "thats just what you did" at some higher level without fully understanding why. I just want the redundancy to be where I set it up to be, and for every part to be fault tolerant so I can power down and replace something without it affecting everything else. I also don't want to create problems just to solve them either if far simpler systems, involving less learning, are equally valid for my needs if the downtime differences are so minor as to be negligible vs the cost and learning overhead. Nothing i'm doing (at this point) is mission critical.
 

cesmith9999

Well-Known Member
Mar 26, 2013
1,417
468
83
I have a SAN array that did not have any customer impacted (non-scheduled) downtime for 8 years. the built in redundancy kept the system running for the customers the whole time.

There were 2 times that it went down. both of those times were due to facilities downtime requirement (UPS and GenBack maintenance). the array hit 6 9's of availability. not bad for spending over $1M for an array over its 9 year lifetime for 223TB of space (600 * 400GB FC disks - 3PAR T400).

There were a fair share of hardware replacements (disks, Controllers, Dimms). but no downtime due to multiple concurrent failures.

for 223 TB of space I could buy it for less than 1/3rd what we paid for the SAN array and get more than 100K iops that this unit capped out with. and have more flexibility.

Chris
 
  • Like
Reactions: sno.cn
I guess the big question comes down to whats the advantage of presenting a block of storage unit as a SAN to an upstream server, instead of just a cluster of two redundant servers accessing disks for comparison? Some part still has to eventually offline for service, whether a DIMM or a PSU. I'm trying to figure out what the killer app for a SAN is that has no real comparable "software workaround on a cluster". :)

And 9 years uptime is pretty dang impressive!
 

mstone

Active Member
Mar 11, 2015
505
118
43
46
A cluster and a SAN do two different things. The cluster will get you much higher performance and lower cost, the SAN will get you byte-level atomic writes and lower latency. If you have an application that expects POSIX/windows filesystem semantics you generally can't use a cluster. These days clustered storage approaches are so common that many apps are written to target that storage model, for example by handling the synchronization issues at a higher level, and don't require a SAN. So the driver is the application; you'd be nuts to spend money on a SAN if you don't need it, but if your application isn't written to account for how cluster filesystems work it may be cheaper to buy a SAN then to rewrite the application.

And note that while I'm using your distinction of "cluster" and "SAN" as I think you're using them, the distinction is really between "shared storage" and "non-shared storage". As an example, VMWare's VMFS is a storage cluster than depends on shared storage while Hadoop HDFS is a storage cluster built on non-shared storage.
 
  • Like
Reactions: Twice_Shy

Jon Massey

Active Member
Nov 11, 2015
339
82
28
37
I have a SAN array that did not have any customer impacted (non-scheduled) downtime for 8 years. the built in redundancy kept the system running for the customers the whole time.
Had a £750k EVA8400 backing all (bare metal) customer DB servers at an old job which managed 6 years nonstop until every thing was migrated to $newcompanyowners unified private cloud just after I left. From what I heard max and avg latency (and sd thereof) went through the roof, glad I didn't have to deal with the fallout from that.

Grumpy prematurely-old-fart DBA me still says you can't beat a few racks of 15k spinners in a FC SAN for really consistent predictable performance!
 

mstone

Active Member
Mar 11, 2015
505
118
43
46
Had a £750k EVA8400 backing all (bare metal) customer DB servers at an old job which managed 6 years nonstop until every thing was migrated to $newcompanyowners unified private cloud just after I left. From what I heard max and avg latency (and sd thereof) went through the roof, glad I didn't have to deal with the fallout from that.

Grumpy prematurely-old-fart DBA me still says you can't beat a few racks of 15k spinners in a FC SAN for really consistent predictable performance!
Where you're saying "SAN" here you're really saying "non-volatile memory". SANs are a fundamentally terrible choice for DB applications from a performance standpoint because you're paying a latency penalty over direct-attached storage, and the bandwidth to the SAN is either really expensive because you're buying a ton of FC paths or limited because you're not. What used to make it an attractive prospect is that you could put a giant non-volatile memory in the SAN head unit to get your IOPS up, in a manner somewhat easier to maintain than a bunch of per-server NV memory units, and you could amortize the cost a bit if all of your DB servers weren't busy at the same time. Most deployments I saw were essentially using the SAN as a big honkin' expensive direct attach storage array, with resources dedicated to a small number of DB servers (not using the big SAN as a general storage pool). With the advent of NVMe SSDs with really amazing IOPS rates and really low latency, the SAN as DB backend model has seen the handwriting on the wall. SANs still have life in the DB space as a shared-storage backend for database clusters, where other advantages outweigh the costs.
 

Jon Massey

Active Member
Nov 11, 2015
339
82
28
37
Most deployments I saw were essentially using the SAN as a big honkin' expensive direct attach storage array, with resources dedicated to a small number of DB servers
Yes, this was very much the case in this instance.
 

TuxDude

Well-Known Member
Sep 17, 2011
616
338
63
Where you're saying "SAN" here you're really saying "non-volatile memory". SANs are a fundamentally terrible choice for DB applications from a performance standpoint because you're paying a latency penalty over direct-attached storage, and the bandwidth to the SAN is either really expensive because you're buying a ton of FC paths or limited because you're not. What used to make it an attractive prospect is that you could put a giant non-volatile memory in the SAN head unit to get your IOPS up, in a manner somewhat easier to maintain than a bunch of per-server NV memory units, and you could amortize the cost a bit if all of your DB servers weren't busy at the same time. Most deployments I saw were essentially using the SAN as a big honkin' expensive direct attach storage array, with resources dedicated to a small number of DB servers (not using the big SAN as a general storage pool). With the advent of NVMe SSDs with really amazing IOPS rates and really low latency, the SAN as DB backend model has seen the handwriting on the wall. SANs still have life in the DB space as a shared-storage backend for database clusters, where other advantages outweigh the costs.
Well I suppose you could consider all all storage to be "non-volatile memory", and sure, an externally attached array qualifies just as much as a local NVMe device.

But SAN's are not a terrible choice for databases, and in many respects have performance that far exceeds direct-attached storage. Most of the SAN architectures are also quite old in comparison to things like NVMe - its not really a fair comparison to put a 5-year-old EVA8400 up against a 1-month-old NVMe drive. (FYI - I've had 5 EVA's here at work over the years). I've been running FC SAN's since the days of 1G-FC, when SSDs didn't exist, the top-end direct-attach storage (which was also used as the back-end interconnect for the array controllers) was SCSI @ 80MB's and could only support 15 devices per channel. The only way to get lots of IOPS was huge numbers of spindles, and the only way to make it cost-effective was to share it across multiple systems.

And there are still plenty of reasons why centralized shared block storage (aka SAN) has value. There is still lots of R&D going into that area, though the back-end architecture's seem to be moving away from active-active dual-controller proprietary RAID systems towards more software-defined scale-out on generic hardware, its still a SAN. Our new Compellent array that replaced some EVAs is really just a pair of R720 servers with fancy software on top, and a whole lot of SAS JBOD shelves attached. And our Nutanix hyper-converged cluster also has the capability to present its storage out as iSCSI LUNs if we wanted it to. The concept of shared block storage is definitely not going away any time soon - just how it is implemented is evolving, just like everything else in this industry.
 
  • Like
Reactions: Blinky 42

mstone

Active Member
Mar 11, 2015
505
118
43
46
Well I suppose you could consider all all storage to be "non-volatile memory", and sure, an externally attached array qualifies just as much as a local NVMe device.

But SAN's are not a terrible choice for databases, and in many respects have performance that far exceeds direct-attached storage. Most of the SAN architectures are also quite old in comparison to things like NVMe - its not really a fair comparison to put a 5-year-old EVA8400 up against a 1-month-old NVMe drive.
Context is important, and I was responding to "you can't beat a few racks of 15k spinners in a FC SAN for really consistent predictable performance". You can beat that, with a modern architecture. Implicit in what I was saying was that years ago buying the big storage device in order to get the big battery backed memory (which is where the IOPS came from--not from some magic in the fabric) was a rational choice. It's just not ideal anymore. And beyond NVMe, NVDIMM is the next nail in the coffin for high latency storage in the DB market.

The concept of shared block storage is definitely not going away any time soon - just how it is implemented is evolving, just like everything else in this industry.
I agree that it's not going away soon, I just see it as something to support legacy applications, not a technology to support the next big thing.
 
A cluster and a SAN do two different things. The cluster will get you much higher performance and lower cost, the SAN will get you byte-level atomic writes and lower latency. If you have an application that expects POSIX/windows filesystem semantics you generally can't use a cluster.

And note that while I'm using your distinction of "cluster" and "SAN" as I think you're using them, the distinction is really between "shared storage" and "non-shared storage".
In reverse order I am tackling storage virtualization yes. Whenever all the drives are in one place in a server to be used by multiple other workstations or clients instead of being DAS.

So clusters are more of an asynchronous app, and the SAN might be needed in some super timing critical application, or something where everything is sequential so are waiting on just one byte of data before you can move to the next step.

Clustering applications generally have to be custom written FOR that application. "Yes you can do it with a cluster - but you have to write an app for it!" If i'm trying to scale out or accelerate my VMware or Adobe CC a SAN is probably all I can use. It's not set up to use a blank network drive thru a NAS via files probably. (tho i've never tried either)

SAN is the only thing that simulates DAS - because the OS doesn't know the difference. Again - so the app forces it. App dependant. (trying to say this back with slightly different words to see if I have the right idea)

Assuming what i'm saying is right so far.. then what apps if written from the ground up would still REQUIRE a SAN? Anything? Or is it just an older solution to the problem of needing to use existing apps with ALOT of storage, performance, and reliability?

Like Adobe CC wants to use a local hard disk or SSD for scratch space. This is an existing app. If I have a fast RAID of SSD's on an SAS Expander chassis - this could either be presented to Adobe as a SAN if it's a remote box, or DAS if my SAS card is in the same workstation I process on.
 
Well I suppose you could consider all all storage to be "non-volatile memory", and sure, an externally attached array qualifies just as much as a local NVMe device.

And there are still plenty of reasons why centralized shared block storage (aka SAN) has value. There is still lots of R&D going into that area, though the back-end architecture's seem to be moving away from active-active dual-controller proprietary RAID systems towards more software-defined scale-out on generic hardware, its still a SAN.
Well for my particular situation i'm trying to figure out "when do I need to look at a SAN in the future server planning". :) I know it sounds bad to say "i'm shopping for something that I dont even know if I need!" yet but that's because i'm being forced into the role by circumstance. I know what I need once I see it, the only question becomes when it becomes critical to implement.

Like right now, my assumptions are basically the following:
- I need to start with a NAS. I've chosen to use SnapRAID. That isn't perfectly suited to things like virtualization or being Adobe CC scratch drives, but it lets me get my feet wet, because i'm still going to need "big dumb cheap but RELIABLE bit buckets" that wont corrupt over the years.

- When I start processing video, a NVME SSD is going to be the next upgrade for applications.

- Yet at some point, if my workfiles are larger maybe than the 1.2TB Intel 750, or I need even more performance, I start looking at things like Infiniband QDR or FDR (32-40gig) running RAID stripes of SSD's on SAS Expanders running Openfiler as a custom SAN. Even though there is a bit more latency than NVME, there isn't much (I see Infiniband Openfiler numbers of 90-100k IOPS) my media processing workloads (if the cpu is bottlenecked by both speed and storage space) need raw bandwidth and size and that's probably the only way to get there.

I'm not sure what the virtualization sessions might need to run on (NAS or SAN or what) but the sequential media processing, if it needs more bandwidth and available space than NVME, might finally give me a use case justifying a custom SAN. Does that about seem right? It's about the application being run, and only a build that extreme would probably even outperform NVME (in total storage space and bandwidth, not necessarily latency) to begin with. Alternately a latency bottlenecked app which also wants more space will have to wait for larger NVME or even run more than one if you can.
 

mstone

Active Member
Mar 11, 2015
505
118
43
46
- Yet at some point, if my workfiles are larger maybe than the 1.2TB Intel 750, or I need even more performance, I start looking at things like Infiniband QDR or FDR (32-40gig) running RAID stripes of SSD's on SAS Expanders running Openfiler as a custom SAN. Even though there is a bit more latency than NVME, there isn't much (I see Infiniband Openfiler numbers of 90-100k IOPS) my media processing workloads (if the cpu is bottlenecked by both speed and storage space) need raw bandwidth and size and that's probably the only way to get there.

I'm not sure what the virtualization sessions might need to run on (NAS or SAN or what) but the sequential media processing, if it needs more bandwidth and available space than NVME, might finally give me a use case justifying a custom SAN. Does that about seem right? It's about the application being run, and only a build that extreme would probably even outperform NVME (in total storage space and bandwidth, not necessarily latency) to begin with. Alternately a latency bottlenecked app which also wants more space will have to wait for larger NVME or even run more than one if you can.
You understand that infiniband is measured in bits and drives are measured in bytes, right? So if you can max out a 40gbit/s FDR that's the bandwidth of only 2 intel 750s (at 2.5gbyte/s each)? Needing more bandwidth generally isn't a good reason to move to a SAN.
 

mstone

Active Member
Mar 11, 2015
505
118
43
46
So clusters are more of an asynchronous app, and the SAN might be needed in some super timing critical application, or something where everything is sequential so are waiting on just one byte of data before you can move to the next step.
It's more that with a SAN you can pretend to some extent that you're using a simple local disk. If you write out a byte to a SAN you can assume it will immediately be available and that the write is in a consistent state. But there's a cost to that, and clusters built around non-shared storage make the tradeoffs more transparent so you can decide what's appropriate for your application. For example, you may only want file level consistency (so a whole file is either available or not, with no partial writes) or maybe you're willing to have different states on different replicas with versioning, etc. In those systems if you need to send a message from one system to another immediately you can do so, but via a message passed directly over the network rather than via the disk. (Even cluster filesystems built on shared storage still generally use explicit message passing for things like locking and metadata synchronization because it performs better--the applications just don't know that's happening.)

Assuming what i'm saying is right so far.. then what apps if written from the ground up would still REQUIRE a SAN? Anything? Or is it just an older solution to the problem of needing to use existing apps with ALOT of storage, performance, and reliability?

Like Adobe CC wants to use a local hard disk or SSD for scratch space. This is an existing app. If I have a fast RAID of SSD's on an SAS Expander chassis - this could either be presented to Adobe as a SAN if it's a remote box, or DAS if my SAS card is in the same workstation I process on.
That's basically it. If you've got an app on a legacy OS that just knows how to use disks, you can can plug the SAN in to centrally manage the disks without having to tell the app that's what you're doing. It's not going to scale as well as something that's cluster aware, but for a single instance app that doesn't matter. VMware has made a ton of money by then virtualizing the hardware the app is running on, so not only does the app not have to know it's running on SAN storage, it doesn't need to know it's running on virtual hardware--so that you can move the virtual machine from one physical host to another on the same SAN, seamlessly (because the SAN guarantees that the data that the VM wrote on one physical host is available immediately on the other physical host). Again, all of that scales much less well than an app that is aware it is on a cluster and running on multiple physical machines simultaneously, but for a lot of applications the convenience of using a legacy OS and not having to write a cluster aware app far outweighs the desire to scale better--especially if the app doesn't need to scale to meet its requirements.
 
  • Like
Reactions: Twice_Shy

TuxDude

Well-Known Member
Sep 17, 2011
616
338
63
Twice_Shy - I don't know if you will ever need a SAN - for the types of workloads you keep referencing there is nothing to be gained by having shared access to block-level storage. All your video-type apps will be better handled by file-level (aka NAS) shared storage. Maybe if some distant day in the future you want to have a cluster of NAS heads for redundancy/HA purposes you would back them with a small SAN, but its just as likely that a replicated/distributed cluster of NASes would be a better fit.

You do know that you can't just give two regular desktop OS's access to the same block device to share right? If you carve out a chunk of space on a SAN and give it to a windows box, which formats with NTFS - and then you give a second windows box access to that same NTFS - best case is the second one will refuse to do anything with it, worst case is total corruption of the disk and full data loss. Server editions of windows can handle such at thing with cluster services (whatever MS is calling it these days), but still mostly don't share it - one node gets control of it at a time. Unless you're doing failover clusters (doesn't sound like it), or running a clustered filesystem like VMFS (doesn't sound like it), you have no need for shared block storage.

Also - while SnapRAID is a great solution for cheap/reliable storage of bulk data - it should REALLY be data that doesn't change. It's not just "poorly suited" to things like VMs or scratch disks - its a really horrible idea that will reduce or eliminate SnapRAIDs ability to recover from a failed disk. All of the data stored in a snapraid array is used to calculate/update the parity each time a sync is run, typically once a day at night. If a disk dies, using the parity and the remaining data you can re-calculate the data on the failed drive. But modifying data to SnapRAID is basically the same as if the disk that data was on died - whether that data is gone or just different, in either case it can no longer be used in the parity recovery calculations. So SnapRAID is great for storage where you are just adding additional files (a new file doesn't affect parity calculations of existing data - just the new data remains unprotected until the next sync), but really horrible for any kind of data that is constantly changing. Thats why I run a combination of SnapRAID and MD-RAID in my home NAS.
 
  • Like
Reactions: Twice_Shy
I'll freely admit i'm still wrapping my head around some things I don't understand yet and trying to get myself up to speed. When I first read about ZFS for instance all of the concepts were so utterly new to me it took me a bit to even think that way, I was so used to just saving files to fixed disks running out of space.

Actually I didnt know the bit about block devices but wasn't really planning it to be like that anyways. When I wanted to connect one SAN to say three workstation, it would be like formatting one SSD to three partitions, and giving each workstation a slice to use for itself which could be changed based on needed space. Though it's starting to look like the introduced latency would defeat part of the purpose.

For SnapRAID, even though I plan to play with some "virtualization", there still wont be much changing of whats virtualized. What I wanted to do was to create a "perfect install" of a workstation with all the desired tools in place, and then freeze it. To remote boot over the network from any workstation I wanted, but rarely saving back any change of the VM - normally discarding it. Data would save to the NAS, but the VM itself would just discard changes every day so i'm not worried about viruses, corruption and similar. Why do you ask? Because people that come over for LANparties also have nicer hardware, and part of the future arrangement is "you let us hang out for the LANparty on the weekend and you can periodically borrow our hardware to help process stuff". So I can load up a fixed set of tools to 3d render or anything else and set it to work because available hardware will change on a weekly basis. I assumed a need like that would work as well on SnapRAID as anything because it wouldn't even be daily change. Only when adding a must have software package to an existing install.

That said I also accepted at some point I would need something in addition to SnapRAID (which also doens't like many small files - it should be fine for the big array of large chunks of 8k Red weapon raw video and multistream mocap but is not going to like the foley station with probably millions of sound samples) but figured i'd decide that then. You are of course free to tell me about MD-RAID anyways since you brought it up though. :)

To previous was aware of the bits vs bytes issue - I was just struggling to come up with some case that ever seems to be a win over DAS. "It would at least have to be faster than an NVME" was a minimum, even if it would only match two of them together. It was more assuming you could have a much larger SSD array and for some workloads that would be faster to keep all available at once, though i'll admit it was a bit of a reach.