Storage Advice Needed for iSCSI

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

BSDguy

Member
Sep 22, 2014
168
7
18
53
Hi All

Currently I have two ESXi 6.5 hosts which both connect to a Windows Server 2016 machine which has Starwinds Virtual SAN installed and provides shared iSCSI storage to the two hosts.

The problem is, whenever I reboot the SAN (the Starwinds server) I end up with the following issues:

  • An empty datastore - literally all VM files and folders are gone after the reboot
  • Sometimes one host won't mount one of the datastores
  • Sometimes the service needs to be restarted for the console to connect/work
This is despite me putting the hosts into maintenance mode and shutting ALL the VMs down before rebooting the SAN. I am on the latest version of Starwinds Virtual SAN.

All I can do after these issues occur is restore the VMs that were on the problematic datastore(s) which is painful to do every time a reboot is done when updates are installed that require a reboot (at least monthly) on the SAN. These issues also screw up my Citrix environment.

Anyways, moving forward I wanted to change my storage since these issues are driving me crazy and my NFR license with Starwinds is going to expire in a few months and I'm not sure if it'll get renewed.

Currently I have 4 Samsung SM863 480GB drives in the Starwinds/SAN server which I use for my iSCSI storage. No RAID is used. Each drive is a datastore.

So I've been thinking of replacing the Starwinds Virtual SAN server with a Synology DiskStation DS1817+ (the 8GB memory model) and then adding a dual port 10Gb SFP NIC in the PCIe slot it has. I'll be direct attaching each host to each SPF port on the Synology (each host has only one SFP port).

I was thinking of either using the 4 Samsung SM863 drives as read/write cache and then adding 4 spinner HDDs for VM storage OR adding 4 more SM863 SSDs and using only SSD storage for the VM storage.

I currently run 25 VMs which use about 1.5TB in space but with Starwinds Virtual SAN I use dedupe so it uses much less that that...maybe 800GB. Not sure if the Synology does dedupe.

So would this be a better choice for storage? I don't want to have a Linux/Windows SAN anymore. I just want something that I can forget about and it will run for months at a time and not need (frequent) updates, reboots etc.

I did consider VSAN with a Witness appliance but have yet to find an affordable cache tier SSD for lab use.

Is the Synology a good choice for what I want to achieve? Not sure how good the performance would be running the VMs off of HDDs with SSD read/write cache and if 8GB of memory in the unit would be enough?
 

gea

Well-Known Member
Dec 31, 2010
3,157
1,195
113
DE
There are many reasons why you prefer one solution over the other.
What I use and prefer with ESXi

NFS over iSCSI
Reason: much easier handling, additional SMB access for versioning/snaps via Windows previous versions or for copy/clone/backup - similar performance like iSCSI

All-In-One, each ESXi server with its own NFS storage server (prefer SSD only)
This allows the them to work on their own storage with all SAN features. You can use my preconfigured VM

or two dedicated storage server with replication for backup/failover

ZFS instead of ntfs or ext4
Gives you a much higher level of datasecurity, is crash resistent due copyonwrite, without the writehole problem of traditional raid (corrupt filesystem on a crash during writes), snaps, the superiour cache options and especially with ESXi a secure sync write behaviour.

Regarding OS for ZFS
I prefer Solaris based systems (where ZFS is origin and native). It comes with iSCSI/FC, NFS and SMB included in OS and ZFS with all services maintained by the OS supplier itself.

For them I offer a Web-UI to make storage management of these enterprise systems easier, see
iSCSI, Configuring Storage Devices With COMSTAR - Oracle Solaris Administration: Devices and File Systems
Hardware: http://www.napp-it.org/doc/downloads/napp-it_build_examples.pdf

All-In-One: http://napp-it.org/doc/downloads/napp-in-one.pdf
Manual Setup: http://www.napp-it.org/doc/downloads/setup_napp-it_os.pdf
Web-UI: http://napp-it.org/doc/downloads/napp-it.pdf
 
  • Like
Reactions: K D

Net-Runner

Member
Feb 25, 2016
81
22
8
41
I am using starwinds free in an almost same scenario that you have without any glitch so most probably there are misconfigurations or hardware issues in your configuration. I would strongly recommend you contact their support since the guys are very helpful and friendly. As for the problems you've mentioned:
• Are you using thin provisioned device? (it's called log structured device if I am not mistaking)
• Automatic datastore rescan script set up and running properly?
• That's completely weird, probably some issues with the storage controller VMs configuration
Since no RAID is used, how did you manage to combine all 4 SSDs into a single storage pool? Or you are using them separately? I have RAID5 of 5 SSDs in my hosts presented directly to starwind VMs over pass-through.
 

nev_neo

Active Member
Jul 31, 2013
158
44
28
I've provisioned a new(old) system just to test out the viability of Starwind's VSAN.
Xeon-D 1521 with 64 gig of ram on Server 2016 - 4 x 1 TB drives in Raid 5 and 2x200gb HGST SAS ssds in RAID0 for ssd cache.
Initially I was impressed by how easy it was to setup and configure and deploy. I've used it for about a day so far. At first it was awesome, fast data transfers over 10G iscsi not close to line speed tho..but fast enough for test lab.
Eventually though, things are not looking so good. I've tested with both esxi as well as windows hosts (identical hp ml10 v2) and I've noticed that sometimes there are network hiccups and the iSCSI target's disappear. Only fix is to reboot the Starwind host.
Also, comparing speeds to my FreeNas host (C2558 with 32gb ram, no slog/zil, 6x4tb RED's) on iSCSI, the FN's numbers for IOPs are more than double what Starwind can achieve. Starwind's memory cache should be easily killing all these tests, but its losing...badly. Random write tests are HORRIBLE.
I'm considering scrapping this and just moving on to Freenas.
 

NISMO1968

[ ... ]
Oct 19, 2013
87
13
8
San Antonio, TX
www.vmware.com
1) Synology (QNAP, Netgear etc) makes nice SO/HO backup units, but I wouldn't run my VMs from them. You can doesn't mean you should :) I really think it's your "last resort" and before applying for this option you should try hard with commodity hardware (servers). IMHO of course :)

2) If you're OK with your shared storage being a single point of failure (Why not? It's SO/HO or lab use from what I get...) I'd re-provision your "storage" server as Ubuntu with ZoL or FreeBSD with ZFS.

ZFS - Ubuntu Wiki

Chapter 19. The Z File System (ZFS)

I'm not a big fan of FreeNAS (Official downgrades on a super-stable FreeBSD?!? WTF?!?) but you might want to explore this option as well. Some hints (take them with grain of salt, it's FreeNAS hosting so some FUD is obvious).

FreeNAS® vs Ubuntu Server with ZFS on Linux - FreeNAS - Open Source Storage Operating System

(I love Ubuntu with Ceph and ZFS on ARM but that's another story)

3) I'd still try to fix your existing setup. From what I see there are few issues to take care of:

- You can't use non-NV DRAM cache on a single controller. It's not NetApp, there's no NV memory, so when you turn off / reboot / crash / stop your service all write-back cache is GONE! Production setups replicate cache between multiple controllers but you have only ONE.

Solution: DISABLE write-back cache and use either WT or no cache at all. Heck, you run SSDs so you should be fine with NO caching and super-low latency.

- Don't use log-structure file system, it's a bad idea to place log on log and SSDs run flash translation layer which is another log-structure fs in firmware.

Don’t Stack Your Log On My Log | USENIX

Solution: with SSDs stick with a simple image files and Windows deduplication.

4) Get some Linux VSA from them, re-mount SSDs into hypervisor nodes and run full HA setup. I don't know much about NFR and time-bombed stuff you mention but last time I've checked everything was open, free and all keys were perpetual. Replicated storage will give you better uptime for sure. You still might want to use third server as a backup target.

Hi All

Currently I have two ESXi 6.5 hosts which both connect to a Windows Server 2016 machine which has Starwinds Virtual SAN installed and provides shared iSCSI storage to the two hosts.

The problem is, whenever I reboot the SAN (the Starwinds server) I end up with the following issues:

  • An empty datastore - literally all VM files and folders are gone after the reboot
  • Sometimes one host won't mount one of the datastores
  • Sometimes the service needs to be restarted for the console to connect/work
This is despite me putting the hosts into maintenance mode and shutting ALL the VMs down before rebooting the SAN. I am on the latest version of Starwinds Virtual SAN.

All I can do after these issues occur is restore the VMs that were on the problematic datastore(s) which is painful to do every time a reboot is done when updates are installed that require a reboot (at least monthly) on the SAN. These issues also screw up my Citrix environment.

Anyways, moving forward I wanted to change my storage since these issues are driving me crazy and my NFR license with Starwinds is going to expire in a few months and I'm not sure if it'll get renewed.

Currently I have 4 Samsung SM863 480GB drives in the Starwinds/SAN server which I use for my iSCSI storage. No RAID is used. Each drive is a datastore.

So I've been thinking of replacing the Starwinds Virtual SAN server with a Synology DiskStation DS1817+ (the 8GB memory model) and then adding a dual port 10Gb SFP NIC in the PCIe slot it has. I'll be direct attaching each host to each SPF port on the Synology (each host has only one SFP port).

I was thinking of either using the 4 Samsung SM863 drives as read/write cache and then adding 4 spinner HDDs for VM storage OR adding 4 more SM863 SSDs and using only SSD storage for the VM storage.

I currently run 25 VMs which use about 1.5TB in space but with Starwinds Virtual SAN I use dedupe so it uses much less that that...maybe 800GB. Not sure if the Synology does dedupe.

So would this be a better choice for storage? I don't want to have a Linux/Windows SAN anymore. I just want something that I can forget about and it will run for months at a time and not need (frequent) updates, reboots etc.

I did consider VSAN with a Witness appliance but have yet to find an affordable cache tier SSD for lab use.

Is the Synology a good choice for what I want to achieve? Not sure how good the performance would be running the VMs off of HDDs with SSD read/write cache and if 8GB of memory in the unit would be enough?
 
  • Like
Reactions: LaMerk

NISMO1968

[ ... ]
Oct 19, 2013
87
13
8
San Antonio, TX
www.vmware.com
You basically replied to your own question: Slow 1TB spinners in RAID5 (You understand it's an absolutely crazy idea, don't you? Google for RAID5 and URE if you don't...) will give you a "write hole" typical to all of the parity RAIDs. With FreeNAS you get ZFS and RAIDZ1 which is "kind of" equivalent of a RAID5 but with one very important difference: variable write strips is what ZFS gets you, there's NO write hole by design! OF COURSE your RAID5 vs RAIDZ1 will get you horrible write performance, just because you compare apples to oranges :) If you want to compare comparable (sic!) you have to rebuild your RAID5 -> RAID10 and do replication with FreeNAS as well. Well... Kind of :) Because S/W can do log-structure with parity RAIDs but there's no permanent log with ZFS (ZIL doesn't;t count, it;s journal gets filled eventually) so you'll get crazy high numbers with S/W having very little to do with your RL workloads.

"Memory cache" see my reply to OP, that's another crazy idea :) TL;DR: You don't want any DRAM write-back cache on a single controller setup! :)

Can't comment about network issues you're facing, except properly configured S/W and FreeBSD & ZFS should give you wire speed with your config. BOTH :) <-- assuming you don't run 10Gb+, to fill that you have to have a bit more of SSDs

I've provisioned a new(old) system just to test out the viability of Starwind's VSAN.
Xeon-D 1521 with 64 gig of ram on Server 2016 - 4 x 1 TB drives in Raid 5 and 2x200gb HGST SAS ssds in RAID0 for ssd cache.
Initially I was impressed by how easy it was to setup and configure and deploy. I've used it for about a day so far. At first it was awesome, fast data transfers over 10G iscsi not close to line speed tho..but fast enough for test lab.
Eventually though, things are not looking so good. I've tested with both esxi as well as windows hosts (identical hp ml10 v2) and I've noticed that sometimes there are network hiccups and the iSCSI target's disappear. Only fix is to reboot the Starwind host.
Also, comparing speeds to my FreeNas host (C2558 with 32gb ram, no slog/zil, 6x4tb RED's) on iSCSI, the FN's numbers for IOPs are more than double what Starwind can achieve. Starwind's memory cache should be easily killing all these tests, but its losing...badly. Random write tests are HORRIBLE.
I'm considering scrapping this and just moving on to Freenas.
 
  • Like
Reactions: LaMerk

nev_neo

Active Member
Jul 31, 2013
158
44
28
Valid points, however, I did forget to mention the Freenas box is running Z2 with 32 GB ram.
Both boxes are on 10G, Xeon-D is on built-in intel x552 and the ZFS is dual connect-x2's.
I did try using Raid 10 on the Starwind box, but that didnt make that much of a difference.
 

LaMerk

Member
Jun 13, 2017
38
7
8
33
Hey Guys,

StarWind rep here!

May I ask you to contact our Support Team for the configuration review?

You can either submit a direct request to support@starwind.com or just fill in the form on the StarWind website: StarWind Software Support Center | Log a Support Case

Eventually, you'll get an engineer that will verify your configuration and go through all the possible issues.

We will keep the community informed as to how the investigation's going.

P.S. Please include the link to this thread into your support query.

Thank you.

Hi All

Currently I have two ESXi 6.5 hosts which both connect to a Windows Server 2016 machine which has Starwinds Virtual SAN installed and provides shared iSCSI storage to the two hosts.

The problem is, whenever I reboot the SAN (the Starwinds server) I end up with the following issues:

  • An empty datastore - literally all VM files and folders are gone after the reboot
  • Sometimes one host won't mount one of the datastores
  • Sometimes the service needs to be restarted for the console to connect/work
This is despite me putting the hosts into maintenance mode and shutting ALL the VMs down before rebooting the SAN. I am on the latest version of Starwinds Virtual SAN.

All I can do after these issues occur is restore the VMs that were on the problematic datastore(s) which is painful to do every time a reboot is done when updates are installed that require a reboot (at least monthly) on the SAN. These issues also screw up my Citrix environment.

Anyways, moving forward I wanted to change my storage since these issues are driving me crazy and my NFR license with Starwinds is going to expire in a few months and I'm not sure if it'll get renewed.

Currently I have 4 Samsung SM863 480GB drives in the Starwinds/SAN server which I use for my iSCSI storage. No RAID is used. Each drive is a datastore.

So I've been thinking of replacing the Starwinds Virtual SAN server with a Synology DiskStation DS1817+ (the 8GB memory model) and then adding a dual port 10Gb SFP NIC in the PCIe slot it has. I'll be direct attaching each host to each SPF port on the Synology (each host has only one SFP port).

I was thinking of either using the 4 Samsung SM863 drives as read/write cache and then adding 4 spinner HDDs for VM storage OR adding 4 more SM863 SSDs and using only SSD storage for the VM storage.

I currently run 25 VMs which use about 1.5TB in space but with Starwinds Virtual SAN I use dedupe so it uses much less that that...maybe 800GB. Not sure if the Synology does dedupe.

So would this be a better choice for storage? I don't want to have a Linux/Windows SAN anymore. I just want something that I can forget about and it will run for months at a time and not need (frequent) updates, reboots etc.

I did consider VSAN with a Witness appliance but have yet to find an affordable cache tier SSD for lab use.

Is the Synology a good choice for what I want to achieve? Not sure how good the performance would be running the VMs off of HDDs with SSD read/write cache and if 8GB of memory in the unit would be enough?
I've provisioned a new(old) system just to test out the viability of Starwind's VSAN.
Xeon-D 1521 with 64 gig of ram on Server 2016 - 4 x 1 TB drives in Raid 5 and 2x200gb HGST SAS ssds in RAID0 for ssd cache.
Initially I was impressed by how easy it was to setup and configure and deploy. I've used it for about a day so far. At first it was awesome, fast data transfers over 10G iscsi not close to line speed tho..but fast enough for test lab.
Eventually though, things are not looking so good. I've tested with both esxi as well as windows hosts (identical hp ml10 v2) and I've noticed that sometimes there are network hiccups and the iSCSI target's disappear. Only fix is to reboot the Starwind host.
Also, comparing speeds to my FreeNas host (C2558 with 32gb ram, no slog/zil, 6x4tb RED's) on iSCSI, the FN's numbers for IOPs are more than double what Starwind can achieve. Starwind's memory cache should be easily killing all these tests, but its losing...badly. Random write tests are HORRIBLE.
I'm considering scrapping this and just moving on to Freenas.
 
Last edited:

NISMO1968

[ ... ]
Oct 19, 2013
87
13
8
San Antonio, TX
www.vmware.com
It doesn't matter: there's no "write hole" on whatever parity mode ZFS is using: Z1, Z2 or Z3.

Valid points, however, I did forget to mention the Freenas box is running Z2 with 32 GB ram.
Both boxes are on 10G, Xeon-D is on built-in intel x552 and the ZFS is dual connect-x2's.
I did try using Raid 10 on the Starwind box, but that didnt make that much of a difference.
 
  • Like
Reactions: nev_neo

BSDguy

Member
Sep 22, 2014
168
7
18
53
Hi All

Thanks for all the posts to my questions (and sorry for the delay in replying).

So some bad news first. This afternoon while I was at work my entire lab environment went down. Literally all my storage vanished from my two ESXi hosts so I'm not a happy bunny at all.

I think there may be a deeper issue at play here. Yesterday one of my VMs had a corrupt disk and wouldn't boot which I thought odd but I did a Veeam restore and life went on but then today ALL my LUNs in Starwind vanished. When I went into Event Viewer on the server that has Starwinds installed I had errors which said: Disk x has been surprise removed.

Here's the odd part, it was only my SSD drives connected to the LSI SAS ports that vanished from within Windows (and Starwinds). My one HDD was still connected as were the mirrored boot SSD drives connected to the Intel SATA ports and they could be browsed in File Explorer. Huh?!

My SAN has the current hardware/software:
Supermicro X10SL7-F motherboard
4 x Samsung SM863 Enterprise SSD drives connected to the onboard LSI SAS ports (used for shared iSCSI storage to the hosts)
Mirrored Samsung Pro 840 128GB boot drives connected to the Intel SATA ports
2 x Samsung Pro 850 SSD drives connected to the onboard LSI SAS ports (used for Veeam backups)
1 x Western Digital Red 4TB drive connected to a SATA port
Windows Server 2016 Datacenter (patched with Junes updates)
Starwinds Virtual SAN build 10927

Before setting up this server I updated all BIOS/firmware on the Supermicro board in October last year.

So before giving up on this server and buying a QNAP/Synology is there anything I can try to make the storage work correctly? I reckon all the issues I've had with my setup (datastores vanishing, VMs getting corrupted, empty folders in the datastores etc) are all related to the storage on this server. Maybe I should have used Windows Server 2012 R2 as the OS rather than 2016 as 2016 isn't listed as a supported OS on Supermicros website?

Failing the above I was thinking of getting a Synology DiskStation DS1817+ with 8GB RAM and dual 10Gb ports direct attached to the hosts. I would consider QNAP as well. I just want to find a stable/reliable SAN I can use with my setup. And no, I'm not interested in rolling my own SAN with Linux/UNIX/Windows again!

Any ideas? Thoughts? Suggestions?

Thanks for reading :)
 

BSDguy

Member
Sep 22, 2014
168
7
18
53
Maybe your lsi controller is overheating?
Possibly. It is unusually warm this time of the year but I do have the following fans in the SAN:

3 x 120mm BeQuiet BL030 SilentWing2
1 x BeQuiet ShadowRock Pro
1 x 140MM BEQUIET BL063 SilentWings FAN

It also doesn't explain missing datastores, disappearing files in the VM folders and other odd behaviour during the cooler weather/winter. Currently I can't even launch the Starwinds Virtual SAN console (it just hangs and/or eventually crashes).
 
Last edited:

BSDguy

Member
Sep 22, 2014
168
7
18
53
So after thinking about this for a bit I'm considering two options currently:

Option 1 (the preferred): Setup a 2 node VSAN cluster and use my old SAN box as a Witness server. Do we have any VSAN experts here? I was going to use my 4 Samsung SM863 drives as the capacity tier (two in each node) and then buy two Intel DC P3600 Add-in Card PCIe 400GB drives for the cache tier. So this would give me an all flash VSAN setup. VSAN is something I've been wanting to get into for ages now. I was going to use the onboard dual 10Gb RJ45 NICs for vMotion and VSAN traffic using a crossover cable. I think this is supported with vSphere/VSAN 6.5/6.6 (ie: VSAN traffic goes over the crossover 10Gb connection and Witness traffic goes over the management 1Gb NIC - correct me if I am wrong as I am rusty on this).

Option 2: Buy a Synology/QNAP/or some other brand and continue using shared iSCSI storage.

I prefer Option 1 as I want to get into VSAN and an all flash setup must give amazing performance!!
 

nev_neo

Active Member
Jul 31, 2013
158
44
28
Go with VSAN, especially since you've always wanted to get into it.
I was planning to eventually doing the something similar, except without the PCIe cached tiers.
Thats going to be a bit expensive for me - unless you know of any deals :)
 

BSDguy

Member
Sep 22, 2014
168
7
18
53
Go with VSAN, especially since you've always wanted to get into it.
I was planning to eventually doing the something similar, except without the PCIe cached tiers.
Thats going to be a bit expensive for me - unless you know of any deals :)
Thanks but after a bit more research it sounds like using the onboard controller with an all flash VSAN setup isn't ideal and may result in poor (or not great) performance. Unfortunately the servers I have don't have the space or slots to add more controllers to them :(

So I'm back to considering a Synology or QNAP with 8 bays and dual 10Gb for the lab. Can anyone comment if the Synology DiskStation DS1817+ with 8GB of RAM and dual SFP+ 10Gb ports would work for 25-30 VMs and with Samsung SM863 enterprise SATA SSD drives? Maybe I should consider QNAP?

My drives disconnected again last night and this time things didn't recover so nicely like last time so, again, I am wasting hours trying to get things restored and back to how they are.

Needless to say I am keen (desperate!!) to purchase a suitable storage solution that I can use in my lab! ;)
 

gea

Well-Known Member
Dec 31, 2010
3,157
1,195
113
DE
Maybe you will be happy with the Linux based Synology with ext4 or btrfs and its cheap desktop hardware (Atom, no ECC, max 8GB, filesystem features not comparable with current state of technology like with ZFS) but with VMs you want
- high iops
- low latency
- powerloss protection, crash resistency
- versioning/ snaps

- professional storage hardware (fast CPU, RAM in range of 32GB-128GB ECC)

A ZFS filer does not matter if its BSD or Solaris based on a professional hardware can give this with a fraction of the price with a much higher performance and datasecurity. If you virtualise a SAN filer its even free if you have enough RAM and CPU power on ESXi.

Use a SSD pool only. Your Samsung are perfect for a ZFS pool for ESXi. Think about NFS.
 

BSDguy

Member
Sep 22, 2014
168
7
18
53
Maybe you will be happy with the Linux based Synology with ext4 or btrfs and its cheap desktop hardware (Atom, no ECC, max 8GB, filesystem features not comparable with current state of technology like with ZFS) but with VMs you want
- high iops
- low latency
- powerloss protection, crash resistency
- versioning/ snaps

- professional storage hardware (fast CPU, RAM in range of 32GB-128GB ECC)

A ZFS filer does not matter if its BSD or Solaris based on a professional hardware can give this with a fraction of the price with a much higher performance and datasecurity. If you virtualise a SAN filer its even free if you have enough RAM and CPU power on ESXi.

Use a SSD pool only. Your Samsung are perfect for a ZFS pool for ESXi. Think about NFS.
Thanks for the comment. You got me thinking again (hah).

I'm baffled why the onboard LSI SAS ports are disconnecting all the SSD drives connected to them. To be honest I've had odd storage issues since building this server. I'm not sure if it's running Windows Server 2016 or if there is an issue with the LSI ports running in IT mode.

I've never had drives disconnect like this before in the server but maybe I should/could consider an add in card with 8 SATA ports?

I'm just thinking out loud here of all the options.
 

gea

Well-Known Member
Dec 31, 2010
3,157
1,195
113
DE
I would call LSI HBAs "best of all" for a professional filer on any OS.
But there are also bugs like LSI 2008 based controllers with a firmware 20.000... 20.004 with known problems especially with fast SSDs.

In such a case, you must update firmware to current 20.007
If you use software raid (btrfs, ReFS, ZFS) prefer the IT firmware.
 

BSDguy

Member
Sep 22, 2014
168
7
18
53
I just had a look on Supermicros FTP site and I see there is a firmware update for the LSI 2308 (version 20.00.07.00). Without rebooting I'm not sure what firmware I am on but it would have been from October 2016 or prior.

So maybe I need to update the firmware first of the LSI 2308 before trying anything else...