Hello,
I would like to ask some advice in this case.
We have a small all-HDD CEPH cluster (with 8 pieces of 2U nodes - each with 10 osd - and 3 monitoring nodes) and an 8 node (1U) compute cluster. All servers are HP server.
Currently all server have a dual port TGE network card and these connected to two LB6M switch on this way:
There are 4 VLAN, 1-1 for ceph public and private network, one for management traffic (which are configured as a tagged VLAN top of ceph public network) and one is the public network (internet).
CEPH private network and compute nodes connecting to switch "a" what require 16 ports (8 are in the CEPH private vlan and 8 are in the internet vlan). Plus 2 port as an uplink to the internet and 2 other ports interconnect to the two LB6M. So totally we using 20 ports. (and also connected a Mikrotik router to the gigabit port, what do the NAT for management netwokr, and this is a VPN server, etc. But its using copper 1G port, so not really matter).
Switch "b" also connected to all servers, where are ports native vlans is the CEPH public and there are the tagged management vlan. (this vlan basically not require too much traffic, its used to reach internet to upgrade servers, SSH connections, etc). And there are the two interconnect port, and also connected one 10G iscsi storage here. So totally we using 19 ports on this switch. Yeah, and monitoring nodes connected to the gigabit port of this switch.
These setup working almost 6 month ago, without any issue, downtime, etc.
However, as you can see, the network setup are not redundant, if one switch goes gown, entire storage cluster will be stop (redundant internet connections are not really matter).
I thinking about what is the best way. I can simply add an extra dual port TGE card to the ceph nodes, and copy the existing layout, but in this case, i need two new 48 ports TGE switch, which are not really cheap.
Or my other plans is, to create a bond beetwen two interfaces and i can ceph network in tagged vlan mode. But the documentation not recomend this (however, network capacity can be anough, even when recovery in process because of HDD's speed). In this case i not need to purchase new switches, but i will loose 10GBit public/internet network (i not really want to mix public and storage traffic on the same 10GBit line). However, i can live together with that, 2*1 GBit internet also will be fine. But 10G better
I open for any idia, or case study or anything, what will be the best or optimal solution.
I would like to ask some advice in this case.
We have a small all-HDD CEPH cluster (with 8 pieces of 2U nodes - each with 10 osd - and 3 monitoring nodes) and an 8 node (1U) compute cluster. All servers are HP server.
Currently all server have a dual port TGE network card and these connected to two LB6M switch on this way:
There are 4 VLAN, 1-1 for ceph public and private network, one for management traffic (which are configured as a tagged VLAN top of ceph public network) and one is the public network (internet).
CEPH private network and compute nodes connecting to switch "a" what require 16 ports (8 are in the CEPH private vlan and 8 are in the internet vlan). Plus 2 port as an uplink to the internet and 2 other ports interconnect to the two LB6M. So totally we using 20 ports. (and also connected a Mikrotik router to the gigabit port, what do the NAT for management netwokr, and this is a VPN server, etc. But its using copper 1G port, so not really matter).
Switch "b" also connected to all servers, where are ports native vlans is the CEPH public and there are the tagged management vlan. (this vlan basically not require too much traffic, its used to reach internet to upgrade servers, SSH connections, etc). And there are the two interconnect port, and also connected one 10G iscsi storage here. So totally we using 19 ports on this switch. Yeah, and monitoring nodes connected to the gigabit port of this switch.
These setup working almost 6 month ago, without any issue, downtime, etc.
However, as you can see, the network setup are not redundant, if one switch goes gown, entire storage cluster will be stop (redundant internet connections are not really matter).
I thinking about what is the best way. I can simply add an extra dual port TGE card to the ceph nodes, and copy the existing layout, but in this case, i need two new 48 ports TGE switch, which are not really cheap.
Or my other plans is, to create a bond beetwen two interfaces and i can ceph network in tagged vlan mode. But the documentation not recomend this (however, network capacity can be anough, even when recovery in process because of HDD's speed). In this case i not need to purchase new switches, but i will loose 10GBit public/internet network (i not really want to mix public and storage traffic on the same 10GBit line). However, i can live together with that, 2*1 GBit internet also will be fine. But 10G better
I open for any idia, or case study or anything, what will be the best or optimal solution.