Did you manage to solve your slow write issue? We're facing similar performance probs and looking for possible solution.
Hi,Did you manage to solve your slow write issue? We're facing similar performance probs and looking for possible solution.
Hi,
For the moment I've moved journaling to sandisk and performance improved a lot. I will I install in two weeks time intel s3710 and samsung sm863 as journals lsi2008, and 10gb for public and cluster network. I'll keep you posted about the obtained results.
Hi,
Comming back with some bad news.
After a not easy but succesfull considered ceph implementation, in which this forum helped very much - thank for all of you answers guys - ive just crashed from those 12.000 iops down to ~3000 and this time it seems i've got stucked here....
How did i achieved that:
- upgraded to luminous
- installed 1x Dell c6100 chassis into the rack ( 4 nodes with: 48GB ram, 24 cores, 1x120 gb ssd for os, 1xIntel s3610 for leveldb/wal, 1x 1tb Sandisk ssd for ssd cache tier pool, 3x5tb spinnning for sata pool, 1x520 dp sfp+ card, 1xlsi 2008 it mode hba. - this config for each node
- installed one Dell 8024f 24p sfp+ switch
- all "A" ports of each node in vlan 6 for public network
- all "B" ports of each node in vlan 60 for cluster network - haven't configured jumbo frames yet
Installed minimal Centos7, and configured sysctl.conf as per ceph recomandation:
tracker.ceph.com/projects/ceph/wiki/Tuning_for_All_Flash_Deployments
The only things left to do was to enable jemalloc on os and jumbo frames on cluster network.
So basically, i've added the new nodes to the cluster (with luminous ), balanced objects, removed the hp nodes, rebalance again. Hp nodes where installed with proxmox 5 to provide compute with storage on the new ceph nodes.
Now i'm just getting those 500-3000 iops . From proxmox vms 500 and from external rbd image map and fio test up to 3000 sometimes 4000. Worst 3-4 times than before, even the hw seems more performant.
All the osds are bluestores now. The 5tb spinning disks are seagate 5tb 2.5 smr drives - i know the smr drives are not recomended, but they workes pretty much ok with jewel/filestore.
I assume netwoking is fine, i can confirm 10gbs between any node involved. Tested with iperf.
All pools configuration are like before..
What do you think, what could be the issue ?
Pulling my hair off again...
Hi,
Comming back with some bad news.
After a not easy but succesfull considered ceph implementation, in which this forum helped very much - thank for all of you answers guys - ive just crashed from those 12.000 iops down to ~3000 and this time it seems i've got stucked here....
How did i achieved that:
- upgraded to luminous
- installed 1x Dell c6100 chassis into the rack ( 4 nodes with: 48GB ram, 24 cores, 1x120 gb ssd for os, 1xIntel s3610 for leveldb/wal, 1x 1tb Sandisk ssd for ssd cache tier pool, 3x5tb spinnning for sata pool, 1x520 dp sfp+ card, 1xlsi 2008 it mode hba. - this config for each node
- installed one Dell 8024f 24p sfp+ switch
- all "A" ports of each node in vlan 6 for public network
- all "B" ports of each node in vlan 60 for cluster network - haven't configured jumbo frames yet
Installed minimal Centos7, and configured sysctl.conf as per ceph recomandation:
tracker.ceph.com/projects/ceph/wiki/Tuning_for_All_Flash_Deployments
The only things left to do was to enable jemalloc on os and jumbo frames on cluster network.
So basically, i've added the new nodes to the cluster (with luminous ), balanced objects, removed the hp nodes, rebalance again. Hp nodes where installed with proxmox 5 to provide compute with storage on the new ceph nodes.
Now i'm just getting those 500-3000 iops . From proxmox vms 500 and from external rbd image map and fio test up to 3000 sometimes 4000. Worst 3-4 times than before, even the hw seems more performant.
All the osds are bluestores now. The 5tb spinning disks are seagate 5tb 2.5 smr drives - i know the smr drives are not recomended, but they workes pretty much ok with jewel/filestore.
I assume netwoking is fine, i can confirm 10gbs between any node involved. Tested with iperf.
All pools configuration are like before..
What do you think, what could be the issue ?
Pulling my hair off again...