3x vdev zfs mirror array for postges - thoughts on u2.nvme drives

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

iotapi322

Member
Sep 8, 2017
75
14
8
48
Hi All,
Looking to move my Crypto mining pool out of AWS and move to colocation the master and the active / active Postgres database.The db has always been the most costly and the thing that we have had to scale up the most. The EBS performance for the database is not optimal and has caused issues.

It will take about 3 months of running in colocation to recover the costs of the hardware purchase . My question is this, any real world postgres database experience running a heavy r/w work load on the micron 9300Pro series and / or the Samsung PM1735 series?


Thanks in advance,
Matt
vipor.net
 

jode

Member
Jul 27, 2021
47
40
18
3x vdev zfs mirror array for postges

My question is this, any real world postgres database experience running a heavy r/w work load on the micron 9300Pro series and / or the Samsung PM1735 series?
Both PostgreSQL (fsync) and zfs (SLOG) by default use mechnisms for synchronous writes. Using both at the same time causes significant write overhead. Consider turning off fsync (if you feed comfortable relying on zfs) and using sufficiently fast SLOG devices for your workload. A quick google search brings up recommendations for adjusting recordsize for PostgreSQL.

Using zfs as storage file system for databases (including PostgreSQL) is very desirable as it allows super efficient online backups with sending/receiving snapshots.

otoh, all my software raid attempts with nvme drives cause so much overhead that I prefer creating (and managing) separate table spaces for each nvme drive.
I understand good nvme raid controllers exist, but not in my home lab. So, this is not a suggestion for a production system, but something to be aware of, especially since you're moving from AWS partly for EBS (=storage) performance reasons.
 
  • Like
Reactions: iotapi322

ca3y6

Member
Apr 3, 2021
86
44
18
Out of curiosity, which software raid solutions having overhead with nvme did you try (and was it parity, mirror or RAID0)?
 

iotapi322

Member
Sep 8, 2017
75
14
8
48
Both PostgreSQL (fsync) and zfs (SLOG) by default use mechnisms for synchronous writes. Using both at the same time causes significant write overhead. Consider turning off fsync (if you feed comfortable relying on zfs) and using sufficiently fast SLOG devices for your workload. A quick google search brings up recommendations for adjusting recordsize for PostgreSQL.

Using zfs as storage file system for databases (including PostgreSQL) is very desirable as it allows super efficient online backups with sending/receiving snapshots.

otoh, all my software raid attempts with nvme drives cause so much overhead that I prefer creating (and managing) separate table spaces for each nvme drive.
I understand good nvme raid controllers exist, but not in my home lab. So, this is not a suggestion for a production system, but something to be aware of, especially since you're moving from AWS partly for EBS (=storage) performance reasons.
This is great information thank you so much, because every single device of every single miner sends a share every 5 to 15 seconds that is a lot of write data that we do. And then every 10 minutes we do payouts which means we look at every single coin and every single share and calculate out how much we owe to each miner. This is a huge amount of database work I have been running on a t3a.2xlarge with everything running on an EBS volume. That is a huge mistake. I need to move the. WAL data/directory to the local drive.
 

Tech Junky

Active Member
Oct 26, 2023
632
215
43
I am open to any driver recommendations and any manufacturer. If you have a specific model, I should look at I’d be more than happy to do the research on it.
I'm using their CD 6 but, the models depend of capacity and budget/speed. I picked mine up in Amazon for about $1200 for the 15.36tb size and to does 6.5gb/s.
 

jode

Member
Jul 27, 2021
47
40
18
Out of curiosity, which software raid solutions having overhead with nvme did you try (and was it parity, mirror or RAID0)?
All software designed to make multiple drives appear logically as one introduce overhead. While often not noticeable in light use, it is easily measurable in benchmarks.

RAID0 configurations are rarely recommendable for any real use case and IMHO are mainly beneficial for testing or niche use cases.

Since I stopped using Windows many years ago with small exceptions I prefer ZFS as a modern CoW filesystem. It, similar to other CoW filesystems such as btrfs (not recommended for RAID5/6 setups), is quite complex and its elaborate default features (such as inline compression, complex caching algo, etc.) slow down fast nvme drives that have very low latencies. Careful tuning is required to achieve good performance that scales linearly with number of drives.

mdraid by contrast is much simpler, but in my tests a few years back showed a (to me meaningful enough) performance overhead. I don't have numbers to share and only made the comment to watch out for it. Tested were RAID10 and RAID5 configurations.
 
  • Like
Reactions: ca3y6

iotapi322

Member
Sep 8, 2017
75
14
8
48
I'm using their CD 6 but, the models depend of capacity and budget/speed. I picked mine up in Amazon for about $1200 for the 15.36tb size and to does 6.5gb/s.
the micron 9300Pro series and / or the Samsung PM1735 series are dog cheap used... like $200 to $300 dollars.
 

Tech Junky

Active Member
Oct 26, 2023
632
215
43
the micron 9300Pro series and / or the Samsung PM1735 series are dog cheap used... like $200 to $300 dollars.
There's a reason for that. When I stepped into using U I was tempted by Micron pricing being ~$800 for the same size. I lost the data on one within a week and the second within 4 hours. Not to mention they run hot like mid 70's hot.

The old adage of you get what you pay for comes into play. While the Kioxia runs a bit more in price it hasn't given me any issues and runs in the 40's for the temp.

Make sure those drives hit the specs of what you're looking for as well. Some might give lopsided performance on either read or write.
 

iotapi322

Member
Sep 8, 2017
75
14
8
48
There's a reason for that. When I stepped into using U I was tempted by Micron pricing being ~$800 for the same size. I lost the data on one within a week and the second within 4 hours. Not to mention they run hot like mid 70's hot.

The old adage of you get what you pay for comes into play. While the Kioxia runs a bit more in price it hasn't given me any issues and runs in the 40's for the temp.

Make sure those drives hit the specs of what you're looking for as well. Some might give lopsided performance on either read or write.
Great advice, which models do you run? Would prefer something near the 2TB size
 

Tech Junky

Active Member
Oct 26, 2023
632
215
43
Great advice, which models do you run? Would prefer something near the 2TB size
CD6. The pricing is more appealing with higher capacities like the 8tb level is where you see better costs. M2 drives run $800 where the U would be about half that. The other thing is that the U drives an be up to 60tb where the max M2's are 8tb.
 

nexox

Well-Known Member
May 3, 2023
1,271
587
113
Isn't the CD6 a read-intensive SSD? With like 30k random write IOPS? That would be not great for a database.

Edit: I see they're TLC, definitely not suited for database use.
 

Tech Junky

Active Member
Oct 26, 2023
632
215
43
Isn't the CD6 a read-intensive SSD? With like 30k random write IOPS? That would be not great for a database.

Edit: I see they're TLC, definitely not suited for database use.
They have other options and great data sheets to compare different options. The performance gets better with higher capacities as well which is kind of a gotcha when it comes to these U drives depending on the use case.
 

nexox

Well-Known Member
May 3, 2023
1,271
587
113
The performance gets better with higher capacities as well which is kind of a gotcha when it comes to these U drives depending on the use case.
That's not unique to U.2 drives or NVMe, that's pretty much how all SSDs work, but TLC is still no good for database performance.
 

gb00s

Well-Known Member
Jul 25, 2018
1,253
667
113
Poland
Since i can't scale up like i can in aws, then i have to think long term but right now even with a lot of dead space i can fit in under 1TB so if i have 6 TB's i'm doing great
Would be 6TB all Optane Pmem in application mode an option with 512GB modules? You could even do DRBD cluster w/2nodes for redundancy on the modules as storage. No disks etc involved. You still go through kernel level but it's still superior to any NVMe storage solution mentioned here. No slow ZFS pool requirements. Just 2x X11DPU units with 12x 512GB modules each. Did you test postgresql on local Minio storage build from the modules? You can go with superior XFS performance as JBOD and Minio takes care of data integrity. Performance wise it's far superior to ZFS.

ADD: Or build distrubuted storage as read ops seem to dominate. Like MooseFS with XFS and atomic snapshots. You will feel like you are in a different film compared with ZFS performance.