Hi
Have done some more testing on burstcoin plotting.
The interesting bit this time around is vega plotting performance (TL;DR -- it's quite good)
Some general results from previous testing:
==========
xplotter on dual e5-2660v1 -- 25k - 30k nonce / minute cpu plotting
Pros:
Creates optimized plots on the first pass without much ram
Easy -- No messing with GPU settings, or trading off between i/o and ram
CPUs are easy to get -- Any good GPUs are mostly sold out or overpriced these days
Cons:
Not super fast
==========
gpuPlotGenerator -- GTX 1070 on dual e5-2660v1 -- 45k nonce / minute (1 plotting process and 1 GPU), or 60-70k nonce / minute (1 plotting process with 2 GPUs), or 90k nonce / minute (2 plotting processes and 2 GPUs)
Pros:
More power efficient than CPU plotting
Much faster per-server, especially when using two GPUs
GTX 1070 is not normally obscenely expensive (but all GPUs are right now)
Cons:
Need to trade off between using large amounts of ram, or creating un-optimized plots
Single-process plotting speed is limited by single threaded CPU performance -- plotter seems to only be able to use 2-3 CPU cores.
Need to carefully select settings to keep the GPU busy / plot at maximum speed
GTX 1070 hard to buy right now
----------
GTX 1070 tested settings:
+100mhz core / +400mhz memory / 100% power limit / 100% fan speed
Good devices.txt settings:
Single process: (not recommended with more than 1 GPU)
Recommended devices.txt:
1 0 1024 16 8192
expect around 40-45k nonces / minute with 1 GPU
Two processes:
1 0 960 8 8192
1 1 960 8 8192
Tested at 73k nonce / minute with 2 GTX 1070's, 48k nonce / minute with 1 GPU
Tested commands (one per instance):
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_0_65536_4096
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_100000_65536_4096
Four processes: (not recommended with less than 2 GPUs)
1 0 768 6 8192
1 1 768 6 8192
Tested at 78k nonce / minute with 2 GTX 1070's
tested commands (one per instance):
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_0_65536_4096
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_100000_65536_4096
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_200000_65536_4096
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_300000_65536_4096
----------
gpuPlotGenerator Settings I used for actual plotting with 2x GTX 1070:
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_0_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_715296_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_1430592_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_2145888_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_2861184_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_3576480_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_4291776_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_5007072_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_5722368_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_6437664_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_7152960_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_7868256_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_8583552_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_9298848_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_10014144_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_10729440_715296_715296
Each simultaneous running copy with the above settings requires 175GB available ram. Can run 2 copies at once with a server with 24 x 16GB DIMMs.
The above creates 16 x 175GB optimized plot files. This fills a 3TB HGST drive with a couple hundred megs free.
==========
Vega 64 results:
Same base hardware (384gb ram / dual e5-2660v1 / 1.6TB Intel NVMe SSD (DC P3605)
GTX 1060 for display only -- Vega for plotting.
Vega has powerplay tables set up
Vega wattman settings:
Core State 6:
950mv / 1402mhz
Core State 7:
1025mv / 1602mhz
HBM
1000mv / 1100mhz
Fan:
max (4600rpm)
Temps:
43c at peak GPU load
First test:
Running two instances at once, creating 500GB plot files
Devices.txt settings:
1 0 4024 34 8048
Instance 1:
gpuPlotGenerator generate buffer d:\plots\numeric-burst-address_137000000000_1907376_8048
Instance 2:
gpuPlotGenerator generate buffer d:\plots\numeric-burst-address_138000000000_1907376_8048
Requires about 2GB ram per process -- much easier requirements there.
Need to create 6 plots to fill a 3TB HGST drive with < 1GB free space
Results:
starts very fast -- 50k nonce / minute for each process, 100k nonce / minute total -- 43c gpu, always active
starts to slow down noticably 1/3 the way in -- by 33%, the average speed is 46k nonce / minute each, 92k nonce / minute total -- 39c gpu, not always active
disk i/o is fairly consistent -- writing happens most of the time at a good speed -- up to 500MB/s -- PCIE SSD needed to maintain performance
cpu use seems consistently "high" at 12-18% -- presumably CPU becomes a bottleneck as the process continues, as gpuPlotGenerator is not well threaded
by 54% complete, average speeds have dropped to 43k nonce / minute per process -- 86k nonce/minute total -- and continuing to fall.
At this point I cancelled plotting with these settings.
----------
I tried out some other settings with smaller plot files and more simultaneous processes. Performance was terrible and the system crashed at once point running 3 processes.
I settled on creating smaller plot files, and only running 2 instances at one time. Also, creating 2x 500GB plot files at once, temporarily buffered onto a 1.6TB SSD, doesn't leave enough free space to start creating the second batch of files while the first batch is being moved from SSD to HDD. So smaller plot files were needed anyway.
I did have better results with smaller plot files:
Instance 1:
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_137000000000_635792_8048
Instance 2:
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_138000000000_635792_8048
kept the same devices.txt settings:
1 0 4024 34 8048
Plot Size: 155.22 GiB / 166.67 GB
Plots per drive:
1tb: 6
2tb: 12
3tb: 18
About halfway done, each process had averaged 48,500 nonce / minute (97k nonce / minute total)
As before, it started out going around 100k total -- 48,500 nonce / minute is not a huge dropoff by 50% done
Final results:
instance 1:
635,792 nonces, 46,351 nonces/minute -- 13m 43s
instance 2:
635,792 nonces, 46,351 nonces/minute -- 13m 43s
total:
1,271,584 nonces, 92,702 nonces/minute -- 13m 43s
317,896 MiB in 13m 43s
386 MiB / second
1,357 GiB / hour
31.8 TiB / day
12x 3TB drives would be 32.6 TiB -- assuming no slowdowns, just over 24 hours to plot 12 drives with 1 Vega GPU -- Not Bad!
with the above settings (8048 stagger, 635792 nonces per plot) you will have very un-optimized plots. You'll want to use a plot optimizer afterwards. Or if you have 300+GB ram to play with, try these settings instead:
Instance 1:
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_137000000000_635792_635792
Instance 2:
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_138000000000_635792_635792
I did notice that buffer mode was quite a bit faster "in total", because every time it finishes computing one stagger, it writes that data to the drive in the background and continues calculating plots. When calculating one huge stagger in one go with direct mode, there is a long delay at the end of the process to write all of the data to disk, where it is not computing nonces for several minutes. So even with loads of ram in this server, I'm inclined to do buffer + optimize rather than direct + move to HDD.
Have done some more testing on burstcoin plotting.
The interesting bit this time around is vega plotting performance (TL;DR -- it's quite good)
Some general results from previous testing:
==========
xplotter on dual e5-2660v1 -- 25k - 30k nonce / minute cpu plotting
Pros:
Creates optimized plots on the first pass without much ram
Easy -- No messing with GPU settings, or trading off between i/o and ram
CPUs are easy to get -- Any good GPUs are mostly sold out or overpriced these days
Cons:
Not super fast
==========
gpuPlotGenerator -- GTX 1070 on dual e5-2660v1 -- 45k nonce / minute (1 plotting process and 1 GPU), or 60-70k nonce / minute (1 plotting process with 2 GPUs), or 90k nonce / minute (2 plotting processes and 2 GPUs)
Pros:
More power efficient than CPU plotting
Much faster per-server, especially when using two GPUs
GTX 1070 is not normally obscenely expensive (but all GPUs are right now)
Cons:
Need to trade off between using large amounts of ram, or creating un-optimized plots
Single-process plotting speed is limited by single threaded CPU performance -- plotter seems to only be able to use 2-3 CPU cores.
Need to carefully select settings to keep the GPU busy / plot at maximum speed
GTX 1070 hard to buy right now
----------
GTX 1070 tested settings:
+100mhz core / +400mhz memory / 100% power limit / 100% fan speed
Good devices.txt settings:
Single process: (not recommended with more than 1 GPU)
Recommended devices.txt:
1 0 1024 16 8192
expect around 40-45k nonces / minute with 1 GPU
Two processes:
1 0 960 8 8192
1 1 960 8 8192
Tested at 73k nonce / minute with 2 GTX 1070's, 48k nonce / minute with 1 GPU
Tested commands (one per instance):
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_0_65536_4096
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_100000_65536_4096
Four processes: (not recommended with less than 2 GPUs)
1 0 768 6 8192
1 1 768 6 8192
Tested at 78k nonce / minute with 2 GTX 1070's
tested commands (one per instance):
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_0_65536_4096
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_100000_65536_4096
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_200000_65536_4096
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_300000_65536_4096
----------
gpuPlotGenerator Settings I used for actual plotting with 2x GTX 1070:
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_0_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_715296_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_1430592_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_2145888_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_2861184_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_3576480_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_4291776_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_5007072_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_5722368_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_6437664_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_7152960_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_7868256_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_8583552_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_9298848_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_10014144_715296_715296
gpuPlotGenerator generate direct d:\plots\numeric-burst-address_10729440_715296_715296
Each simultaneous running copy with the above settings requires 175GB available ram. Can run 2 copies at once with a server with 24 x 16GB DIMMs.
The above creates 16 x 175GB optimized plot files. This fills a 3TB HGST drive with a couple hundred megs free.
==========
Vega 64 results:
Same base hardware (384gb ram / dual e5-2660v1 / 1.6TB Intel NVMe SSD (DC P3605)
GTX 1060 for display only -- Vega for plotting.
Vega has powerplay tables set up
Vega wattman settings:
Core State 6:
950mv / 1402mhz
Core State 7:
1025mv / 1602mhz
HBM
1000mv / 1100mhz
Fan:
max (4600rpm)
Temps:
43c at peak GPU load
First test:
Running two instances at once, creating 500GB plot files
Devices.txt settings:
1 0 4024 34 8048
Instance 1:
gpuPlotGenerator generate buffer d:\plots\numeric-burst-address_137000000000_1907376_8048
Instance 2:
gpuPlotGenerator generate buffer d:\plots\numeric-burst-address_138000000000_1907376_8048
Requires about 2GB ram per process -- much easier requirements there.
Need to create 6 plots to fill a 3TB HGST drive with < 1GB free space
Results:
starts very fast -- 50k nonce / minute for each process, 100k nonce / minute total -- 43c gpu, always active
starts to slow down noticably 1/3 the way in -- by 33%, the average speed is 46k nonce / minute each, 92k nonce / minute total -- 39c gpu, not always active
disk i/o is fairly consistent -- writing happens most of the time at a good speed -- up to 500MB/s -- PCIE SSD needed to maintain performance
cpu use seems consistently "high" at 12-18% -- presumably CPU becomes a bottleneck as the process continues, as gpuPlotGenerator is not well threaded
by 54% complete, average speeds have dropped to 43k nonce / minute per process -- 86k nonce/minute total -- and continuing to fall.
At this point I cancelled plotting with these settings.
----------
I tried out some other settings with smaller plot files and more simultaneous processes. Performance was terrible and the system crashed at once point running 3 processes.
I settled on creating smaller plot files, and only running 2 instances at one time. Also, creating 2x 500GB plot files at once, temporarily buffered onto a 1.6TB SSD, doesn't leave enough free space to start creating the second batch of files while the first batch is being moved from SSD to HDD. So smaller plot files were needed anyway.
I did have better results with smaller plot files:
Instance 1:
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_137000000000_635792_8048
Instance 2:
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_138000000000_635792_8048
kept the same devices.txt settings:
1 0 4024 34 8048
Plot Size: 155.22 GiB / 166.67 GB
Plots per drive:
1tb: 6
2tb: 12
3tb: 18
About halfway done, each process had averaged 48,500 nonce / minute (97k nonce / minute total)
As before, it started out going around 100k total -- 48,500 nonce / minute is not a huge dropoff by 50% done
Final results:
instance 1:
635,792 nonces, 46,351 nonces/minute -- 13m 43s
instance 2:
635,792 nonces, 46,351 nonces/minute -- 13m 43s
total:
1,271,584 nonces, 92,702 nonces/minute -- 13m 43s
317,896 MiB in 13m 43s
386 MiB / second
1,357 GiB / hour
31.8 TiB / day
12x 3TB drives would be 32.6 TiB -- assuming no slowdowns, just over 24 hours to plot 12 drives with 1 Vega GPU -- Not Bad!
with the above settings (8048 stagger, 635792 nonces per plot) you will have very un-optimized plots. You'll want to use a plot optimizer afterwards. Or if you have 300+GB ram to play with, try these settings instead:
Instance 1:
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_137000000000_635792_635792
Instance 2:
gpuPlotGenerator generate buffer e:\plots\numeric-burst-address_138000000000_635792_635792
I did notice that buffer mode was quite a bit faster "in total", because every time it finishes computing one stagger, it writes that data to the drive in the background and continues calculating plots. When calculating one huge stagger in one go with direct mode, there is a long delay at the end of the process to write all of the data to disk, where it is not computing nonces for several minutes. So even with loads of ram in this server, I'm inclined to do buffer + optimize rather than direct + move to HDD.