STH AEON xmrig 1MB L3 cache Docker miner testing

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
There are a few architectures, such as the Intel Atom C3000 series 16-core parts, that have 1MB cache per core. Likewise, Xeon D has 1.5MB L3 cache per core so not a full 2MB which is what is required for the standard miner. For those architectures, AEON with cryptonight-lite and xmrig can be set to use less cache.

Here is the testing container:
Code:
servethehome/aeon_xmrig:av1
It is based on the private pool container but you can use another pool. xmrig dev fee is set to default at 0% but you can adjust upwards like in the other image.

Generating some data to see how well it works. From initial testing, it looks like this will be added to the main image.

First will be the xmrig av2 number (legacy) then av1 (this image)

Atom C3000
Intel Atom C3955, 1443, 1677, +15%
Intel Atom C3958, 1132, 1306, +15%

Xeon D-1500
Intel Xeon D-1587, 1120, 1700 +51% (note using 23 threads for this. 24 threads = 1600, 20 threads = 1552)
Intel Xeon D-1540, 630, 938 +48% (note using 12 threads for this. 8 threads = 707, 11 threads = 898)

Xeon D-2100
Intel Xeon D-2123IT 798H/s (note using 4 threads)
Intel Xeon D-2183IT 3301H/s (note using 16 threads)

Skylake-SP
2x 8180, 10400, 13100, +25% (note using 28 threads per CPU, likely more available)
4x 8180, 21000, 26000, +23% (using 28 threads per CPU)
2x 6152, 6895, 8803, +27% (using 22 threads per CPU)
2x 6138, 6331, 7408, +17% (using 20 threads per CPU)
2x 8158, 6365 (using 12 threads per CPU)
2x 6136, 6363 (using 12 threads per CPU)
1x Intel QLF1, 4544 (using 28 threads per CPU)
1x 6132, 2545, 3387, +33% (using 14 threads per CPU)
1x 8158, 3186H/s (using 12 threads per CPU)
1x 6136, 3185H/s (using 12 threads per CPU)
2x 4110, 2570, 2829, +10% (using 8 threads per CPU)
1x 5119T, 2364 (using 14 threads per CPU)
1x 6134, 2178 (using 8 threads per CPU)
1x 4116, 2123 (using 12 threads per CPU)

AMD EPYC
1x AMD EPYC 7551, 6583, 6892, +4% (using 16 threads per CPU)
1x AMD EPYC 7601, 6951, 7296, 5% (using 16 threads per CPU)

AMD Ryzen
AMD Ryzen 5 1400, 910, 950, +4%

Intel Xeon E5
Use av=2

Intel Xeon E3
Use av=2
 
Last edited:
  • Like
Reactions: Klee

Marsh

Moderator
May 12, 2013
2,642
1,496
113
Using av=1 , my experience is that E5 CPU hashrate did not increase.

Lower end CPU with larger L3 cache respond well with av=1 and higher number of threads.

I do not use xmrig auto-config, just pass av=0 or av=1 , number of threads in command line base on CPU model.
Some CPU behave badly with xmrig auto-config.
 

nkw

Active Member
Aug 28, 2017
136
48
28
Here is the difference I saw on my Xeon-D systems:

old - servethehome/aeon_xmrig:priv
new - servethehome/aeon_xmrig:av1

D-1537
6 threads / old: 805
12 threads / new: 851
+ 5.7%

D-1521 #1
3 threads / old: 501
6 threads / new: 543
+ 8.4%

D-1521 #2
3 threads / old: 502
6 threads / new: 544
+ 8.4%

D-1541
6 threads / old: 960
12 threads / new: 1033
+ 7.6%
 
  • Like
Reactions: Xeppo

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
Added over 10KH/s today from switching out a few nodes.

Seems like the Skylake-SP setting threads = physical CPU cores gives a big boost and is actually lowering power. The larger L1 cache sizes do well here.

For C3000 series, using $(nproc) with AV1 is better due to 1MB L2/ core. Need twice as many threads (or so) v. AV2

For Xeon D, AV1 is better.

Working on Xeon E5 and EPYC next.
 
  • Like
Reactions: nkw

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
Using the 1P 6132 as a test case here:

av=2, 14 threads = 2545
av=1, 14 threads = 3387
av=1, 15 threads = 3059
av=1, 16 threads = 2974
av=1, 17 threads = 3036
av=1, 18 threads = 3165
av=1, 19 threads = 3139
av=1, 20 threads = 3165
av=1, 21 threads = 3158
av=1, 22 threads = 3173
av=1, 23 threads = 3185
av=1, 24 threads = 3218
av=1, 25 threads = 3220
av=1, 26 threads = 3150
av=1, 27 threads = 3102
av=1, 28 threads = 2940

Seems as though staying in L1 for Skylake-SP has a huge impact. So with av=1 and Skylake-SP the trick is to make a docker container and use number of threads = number of cores and constrain miner to physical cores.
 

nkw

Active Member
Aug 28, 2017
136
48
28
I did notice with the change to AV1 on the Xeon Ds it raised the CPU utilization % (as reported by proxmox which I think is load average divided by hyper-thread cores) from about 50% to 75%. Power consumption went from 139 to 148 W (+ 6.4%) on the D-1537, 75 W to 78 W (+ 4%) on one D-1521, and 58W to 56W on the other D-1521 (?). These are for the whole systems, so the absolute values aren't meaningful to compare between them but the only change made to them was the switch to the AV1 image.

On the Xeon Silver 4110, using the standard image with numthreads=8 (# of physical cores) gave the best result for me which runs at 50% CPU utilization (so 8.0 load average). These are getting ~1285 H/s for me. EDIT: Switching to the AV1 image increased the 4110 to ~1412H/s (+ 9.8%).

Last week I enabled huge pages which gave a +10-15% boost across the board on my systems. This may be common knowledge but I didn't realize it gave such a big boost.
 
Last edited:

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
hugepages is a big boost!

EPYC looks less impacted 4.5-5% by the change.

If you are using EPYC use the number of threads on the die for -e numthreads=X. For --cpuset-cpus= you can use what is on each numa node when you do lscpu.

NUMA node0 CPU(s): 0-7,32-39
NUMA node1 CPU(s): 8-15,40-47
NUMA node2 CPU(s): 16-23,48-55
NUMA node3 CPU(s): 24-31,56-63

2P EPYC 7601 power went from 381 to 393 so about 3% more.

Power on Skylake-SP is down and hashes are up by not hitting the L2 cache. On the 4P 8180 23% more hashing at 10% lower power consumption.
 
Last edited:

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
I have been trying to get the Xeon E5 series to do more with av=1 but it is not happening.

My conjecture is that if you are using L3 cache, and have more than 2MB L3 cache per core, you require more threads than you would have on your CPU.

For now, E5 = use the original av=2 image.
 

dwright1542

Active Member
Dec 26, 2015
377
73
28
50
I have been trying to get the Xeon E5 series to do more with av=1 but it is not happening.

My conjecture is that if you are using L3 cache, and have more than 2MB L3 cache per core, you require more threads than you would have on your CPU.

For now, E5 = use the original av=2 image.
Opterons are great examples of that.
 
  • Like
Reactions: Patrick

nkw

Active Member
Aug 28, 2017
136
48
28
hugepages is a big boost!
If it helps anyone out, the top couple of hits when you google "hugepages debian mining" or "hugepages xmrig" lead you to pages that tell you to enable hugepages with either:

echo 128 > /proc/sys/vm/nr_hugepages
or
sysctl -w vm.nr_hugepages=128
or some pages say both, however the commands do the same thing.

This will work, but will not persist across reboots. You need to add the line
vm.nr_hugepages=128
to /etc/sysctl.conf for the setting to persist.

Again, might be something everybody knows but I had a "huh?" moment after I upgraded/rebooted all my servers yesterday and my total hashrate was down 25%.
 
  • Like
Reactions: pgh5278

Marsh

Moderator
May 12, 2013
2,642
1,496
113
add these two lines for miners as well
in
/etc/security/limits.conf
* soft memlock 626688
* hard memlock 626688
 
  • Like
Reactions: nkw

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
@Marsh have you seen a big jump with that? I still have not seen a meaningful improvement. Sub 0.5%.

@nkw if you want to do once:
Code:
sudo sysctl vm.nr_hugepages=128
That will not persist.
 
  • Like
Reactions: Xeppo

RBE

Member
Sep 5, 2017
60
34
18
@Marsh The values used for the hard and soft limits should match both the huge page size and the number of huge pages allocated. Use the following command to determine the huge page size:

Code:
$ grep Hugepagesize /proc/meminfo
In my case this shows that the Hugepagesize is set to 2048kB.

The next step is to edit /etc/sysctl.conf, adding the following to the bottom of the file:

Code:
vm.nr_hugepages=128
The soft and hard limits in /etc/security/limits.conf therefore need to be set to 128 x 2048, or 262144 bytes:

Code:
* soft memlock 262144
* hard memlock 262144
Having done all this, all that is needed is a reboot and huge pages should be enabled.
 
  • Like
Reactions: archangel.dmitry

Xeppo

New Member
Nov 13, 2013
28
11
3
Thanks guys! huge increases with these new optomizations - I just added 40KH/s to bring me to 220.95 KH/sec
 
  • Like
Reactions: Patrick

Klee

Well-Known Member
Jun 2, 2016
1,289
396
83
If it helps anyone out, the top couple of hits when you google "hugepages debian mining" or "hugepages xmrig" lead you to pages that tell you to enable hugepages with either:

echo 128 > /proc/sys/vm/nr_hugepages
or
sysctl -w vm.nr_hugepages=128
or some pages say both, however the commands do the same thing.

This will work, but will not persist across reboots. You need to add the line
vm.nr_hugepages=128
to /etc/sysctl.conf for the setting to persist.

Again, might be something everybody knows but I had a "huh?" moment after I upgraded/rebooted all my servers yesterday and my total hashrate was down 25%.
@Marsh The values used for the hard and soft limits should match both the huge page size and the number of huge pages allocated. Use the following command to determine the huge page size:

Code:
$ grep Hugepagesize /proc/meminfo
In my case this shows that the Hugepagesize is set to 2048kB.

The next step is to edit /etc/sysctl.conf, adding the following to the bottom of the file:

Code:
vm.nr_hugepages=128
The soft and hard limits in /etc/security/limits.conf therefore need to be set to 128 x 2048, or 262144 bytes:

Code:
* soft memlock 262144
* hard memlock 262144
Having done all this, all that is needed is a reboot and huge pages should be enabled.
I don't want to sound like i'm tooting my own horn, well maybe I am LOL, but I covered this a while back in my dedicated mining open compute server thread.

I used "* soft memlock unlimited" and "* hard memlock unlimited" in all servers and my desktop, might not be the best option in all cases but for what I am using those servers for, mining and my main desktop, it works fine.

Also xmrig seems to run in Ubuntu 17.10 server a little faster than the older Ubuntu's, probably because the newer gcc version.

The major disadvantage of running a version of Ubuntu newer than 16.04 is some issues running some coins core programs and wallets.

17.10 seems to be faster with all miners I have tried.

So for my main desktop I run Ubuntu 16.04.03 and my dedicated miners I run three with 17.04 and four with 17.10. I have not got around to updating the two 17.04's to 17.10.

Also one windows 7 pc running my GTX 1060 p106 dedicated mining gpu's to test them, still need risers and decide on a power supply to run all 6 of the mining gpu's in my mining case.
 

neggles

is 34 Xeons too many?
Sep 2, 2017
62
37
18
Melbourne, AU
omnom.net
This runs about twice as fast on my E5-1620V2 as the AV2 image. Getting a solid 1060H/sec (with hugepages enabled) where I was getting about 560 with AV2 - Awesome! This box is otherwise sitting fairly idle, so might as well!