STH AEON xmrig 1MB L3 cache Docker miner testing

Discussion in 'Cryptocurrency Mining and Markets' started by Patrick, Dec 9, 2017.

  1. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    10,964
    Likes Received:
    3,920
    There are a few architectures, such as the Intel Atom C3000 series 16-core parts, that have 1MB cache per core. Likewise, Xeon D has 1.5MB L3 cache per core so not a full 2MB which is what is required for the standard miner. For those architectures, AEON with cryptonight-lite and xmrig can be set to use less cache.

    Here is the testing container:
    Code:
    servethehome/aeon_xmrig:av1
    It is based on the private pool container but you can use another pool. xmrig dev fee is set to default at 0% but you can adjust upwards like in the other image.

    Generating some data to see how well it works. From initial testing, it looks like this will be added to the main image.

    First will be the xmrig av2 number (legacy) then av1 (this image)

    Atom C3000
    Intel Atom C3955, 1443, 1677, +15%
    Intel Atom C3958, 1132, 1306, +15%

    Xeon D-1500
    Intel Xeon D-1587, 1120, 1700 +51% (note using 23 threads for this. 24 threads = 1600, 20 threads = 1552)
    Intel Xeon D-1540, 630, 938 +48% (note using 12 threads for this. 8 threads = 707, 11 threads = 898)

    Xeon D-2100
    Intel Xeon D-2123IT 798H/s (note using 4 threads)
    Intel Xeon D-2183IT 3301H/s (note using 16 threads)

    Skylake-SP
    2x 8180, 10400, 13100, +25% (note using 28 threads per CPU, likely more available)
    4x 8180, 21000, 26000, +23% (using 28 threads per CPU)
    2x 6152, 6895, 8803, +27% (using 22 threads per CPU)
    2x 6138, 6331, 7408, +17% (using 20 threads per CPU)
    2x 8158, 6365 (using 12 threads per CPU)
    2x 6136, 6363 (using 12 threads per CPU)
    1x Intel QLF1, 4544 (using 28 threads per CPU)
    1x 6132, 2545, 3387, +33% (using 14 threads per CPU)
    1x 8158, 3186H/s (using 12 threads per CPU)
    1x 6136, 3185H/s (using 12 threads per CPU)
    2x 4110, 2570, 2829, +10% (using 8 threads per CPU)
    1x 5119T, 2364 (using 14 threads per CPU)
    1x 6134, 2178 (using 8 threads per CPU)
    1x 4116, 2123 (using 12 threads per CPU)

    AMD EPYC
    1x AMD EPYC 7551, 6583, 6892, +4% (using 16 threads per CPU)
    1x AMD EPYC 7601, 6951, 7296, 5% (using 16 threads per CPU)

    AMD Ryzen
    AMD Ryzen 5 1400, 910, 950, +4%

    Intel Xeon E5
    Use av=2

    Intel Xeon E3
    Use av=2
     
    #1
    Last edited: Mar 21, 2018
    Klee likes this.
  2. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    10,964
    Likes Received:
    3,920
    Impressive results thus far.
     
    #2
  3. Marsh

    Marsh Moderator

    Joined:
    May 12, 2013
    Messages:
    1,800
    Likes Received:
    795
    Using av=1 , my experience is that E5 CPU hashrate did not increase.

    Lower end CPU with larger L3 cache respond well with av=1 and higher number of threads.

    I do not use xmrig auto-config, just pass av=0 or av=1 , number of threads in command line base on CPU model.
    Some CPU behave badly with xmrig auto-config.
     
    #3
  4. nkw

    nkw Active Member

    Joined:
    Aug 28, 2017
    Messages:
    130
    Likes Received:
    44
    Here is the difference I saw on my Xeon-D systems:

    old - servethehome/aeon_xmrig:priv
    new - servethehome/aeon_xmrig:av1

    D-1537
    6 threads / old: 805
    12 threads / new: 851
    + 5.7%

    D-1521 #1
    3 threads / old: 501
    6 threads / new: 543
    + 8.4%

    D-1521 #2
    3 threads / old: 502
    6 threads / new: 544
    + 8.4%

    D-1541
    6 threads / old: 960
    12 threads / new: 1033
    + 7.6%
     
    #4
    Xeppo likes this.
  5. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    10,964
    Likes Received:
    3,920
    Added over 10KH/s today from switching out a few nodes.

    Seems like the Skylake-SP setting threads = physical CPU cores gives a big boost and is actually lowering power. The larger L1 cache sizes do well here.

    For C3000 series, using $(nproc) with AV1 is better due to 1MB L2/ core. Need twice as many threads (or so) v. AV2

    For Xeon D, AV1 is better.

    Working on Xeon E5 and EPYC next.
     
    #5
    nkw likes this.
  6. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    10,964
    Likes Received:
    3,920
    Using the 1P 6132 as a test case here:

    av=2, 14 threads = 2545
    av=1, 14 threads = 3387
    av=1, 15 threads = 3059
    av=1, 16 threads = 2974
    av=1, 17 threads = 3036
    av=1, 18 threads = 3165
    av=1, 19 threads = 3139
    av=1, 20 threads = 3165
    av=1, 21 threads = 3158
    av=1, 22 threads = 3173
    av=1, 23 threads = 3185
    av=1, 24 threads = 3218
    av=1, 25 threads = 3220
    av=1, 26 threads = 3150
    av=1, 27 threads = 3102
    av=1, 28 threads = 2940

    Seems as though staying in L1 for Skylake-SP has a huge impact. So with av=1 and Skylake-SP the trick is to make a docker container and use number of threads = number of cores and constrain miner to physical cores.
     
    #6
  7. nkw

    nkw Active Member

    Joined:
    Aug 28, 2017
    Messages:
    130
    Likes Received:
    44
    I did notice with the change to AV1 on the Xeon Ds it raised the CPU utilization % (as reported by proxmox which I think is load average divided by hyper-thread cores) from about 50% to 75%. Power consumption went from 139 to 148 W (+ 6.4%) on the D-1537, 75 W to 78 W (+ 4%) on one D-1521, and 58W to 56W on the other D-1521 (?). These are for the whole systems, so the absolute values aren't meaningful to compare between them but the only change made to them was the switch to the AV1 image.

    On the Xeon Silver 4110, using the standard image with numthreads=8 (# of physical cores) gave the best result for me which runs at 50% CPU utilization (so 8.0 load average). These are getting ~1285 H/s for me. EDIT: Switching to the AV1 image increased the 4110 to ~1412H/s (+ 9.8%).

    Last week I enabled huge pages which gave a +10-15% boost across the board on my systems. This may be common knowledge but I didn't realize it gave such a big boost.
     
    #7
    Last edited: Dec 9, 2017
  8. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    10,964
    Likes Received:
    3,920
    hugepages is a big boost!

    EPYC looks less impacted 4.5-5% by the change.

    If you are using EPYC use the number of threads on the die for -e numthreads=X. For --cpuset-cpus= you can use what is on each numa node when you do lscpu.

    NUMA node0 CPU(s): 0-7,32-39
    NUMA node1 CPU(s): 8-15,40-47
    NUMA node2 CPU(s): 16-23,48-55
    NUMA node3 CPU(s): 24-31,56-63

    2P EPYC 7601 power went from 381 to 393 so about 3% more.

    Power on Skylake-SP is down and hashes are up by not hitting the L2 cache. On the 4P 8180 23% more hashing at 10% lower power consumption.
     
    #8
    Last edited: Dec 9, 2017
  9. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    10,964
    Likes Received:
    3,920
    I have been trying to get the Xeon E5 series to do more with av=1 but it is not happening.

    My conjecture is that if you are using L3 cache, and have more than 2MB L3 cache per core, you require more threads than you would have on your CPU.

    For now, E5 = use the original av=2 image.
     
    #9
    gigatexal, archangel.dmitry and Xeppo like this.
  10. Marsh

    Marsh Moderator

    Joined:
    May 12, 2013
    Messages:
    1,800
    Likes Received:
    795
    Same observation for me. E5 is av=2
     
    #10
    Patrick likes this.
  11. dwright1542

    dwright1542 Active Member

    Joined:
    Dec 26, 2015
    Messages:
    319
    Likes Received:
    67
    Opterons are great examples of that.
     
    #11
    Patrick likes this.
  12. nkw

    nkw Active Member

    Joined:
    Aug 28, 2017
    Messages:
    130
    Likes Received:
    44
    If it helps anyone out, the top couple of hits when you google "hugepages debian mining" or "hugepages xmrig" lead you to pages that tell you to enable hugepages with either:

    echo 128 > /proc/sys/vm/nr_hugepages
    or
    sysctl -w vm.nr_hugepages=128
    or some pages say both, however the commands do the same thing.

    This will work, but will not persist across reboots. You need to add the line
    vm.nr_hugepages=128
    to /etc/sysctl.conf for the setting to persist.

    Again, might be something everybody knows but I had a "huh?" moment after I upgraded/rebooted all my servers yesterday and my total hashrate was down 25%.
     
    #12
    pgh5278 likes this.
  13. Marsh

    Marsh Moderator

    Joined:
    May 12, 2013
    Messages:
    1,800
    Likes Received:
    795
    add these two lines for miners as well
    in
    /etc/security/limits.conf
    * soft memlock 626688
    * hard memlock 626688
     
    #13
    nkw likes this.
  14. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    10,964
    Likes Received:
    3,920
    @Marsh have you seen a big jump with that? I still have not seen a meaningful improvement. Sub 0.5%.

    @nkw if you want to do once:
    Code:
    sudo sysctl vm.nr_hugepages=128
    That will not persist.
     
    #14
    Xeppo likes this.
  15. RBE

    RBE Member

    Joined:
    Sep 5, 2017
    Messages:
    60
    Likes Received:
    34
    @Marsh The values used for the hard and soft limits should match both the huge page size and the number of huge pages allocated. Use the following command to determine the huge page size:

    Code:
    $ grep Hugepagesize /proc/meminfo
    In my case this shows that the Hugepagesize is set to 2048kB.

    The next step is to edit /etc/sysctl.conf, adding the following to the bottom of the file:

    Code:
    vm.nr_hugepages=128
    The soft and hard limits in /etc/security/limits.conf therefore need to be set to 128 x 2048, or 262144 bytes:

    Code:
    * soft memlock 262144
    * hard memlock 262144
    Having done all this, all that is needed is a reboot and huge pages should be enabled.
     
    #15
    archangel.dmitry likes this.
  16. archangel.dmitry

    archangel.dmitry Active Member

    Joined:
    Sep 11, 2015
    Messages:
    222
    Likes Received:
    38
    Thanks! I was trying to figure out how do it.
     
    #16
  17. Xeppo

    Xeppo New Member

    Joined:
    Nov 13, 2013
    Messages:
    26
    Likes Received:
    9
    Thanks guys! huge increases with these new optomizations - I just added 40KH/s to bring me to 220.95 KH/sec
     
    #17
    Patrick likes this.
  18. Klee

    Klee Well-Known Member

    Joined:
    Jun 2, 2016
    Messages:
    1,063
    Likes Received:
    325
    I don't want to sound like i'm tooting my own horn, well maybe I am LOL, but I covered this a while back in my dedicated mining open compute server thread.

    I used "* soft memlock unlimited" and "* hard memlock unlimited" in all servers and my desktop, might not be the best option in all cases but for what I am using those servers for, mining and my main desktop, it works fine.

    Also xmrig seems to run in Ubuntu 17.10 server a little faster than the older Ubuntu's, probably because the newer gcc version.

    The major disadvantage of running a version of Ubuntu newer than 16.04 is some issues running some coins core programs and wallets.

    17.10 seems to be faster with all miners I have tried.

    So for my main desktop I run Ubuntu 16.04.03 and my dedicated miners I run three with 17.04 and four with 17.10. I have not got around to updating the two 17.04's to 17.10.

    Also one windows 7 pc running my GTX 1060 p106 dedicated mining gpu's to test them, still need risers and decide on a power supply to run all 6 of the mining gpu's in my mining case.
     
    #18
  19. neggles

    neggles Member

    Joined:
    Sep 2, 2017
    Messages:
    33
    Likes Received:
    2
    This runs about twice as fast on my E5-1620V2 as the AV2 image. Getting a solid 1060H/sec (with hugepages enabled) where I was getting about 560 with AV2 - Awesome! This box is otherwise sitting fairly idle, so might as well!
     
    #19
  20. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    10,964
    Likes Received:
    3,920
    Tagging @lostmind here since you may want to use this one.
     
    #20
    lostmind likes this.
Similar Threads: AEON xmrig
Forum Title Date
Cryptocurrency Mining and Markets STH Aeon Docker why older xmrig? Dec 3, 2017
Cryptocurrency Mining and Markets AEON gone crazy! Apr 24, 2018
Cryptocurrency Mining and Markets Exchange Aeon for BTC? Apr 19, 2018
Cryptocurrency Mining and Markets Aeon difficulty dropped like a ROCK Apr 10, 2018
Cryptocurrency Mining and Markets AEON price is falling but difficulty is rising Jan 30, 2018

Share This Page