Help! Abysmal Hashrate on Epyc 7351P

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

x3haloed

New Member
Mar 25, 2018
5
0
1
35
Hi,

I decided to buy an Epyc 7351P for CryptoNight mining based on Patrick's article about performance vs. Threadrippper 1950x: https://www.servethehome.com/amd-epyc-obliterates-threadripper-monero-cryptonight-mining/

I have it up and running now, and I'm only getting up to 160 H/s.

I've tried everything I can think of or find online, and I'm not able to improve the hashrate. I do have a Threadripper 1950x up and running, and it's performing much closer to expectations.

Does anyone with experience have any advice?

My system:
AMD Epyc 7351P
Supermicro H11SSL-i
1 x 16 GB DDR4 RDIMM
Windows 10 Pro 64-bit
XMR-Stak 2.2.0 (all-in-one)

My config:
Code:
"cpu_threads_conf" :
[
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 0 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 2 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 4 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 6 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 8 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 10 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 12 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 14 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 16 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 18 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 20 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 22 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 24 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 26 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 28 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 30 },
],
What I've tried so far:
  • Large Page Support / Lock Pages is set up correctly
  • xmr-stak is running as administrator
  • I tried mining on Debian 9 before I tried Windows, and I was getting very similar results
  • I've tried a variety of different CPU affinities, and this is the best combination I could achieve. It's the exact same configuration that I'm using successfully on my Threadripper 1950x.
  • I've read that I may need to add more DIMMs to make use of dual or quad channels. I haven't been able to test that, because I only own one. I'll buy more if I need to, but I want to be sure about it first, because they're not cheap. I'm guessing that this is not the problem though, because I'm experiencing about 10x less performance than expected -- not 2x or 4x.
Does anyone see anything that I'm missing?
Is there any more helpful data I can pull from my system?
 

pyro_

Active Member
Oct 4, 2013
747
165
43
Hi,

I decided to buy an Epyc 7351P for CryptoNight mining based on Patrick's article about performance vs. Threadrippper 1950x: https://www.servethehome.com/amd-epyc-obliterates-threadripper-monero-cryptonight-mining/

I have it up and running now, and I'm only getting up to 160 H/s.

I've tried everything I can think of or find online, and I'm not able to improve the hashrate. I do have a Threadripper 1950x up and running, and it's performing much closer to expectations.

Does anyone with experience have any advice?

My system:
AMD Epyc 7351P
Supermicro H11SSL-i
1 x 16 GB DDR4 RDIMM
Windows 10 Pro 64-bit
XMR-Stak 2.2.0 (all-in-one)

My config:
Code:
"cpu_threads_conf" :
[
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 0 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 2 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 4 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 6 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 8 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 10 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 12 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 14 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 16 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 18 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 20 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 22 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 24 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 26 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 28 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 30 },
],
What I've tried so far:
  • Large Page Support / Lock Pages is set up correctly
  • xmr-stak is running as administrator
  • I tried mining on Debian 9 before I tried Windows, and I was getting very similar results
  • I've tried a variety of different CPU affinities, and this is the best combination I could achieve. It's the exact same configuration that I'm using successfully on my Threadripper 1950x.
  • I've read that I may need to add more DIMMs to make use of dual or quad channels. I haven't been able to test that, because I only own one. I'll buy more if I need to, but I want to be sure about it first, because they're not cheap. I'm guessing that this is not the problem though, because I'm experiencing about 10x less performance than expected -- not 2x or 4x.
Does anyone see anything that I'm missing?
Is there any more helpful data I can pull from my system?
Ram could potentially be a big issue for your performance. See this article for what happened on a Threadripper system
Monero/XMR Mining On Threadripper With Multi-Channel Memory - Phoronix

They more than doubled their performance by going from single to dual channel


This would be even more so on an epyc system since it is 4 NUMA nodes each with their own controller instead of two dies
 

pyro_

Active Member
Oct 4, 2013
747
165
43
One other thin you might be able to try is running multiple instances of Xmr stak so there is one per NUMA node
 

alex_stief

Well-Known Member
May 31, 2016
884
312
63
38
Exactly. You need at least 4 DIMMs correctly populated and one instance per NUMA-node.
The latter is free, so before buying more RAM you can give it a try. But with only one DIMM and 16GB of RAM, I can't think of any useful application for an Epyc 7351P anyway.
 

x3haloed

New Member
Mar 25, 2018
5
0
1
35
OK, thank you two! I appreciate your time and help.

I tried running two instances, but it brought my total hashrate down to about 120 H/s. 4 instances is running at about 124 H/s. Looks like I'm going to have to buy more DIMMs. :(

Since AMD's documentation specifies 8 memory channels, should I really be using 8 DIMMs? Or should 4 take care of it?
AMD EPYC™ 7351P | AMD

I'm weeping for my checking account.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,801
113
What about one instance on NUMA0 running 8x 2MB threads?

I do think it is a RAM issue. When you only have one stick installed, three of the four NUMA nodes are constantly hitting Infinity Fabric whenever they need RAM access.
 

alex_stief

Well-Known Member
May 31, 2016
884
312
63
38
4 DIMMs will be enough for mining.
Did you make sure to affine the cores correctly to each NUMA-Node? One of them should output hashrates with the expected performance.
 
  • Like
Reactions: Patrick

x3haloed

New Member
Mar 25, 2018
5
0
1
35
Sorry, I think I'm in over my head here. Thanks again for the help.

Do I just need to play around with the affinities until I find the right cores, or is there a way to look up which cores belong to which NUMA nodes?

Patrick - for 2MB threads, I would set "low_power_mode" to true, right?
 

x3haloed

New Member
Mar 25, 2018
5
0
1
35
4 DIMMs will be enough for mining.
Did you make sure to affine the cores correctly to each NUMA-Node? One of them should output hashrates with the expected performance.
I found CoreInfo, which I was able to use to map out my threads, cores, and NUMA nodes. See below:

Code:
Logical to Physical Processor Map:
**------------------------------  Physical Processor 0 (Hyperthreaded)
--**----------------------------  Physical Processor 1 (Hyperthreaded)
----**--------------------------  Physical Processor 2 (Hyperthreaded)
------**------------------------  Physical Processor 3 (Hyperthreaded)
--------**----------------------  Physical Processor 4 (Hyperthreaded)
----------**--------------------  Physical Processor 5 (Hyperthreaded)
------------**------------------  Physical Processor 6 (Hyperthreaded)
--------------**----------------  Physical Processor 7 (Hyperthreaded)
----------------**--------------  Physical Processor 8 (Hyperthreaded)
------------------**------------  Physical Processor 9 (Hyperthreaded)
--------------------**----------  Physical Processor 10 (Hyperthreaded)
----------------------**--------  Physical Processor 11 (Hyperthreaded)
------------------------**------  Physical Processor 12 (Hyperthreaded)
--------------------------**----  Physical Processor 13 (Hyperthreaded)
----------------------------**--  Physical Processor 14 (Hyperthreaded)
------------------------------**  Physical Processor 15 (Hyperthreaded)

Logical Processor to Socket Map:
********************************  Socket 0

Logical Processor to NUMA Node Map:
********------------------------  NUMA Node 0
--------********----------------  NUMA Node 1
----------------********--------  NUMA Node 2
------------------------********  NUMA Node 3
So I see the 4 nodes and how they map to processors. When you say the they should be affined to each NUMA node, are you saying that I'm searching for the one node that hashes at the expected rate, and that I should try thread numbers that match each node as I search through them?
 

alex_stief

Well-Known Member
May 31, 2016
884
312
63
38
With a 7351P and SMT enabled, the NUMA nodes should be
Node 0: 0-3, 16-19
Node 1: 4-7, 20-23
Node 2: 8-11, 24-27
Node 3: 12-15, 28-31
No idea how to look this up in windows though :confused:
 
  • Like
Reactions: x3haloed

x3haloed

New Member
Mar 25, 2018
5
0
1
35
I'm getting closer!

I was able to achieve ~312 H/s using the following configuration:
Code:
"cpu_threads_conf": [
    { "low_power_mode" : true, "no_prefetch" : false, "affine_to_cpu" : 0 },
    { "low_power_mode" : true, "no_prefetch" : false, "affine_to_cpu" : 2 },
    { "low_power_mode" : true, "no_prefetch" : false, "affine_to_cpu" : 4 },
    { "low_power_mode" : true, "no_prefetch" : false, "affine_to_cpu" : 6 },
]
That should be one thread per core, running at 2MB each, all on the first NUMA node.

I've got four matching DIMMs on order. If that manages to quadruple my hashrate, then we're looking at ~1.248 KH/s, which sounds about right.

Thanks, everybody!
 

pyro_

Active Member
Oct 4, 2013
747
165
43
FYI you might be able to get a high higher hash rate then that as there is enough cache on the cpu to run a thread on each of the hyperthreaded cores as well so you should be able to do 8 threads per NUMA node