Monero Mining Performance

Joel

Active Member
Jan 30, 2015
811
162
43
39
I have huge pages enabled and am running 6 cores per cpu on XMRIG. How many cores do you have enabled?

Thanks,

Jeff
6 cores, same as you (note I actually went into BIOS and disabled unused cores). I do run two separate Docker images pinned to each NUMA node, though when I was testing it didn't make much difference for me.
 

jetbird

New Member
Dec 28, 2017
19
4
3
49
6 cores, same as you (note I actually went into BIOS and disabled unused cores). I do run two separate Docker images pinned to each NUMA node, though when I was testing it didn't make much difference for me.
For comparison, what address are you using to mine? I am currently using;

us.monero.hashvault.pro:3333

Thanks,

Jeff
 

Joel

Active Member
Jan 30, 2015
811
162
43
39
Nicehash. The pool won't matter at all for what your miner software reports. Actual effective hash rate is a different story of course.
 

sno.cn

Active Member
Sep 23, 2016
206
73
28
35
My new Ryzen 7 1700 is up and running.

3.8 GHz @ 1.29v, running nice and cool at 45 C under a cheap AC Freezer 33. I haven't touched the memory yet so it's running low and slow right now.

xmrig with hugepages and affinity 0x5555 (cores 0,2,4,6,8,10,12,14) in Windows 10 is doing 608 h/s.

My 2 x E5-2670V2 is doing around 870 h/s on 20 threads without hugepages and no affinity set.
 

ari2asem

Active Member
Dec 26, 2018
536
85
28
The Netherlands, Groningen
a question about a note in the OP

Using (MB L3 cache/ 2) for threads

what does this means? suppose you have 2 cpus with each 32 core/64 threads and 64mb of L3 cache.
does it mean running 64mb/2 = 32 threads inside mining software? i dont mean 32 cpu threads.

can some explain me?
 

alex_stief

Active Member
May 31, 2016
648
202
43
35
IIRC, the default algorithm needs 2mb of last level cache per thread. So in your case, 32 threads per CPU.
It's been a long time, but I think there is a different algorithm that only needs 1mb of cache per thread. This would allow you to use all 64 HWthreads of each CPU. I could be wrong though.
My guess is that you are talking about AMD Epyc CPUs? In order to get decent performance out of them, you need to run one worker per NUMA node. Spinning up one worker on all cores/threads simultaneously will lead to poor performance.
 
Last edited:

ari2asem

Active Member
Dec 26, 2018
536
85
28
The Netherlands, Groningen
IIRC, the default algorithm needs 2mb of last level cache per thread. So in your case, 32 threads per CPU.
It's been a long time, but I think there is a different algorithm that only needs 1mb of cache per thread. This would allow you to use all 64 HWthreads of each CPU. I could be wrong though.
My guess is that you are talking about AMD Epyc CPUs? In order to get decent performance out of them, you need to run one worker per NUMA node. Spinning up one worker on all cores/threads simultaneously will lead to poor performance.
thanks for your answer.

i am running xmrig-miner on 2 socket epyc 7551 ( 2* 32 cores, 2* 64 threads of cpu's).

does this mean that i have to run 4 instances of xmrig (on windows 10, 64bit) to fully load all my cpu's with mining?
 

alex_stief

Active Member
May 31, 2016
648
202
43
35
Each of your CPUs has 4 NUMA nodes. At least if you did not fiddle with the memory interleaving options in the bios. In Linux you can check NUMA topology with lscpu and/or lstopo. In Windows a quick look at the task manager should suffice. Right-click on the the image of CPU utilization, and change the view to NUMA.
That makes 8 workers in total. No idea how to get the workers pinned to the correct cores in Windows. I only did this in Linux.
Or if you really want to squeeze the last percent of performance out of it: The L3 cache on each NUMA node is segmented again into 2 chunks. Running one worker for each chunk of L3 (consisting of 4 cores in your case) would be ideal. But then again, you would probably want to run Linux if the last bit of performance was important ;)
 

ari2asem

Active Member
Dec 26, 2018
536
85
28
The Netherlands, Groningen
upload_2019-10-22_18-21-42.png

i have totall of 8 NUMA nodes. i use bitsum process laso to set cpu-affinity when i am in BOINC/WCG and F@H. but with xmrig i cann't get done with process lasso to set cpu affinity to multiple instances of xmrig.exe....

and another question...with L3 and 2 chucnks....do you mean L3 in 16way as shown in cpu-z?

if i switch to Linux, can you help with NUMA nodes binding to miner-worker?? and which Linux distro??....last time i used Linux was in 2000....19 yrs ago :)
 

jims2321

Active Member
Jul 7, 2013
185
44
28
View attachment 12167

i have totall of 8 NUMA nodes. i use bitsum process laso to set cpu-affinity when i am in BOINC/WCG and F@H. but with xmrig i cann't get done with process lasso to set cpu affinity to multiple instances of xmrig.exe....

and another question...with L3 and 2 chucnks....do you mean L3 in 16way as shown in cpu-z?

if i switch to Linux, can you help with NUMA nodes binding to miner-worker?? and which Linux distro??....last time i used Linux was in 2000....19 yrs ago :)

Your running cn/r so you might need to install numactl first. Then run something like this

“seq 0 1 | xargs -P 0 -I node numactl -N node '/.../bin/randomx-benchmark' --mine --largePages --jit --nonces 100000 --init 8 --threads 8”

Substituting the appropriate app for the randomx-benchmark app. the critical part is seq 0 1 | xargs -P 0 -I node numactl -N node

Here is a link to how to do it.

Managing Process Affinity in Linux
 
Last edited:

ari2asem

Active Member
Dec 26, 2018
536
85
28
The Netherlands, Groningen
Your running cn/r so you might need to install numactl first. Then run something like this

“seq 0 1 | xargs -P 0 -I node numactl -N node '/.../bin/randomx-benchmark' --mine --largePages --jit --nonces 100000 --init 8 --threads 8”

Substituting the appropriate app for the randomx-benchmark app. the critical part is seq 0 1 | xargs -P 0 -I node numactl -N node

Here is a link to how to do it.

Managing Process Affinity in Linux
any clue how to do it in windows 10?
 

alex_stief

Active Member
May 31, 2016
648
202
43
35
and another question...with L3 and 2 chucnks....do you mean L3 in 16way as shown in cpu-z?
No, each die in the Zen microarchitecture consists of 2 compute complexes, each with its own L3 Cache. See e.g. Sizing Up Servers: Intel's Skylake-SP Xeon versus AMD's EPYC 7000 - The Server CPU Battle of the Decade?

Pinning workers to a certain range of cores is easy in Linux. I had several "worker" scripts that started an instance of xmrig with taskset. E.g.
taskset -c 0-3,32-34 xmrig -t 7 ...
This started xmrig with 7 threads, pinned to hardware threads 0,1,2,3,32,33,34. These threads belong to the first die on the first Epyc 7301 CPU in a dual-socket system, SMT enabled.

And then a "supervisor" script that starts all workers at the same time. It is a bit of work to set it up, but the performance and efficiency increase is worth it.
 

Klee

Well-Known Member
Jun 2, 2016
1,285
393
83
I don't use Windows 10 for mining, but this article does detail how to set thread affinity to core in windows 10

Improve CPU Monero (XMR) mining by disabling Hyperthreading (HT) — Steemit

I only use Windows 10 if I mine with GPU's and only if it is a dedicated GPU miner.

I played around a bit with my Open Compute servers with Hyper threading being disabled and yes it does mine a with little bit higher hash rate, don't remember the details but It was not a huge difference.

But since its a pain to shutdown and to add a video card and keyboard just to go into the bios to turn it off or back on on the OC servers and hyper threading is just so use useful for other things I just leave it enabled.
 

ari2asem

Active Member
Dec 26, 2018
536
85
28
The Netherlands, Groningen
my system is 2* epyc 7551. this means 2* 32cores, or totall of 64 cores, which are also seen (real and hardware 64 cores) in Process Laso (from Bitsum).

i can set cpu affinity inside Process Lasso for XMRIG-01.EXE 0-31, hash speed around 3000 h/s.

running second instance of xmrig (XMRIG-02.EXE) with cpu affinity 32-63, hash speed around 450 h/s. while hashrate of xmrig-01 drops to around 2500-2600 h/s.

when i run xmrig, it says 64 threads available.

logs of xmrig

Code:
 * ABOUT        XMRig/4.3.1-beta MSVC/2017
 * LIBS         libuv/1.31.0 OpenSSL/1.1.1c hwloc/2.0.4
 * HUGE PAGES   permission granted
 * CPU          AMD EPYC 7551 32-Core Processor (2) x64 AES
                L2:32.0 MB L3:128.0 MB 64C/128T NUMA:8
 * DONATE       1%
 * ASSEMBLY     auto:ryzen
 * POOL #1      pool.supportxmr.com:7777 coin monero
 * COMMANDS     hashrate, pause, resume
 * OPENCL       disabled
[2019-10-23 00:04:57.787] use pool pool.supportxmr.com:7777  94.23.247.226
[2019-10-23 00:04:57.788] new job from pool.supportxmr.com:7777 diff 40000 algo cn/r height 1950534
[2019-10-23 00:04:57.789]  cpu  use profile  cn  (64 threads) scratchpad 2048 KB
[2019-10-23 00:04:58.443]  cpu  READY threads 64/64 (64) huge pages 100% 64/64 memory 131072 KB (654 ms)
my totall L3 cache is 128MB (for dual socket epyc 7551). this means i can run xmrig in totall 64cores/threads, because of 2MB L3 cache per thread.

so this means, hardware limitations... am i correct? having 64 cores, mining on 32 cores (or 64 threads). half of my totall cores.

or can i fully load all my cores (or threads) under Linux with mining? or maybe also under Linux limited to L3 cache size per mining thread??
 
Last edited:

alex_stief

Active Member
May 31, 2016
648
202
43
35
Things don't quite add up here.
First things first: do you have SMT enabled?
Running a second instance of xmrig on the second CPU should not affect the hash rate of the first CPU. And you should see about the same hash rate on each of your CPUs. Provided pinning worked correctly.
And you need to tell each instance of xmrig how many threads it should use.
 

ari2asem

Active Member
Dec 26, 2018
536
85
28
The Netherlands, Groningen
SMT enabled. no any other changes in BIOS about memory and cpu. in BIOS all default.

upload_2019-10-23_18-0-31.png

each green bar in the right above corner is around 100% core load. under RULE you can see i signed different cores to different EXE-files from different locations.

xmrig.exe is version 4.4.0 beta
xmrig-01.exe is version 4.3.1 beta

upload_2019-10-23_18-2-8.png

......................................................................

this below screenshots are after changing cpu affinity for
xmrig.exe ==> 0-31 (xmrig version 4.4.0)
xmrig-01.exe == > 32-63 (xmrig version 4.3.1)
hashrate around 3000 hashes/second for 2 instances together
upload_2019-10-23_18-11-50.png

upload_2019-10-23_18-12-20.png


still having max hashrate around 3000 hashes/second for both instances together


i think my limiting factor here is L3 cache size. the same i expect under Linux.

@alex_stief .....can you post screenshots of your hashing values with so many programs/miners running and with cpu usage?
 

Attachments

alex_stief

Active Member
May 31, 2016
648
202
43
35
Sorry, it has been more than a year since I mined my last coins. Not sure if I will get back to it any time soon.
What you could do is disabling SMT. You won't use more than one thread per core, and this makes finding the correct threads to pin easier.
Isn't there a tool similar to lstopo in Windows?
 
Last edited: