Monero Mining Performance

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Marsh

Moderator
May 12, 2013
2,645
1,496
113
i7-4960X 3.60GHz, or Xeon E5-2697 v2 2.70GHz, or i7-4930K 3.40GHz.
Xeon E5-2697 v2 2.70GHz , Best hashrate among the bunch.
But it is one power hungry CPU. It depends on how much you pay for power.
My be other CPU produce less hashrate but more profitable .

I mine with a E5-2696 v2 temporary to stress the system now, future usage for the machine is run Windows 10 Creator .
 

Algeroth

New Member
Sep 29, 2017
15
3
3
54
Just wondering, which miner? Is 3646 sustained or does it drop off over time?
It does not drop
3636.6 right now

and i'm using stak, witch in every CPU model that i tested gives 7-18% more Hrate than wolf's cpu miner.
(whenever both compiled with gcc5, 6, 7, or intel's GCC)

2x Intel Xeon Gold 6138 = 1857.6H/s (38 threads)
witch shows that non-inclusive cache gives huge boost as it hashes almost identically like e5 2699v4

2x Intel E5-2699 v4 = 1985.6H/s
2x Intel E5-2683 v4 = 1584.6H/s

i know stak is little more difficult because lack of parameters from command line, or even from environment, and the need of config file
but building that config is just a matter of simple bash script with few loops and if's that will lauch optimal settings for almost all intel/amd platforms.
 
  • Like
Reactions: Marsh

Edward Ronquillo

New Member
Oct 2, 2017
4
0
1
Hey, Patrick,

Those measures grossly under-represent the Xeon Phi's potential: using my own -- phi-optimized -- miner i'm getting roughly 2,700H/s, which is 4 1/2 times what you're reporting. And, BTW, puts the 7210 on a quite differnet place in the performance list :)

- Luk
Hi Luk, I'm interested in using lukminer. I have checked eBay for some Xeon Phi 7210 cards, but I'm not sure which one's I'm supposed to get. What would be the guidelines for the proper setup to use lukeminer in this case?
 

Algeroth

New Member
Sep 29, 2017
15
3
3
54
Hi Luk, I'm interested in using lukminer. I have checked eBay for some Xeon Phi 7210 cards, but I'm not sure which one's I'm supposed to get. What would be the guidelines for the proper setup to use lukeminer in this case?
Luke's Miner does not work on 71xx series, as this is clearly stated on his website

"IMPORTANT:
The Phi version of this miner is designed for - and compiled for - the 72xx "Knights Landing" series. It will not work on older, 31xx, 51xx, or 71xx "Knights Corner" series co-processors. I may or may not add a Knights Corner compatible version, too, but that will have to wait a bit."
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,514
5,805
113
If you are buying Xeon Phi 7200 series, even at ~26500 H/s it is going to be slower than stak + 2x EPYC 7301's, cost more, and use similar if not more power. Plus, EPYC 7301's will have a bigger resale market.

You could also buy UP boards + EPYC 7351P/ EPYC 7401P's and come out ahead using stak. Wolf's is slower and xmrig is doing something strange at the moment with EPYC.
 

jims2321

Active Member
Jul 7, 2013
184
44
28
It does not drop
3636.6 right now

and i'm using stak, witch in every CPU model that i tested gives 7-18% more Hrate than wolf's cpu miner.
(whenever both compiled with gcc5, 6, 7, or intel's GCC)

2x Intel Xeon Gold 6138 = 1857.6H/s (38 threads)
witch shows that non-inclusive cache gives huge boost as it hashes almost identically like e5 2699v4

2x Intel E5-2699 v4 = 1985.6H/s
2x Intel E5-2683 v4 = 1584.6H/s

i know stak is little more difficult because lack of parameters from command line, or even from environment, and the need of config file
but building that config is just a matter of simple bash script with few loops and if's that will lauch optimal settings for almost all intel/amd platforms.
Al,

How much performance difference are you seeing between xmr-stak-cpu (I am guessing that is what you mean by 'stak'). and wolf's cpu miner?

You mine sharing your bash script that generates the config.txt?

Jim
 

Nero24

Member
Jul 26, 2017
32
5
8
48
www.planet3dnw.de
AMD's old Bulldozer arch seem to perform quite well in Monero, perhaps due to their massive caches: 4x 2 MB L2 plus an 8 MB victim L3 cache :eek:

410 H/s : 1x AMD FX-8350 (4M/8T, 4.0 GHz, Bulldozer v2 cores, 125 W, 8T with Claymore's)
305 H/s : 1x AMD Opteron 4284 (4M/8T, 3.0 GHz, Bulldozer v1 cores, 95 W, 8T with xmrig)

And even the office APUs without L3 cache are not so bad:

177 H/s : 1x AMD A8-7600 (2M/4T, 3.1 GHz, Bulldozer v3 cores, 65 W, 2T with xmrig)
153 H/s : 1x AMD A8-5600K (2M/4T, 3.6 GHz, Bulldozer v2 cores, 95 W, 2T with xmrig)
 

Nero24

Member
Jul 26, 2017
32
5
8
48
www.planet3dnw.de
@Biren78
I haven't measured it, 'cause the systems are very different, have power consuming gaming cards or disks, different PSUs and so on. Just wanted to say, that Bulldozer based CPUs might be an alternative for Monero, since there are none in Patrick's list. Here are some:
Hashrate Mining Hardware – Monero
Some data centers in Germany sold their used Bulldozer based Opteron server barebones for under 100 € on eBay :eek: So if somebody had low-grade work for them (vServer, webhosting, fileserving, etc.) it might be useful to buy them cheaply – instead of scrapping them :eek: – and fill the idle times with Monero. :D
 
Last edited:

Algeroth

New Member
Sep 29, 2017
15
3
3
54
If you are buying Xeon Phi 7200 series, even at ~26500 H/s it is going to be slower than stak + 2x EPYC 7301's, cost more, and use similar if not more power. Plus, EPYC 7301's will have a bigger resale market.

You could also buy UP boards + EPYC 7351P/ EPYC 7401P's and come out ahead using stak. Wolf's is slower and xmrig is doing something strange at the moment with EPYC.

xmrig has problems above 64threads because lack of thread pinning to specified cores,
a lot of context switching is _VERY_ bad in such NUMA configuration (as epyc architecture is basically 4 numa nodes in P... and 8 in DP)
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,514
5,805
113
xmrig has problems above 64threads because lack of thread pinning to specified cores,
a lot of context switching is _VERY_ bad in such NUMA configuration (as epyc architecture is basically 4 numa nodes in P... and 8 in DP)
Yea, I know a bit about EPYC architecture :)

You can pin docker images to NUMA nodes using Docker. That did not help very much.
 

Klee

Well-Known Member
Jun 2, 2016
1,289
396
83
AMD's old Bulldozer arch seem to perform quite well in Monero, perhaps due to their massive caches: 4x 2 MB L2 plus an 8 MB victim L3 cache :eek:

410 H/s : 1x AMD FX-8350 (4M/8T, 4.0 GHz, Bulldozer v2 cores, 125 W, 8T with Claymore's)
305 H/s : 1x AMD Opteron 4284 (4M/8T, 3.0 GHz, Bulldozer v1 cores, 95 W, 8T with xmrig)

And even the office APUs without L3 cache are not so bad:

177 H/s : 1x AMD A8-7600 (2M/4T, 3.1 GHz, Bulldozer v3 cores, 65 W, 2T with xmrig)
153 H/s : 1x AMD A8-5600K (2M/4T, 3.6 GHz, Bulldozer v2 cores, 95 W, 2T with xmrig)

I see your running 8 threads change it to 4 using only the odd number cores, your hash rate drops a little but you use WAY less power and will run cooler.

I found that was the optimum as for as hashes to power used.

From an earlier post I made about my FX 8320 at 4.0 GHz:

"If I set the affinity to even numbered threads 0,2,4,6 it slows down to ~339 H/s.

If I change it back to use threads 1,3,5,7 it runs at ~385 H/s."

FX cpu's with the large L2 , 2mb per module, mines way more on 4 threads than you would think since I think it uses the faster L2 cache rather than the L3.

410 H/s / 8 = 51.25 H/s per thread.
vs
385 H/s / 4 = 96.25 H/s per thread.


EDIT: If only those 12-16 core Opterons were unlocked they would be the price to hashing bargin.

What if the Opterons were unlocked and could do the same speeds as a FX?

Yea wishful thinking but fun to think about.

Lets see 16 core Opteron at 4.0 Ghz running 8 threads at 96.25 H/s = 770 h/s.

A four cpu motherboard running four of the above cpu's would be 3080 H/s......

I still would like to play around with a quad opteron motherboard if I could find the right one at the right price.

The right one for me would have standard atx style power connectors.
 
Last edited:

Patrick

Administrator
Staff member
Dec 21, 2010
12,514
5,805
113
I have a Opteron Mobo but only 2P. There are chips in there of unknown vintage
 

Nero24

Member
Jul 26, 2017
32
5
8
48
www.planet3dnw.de
@Klee That's very interesting :oops: But I don't understand where the difference between even and odd core numbers do come from?! Each Bulldozer v1 and v2 module has a shared L2, a shared L1 instruction cache, a shared decoder, a shared FMAC FPU on one hand, and two separate INT/ALU units and L1 data caches on the other hand, no logical units as in CPUs with SMT. I can't see a single reason why pinning threads to odd core numbers should be faster than pinning them to even ones. They should be equal. But it's very interesting :eek:
 
Last edited:

Klee

Well-Known Member
Jun 2, 2016
1,289
396
83
@Klee That's very interesting :oops: But I don't understand where the difference between even and odd core numbers do come from?! Each Bulldozer v1 and v2 module has a shared L2, a shared L1 instruction cache, a shared decoder, a shared FMAC FPU on one hand, and two separate INT/ALU units and L1 data caches on the other hand, no logical units as in CPUs with SMT. I can't see a single reason why pinning threads to odd core numbers should be faster than pinning them to even ones. They should be equal. But it's very interesting :eek:


I don't have a clue why, its just my observation.

Also test results of the first test I did with zero tweaking and just playing around with the number of cores used.

AMD FX 8320 at 4.4 GHz

XMR-STAK-CPU: only max listed, after a few minutes it would settle on a slightly lower hash rate.
8 threads= 305 H/s
7 threads= 435 H/s
6 threads= 329 H/s
5 threads= 324 H/s
4 threads= 219 H/s
3 threads= 217 H/s
2 threads= 111 H/s
1 thread = 107 H/s

Notice anything odd?

First of all I used both odd and even cores at the same time.

One thread at 107 H/s the highest speed of a single thread mining on a cpu that I have had on my hardware, and I don't think I have seen any higher. WAY WAY higher than any of my V1 V2 and V3 Xeon cpu's that I have or had.

Two threads used , from the same module, at 111 H/s.

Hardly any improvement at all, what is happening is with one thread per module it stays in L2 cache and with two threads per module it has to use L3 causing the performance hit, 55.5 H/s per thread vs 107 H/s per thread. (111 divided by 2 = 55.5 H/s)

Three Threads 217 H/s, BIG jump because one module has two threads running in L2 and L3 and the other only has one running in faster L2.

And that trend continues until you get to 8 threads because the os needs to use the cpu in other tasks not just mining and that explains why the trend no longer applies.

Thats when I realized what is going on and going with one thread per core was the most efficient way the use this cpu architecture with a large saving in power usage.

The odd vs even thing I stumbled upon during my testing when I wanted to use my GTX 650 card to mine using amr-stak-nvidia and playing around with which core to affine the gpu miner program.

I really would like to play around with a Opteron 16 core and cheap motherboard.

EDIT: V1 Xeon's also have 2 mb L2 cache, also other intel cpu's do too, but seem to work best using the formula 1/2 L3 cache not 1 thread per L2 cache that seems to work best for the FX cpu's, has to be the cpu architecture difference.

Also I have never played around with turning hyperthreading off on my Xeon's yet, may have to give that a try to see what happens.
 
Last edited:

Nero24

Member
Jul 26, 2017
32
5
8
48
www.planet3dnw.de
@Klee That are very interesting and in-depth observations. I do understand, that it doesn't make much sense to load one module with two threads; the shared ressources, the 2 MB L2 cache a module. But the even vs. odd thing is still very strange.

Whereas, the mining software and/or the OS scheduler seems to play a role, as well. You observed the max hashrate at 7 threads with xmr-stak-cpu. With xmrig by contrast I got the highest hashrate with 8 threads; with 7 threads not much lower (305 vs. 297 H/s with Opteron 4284), but not the drop you observed.

But you are right: Bulldozer has an extraordinary hashrate per module. Not even the AMD Zen arch, which performs very well with Monero, has such high hashrates per core. Otherwise, Bulldozer was trimmed by AMD far above its sweetspot regarding efficiency. To get max hashrate per watt, Bulldozer would have to be driven at much lower speed and Vcore.

If you don't have to care about wattage and acquisition costs are your primary criterion, the Bulldozers at eBay might be an insider tip...