Monero Mining Performance

onsit

Member
Jan 5, 2018
98
26
18
30
Latency doesn't have any bearing on hashrate. It does have a relationship with the pool's estimate of your hashrate, but that's calculated from difficulty and share submission interval, and is never going to be accurate. Based on the description of hashrate going down a mere 4H/s it's not the pool variance here, that's a local reading from the software.
This is dependent on how hash rate is timed on the mining software. The transit time between submitting to a proxy / pool and getting the next batch will result in a zig zag pattern for total hashrate. Hashrate is never flat. Xmrig for example will report on a 5sec interval, if you happen to take 1 second to round trip to your proxy / pool your hashrate will be 0 for that second - resulting in a lower 5sec average.

Your workers ability to do work is completely based on being able to talk over stratum. Hence why it's better to tailor your own difficulty so you spend more time hashing, and not round tripping to your upstream. A 10KH/s machine hashing at 1000 difficulty will have a lower hash rate due to latency and frequency, than a difficulty of 30000.
 

eureka

New Member
Jan 30, 2018
8
9
3
28
Las Vegas, NV
astr.al
Interesting technique, I'll have to try it when I get home. I'm sure Nicehash will be less likely to drop the connection too (they set difficulty really high). I notice you're actually overcommitting the cache too.
NH just has unstable servers, it's typical to see drops even with a high hashrate being submitted unfortunately. The cache isn't overcommitted, there's actually 1MB spare per socket here (2667 v2 being a 25MB SKU) which leaves a little bit of room for OS tasks. Not entirely sure which core the cache ends up being free on since Ivy doesn't have the CQM features, but it seems to work out as expected.

Just had a lightbulb moment, because I hadn't thought about that. Makes total sense though, and explains why my stacking all the low power threads at the beginning didn't work too well. I'll also check to see if I can get any gains on my dual 2680v2 nodes (those are 580/600h/s using 9 threads with 0-3 low power under linux, 2 docker images due to NUMA).
Surprisingly enough, I've had a very difficult time getting ideal rates on Linux. The only distro kernel I've found that works optimally out of the box is CentOS 7 - older variants are gimped, and other distros like Ubuntu are also far from efficient. I'm building my own kernels for machines that are mining at this point and making sure that all the CONFIG_*_PERF* and CONFIG_NUMA related options are built in. There's no need to run two copies of stak in this environment either, as long as CONFIG_NUMA is enabled. The perf related settings are just for ease of configuration rather than poking MSRs directly.

If nothing else, your philosophy definitely shows that there's reasons to read up on the architecture.
I've had my face stuffed in the Xeon datasheets and Hex-Rays for ages now - been working on optimizing cryptonight workloads since early last year. There's a whole pile of other tweaks that you can do with a fully unlocked UEFI image, probably should write some of this info up at some point. I also have a bunch of other interesting leads on optimizations that apply to later architectures - I plan to implement some of them into my Xeon miner in the near future.
 
  • Like
Reactions: Joel

eureka

New Member
Jan 30, 2018
8
9
3
28
Las Vegas, NV
astr.al
This is dependent on how hash rate
is timed on the mining software. The transit time between submitting to a proxy
/ pool and getting the next batch will result in a zig zag pattern for total
hashrate. Hashrate is never flat.
Sorry, this is totally incorrect. The mining software should be reporting
your TRUE hashrate, which will be extremely consistent if mining optimally. The
reported hashrate on the pool is going to fluctuate because it's an estimate
based on the number of shares submitted at a certain difficulty level. The
calculation is pretty simple and naive, which is why it's so inaccurate:

sum(valid_shares_diff[]) / duration

Example: ~1300H/s fixed diff of 40k with target of 30 seconds, over 1 hour:

(40000 * 120) / 3600 = ~1333H/s

Real world example: the 15 minute hashrate is within 0.2 H/s of the immediate
rate on each core: https://astr.al/u/c3734b70_399x184.png

On top of the nonce range issue, a 1000ms round trip is completely unheard of.
Even 56Kbps dialup can do better than that. If you're mining on a 1 second
delay, you should probably consider fixing that before anything else.

Xmrig for example will report on a
5sec interval, if you happen to take 1 second to round trip to your proxy / pool
your hashrate will be 0 for that second - resulting in a lower 5sec
average.
That isn't really a good way to describe how it works. A standalone single rig
has very little chance to exhaust the given nonce range. Block time is 120
seconds and the nonce is 32 bits. With stak at least, the upper 10 bits are
reserved for flags (thread ID and other fields), leaving you with 22 bits of
nonce. Using my 1300H/s example again:

nonce_space / hashrate = nonce_lifetime

e.g. (2**22) / 1300 = 3226 seconds, or in other words: 53.7 minutes per job
before you request additional work. Even with your example of 10 KH/s would
require 7 minutes to exhaust the nonce space - and that'd have to be 10KH/s per
thread as each thread is given its own nonce range to work on.

Your workers ability to do work is completely based on being able to talk over
stratum. Hence why it's better to tailor your own difficulty so you spend more
time hashing, and not round tripping to your upstream. A 10KH/s machine hashing
at 1000 difficulty will have a lower hash rate due to latency and frequency,
than a difficulty of 30000.
The only time it's suitable to use a fixed difficulty is when you're connecting
to a braindead stratum server, like the NiceHash one. Any pool running
nodejs-pool (i.e. not node-cryptonight-pool) will have vardiff working as it
should, and for the ideal payment you should be using vardiff.
 
  • Like
Reactions: onsit

Joel

Active Member
Jan 30, 2015
816
168
43
39
NH just has unstable servers, it's typical to see drops even with a high hashrate being submitted unfortunately. The cache isn't overcommitted, there's actually 1MB spare per socket here (2667 v2 being a 25MB SKU) which leaves a little bit of room for OS tasks. Not entirely sure which core the cache ends up being free on since Ivy doesn't have the CQM features, but it seems to work out as expected.
Whoops, I counted 26MB originally, looked again and I see now that it's 24. My mistake.

Surprisingly enough, I've had a very difficult time getting ideal rates on Linux. The only distro kernel I've found that works optimally out of the box is CentOS 7 - older variants are gimped, and other distros like Ubuntu are also far from efficient. I'm building my own kernels for machines that are mining at this point and making sure that all the CONFIG_*_PERF* and CONFIG_NUMA related options are built in. There's no need to run two copies of stak in this environment either, as long as CONFIG_NUMA is enabled. The perf related settings are just for ease of configuration rather than poking MSRs directly.
I tested a single image again and found out you were correct. I think I misattributed the gains from properly setting hugepages to using two docker instances. No gains found, but no losses either. So I'll migrate to a single image for simplicity and hopefully better stability on Nicehash.
 

onsit

Member
Jan 5, 2018
98
26
18
30
The only time it's suitable to use a fixed difficulty is when you're connecting
to a braindead stratum server, like the NiceHash one. Any pool running
nodejs-pool (i.e. not node-cryptonight-pool) will have vardiff working as it
should, and for the ideal payment you should be using vardiff.
Sure if you only run 3-4 machines. I run 12-30 machines (based on idle schedules) connecting to a ec2 nano instance running xmrig-proxy. I use a small app I made to scrape cryptunit and whattomine and figure out the flavor of the day to mine, and adjust the wallet and pool host to the currency. Setting a static difficulty is a must in this scenario because not all workers in the proxy-pool have the same hashrate. You end up with sockets dropping as vardiff will set the difficulty based on my proxies total hashing rate.

Coincidentally nodejs-pool only works for monero, and it has large compatibility issues with the cryptoutil library for other currencies. I've seen a few pools figure it out it seems, but not the majority of the pools have moved to nodejs-pool due to double payment issues.

So for coins like KRB, TRTL, ITNS you get left with crap node-cryptonight-pool forks.
 

mantis

Member
Nov 17, 2017
38
6
8
49
xmr-stak has a low power flag that can be used to allocate more cache (4mb per thread vs. 2mb for "false"). I found with 2680v2s that low power mode results in a higher overall hash vs affining to hyperthreaded cores.

What kind of hashrate did you get this way? And how many threads were you running?

I don't think this works. At least i'm getting under 600h/s regardless of what settings i use in stak.

The low power treads also give a lower hashrarte than two normal threads.
 
Last edited:

cafcwest

Member
Feb 15, 2013
136
14
18
Richmond, VA
Surprisingly enough, I've had a very difficult time getting ideal rates on Linux. The only distro kernel I've found that works optimally out of the box is CentOS 7 - older variants are gimped, and other distros like Ubuntu are also far from efficient. I'm building my own kernels for machines that are mining at this point and making sure that all the CONFIG_*_PERF* and CONFIG_NUMA related options are built in. There's no need to run two copies of stak in this environment either, as long as CONFIG_NUMA is enabled. The perf related settings are just for ease of configuration rather than poking MSRs directly.
I've got 16 mining systems on Ubuntu 16.04 currently that inconsistently do 60-80% of what the same hardware does under Windows Server 2016. Being a bit of a Linux novice, anything you can put together would be massively appreciated.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,138
5,173
113

eureka

New Member
Jan 30, 2018
8
9
3
28
Las Vegas, NV
astr.al
Sure if you only run 3-4 machines. I run 12-30 machines (based on idle schedules) connecting to a ec2 nano instance running xmrig-proxy. I use a small app I made to scrape cryptunit and whattomine and figure out the flavor of the day to mine, and adjust the wallet and pool host to the currency. Setting a static difficulty is a must in this scenario because not all workers in the proxy-pool have the same hashrate. You end up with sockets dropping as vardiff will set the difficulty based on my proxies total hashing rate.

Coincidentally nodejs-pool only works for monero, and it has large compatibility issues with the cryptoutil library for other currencies. I've seen a few pools figure it out it seems, but not the majority of the pools have moved to nodejs-pool due to double payment issues.

So for coins like KRB, TRTL, ITNS you get left with crap node-cryptonight-pool forks.
Yeah, that's problematic, your static diff use case is definitely valid. I know the nodejs-pool developer and he has added support for ETN at the very least, as well as the basic infrastructure required to add other coins however without an actual request or bounty for it, the coin-specific changes aren't happening any time soon. It is theoretically possible to adapt nodejs-proxy to work with these other coins but I don't know the exact route, it'd take some testing. It does work much better with vardiff compared to other proxy implementations.

I haven't heard anything about double payment issues, and the largest Monero pools run his code - if you have some more details on that problem I'd like to know so I can bring it to his attention.

You are following this https://forums.servethehome.com/index.php?resources/how-to-start-mining-monero-in-docker.34/ and getting 60-80% CPU mining speed versus Windows?
I've seen the same results on Ubuntu, while CentOS on the same hardware (both LiveCDs for testing) will work on-par with Windows. The issue stems from the default kernel config.

@cafcwest - I'll be releasing my new Linux miner in the near future, which should help with some of the efficiency problems compared to just doing a plain xmr-stak compile. As for the kernel changes, I suggest trying a CentOS 7 LiveCD first to see if the issue persists - you're going to need a precompiled version of xmr-stak however, the default version of g++ on CentOS 7 is too old. DM me if you need some assistance with that.
 

cafcwest

Member
Feb 15, 2013
136
14
18
Richmond, VA
You are following this https://forums.servethehome.com/index.php?resources/how-to-start-mining-monero-in-docker.34/ and getting 60-80% CPU mining speed versus Windows?
No, I'm bare metal, Ubuntu 16.04, combined XMR-Stak. I have configured huge pages and run the miner with a privileged account. With the 16 blades, I have over 250 data points and the 14 Ubuntu machines have on average done 23% worse Hs/sec than the two Windows 2016 machines. Identical hardware, BIOS versions, etc.

I'll be releasing my new Linux miner in the near future, which should help with some of the efficiency problems compared to just doing a plain xmr-stak compile. As for the kernel changes, I suggest trying a CentOS 7 LiveCD first to see if the issue persists - you're going to need a precompiled version of xmr-stak however, the default version of g++ on CentOS 7 is too old. DM me if you need some assistance with that.
Thank you sir. I'll spinning up a CentOS machine now to do some testing, and may take you up on that offer if I run into any walls.
 

onsit

Member
Jan 5, 2018
98
26
18
30
Picked up some R520 and a Compellent SC8000.

The R520 have dual E5-2420
And the Sc8000 is basically an R720 and came iwth dual E5-2640

Under 16.04 ubuntu server minimal with xmrig.
  1. dual E5-2420 515 H/s @ 120 watts (idrac reported), affinity set to all physical cores, 2 logical ( THREADS: 14, cryptonight, av=1, donate=1%, affinity=0x3FFF
  2. dual E5-2640 640 H/s @ 200 watts (idrac reported), similar setup 14 threads, 2 affined to HT threads.
Haven't bothered to try xmr-stak. But in my experience xmrig seems to do better under linux (XMRig/2.4.4 libuv/1.8.0 gcc/7.1.0)

Pretty surprised actually at how good the E5-2420 does, it's a $10 processor. I'm still going to upgrade these to E5-2450L V1 - for other than mining purposes. But E5-2450L should be able to do 650 H/s @ 120 watts.
 

eureka

New Member
Jan 30, 2018
8
9
3
28
Las Vegas, NV
astr.al
Pretty surprised actually at how good the E5-2420 does, it's a $10 processor. I'm still going to upgrade these to E5-2450L V1 - for other than mining purposes. But E5-2450L should be able to do 650 H/s @ 120 watts.
Can confirm on the 2450L which go for about $25-30. 105-110W is achievable with DDR3L and a decent PSU. Friend of mine has a small collection of em: https://astr.al/u/6150e2c8_2592x1458.jpg
 

Marsh

Moderator
May 12, 2013
2,440
1,269
113
@k0ste

My E3-1220v2 xmrig
[2018-02-04 08:20:39] speed 2.5s/60s/15m 930.6 930.4 930.3 H/s max: 931.6 H/s
[2018-02-04 08:21:39] speed 2.5s/60s/15m 930.5 930.3 930.3 H/s max: 931.6 H/s
[2018-02-04 08:22:39] speed 2.5s/60s/15m 930.1 930.4 930.3 H/s max: 931.6 H/s
[2018-02-04 08:23:39] speed 2.5s/60s/15m 930.9 930.3 930.3 H/s max: 931.6 H/s
[2018-02-04 08:24:39] speed 2.5s/60s/15m 930.7 930.4 930.3 H/s max: 931.6 H/s

cpu threads 4
mining threads 4
algorithm variation 0
 

Marsh

Moderator
May 12, 2013
2,440
1,269
113
Sorry, it was mining cryptonight-lite.

added:
E3, I5 CPU produced better (2.5x ) Aeon hashrate compare to XMR.
 

onsit

Member
Jan 5, 2018
98
26
18
30
Can confirm on the 2450L which go for about $25-30. 105-110W is achievable with DDR3L and a decent PSU. Friend of mine has a small collection of em: https://astr.al/u/6150e2c8_2592x1458.jpg
Yeah I have briefly considered picking up a small amount of 1U servers ($160 quanta 1u, that are dual slot E5-2400s).

What server motherboards are those you have pictured, they don't look like Dell or HP motherboards. Intel or supermicro maybe? Ones to the left look like OEM Intel C602 Socket B2.

My personal favorite is this guy on reddit that put them in military pelican racks. Runs them with Dual E5-2450Ls, and a few graphics cards that will fit. They look to be 1U DL360 G8s.

 
Last edited:
  • Like
Reactions: eureka

eureka

New Member
Jan 30, 2018
8
9
3
28
Las Vegas, NV
astr.al
They're actually no-name OEM boards from Inventec, you can find em on Taobao or Aliexpress and occasionally on eBay. The only types on eBay right now have integrated 10G (no copper ports however) but they're also LGA2011 -

Inventec B800G2/10G A9Drpf-10G Dual Lga2011 Eatx Zt System Board B800GG0 | eBay

Doesn't seem they support V2 chips unfortunately, but still a good price on a dual socket 2011 board nevermind the 10G port.

The 1356 variant is the B600_1G. I've got a modified BIOS for those too, opens up a handful of power tuning options as well as configures optimized defaults good for mining.

 
  • Like
Reactions: Marsh