Issues with ESXi 6.7/EPYC and NUMA

AlexG

New Member
Sep 21, 2015
6
1
3
38
Was wondering if anyone has seen an issue with AMD Epycs and ESXi and the scheduler not using the 3rd and 4th nodes of the Epyc CPUS ? Ive created several VMs with 4/8 cores and with 10-12GB (a bit less specs than a NUMA which would be 4 cores / 16gbram) however if I do esxtop I just see in the NHM column 1, 2 or both (in case of the freenas vm which has 24gb ram assigned to it)

Ive tried to have a VM manually run on specific NUMAs by setting the advanced parameter of a VM

  • cpuid.coresPerSocket = 1
  • numa.vcpu.preferHT = TRUE
  • numa.nodeAffinity = 0 (tried changing it to 1,2,3 too and it always shows 1 or 2 on esxtop NHM colun)

BIOS seems to be configured correctly, or at least it exposes the NUMA nodes to the ESXi 6.7

[root@esxi:~] esxcli hardware memory get | grep NUMA
NUMA Node Count: 4

The main thing I want to acomplish with this is being able to run 2 VMs (one of each of the other nodes) to run the XMR miner and dont have any performance hit on the VMs running on the other 2 NUMA nodes

Current setup:
CPU AMD Epyc 7351P
Mobo Supermicro H11SSL-i
4 x 16GB DDR4 ECC RAM


Hope someone can share some info if im doing something wrong or missing something.
Thanks!
 
  • Like
Reactions: MiniKnight

sirsquishy

New Member
Aug 6, 2018
6
5
3
I know this a month old thread with no replies, but I am seeing the same thing. What I have discovered is that there are 2 NUMA's being applied here, the CCX Compute and then the Memory Controllers. You can scale ram BW out by adding more ram then what a single NUMA has at full allocation with no Swaping or by adding more vCPUs then as single NUMA has. A 3rd way is to force NUMA on the VMX and to have enough vCPUs to fill the NUMA requirement. But What I am unable to solve as of yet is how a single NUMA configured VM (core wise) will never go beyond 1 NUMA worth of memory bandwidth. IMHO under VMX you should be able to define NUMA based on vSocket configuration, but that doesn't work either.

On my Dell R7425's here is the testing I did that shows scaling at 32GB of ram (each of my NUMA have 64GB of ram)

I have posted this info on Reddit and opened cases with Dell and VMware to see if there is any supported way we can
1. Lower that RAM Latency
2. expand even a 4core VM to 4NUMA
r/vmware - ESXi, EPYC, and Memory Scaling.
r/Amd - EPYC and Memory Scaling on ESXi


Testing setup:

Win2012R2 VM with 32GB of ram

AIDA Memory and Cache benchmark 5.92.4300 (using RAM Copy Results and Latency)

CinebenchR15.038)RC184115 Multicore test

Both tests are ran 8 times, every other test is recorded.

..

Channel Interleaving (32GB ram on VMs)(RAM Copy)

NUMA Forced via VMX

24c/1s/4numa RAM@119/120/123/122 CB-MC2236/2110/2203/2078 (97ns/97ns/100ns/99ns)

24c/1s/0numa RAM@85/85/89/86 CB-MC2238/2340/1863/1997 (139ns/147ns/141ns/141ns)

12c/4s/4numa RAM@66/66/66/66 CB-MC1222/1269/1143/1342 (95ns/96ns/96ns/96ns)

12c/1s/4numa RAM@67/67/66/67 CB-MC1281/1345/1338/1212 (96ns/97ns/96ns/97ns)

12c/1s/0numa RAM@43/45/43/42 CB-MC1224/1329/1021/928 (138ns/126ns/127ns/131ns)

8c/1s/4numa RAM@42/44/42/42 CB-MC837/809/850/854 (96ns/96ns/96ns/95ns)

8c/1s/0numa RAM@40/43/42/46 CB-MC777/775/844/774 (127ns/125ns/131ns/129ns)

..

DIE Interleaving (32GB ram on VMs)(RAM Copy)

NUMA Forced via VMX

24c/1s/4numa RAM@100/84/86/86 CB-MC2274/2280/2041/2102 (143ns/141ns/140ns)

24c/1s/0numa RAM@86/88/85/85 CB-MC2070/2220/2226/2317 (144ns/143ns/142ns/141ns)

12c/1s/4numa RAM@92/93/88/89 CB-MC1320/1258/1331/1333 (139ns/140ns/138ns/140ns)

12c/1s/0numa RAM@52/70/52/72 CB-MC1328/1205/1319/1121 (140ns/139ns/140ns/139ns)

8c/1s/4numa RAM@88/92/52/92 CB-MC894/865/902/834 (139ns/140ns/140ns/139ns)

8c/1s/0numa RAM@42/54/46/51 CB-MC838/769/766/703 (143ns/140ns/142ns/143ns)
..
Testing done on 4core before this weekend.
4c/1s/0n RAM@28GB/s CB-MC398 CB-SC117 (138.2ns)
4c/1s/2n RAM@33GB/s CB-MC440 CB-SC117 (93.2ns)
4c/2s/4n RAM@37GB/s CB-MC444 CB-SC120
4c/4s/4n RAM@37GB/s CB-MC458 CB-SC120
..

Socket Interleaving (32GB ram on the VMs(RAM Copy)

NUMA Forced via VMX

24c/1s/4numa RAM@88/96/97/95 CB-MC2408/2483/2563/2313 (197ns/198ns/198ns/199ns)

24c/1s/0numa RAM@98/102/95/106 CB-MC2234/2315/2399/2146 (200ns/205ns/200ns/198ns)"
 
  • Like
Reactions: MiniKnight