[Solved] Missing instruction sets on SPR 8461v E3 EVQS (on X13SEM/X13DEI vs. X13SEI)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Syr

Member
Sep 10, 2017
55
20
8
Resolution/solution edit:

The majority of my problems were solved w/ the known-good replacement chip sent by Kizune.
Two issues remained:
  • The intel AMX-TMUL sample code appears to need initialization on windows 10, but there is no documentation on how to do this for windows 10 - presumably this can be reverse-engineered from the ONNX codebase which supports AMX, but exploring that was outside of the scope of merely trying to resolve the issues I was encountering. The sample code only has initialization for linux (requires kernel version 5.16 or later), and on windows 11 it just works (after removing the linux-specific initialization code) without any problems.
  • Officially according to Intel, AMX is disabled in VMs. Theoretically (at least according to VMWare) using specific hypervisors (ex, ESXI 8.0u1, apparently) it is possible to get it working on guest OSes, but this doesnt seem to be officially supported by Intel.

Original Post:

This is a continuation of this discussion from the Intel ES discussion thread: https://forums.servethehome.com/index.php?threads/es-xeon-discussion.5031/post-394775

Summary of findings so far:
AMX (Specifically AMX-Base and AMX-TMUL), VMX, SMX, SGX, and TME instructions should be available on SPR chips. There seems to be a divergence in these supposedly supported instructions between 3 users with similar chips and platforms

Using the same model of SPR stepping E3 EVQS chips sold by Kizune, sam55todd and Kizune were able to get the instruction showing up in hwinfo64, but I was unable to do so on my chip (It also did not show up under /proc/cpuinfo flags in linux). All of us were using supermicro boards, although sam55todd's X13DEI board was running older microcode. My X13SEM board ran with both older and newer microcode, neither of which enabled the missing instructions on my chip.

1696411213353.png
(Annotated image compilation from sam55todd)

Despite missing the flag for the AMX instructions, my chip did have the AMX-TMUL clocks fused, and they were fused identically to the ones on the chips that both sam55todd and Kizune have.

I was eventually able to get one of the 5 instructions to show up, but unlike sam55todd who had AMX showing up, I was able to get SGX enabled through the bios (which also indicated that HWINFO64 has a bug where it should display 'supported but disabled' instructions in red, but it seems to just be displaying them as greyed out (unsupported) instead). Despite also enabling TME and SMX in my bios on the latest firmware through their respective options, they still showed as unsupported. There were no visible options in my bios to enable either VMX or AMX.

photo_2023-10-04_10-33-06.jpg



Things I have tried:
* Rebooting
* Updating the bios from 1.1 to 1.4 (current latest)
* Resetting the bios settings (on both versions 1.1 and 1.4)
* Installing the chipset drivers
* Updating the OSes (Ubuntu 22.04 LTS and Windows 10 22H2)
* Enabling options in the bios (This got SGX working, but not VMX (no option enabled by default but not working), SMX (enabled but not working), TME (enabled but not working) or AMX (no option))

Things I am trying/will be trying over the next few days:
* known-working CPU from Kizune (to validate that this is not a board problem)
* making sure that the CPU is evenly mounted in the socket
* Windows 11 (If I can get it to install. For some reason during the 'preparing files for installation' step it bugs out and drops all the nvme and sata drives off of their busses. This doesn't cause the system to reboot or crash, but its impossible to continue the installation from that point. The windows 10 installer did not have this problem, and I had not observed it in normal use in either linux or windows. I haven't really had a chance to debug this prior to now since I've been busy today w/ work and other things)
 
Last edited:

Syr

Member
Sep 10, 2017
55
20
8
Oh, oops, I missed the option for Intel Virtualization Technology (Supermicro's label for VT-x) because it was already set to [Enabled] by default. So it was there, its just not working. Just in case there was a UI bug, I tried toggling the selection, but it changed nothing. (updated the main post)

I also double-checked the other options that are [Enabled] by default, but there was still no mention of AMX or TMUL.
 

RolloZ170

Well-Known Member
Apr 24, 2016
5,426
1,639
113
weird. this is the earliest SPR-SP ES i have, BIOS settings at default.
QY06_HWinfo.jpg
 

Syr

Member
Sep 10, 2017
55
20
8
and you can't disable AVX512 functions. there is no option.
Yeah, so I was thinking that there shouldn't be a selectable option for AMX either - like AVX512, I would expect it to just be always-on (like it is for everyone else's chips), except for some reason its not for my chip. I was just checking for one just to be sure.
 

sam55todd

Active Member
May 11, 2023
115
28
28
My take on this:

VMX (Virtual Machine Extensions) - IMO this better to be present on Windows 11 system due to the nature OS operates (containers, etc.) and developer-level user needs.
SMX (Safer Mode Extensions) - can survive without it as home user, so don't care.
SGX (Software Guard Extensions) - don't care, home test computer anyways (single user), no need for extra protection.
TME (Total Memory encryption) - don't care, home test computer anyways (single user), no need for extra protection.
AMX (Advanced Matrix Extensions) - must have for performance boost as part of a upgrade purpose to LGA4677 on a first place (at least for my analytics/development needs).
SST (Speed Shift Technology) - nice to have but my 8461V is already having relatively acceptable low-power p-states (and my PC isn't VPro-level), RolloZ170 has Asus MB thus enabled it there is not something unusual.

Since I'm running CPUs on a single-user machine and not multi-tenant enterprise/cloud - would be great to even have an option for microcodes without all those security/isolation optimizations and CVE/CWE (and around those) advisory "fixes" (trade-off leading to degraded performance and higher power usage).


I do need at least one of QAT/IAA accelerators as mandatory part of work requirements and DSA as desirable while DLB is just nice to have.

DSA - Data Streaming Accelerator
QAT - Quick Assist Technology Accelerator
DLB - Data Load balance Accelerator
IAA (or IAX) - In-Memory Analytics Accelerator
 
Last edited:

Syr

Member
Sep 10, 2017
55
20
8
Ok, I believe I've figured out what is going on with VMX.

sam55todd, are you running any hypervisors? (ex, hyper-v, virtualbox) Sounds like you have set one up already.

Turns out that CPUs hide the VMX/AMD-V flag if a hypervisor has already 'grabbed' it, and lock out the instructions from any other hypervisor. The first hypervisor to grab it will function correctly. Turns out on windows hyper-v was running (not sure if by default or from a setup script I wrote years ago, and I'm pretty sure my equivalent linux script was installing the KVM hypervisor, but I'd need to go back and check). I checked hyper-v and it was working just fine despite VMX being greyed out, so I'm pretty sure this is the case.

Since SMX, SGX, and TME all have VM-related functionality, I would not be surprised if they too got "disabled" in the same way as VMX by a hypervisor running. I'll test this out later, since to do so I'll need to disable hyper-v & the other windows virtualization functionality.

Unresolved mysteries:
* AMX missing, since that has nothing to do with VMs, and its working (presumably with a hypervisor) for sam55todd
* Windows 11 installer causing my drives to fall off the bus
 

RolloZ170

Well-Known Member
Apr 24, 2016
5,426
1,639
113
you have NOT done a fresh new windows install with actual intel inf installation ?
no wonder.
* Windows 11 installer causing my drives to fall off the bus
not normal: RMA the board ?
 

Syr

Member
Sep 10, 2017
55
20
8
No, the windows 10 & ubuntu 22.04 were fresh new installs on this hardware, but I ran some setup scripts on them immediately after installing them which automate a lot of first-time setup.

In fact, the drives falling off the bus with windows 11 took me by surprise because the installer for windows 10 worked just fine. It could be an issue the installer software has with one or more drives, the board itself, or the cpu. I haven't investigated it further though, but I'll see about doing that tonight or tomorrow.
 
  • Like
Reactions: RolloZ170

bayleyw

Active Member
Jan 8, 2014
306
102
43
What's your Ubuntu setup? I'd be willing to believe that Hyper-V on windows botched something - enabling Hyper-V triggers some rather deep changes to the OS to convert it to a Type-1 hypervisor similar to Xen. In particular, it's unclear if Hyper-V supports AMX so maybe the kernel hides the instruction availability from programs.

If you have a Xen kernel or whatnot installed on Ubuntu I'd also bet it is obscuring the extensions.
 

Syr

Member
Sep 10, 2017
55
20
8
Oh excellent points Bayleyw. To be honest, I'm not 100% certain what I updated (and didn't) when I had last updated the install scripts, and some of whats in there is untouched and dates back to 2016. Those parts contain a bunch of 'build & install stuff manually from source off github' due to issues with ubuntu 16.04 LTS missing packages that I use that were present in 14.04, or installing different versions of programs than what I wanted. So in retrospect, 'maybe' not the best setup to be debugging this in, but on the other hand, I didnt expect to be running into this problem. Also in retrospect I need to get these scripts in source control rather than just sitting around in my nextcloud server...

So in combination with that uncertainty, and RolloZ170's comment about fresh installs, I'm deciding its probably worth trying completely fresh installs w/o using my old setup scripts (I have nothing of value on these drives, so I have no qualms about nuke-and-paving the installations on them), and instead only doing the minimum installation to accomplish my testing.



Would it be possible for someone with a known-good (displays AMX as supported) sapphirerapids chip running windows to do an experiment for me? When I do this, I get an illegal instruction crash, but I'm not sure if its because my attempt to make a windows executable of the test-amxtile program failed, or because the chip actually has a problem.

What you need to do is the following: (Feel free to provide corrections to this if you find errors)
  1. Install Msys2 Mingw64 & a text editor/IDE of your choice
  2. Add the mingw64/bin dir to the %PATH% environment variable (ex, this would be at C:\msys64\mingw64\bin in the default install location)
  3. Start up the Msys2 Mingw64 terminal
  4. Run: pacman -S gcc git make
  5. Run: gcc --version
    1. Just make sure it is at least 12.x for definite AMX compilation compatibility - for me it installed 13.x
  6. Run: git clone GitHub - intel/AMX-TMUL-Code-Samples: Code samples related to Intel(R) AMX
  7. Run: cd AMX-TMUL-Code-Samples/src
  8. Run: explorer .
    1. Copy the directoy path from explorer
    2. Open up cmd and cd to that path
    3. Open up Makefile with your editor - you need to force the sapphirerapids architecture target
      1. Change -march=native to -march=sapphirerapids
    4. Open up test-amxtile.c - you need to remove everything related to syscall.h since it doesnt work for windows
      1. Remove the line: #include <sys/syscall.h>
      2. Remove the entire function: set_tiledata_use()
        1. Instead of relying on the reported flags & print statements to indicate if it works, we will simply rely on the program crashing or not to tell if it works
      3. Remove the if block: if (!set_tiledata_use())
  9. Run: make
  10. Switch to the cmd window opened up in step 8.2
  11. In Cmd*, Run: test-amxtile.exe
    1. In theory, if AMX is working correctly the program should just execute and exit with no errors [Correction: I doublechecked the code and there are some print statements of the input and result buffers]
    2. Otherwise you will get an illegal instruction exception, which is what I am getting
    3. Its possible that if theres a problem with this method, you might get the illegal instruction crash anyways, so thats why I am asking if anyone with a known-good chip can give this a shot. Even better if you can test it with and without hyper-V (or another hypervisor), and tell me if there is any difference in what instruction sets hwinfo is reporting btween the hypervisor being on/off.
If nobody gets a chance to, thats probably fine - the known good chip from Kizune should be arriving at some point tomorrow, so hopefully I can just put the whole question of if this is a chip or board (or software) problem to rest soon.


Anyways I'm going to try a few more things to see if I can reproduce this on a clean win 10 install, and then get win11 installed, and then run this test on it if I do succeed in that before I need to get some sleep. Probably wont have a chance tonight to take a look at the linux install since it is getting rather late.

*[Edited note]: If it does not run in command, try it in Msys. I eventually got Win11 running (more details in a new post) and found that it complains of a missing msys dll if the executable is run in cmd, meanwhile it was having issues w/ running in msys on Win10
 
Last edited:

Syr

Member
Sep 10, 2017
55
20
8
Ok so some updates:

I got win11 installed, but I had to disable all the other drives except the one that I was installing to from the bios. Not sure why, but this got it to install without dropping the drives off the pcie bus

I added some amendments to the instructions due to differences in the process on win 10 vs win 11

On Win 11 (currently just tested without hyper-v) I was able to get it to attempt executing the Int8 TMUL instruction (only worked when running in msys, as otherwise it refused to run due to a missing msys dll), but the first time I got a corrupt output. I tried it again several more times to see if it would happen again but only got the correct response, except once where it just exited. Not sure whats going on there, but I suspect either the board might have flakey power delivery and the AMX multiplication is drawing too much power at once or one or two cores might have flakey units. Guess I'll see tomorrow with the comparison CPU. Also I'll need to try it out w/ hyper-v enabled tomorrow as well, but I'm off to sleep.
 

bayleyw

Active Member
Jan 8, 2014
306
102
43
set_tiledata_use()
but this function requests permission to use AMX. if you remove it, AMX is not allowed thought.
That would explain the incorrect results I think, there is zero documentation for AMX on Windows but on Linux the permissions tell the kernel to reserve a huge buffer for the processor internal state. If Windows isn't allocating the right amount of state the program will fail depending on whether it was preempted partway by a different thread.
 

Syr

Member
Sep 10, 2017
55
20
8
For linux, I could find documentation saying that you needed kernel version 5.16 or later to support the large buffer, but for windows there was nothing beyond vague statements from microsoft and intel saying "its supported" on Win10 and Win11, and code that works (ONNX, openVINO) - but it would take significantly more effort to strip out just the AMX code from ONNX or openVINO without all the AVX fallbacks and create a working demo.
Whats interesting still though is at least on Win11, it "mostly worked" (at least without hyper-v), and that was with each run being its own process.

The other chip should be in soon, possibly before I out of a bunch of meetings this morning, so I might be able to just swap that in once I have the time and can see if it behaves any differently.
 

RolloZ170

Well-Known Member
Apr 24, 2016
5,426
1,639
113
The other chip should be in soon, possibly before I out of a bunch of meetings this morning, so I might be able to just swap that in once I have the time and can see if it behaves any differently.
defective cores should be automatic disabled. i doubt it is the CPU.
 

Syr

Member
Sep 10, 2017
55
20
8
Quick update from a low-effort test I'm running during meetings:
Rebooting the computer & rerunning the program reliably reproduces a corrupted output on the first attempt every time for me, so I think it is an issue w/ first-time buffer initialization in the cut down demo code when running at least in the msys2 mingw64 env on win 11. It also only happens once per reboot (even if I spin up separate msys2 terms), so the win11 kernel seems to be handling it appropriately afterwards.
No explanation yet for the failure that happened after on the first run through.