Problem with Ga-MZ31-AR0, the first single socket AMD EPYC Motherboard

Bytales

New Member
Jan 31, 2018
24
5
3
35
I have decided to Register myself here, and post my issue, since i am strugling with it for a couple of days, and i cant seem to pinpoint to culprit.
I decided to Change my Desktop pc, from a AsRock EP2c612 WS with dual E5-2690 v3 Xeons (two 12 core CPUs), to a single AMD EPYC 32 core CPU, while using as Motherboard the Gigabyte MZ31-AR0 Motherboard.
The link for the Motherboard description page is here:
MZ31-AR0 (rev. 1.0) | Server Motherboard - GIGABYTE B2B Service

Together with this Motherboard and CPU, i have two RADEON VEGA Frontier Edition, watercooled, 4x16 gb Samsung Memory stick 2133 mhz, and the following PCI Express expansions Slots
1)Aquacomputer PCI Express to M.2 Card, where a 512gb Samsung 950 pro is installed
2)256 gb Samsunng 960 EVO installed in the Motherboards M.2 Slot, both drives NVME and PCI express 3.0 4x
3)A Sonnet USB 3.0 Card, with fresco Chips, each of the 4 usb ports tied to a single PCI express 1x, the Card itself is PCI expres 2.0 4x
4)A Gigabyte WB867D-I WLAN + Bluetooth 4.0 Adapter PCIe 802.11ac PCIe x1

Also connected are 2 ASUS ROG Horus Keyboards and Roccat Nyth mouse, 2 of each.
Also on usb 2.0 port on the Motherboard through a series of links are Hubby aquacomputer hubs connected to the Motherboard 4 pieces.

For the purpose of trying to boot succesfully, i only connected one mouse and one Keyboard, and only one Monitor to the first Video Card.

The Problem i am having, is the wheter im trying to boot from an USB Installation stick of Windows 10, or trying to boot in the installed Windows (from the stick), i get "Whea Internal Error.
Sometimes i manage to boot from the stick and install the Windows on the 256gb ssd, and i enter Windows, but after installing Video Drivers and restarting, i get in a Loop of "WHEA INTERNAL ERROR" which never Ends, and i cannot enter Windows ever again.

I tried asking GIgabyte for an asnwer to the Problem, but i have no ideea why it wont work

Bios Settings i have tried:
1)Setting the bios to LEgacy instead of uefi, doesnt allow me to receive an Image at all, i had to take out the battery (probably Image was outputted on the onboard Video Card) - so im not keen on trying this again
2)enable/Disable USB Setting (Legacy, XHCI shake off, etc) doesnt help, sometimes it boots, sometimes it doesnt
3)4G Decoding enabled or disabled, nothing.
4)Setting the onboard videocard to "external" probably disables the internal Video Card, but if and when i do get in Windows, the internal Video chip is still shown in devices.

My further ideeas i Need to try.
1)Removing the usb 2.0 connector from the momotherboard and trying without it
2)Trying without the pci express usb Card
3)Trying only with the videocards inserted in the pci express Slots.
4)Trying to install Windows Server 2016, since on Support drive Downloads page, there is no "Windows 10" as supported Operation System.
5)Trying to install the regular Adrenalin Driver or the proffesional Driver for the Video Cards once Windows does start, instead of the blockchain Driver, but i dont see how that will help with the issue,.... i dont see how the MAD blockchain Driver could cause such Errors.

Seeing as this Website is where People with Server board knowledge gather, i tried to post my Problem here, perhaps i can gets some hints and ideeas on what i might be doing wrong. I really do like to Keep my brand spanking new 32 core cpu, and to ge with it, there is no other Motherboard except for this Gigabyte model.

I will probably also try installing Windows Server 2016 from a CD, but i still Need to receive the SLIM SAS to SATA cable, since the Motherboard doesnt have normal sata connector.

I have never had so big Problems trying to make a System work in my life, and i have been trough a lot of Computer manipulating stuff in my life.
 

pyro_

Active Member
Oct 4, 2013
742
162
43
try to strip it down to the bare minimum you need to boot it and install windows on it, min ram, max one video card, one m2 drive, no extra pcie cards if you can avoid it, etc. If you can get things setup like that then start to add in extra cards, drives and memory, that should at least help you narrow down the issue

edit: just noticed that it has ipmi on this board so dont even bother installing a video card on it use the ipmi to get windows setup
 
  • Like
Reactions: Patrick

Patrick

Administrator
Staff member
Dec 21, 2010
11,966
4,931
113
Our test MZ31-AR0 is in a 1U server so we have not been able to test with additional GPUs yet.

If I saw this, I would do what @pyro_ says and try without the GPUs.

My next step is that I would try a different video driver. One possible option that we had to do in the early days of EPYC and Denverton was to install OSes on other systems, then swap the drive to their final machines.

I agree that this should not happen, but it would not be the first AMD GPU driver bug of this type.

It is also why I generally suggest Threadripper over EPYC for desktop users.
 
  • Like
Reactions: gigatexal

Bytales

New Member
Jan 31, 2018
24
5
3
35
Hmmm. I could try to install windows with nothing but cpu ram mouse keyboard and the windows drive nvme ssd connected. And add gradually stuff and see what happens.
What dou you mean with IPMI? I know that with ipmi onecan acces the motherboard if its ipmi networkslot is connected to network, but i never knew one could install windows through it.

Also, i would like to get windows 10 if it would be possible, i have zero experience with windows server 2016, andi would need to get standard edition since the essentials edition support only 64gb ram and i was planning on filling up the ram slota. Besides the windows server 2016 standard version has a per core licence structure, and for 32 core would cost a ton just to have the operating system going.

I dont plan on doing any "server" stuff, justthe regular windows user, gaming computer use mining editing autocad, internet, etc.
And i would need the videocards to work.
What i would like to setup is a "hot seet" scenario, because i have 2 mouse2 keyboard 2 video cards 2 monitora and i would like for me and my friend to play same game in network using the same pc, whereas resources are given in half to each player. Sort of lime splitting the pc in two. I was hoping on setting this up in windows 10.

I have no ideea wheter or not such a scenatio could be made possible using windows virtual1zation, or if some sort of additional microsoft program might be needed for that.

In the end i just want the damn system to work as it should. I dont get it, these whea internal error. Lets hope its a trivial thing, like the connection to the aquacomputer usb 2.0 hubs. Damn, so much pain in the ass just to make it work.

Sent from my MIX using Tapatalk
 

Bytales

New Member
Jan 31, 2018
24
5
3
35
I have made a breakthrough.
I have noticed hot restarts trigger the whea internal error, whereas cold restarts, which mean reastarts done shutting down the computer, dont trigger the error.
So basically if i need to do a restart after some install, i do a shutdown instead.

What could the difference be, between a hot and a cold restart, on a hardware level, that would cause such a nasty error to appear.
THis is a most interesting find.

For instance if i need to do a restart in safe mode, i do a restart with shift, and select safe mode, but then that is a hot restart, and it wont work. Therefore, after doing the restart in safe mode command, at the boot screen, i force shutdown the pc. now whe have a cold restart in betwen, and after starting the pc, it goes in safe mode as planned, without triggering the error.

If we wont be able to find the cause for this error triggering, i guess, doing a cold restart instead of a hot restart is a small price to pay, for everzthing to work as intended.
 

KC@Gadgetblues

New Member
Sep 4, 2017
29
15
3
43
There is an issue with AMD IOMMU compatibility with Windows which was fixed in a Microsoft cumulative update. This generally causes boot failures with stop 0x5C but it could also be causing your 0x122 through general memory corruption. Normally you would work around this by disabling IOMMU in the BIOS, but the Gigabyte board doesn't seem to have a BIOS option to disable it. Therefore you need to either A) add the latest WS2016 cumulative update to your install media with DISM (both install.wim and boot.wim) or B) install on another machine, use Windows Update to update to the latest patch level, then move the drive to the Gigabyte.

The latest CU as of this post is KB4057142 (January) but you can use any CU June 2017 or later. Microsoft Update Catalog
 

Bytales

New Member
Jan 31, 2018
24
5
3
35
What in gods Name is IOMMU ?
I have Windows 10 now, can i install this Update on Windows 10, or is this a Windows Server update ?
Later edit: i clicked the link, and there is a package for Windows 10.

It is worth trying it out.
How can i see the stop code i am getting ? By scanning the QR code displayed in the Crash ?
 

Bytales

New Member
Jan 31, 2018
24
5
3
35
What i could easily do is install the update package from the link provided when i get home. However, i do not know what you mean at Point A):
Adding the cumulative update to my install media with DISM (both install.wim and boot.wim). Can you please explain what does this really mean ?
 

KC@Gadgetblues

New Member
Sep 4, 2017
29
15
3
43
Windows 10 is not officially supported by Gigabyte so I don't recommend it, but it will run unofficially. If you want to use Windows 10, you can just get the current version from the Microsoft Media Creation Tool (Download Windows 10) which will include a recent cumulative update pre-installed.

As for Server 2016, if you don't know what DISM is, then you'd probably be better off installing on another system and moving the drive over to the Gigabyte. Otherwise if you want to learn the complexities of DISM you can read here (replace "ZDP" mentally with "latest cumulative update KB" and "Windows 10" with "Windows Server 2016" since the procedure is the same): Add updates to customized Windows images

What are you doing with this PC? It's a terrible desktop unless you want to run Blender or similar.
 

Bytales

New Member
Jan 31, 2018
24
5
3
35
I am contacting a Gigabyte representative and pointing out to this Forum here, perhaps we can get some answers.
For now i updated to the latest bios, and i never get to test the updates. Is there a way to see if the said update is allready installed ?
 

Bytales

New Member
Jan 31, 2018
24
5
3
35
Can I ask, what happened in the end to this build? Did it get resolved and working? I am thinking of getting this motherboard and using it with Windows 10 hence d I just wanted a heads up on any potential problems/issues.

I previously wrote a posting asking about this Mobo:

https://forums.servethehome.com/ind...p-gigabyte-mz31-ar0-7-gpus.17962/#post-173057
I have updated to the latest BIOS a couple of months ago, and the Problem got fixed.
I am not getting any BSOD Errors at all. Everything works perfectly. You can get this Motherboard without Problems. Everything works fine.

The next step im planning is getting more pieces to make 2 PC out of a this single one, like Linus Did, with that Software damn, the Name misses my brain right now.
 
  • Like
Reactions: chriscambridge

Bytales

New Member
Jan 31, 2018
24
5
3
35
Ok, so the plan is to install Unraid, and use to create two PC from my single server tower. I got a 32 core cpu, 2 video cards, two monitors, 64 gb ram, 2 mouses 2 keyboard.
I am using a pci express 4 port usb 3.0 card, to which i have connected 3 usb 3.0 hub, and the problem is i need to take 2 hubs out, otherwise the motherboard doesnt see the usb memory stick from which it needs to boot from.
However i booted unraid, the bad part about it, is in the system information, i see IOMMU disabled, and i need to have it enabled to make two systems from this single motherboard, to pass the gpus to each independent virtuaal machine.
However i havent seen any setting in bios to enable it, and i installed the latest bios F09.

Anyone has any ideea how i can enable this. Surely the AMD Eypc 32 core cpu support IOMMU ? right ? Otherwise its all for nothing.
 

VincentdeG

New Member
Oct 12, 2018
16
0
1
Hello everybody,


We are trying to build a deep learning machine but having some problems and need some help.
First we had signal, but only with onboard VGA, videocard didn’t give any signal. We installed Windows with the onboard VGA

After a bios update (F6 to F10) the motherboard doesn’t give any signal, not onboard, not with a videokaart.

We have:

- Gigabyte MZ31-AR0
- AMD EPYC 7351
- 4x Crucial 32GB LRDIMM
- Samsung 970 EVO / 2TB / M.2
- 4x Asus GeForce RTX2080
- 1500 Watt PSU

Has anyone any idea to solve this problem?


Kind Regards

Vincent
 

VincentdeG

New Member
Oct 12, 2018
16
0
1
New Update

Update the bios systeem gives signal. When we place 1080 videocard windows boots, but with 2080 videocard gives signal but gives error 9c
We already disconnected al the usb plugs and tried diffrent slots
 

Bytales

New Member
Jan 31, 2018
24
5
3
35
Hello everybody,


We are trying to build a deep learning machine but having some problems and need some help.
First we had signal, but only with onboard VGA, videocard didn’t give any signal. We installed Windows with the onboard VGA

After a bios update (F6 to F10) the motherboard doesn’t give any signal, not onboard, not with a videokaart.

We have:

- Gigabyte MZ31-AR0
- AMD EPYC 7351
- 4x Crucial 32GB LRDIMM
- Samsung 970 EVO / 2TB / M.2
- 4x Asus GeForce RTX2080
- 1500 Watt PSU

Has anyone any idea to solve this problem?


Kind Regards

Vincent
New Update

Update the bios systeem gives signal. When we place 1080 videocard windows boots, but with 2080 videocard gives signal but gives error 9c
We already disconnected al the usb plugs and tried diffrent slots
You might want to try to stick to F06 Bios. I have noticed F06 Bios is the last Bios where you can select in the "legacy VGA" menu, if the system should use the onboard VGA or if it should use the external.
All newer Bioses, this Menu is empty.
THat is if you want or need to boot with the Onboard VGA as primary video.

Also you need to check both Legacy and UEFI options. in the bios boot menu,
For me i have found out for instance that ESXi server install only when legacy is selected, and if i want to use the onboard VGA explicitly i need to have the F06 Bios.

Luckiyl i could swithc between Bioses very easily, since i could update bios through the BMC Web COnsole. YOu need to go to Update, select Bios, and choose the Image.RBU file.

Personally im battling now trying to install two VMs with GPU passtroughs, tried Unraid, ESXi, and now PROXMOX, im still having trouble making the gpu passtrough work.
 
  • Like
Reactions: VincentdeG