[FS][US-MO] AMD EPYC 7543 Milan / Asrock Romed8-2T / 256GB ECC DDR4 3200Ghz

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

slidermike

Active Member
May 7, 2023
116
45
28
Long post but hopefully it explains most of the questions (that I would have if I were a potential purchaser).


Selling:
  1. AMD EPYC 7543 (Milan) 32 core / 64 thread 2.8ghz turbo 3.7ghz 225w
  2. 8 pieces Samsung 32gb ecc ddr4 3200ghz (256Gb) m393a4k40eb3-cwe
  3. Asrock Rack Romed8-2T motherboard - currently out for RMA. Will be for sale upon return.
  4. ASUS Hyper M.2 X16 PCIe 4.0

Payment: Paypal, Cash, Venmo. (Paypal offers buyer protection I believe)
~ Local pickup if you are in the St. Charles MO area. My zipcode 63389.

bought CPU/RAM (new) & Cooler (used) from tugm4470 via Ebay in July 2023. Great seller, would purchase from him again when in the market.
bought mobo (new) from VPCI through Amazon in July 2023.

I paid - (excluding tax)
CPU $1400.00
DRAM $560.00
CPU Cooler $39.00
MOBO $667.98
ASUS Hyper M.2 X16 PCIe 4.0 - $74.99


Your price: $2430 (complete set)
CPU $1200
DRAM $480 ($60 per piece)
CPU COOLER $30
ASUS Hyper M.2 X16 PCIe 4.0 $35
MOBO $ ? (this is not currently in my possession so it cannot technically be considered for sale. Will update when I get it back. The remaining items together can be considered a complete set until then for pricing consideration)
Shipping: I am willing to negotiate a shared cost.


Your price: (pieced out)
CPU $1250
DRAM $500 ($62.50 per piece)
CPU COOLER $35
ASUS Hyper M.2 X16 PCIe 4.0 $40
MOBO $ ? (this is not currently in my possession so it cannot technically be considered for sale. Will update when I get it back.)
Shipping: Buyer pays

Pictures: gDrive_Pictures [the system said the images were too large so I added a link to my google drive]

Ideally it all as one set but would consider parting it out.

Any documentation around the purchases or support for the motherboard (emails, additional photos etc..) I have can be provided.
I have the boxes the items came in and they would go out in said boxes.

**update 01**
The system is currently down because Asrock refuses to do a cross ship RMA for the motherboard. It is on its way to them for deeper analysis and RMA processing. I have spent over a month troubleshooting the weird corrected memory error with William to no avail.
They seem stumped by it. The weirded part is the message is being reported by a fan sensor. Fan sensor = memory error???
There has been zero user/system impact but a new, server (expensive) motherboard should not be having issues out of the box.
William said he will prioritize the return but is unable to give me an EFD. I will update when I get more.

I have also adjusted the price of the CPU & RAM as well as added one of the pcie Asus m.2 cards for sale.
Additional photos added since the system is now apart. FYI the single m.2 drive in the Asus card is not going with the card. It is just in there so I do not lose it for now.
** end update 01**

Trying to upload the images it kept saying they were too large so hopefully linking to my google drive works.
I shared the pins pictures Asrock techsupport wanted as part of the troubleshooting in the lengthy backstory underneath.

Now to the backstory that should fill in most of the blanks.
I am a long time unRaid person and until the Epyc build it was running on an AMD 5900x.
I saved up for a "real" server as home systems kept me pci lane bound. Got the approval from the home boss (wife) to spend up to $x which lead me to downgrading from the Threadripper I really wanted and into a nice Epyc.
Migrating storage is easy with unRaid.
Been learning the ins and outs of Epyc on STH and levelone forums.
Noticed in the IPMI of the motherboard there were a couple of corrected notifications in slot A1. Being this was a brand new build with new components I assumed it was either user install error or faulty dram.
I reseated the dram with the same, infrequent corrected memory notifications.
I proceeded to contact Asrock to verify which physical slot it was. They confirmed it was the slot directly below the cpu (next to the pci port).
They suggested the logical, move memory around, clean the sticks, validate they are firmly seated and the slots free of debris. Done and the error stayed with the slot. Next step was to remove the cpu, check for bent pins or debris in the pin area and send pics to support so they could validate.
Done and they also confirmed no bent pins or debris in the socket. Next step was to send it in and they would further examine it and maybe a standoff is grounding or they could determine the cause. I countered with I was 100% certain I followed the correct standoff locations per the manual and due this being a high cost and brand new board with no user caused issue I would like to do an RMA. They said ok but that required another department. They gave me the link.
I initiated the RMA and asked them if they would do a cross-ship or take my credit card as security so they would pre-ship me a replacement as I did not want to be without my server for 1+ weeks.
The response was a curt "Unfortunately, we are not able to provide an advancement for your model. The RMA will be fulfilled in a first come first serve basis."
Needless to say I am unhappy with Asrock when it comes to standing behind the product. I paid a lot of money for the board and being it is a real server grade product I expected them to offer some form of exchange.
I do not intend to send the board in to Asrock at this time (there is zero impact). It would actually be more impactful to the family and friends if I took my server down for a long duration than to let the counters slowly increment.
Now we come to why I am posting.
I have spenders remorse. Love the performance and capabilities. The massive amount of pci lanes is a godsend. Spending so much money when I did not really need to is what is nagging me. I am doing nothing more with the new hardware than I was doing on the old hardware. No new VMs etc.. When the server guts sell I will simply move the storage array back into the 5900x and give the wife the money.
I can share the email details (tech support and RMA) as required.
If a serious buyer wants to inspect the parts in better lighting then I can take the server down and either do a video or pictures (your pref).


I am sharing my Ebay account to demonstrate my selling rating. I am not posting/selling my server on ebay.
I use the same username on STH and LevelOne, feel free to check me out. My account on the forums is relatively new as I have only recently gotten into the server market but I have tried to be active where I thought I could add value to the discussion or had unanswered questions.

If you have made it this far,
Thank you
Mike
 
Last edited:
  • Like
Reactions: Samir

redeamon

Active Member
Jun 10, 2018
291
207
43
Hey Mike,

Thanks for your wonderful post. I just wanted to share my experience with memory errors on EPYC boards- in the vast majority of cases, it's been not enough torque on the CPU (some boards just need more torque) or using a torque tool that is inaccurate (they all have +/- error) or has become uncalibrated. Try going another 1/8 or 1/4 turn on each screw and see if that resolves the memory issue.

Sometimes after months or even years of 24/7 use, suddenly random memory errors would appear or PCI errors (especially with NVMe drives) and we found that tightening the cpu frame resolved the problem nearly 100% of the time. We now tighten to 16 in/lb on all EPYC boards excluding Genoa and haven't had an issue since.

This is one of the tools we use (notice it has an error of plus or minus 2 in/lb!): https://www.amazon.com/gp/product/B0012AXR4S/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&th=1
 
Last edited:

slidermike

Active Member
May 7, 2023
116
45
28
Hey Mike,

Thanks for your wonderful post. I just wanted to share my experience with memory errors on EPYC boards- in the vast majority of cases, it's been not enough torque on the CPU (some boards just need more torque) or using a torque tool that is inaccurate (they all have +/- error) or has become uncalibrated. Try going another 1/8 or 1/4 turn on each screw and see if that resolves the memory issue.

Sometimes after months or even years of 24/7 use, suddenly random memory errors would appear or PCI errors (especially with NVMe drives) and we found that tightening the cpu frame resolved the problem nearly 100% of the time. We now tighten to 16 in/lb on all EPYC boards excluding Genoa and haven't had an issue since.

This is one of the tools we use (notice it has an error of plus or minus 2 in/lb!): https://www.amazon.com/gp/product/B0012AXR4S/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&th=1
Thank you @redeamon will give that a try. There has been no impact from the error and mostly i have over spent for my needs. I envisioned building many VMs etc.. turns out my environment is pretty static and adequately served by the 5900x for a lot less cost. Doh!
 
  • Like
Reactions: Samir

slidermike

Active Member
May 7, 2023
116
45
28
**update 01**
The system is currently down because Asrock refuses to do a cross ship RMA for the motherboard. It is on its way to them for deeper analysis and RMA processing. I have spent over a month troubleshooting the weird corrected memory error with William to no avail.
They seem stumped by it. The weirded part is the message is being reported by a fan sensor. Fan sensor = memory error???
There has been zero user/system impact but a new, server (expensive) motherboard should not be having issues out of the box.
William said he will prioritize the return but is unable to give me an EFD. I will update when I get more.

I have also adjusted the price of the CPU & RAM as well as added one of the pcie Asus m.2 cards for sale.
Additional photos added since the system is now apart. FYI the single m.2 drive in the Asus card is not going with the card. It is just in there so I do not lose it for now.

@redeamon thank you again for the helpful information. Unfortunately in this case it did not seem to help. I am guessing a faulty sensor since it says one of the fan sensors is reporting the memory correction. That threw Asrock for a loop. In fact, I had to point out to them the sensor reporting the issue. :rolleyes:
** end update 01**
 

shpitz461

Member
Sep 29, 2017
109
19
18
50
Try going another 1/8 or 1/4 turn on each screw and see if that resolves the memory issue.

Sometimes after months or even years of 24/7 use, suddenly random memory errors would appear or PCI errors (especially with NVMe drives) and we found that tightening the cpu frame resolved the problem nearly 100% of the time. We now tighten to 16 in/lb on all EPYC boards excluding Genoa and haven't had an issue since.
That is very interesting, I'm running a 7742 QS on a ROMED8-2T and lately been having NVMe errors (which I was able to suppress in BIOS and I don't see any effects in performance). Never had memory issues though, had 128GB in the past and a few months back upgraded to 256GB ram from NemixRam.
And just like @slidermike, I too had to RMA my original board, one of the 10gbe ports stopped working. I dealt with William as well, he was very helpful.
I did wait a month and a half for the replacement, they didn't have any of the Intel variants in stock, only the Broadcom (BCM), which turns out is missing the extra USB-C port the Intel variant has, so I refused getting a BCM variant and waited for the Intel one.
Have you tried flashing latest BIOS (they released 3.70 which has Resizable-BAR support), then resetting BIOS to defaults, and booting it with bare components? CPU and a single dimm, and see if the issue goes away and then build on that, adding more components until you hit the issue?
 

slidermike

Active Member
May 7, 2023
116
45
28
Yes to the firmware/IPMI updates and reflash - no change in behavior.
No to CPU & single DIMM. William had me move the DIMM to another slot (I have all 8 populated) and the error remains with the slot, not the DIMM.
The other "odd" behavior is the reporting sensor is a fan sensor with stumped William/Asrock support. For the curious, you can see the IPMI in the images I linked to my google drive. (still not sure what size pictures STH wants, just that it tells me the ones I try uploading are too large)
It is in his hands now (RMA). No idea how long it will take but thank you for giving me an estimate based on your experience. Shame that for server grade components they take so long and there is no advance replacement. You get faster hardware swap with lower tier components.
 
  • Like
Reactions: shpitz461

slidermike

Active Member
May 7, 2023
116
45
28
*Update*
After several months of troubleshooting, getting the Asrock board RMA'd I reached back out to the original seller of the cpu tugm from ebay and someone others here have bought from because the only part left to be the cause is the cpu. He was willing to swap the cpu after I did some testing under his direction. Mostly redundant steps Asrock had me do. Same results and I then bought a Tyan board and ran the cpu in there for a little under two weeks. Same ecc reported issue in slot closest to cpu. Tyan shut the slot down. I shared the data to tugm and send the defective epyc back to China via DHL. Ever since it arrive 3 days ago, he has stopped responding. Not going to lie I am getting a bit frustrated and concerned. I am sitting on hardware I purchased in July but now have no epic for and a formerly very responsive vendor has a working albeit faulty cpu I paid a large amount of $ for.
 
  • Wow
Reactions: Samir