Overheating problems with 2x Epyc 7742 in Define 7 XL 1TB/2TB RAM

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,625
2,043
113
Rosewill Rackmount .. no.. no. That's not a real server chassis it's a home-version of a real server chassis.
Do I have experience with them, yes I have 3 various model Rosewill 4U and I don't use them beyond basic storage because they're basically a home variant of a real server chassis ;) that still come with minimal air flow even with better 120 home-style fans.

You do NOT need a rack. You can sit it on a desk, on wood raised off the floor (just no carpet if low), etc...

Your motherboard is only E-ATX nothing special-size or anything like that so you can fit it in many chassis.
I would buy the one SuperMicro suggests for that motherboard and they even sell it as a kit. It's also a 4U \ Workstation hybrid.


You get power supplies made for your motherboard, airflow made for it, etc, etc...

" 3x Middle 8cm (9400 rpm) PWM Fans & 2x 8cm (9400 rpm) PWM Fans "

If you're running this beast at near full utilization it's going to be LOUD.


I'll also be the person who says for such a badass build I would have run 1.5TB Intel Optane Drives, they're PCIE already no adapter needed, won't throttle like m.2 unless you need the space you have from the 3 decent drives you could run 2x 1.5TB 905P Optane and absolutely smoke the performance of your other drives.
 
Last edited:
  • Like
Reactions: gabe-a and i386

Dreece

Active Member
Jan 22, 2019
503
160
43
Having been round the block with custom 'power' workstation builds for nearly 40 years now, all I'll say here is I'm loving this thread! :)
Giggling a fair bit here, let us throw 4 turbos into an engine bay designed for a 1200cc engine while we're at it hey?! oh dont worry, just add more air flow, lol, it doesnt quite work that way, it is the design of the case which is the most crucial part for a board that is engineered to be air cooled a certain way.

On a more serious note, it's a stonking piece of tech there, very very nice! If I were going to do a workstation build with that tech, I personally would have popped into a milling shop and shored-up for a custom block for the board for a multi-pump water loop plumbed into an external rad arrangement, a few colleagues have done that with server boards over the years, I only did it the once years ago, never again (silly money). Indeed quiet and cool isn't impossible, the budget is the enabler/disabler. What you desire is not impossible, but the the way you're going about it is just going to keep causing you grief sir.

On a budget? rack it my friend, but don't expect quiet, ain't happening with that setup on air. Rack it in another room, run a usb extension to your desk along with display cables and whatnot and call it a day, enjoy raw power in peace and quiet at your desk. I kind of did the same thing for my other half's and kid's workstations, everything sits in the rack in the attic.
 

i386

Well-Known Member
Mar 18, 2016
4,217
1,540
113
34
Germany
Rosewill Rackmount .. no.. no. That's not a real server chassis it's a home-version of a real server chassis.
Do I have experience with them, yes I have 3 various model Rosewill 4U and I don't use them beyond basic storage because they're basically a home variant of a real server chassis ;) that still come with minimal air flow even with better 120 home-style fans.

You do NOT need a rack. You can sit it on a desk, on wood raised off the floor (just no carpet if low), etc...

Your motherboard is only E-ATX nothing special-size or anything like that so you can fit it in many chassis.
I would buy the one SuperMicro suggests for that motherboard and they even sell it as a kit. It's also a 4U \ Workstation hybrid.


You get power supplies made for your motherboard, airflow made for it, etc, etc...

" 3x Middle 8cm (9400 rpm) PWM Fans & 2x 8cm (9400 rpm) PWM Fans "

If you're running this beast at near full utilization it's going to be LOUD.


I'll also be the person who says for such a badass build I would have run 1.5TB Intel Optane Drives, they're PCIE already no adapter needed, won't throttle like m.2 unless you need the space you have from the 3 decent drives you could run 2x 1.5TB 905P Optane and absolutely smoke the performance of your other drives.
The 745bac is a great chassis :cool:
 

Dreece

Active Member
Jan 22, 2019
503
160
43
z4cxoqffdxq51.jpg

@gabe-a - you'll end up with something like this, honestly with the cheap prices of 4u rack cases and small racks, you can't argue, if you shop with beady eyes you could probably even find a whole setup where you just rip out the board and throw your gear in ;)

Definitely go with 4U so you can throw in large fans that push more air in the right direction with less noise. If done right, you could get away with a setup you can have at your desk, won't be dead quiet, but on idle/regular use ambient around 20c it will be like a low-tone breeze in the background... but do expect things to go crazy when you start pushing that tech into the next dimension of performance (whence my suggestion of racking it in another room if at all possible or do a me and throw it in the attic/basement/garage)
 

msg7086

Active Member
May 2, 2017
423
148
43
36
A bit off topic here.

I'm wondering, if you are doing HPC related work, would it be easier to build multiple single socket workstations and run it distributed? Sorry if this sounds dumb as I'm not familiar with HPC stuff, but I'm thinking that maybe multiple single socket EPYC workstation with regular RDIMM could be more cost effective?
 

Dreece

Active Member
Jan 22, 2019
503
160
43
I'm wondering, if you are doing HPC related work, would it be easier to build multiple single socket workstations and run it distributed? Sorry if this sounds dumb as I'm not familiar with HPC stuff, but I'm thinking that maybe multiple single socket EPYC workstation with regular RDIMM could be more cost effective?
Cost/power/noise/space... devil's in the detail I guess. If it's being done at home, it's being done with an all-in-one view, usually the case. The fact the chap went the server-parts route for this in a Fractal case, already kind of shines the light on it being a budget'd setup... distributed setups can very quickly jump into the tens of thousands.
 

Stevecam

New Member
Oct 1, 2020
7
2
3
36
I noticed a few caps were placed behind the board, would airflow back here be helpful?
 

gabe-a

New Member
Sep 10, 2020
26
6
3
Cost/power/noise/space... devil's in the detail I guess. If it's being done at home, it's being done with an all-in-one view, usually the case. The fact the chap went the server-parts route for this in a Fractal case, already kind of shines the light on it being a budget'd setup... distributed setups can very quickly jump into the tens of thousands.
Hehe, funny you should mention this. The reality is, I paid multiple tens of thousands of my own savings for a company to build this for me. This Fractal business is their rebuild. The first was in a glass case and literally wouldn't boot without an alarm. After so much sunk cost, I am willing to almost do anything at this point. (Almost = I have no skills myself with hardware, having not even built a normal desktop on my own).
I noticed a few caps were placed behind the board, would airflow back here be helpful?
Caps? (Apologies, new to some of the lingo).

Indeed quiet and cool isn't impossible, the budget is the enabler/disabler. What you desire is not impossible, but the the way you're going about it is just going to keep causing you grief sir.
I would like to do the impossible. :) (Cool + quiet + dual server CPU). How come HP managed to do it with my dual Xeon platinum 8180 workstation? Absolutely silent at idle, quiet under load...but loudish under crazy load.

For now I'm going to target the hotspots specifically as alex_stief mentioned earlier -- one thing at a time is a reasonable philosophy as long as I'm tinkering with fans. A 40mm for the one overheating RAM VRM, and an 80mm at an angle over the CPU VRM.

As for why one rig -- I want unified memory access for a number of jobs. With 2TB of RAM in there, it can hold large clustering jobs, in-RAM low-latency DB lookups, and more that do not work as well in a distributed system. Plus distributed systems take up more space, make more noise, etc.
 
  • Like
Reactions: Dreece

Dreece

Active Member
Jan 22, 2019
503
160
43
How come HP managed to do it with my dual Xeon platinum 8180 workstation? Absolutely silent at idle, quiet under load...but loudish under crazy load.
HPE are magicians, plus the masters of virtually every stage of the build from the motherboard through to bios right through to what materials they use for every inch of the box they put together. Off-the-shelf parts guys don't have those kind of magical powers, but we can achieve similar albeit with a lot of testing/trialling... keep us posted on your journey!!!
 

williedee

New Member
Jan 25, 2016
20
6
3
44
I'll echo the others - you need to get this in a server case with high pressure fans - itll probably be loud but it should be stable. That company should refund you, they sold you something that doesn't work. crazy!
 

TXAG26

Active Member
Aug 2, 2016
397
120
43
I did my Epyc 7302p build with the case linked below. The case comes standard with two 80mm fans in the fan wall, but I added two more to fill it up completely. All 4x 80mm fans are 2,200 rpm fans. Supermicro also makes two other 80mm fan models for this case that spin around 5,000rpm and 7,000rpm if you need even more front to back airflow. This case is nice and tight from an airflow standpoint and seems to cool my ram and VRM's well. I have just bare ram sticks and have added nothing to the motherboard besides the standard CPU heatsink/fan. Granted, this is a single CPU system with 4 sticks of 32GB ram, but still, it seems to cool well. I use this system for Folding@Home and its been running 100% on the CPU and 100% on the Nvidia RTX 1660 Super since March 2020. Below are my temperatures from a 73 degree room.


1​
CPU TempNormal62 degrees C
2​
System TempNormal51 degrees C
3​
Peripheral TempNormal48 degrees C
4​
M2NVMeSSD Temp1N/ANot Present!
5​
VRMCpu TempNormal53 degrees C
6​
VRMSoc TempNormal51 degrees C
7​
VRMABCD TempNormal56 degrees C
8​
VRMEFGH TempNormal50 degrees C
9​
DIMMA1 TempN/ANot Present!
10​
DIMMB1 TempN/ANot Present!
11​
DIMMC1 TempNormal55 degrees C
12​
DIMMD1 TempNormal56 degrees C
13​
DIMME1 TempN/ANot Present!
14​
DIMMF1 TempN/ANot Present!
15​
DIMMG1 TempNormal49 degrees C
16​
DIMMH1 TempNormal51 degrees C
17​
FAN1Normal2700 R.P.M
18​
FAN2Normal2200 R.P.M
19​
FAN3N/A2100 R.P.M
20​
FAN4N/A2100 R.P.M
21​
FAN5N/A2100 R.P.M
22​
FANANormal2100 R.P.M
23​
FANBNormal2000 R.P.M
24​
12VNormal12.176 Volts
25​
5VCCNormal5.02 Volts
26​
3.3VCCNormal3.31 Volts
27​
VBAT Battery presence detected.
28​
VDDCRNormal1.192 Volts
29​
VMEMABCDNormal1.237 Volts
30​
VMEMEFGHNormal1.221 Volts
31​
VDD_5_DUALNormal5.159 Volts
32​
VDD_33_DUALNormal3.327 Volts
33​
SOCRUNNormal0.873 Volts
34​
SOCDUALNormal0.893 Volts
35​
Chassis Intru OK
36​
PS1 Status Presence detected.
37​
AOC_NIC_TempNormal51 degrees C
 
  • Like
Reactions: Jaket and T_Minus

TXAG26

Active Member
Aug 2, 2016
397
120
43
I purchased mine from Wiredzone and they're great to work with. Most stuff also drop-ships straight from Supermicro so you get the latest revision!

 
Last edited:

trazanka

New Member
Jan 18, 2021
4
2
1
I am running dual 7742 on a Supermicro H11DSi-NT board 512GB of RAM and a Nvidia Quadro RTX8000.

Our case is a Thermaltake W200
Our CPU fans are Noctua NH-U9 TR4-SP3 premium-grade Dual 92mm Fans, we use a separate fan controller and run all four fans at 2000 RPM
7 Case Fans are Noctua NF-P14s redux-1500 PWM, High Performance Cooling Fan, 4-Pin, 1500 RPM (140mm, Grey)
Running a Highpoint Rocket Raid 3740 controller and 6 Samsung EVO 2 TB in a Raid 10 configuration.

We do a lot of plastics mold flow, casting mold flow and FEA.

The screen shot is us running a Keyshot render, all 128 cores and 256 threads are running.

The only time we ran into an overheat issue was the case was pushed under the desk for a weekend and no airflow out the back.

Our simulations and renders can run days at a time at 100% with small breaks when the simulation goes to a single thread operation before ramping back up.

I would highly vouch for the Noctua CPU fans, we have had no RAM or CPU overheats except for that one weekend.

No heatsinks or fans on the RAM.

Any questions let me know.
 

Attachments

  • Like
Reactions: nnunn

maxermaxer

Active Member
Oct 28, 2016
288
48
28
48
I am running dual 7742 on a Supermicro H11DSi-NT board 512GB of RAM and a Nvidia Quadro RTX8000.

Our case is a Thermaltake W200
Our CPU fans are Noctua NH-U9 TR4-SP3 premium-grade Dual 92mm Fans, we use a separate fan controller and run all four fans at 2000 RPM
7 Case Fans are Noctua NF-P14s redux-1500 PWM, High Performance Cooling Fan, 4-Pin, 1500 RPM (140mm, Grey)
Running a Highpoint Rocket Raid 3740 controller and 6 Samsung EVO 2 TB in a Raid 10 configuration.

We do a lot of plastics mold flow, casting mold flow and FEA.

The screen shot is us running a Keyshot render, all 128 cores and 256 threads are running.

The only time we ran into an overheat issue was the case was pushed under the desk for a weekend and no airflow out the back.

Our simulations and renders can run days at a time at 100% with small breaks when the simulation goes to a single thread operation before ramping back up.

I would highly vouch for the Noctua CPU fans, we have had no RAM or CPU overheats except for that one weekend.

No heatsinks or fans on the RAM.

Any questions let me know.
Hi trazanka, is there a REV 1 and REV 2 of the SM motherboard you have? Which version are you using?
 

mirrormax

Active Member
Apr 10, 2020
225
83
28
its not the cpus overheating but vrms maybe have a look at your vrm temps in IPMI.
ive had both rev1 and rev2 of the h11dsi and iam pretty sure only bios chips is different.
 

trazanka

New Member
Jan 18, 2021
4
2
1
its not the cpus overheating but vrms maybe have a look at your vrm temps in IPMI.
ive had both rev1 and rev2 of the h11dsi and iam pretty sure only bios chips is different.
Attached are the VRM temps, we have not had an issue, we are only running 16 sticks of 32GB, here is the specs for our RAM 3D-1546R23424-32G 32GB Module DDR4 PC4-25600 CL=22 Registered ECC DDR4-3200 Dual Rank, x4 1.2V 4096Meg x 72 for SuperMicro H11DSi Motherboard.

One more change we made, after reviewing our logs we found the temps on the CPU's would bump up to 72 to 75C and could start throttling, we swapped out the Noctua NH-U9 TR4-SP3 with the Noctua NH-U14S TR4-SP3, since then we have not reach 60C under full load.

You may want to check your RAM and make sure it is good quality, we most most of our RAM from Memory4less and have had no issues with any of our servers or workstations.

If there is anything else you would like me to check or any tests you would want us to do let me know, I know how frustrating it can be to have a powerful piece of hardware not working correctly.
2021-01-27_17-23-30.pngIMG_3279.jpg
 
  • Like
Reactions: nnunn

gabe-a

New Member
Sep 10, 2020
26
6
3
To address the RAM temp question, the type of technology present in sub 1TB RAM fundamentaly differs from that in 1TB+ RAM.

The difference is due to something called LRDIMM. The "load reduced (LR)" aspect of these modules places much more load on the memory VRMs than typical RAM where the load is mostly in the sticks themselves. So really, "LRDIMM" is a misnomer as the overall system does not have Load Reduced (the "LR" in LRDIMM) but really load redistributed to the motherboard which is already hot as blazes.

When combining terabyte-scale clustering operations with high-VPU HPC workloads, the memory VRMs shoot up to 95C (and beyond) within minutes, particularly the hot one near the power intake. It's poorly designed such that the incoming current heats the board directly where the memory VRM heats up.

By the way, an update on my rig. With a lot of tinkering with ipmi and fan speed control in Linux, and the 40mm fan mounted with zip-ties over the "hottest" memory VRM as suggested by @alex_stief , I have been able to run this nearly full-load without shutdowns (only heavy throttling at times). To at least lessen the impact of these heavy throttle events, I wrote a daemon to actually detect them. Here's how it works: just monitor all core speeds and check to see if any core is < 500MHz. If it is, flag this and wait for it to go back up again. The moment it does, immediately call cpupower to set the speed to 1500 or 2000MHz (otherwise the affected processor will just keep jumping to 3.4GHz for a few seconds before shooting back down sub 500MHz for a few minutes, which is much less efficient than a steady 1500-2000MHz). Once CPU load settles, set the frequencies back to normal (3400MHz) and clear all existing flags.

I am considering the supermicro chassis (with the 9400RPM fans!) just because it seems built to purpose, not slapped together by kiddies at "mediaworkstations" who point the fans the wrong way! Of course if/once I get it, it may be problematic for me to get it assembled as I'm not a hardware expert. Will it even accomodate my 6 spinny HDDs?
 
Last edited:
  • Like
Reactions: nnunn and nasi

Stephan

Well-Known Member
Apr 21, 2017
920
697
93
Germany
Wow, what a fscking nightmare thread. Pains me just from looking at it. Some dumbo back-alley company selling a novice an overheating formula 1 car and wishing him good luck while waving good-bye out the door.

From looking at the pictures I would say those CPU coolers still look fishy. Are those original Supermicro? Those are designed for a case with forced airflow with shrouds. I also have doubts that your VRMs get enough airflow, especially those under the heatsink between CPUs.

To save this build I would suggest to get an original Supermicro case (empty, just PSU and fans, or whatever accessories you need) and plant the board, CPU, RAM, controllers, SSDs, GPU in there. Only get a case that is sold with this very board (!) and especially designed and certified (!) for these powerful CPUs. This will not be 100% silent but at this power level, that is next to impossible. I know it sucks but this will at least lead to a path of stability, or a method to diagnose a problem if something broke along the way. Run extended prime95, linpack or similar for 24h straight to see if it holds up without tripping temps or crashing. The reason they shipped you this machine with fans full throttle ("debug mode" haha...) is, that they realized they botched this build, not thinking the cooling system fully through. Fans full blast was their only hope to get temperatures under control. I doubt watercooling would have helped much, because the case still would have had no provision to mount fans to cool hot board components or RAM. It's the wrong case.

If you are not comfortable with reworking a system like this, you need to find and pay a reputable shop to do it. Alot of expensive things in there that need to be unscrewed and unmounted the right way.

Another option would be to strip the system into parts and sell those off, take the 10%-30% hit and move on.

Worry about the hard disks later. Even if they don't fit all into the new case. The HDDs could stay in the Fractal case with a cheap Core i3 ECC intel board, to run something advanced like ZFS on it. If you feel like trying something new. Boot from a small NVMe and connect the six disks to onboard SATA, done. Or plant them into a Synology NAS (needs 10 Gbps ethernet). RAID0 is asking for trouble, because any bad drive will ruin the full array. Every time. Are you trying to mine this new China coin with it? I run an 8-drive SC530 14TB drive RAIDZ2 in one box and it delivers 1 GByte/s sustained read anyhow. For drives this large, my recommendation is always to use a filesystem like ZFS that checksums everything, data and metadata.
 

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,319
800
113
I am considering the supermicro chassis (with the 9400RPM fans!) just because it seems built to purpose, not slapped together by kiddies at "mediaworkstations" who point the fans the wrong way! Of course if/once I get it, it may be problematic for me to get it assembled as I'm not a hardware expert. Will it even accomodate my 6 spinny HDDs?
I wouldn't go for the 745, but rather the 747 :) It's a lot bigger and has enough space for future upgrades

Have you considered watercooling?