Strange problem. Help me understand.

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Fritz

Well-Known Member
Apr 6, 2015
3,371
1,375
113
69
Just upgraded the MB in one of my servers. Old board was a Supermicro X8DT6-F. New board is a Supermicro X9DRD-7LN4F. Bought the new board about a month ago and fired it up on the bench with a couple of RDIMMS. Just want to verify that it worked. It did.

Final config consisted of a pair of E5 2690V1's and 128GB (16x 8GB). Installed the board in the SC846 and got no post, just low speed fans. Started removing the RDIMMs one at a time and reached the point where it would post but hung at BA code which is memory initialzation. Using the process of elemination I zeroed in on one slot. Remove the RDIMM and the system successfully completed POST. Replaced the RDIMM and it hung at BA again. Almost accepted that the slot was bad even tho visually I saw nothing different. Then I vaguely remembered reading on here that the CPU can cause memory slot problems. I had set of E5 2670's so decided to install them and see what happens. Everything then was normal. All memory slots were working properly and no post hang. Spent the rest of the day running diags and benchmarks with zero problem so apparent the E5 2690 set is bad, either one or both. I visually inspected them and the CPU sockets with a magnifying glass and a strong light and saw nothing out of the ordinary. Anybody know what might be wrong with the E5 2790's? I don't have a single CPU v1/2 MB to test each CPU individually.

Anybody ever deal with something like this?

TIA
 

ari2asem

Active Member
Dec 26, 2018
745
128
43
The Netherlands, Groningen
you are talking about 3 different cpu's. (2690, 2760, 2790)

is that correct ??

did you try your first set of cpu's (2690 v1) on your newer board (x9-serie) to reseat it ?
because sometime memory problem is solved by just reseating the cpu's
 

Fritz

Well-Known Member
Apr 6, 2015
3,371
1,375
113
69
you are talking about 3 different cpu's. (2690, 2760, 2790)

is that correct ??

did you try your first set of cpu's (2690 v1) on your newer board (x9-serie) to reseat it ?
because sometime memory problem is solved by just reseating the cpu's
Sorry, typo. 2670 and 2690 only.
 

Fritz

Well-Known Member
Apr 6, 2015
3,371
1,375
113
69
I'm curious. Is there a reason why you can't test them by running just one at a time in Socket 0? That motherboard should work on one CPU shouldn't it?
Yea, I could have if I had thought about it but it's all buttoned up and back in the rack now. I had to remove and reinstall it in the rack by myself. Even with all 24 drives and the top removed it was a helluva job. I've got v3/4 boards but no v1/2's. Might pick one up if a deal comes along.
 

itronin

Well-Known Member
Nov 24, 2018
1,234
793
113
Denver, Colorado
Anybody ever deal with something like this?
Yes quite recently X10SRL-F. 4 x 32gb sticks.

I had installed an E5-2678v3. BIOS kept getting hung up on memory check. At first I thought I had bad memory. I swapped in a 2673v3 and still no joy. I pulled the CPU and took a look at the socket with my illuminated magnifying glass and found Whoville had taken up residence on one of the pins. Used the tip of Pentel 0.5mm mechanical pencil to get that off.

After the 2670's worked did you try the 2690's again or just leave the 2670's in there?

It is possible Whoville came to visit you and in the process of changing the CPU's you unknowingly removed it from the pin it was resting on.

It is also possible you have a bad CPU or two.

Worth noting a bent CPU pin can do the same thing. The hollow tip in the Pentel 0.5mm pencil also works extremely well to gently straighten bent pins.
 
  • Like
Reactions: Markess

Fritz

Well-Known Member
Apr 6, 2015
3,371
1,375
113
69
LOL :p I didn't try the 2690's again after the 2670's worked. I decided to leave well enough alone. Somewhere in the future I revisit them. For now Fatty is back up and running so I'll leave it be. As for the socket pins, I screwenized them with a magnifying glass and saw nothing.
 

pcmoore

Active Member
Apr 14, 2018
138
48
28
New England, USA
Since you changed the motherboard, did you verify that there isn't a rivet/screw/etc. on the case which is causing a short? I might suggest temporarily placing the fully loaded motherboard on top of a piece or cardboard/plastic/etc. and seeing if the problem resolves itself.
 

Fritz

Well-Known Member
Apr 6, 2015
3,371
1,375
113
69
Since you changed the motherboard, did you verify that there isn't a rivet/screw/etc. on the case which is causing a short? I might suggest temporarily placing the fully loaded motherboard on top of a piece or cardboard/plastic/etc. and seeing if the problem resolves itself.
Yep. That's the first thing I checked. No rouge standoffs. And since replacing the CPU's fixed the problem it also eliminated all other possibilities. Except an intermittent problem off course. It ran all night without an issues So I'm calling it a day. For curiosities sake I'd like to know what's wrong with the CPU's. Maybe they took a static electricity hit, who knows.
 

Mwilliamson

New Member
Aug 15, 2020
19
7
3
Fenton, Michigan, USA
I've seen on-die memory controllers have this kind of issue before, though I do not know why it happens. It's not a common thing, as usually an issue like this is caused by a bent/missing pin, or dust in either the dimm channel or the processor socket, though you've already ruled that out.

One other possibility is that the problem may have been caused by a QPI error. Sometimes a QPI link between processors fails, and one of the symptoms is memory errors. In-house, we usually just swap the two processors around and the error corrects itself; which may be what you've seen here by just using a different set of proc's.
 
  • Like
Reactions: Fritz