H8DCL-iF

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

lpallard

Member
Aug 17, 2013
276
11
18
So the motherboard in the title has given me weird issues since I purchased it about a year ago.

First there was a single occurence where Linux reported ECC errors but that did not come bac AFAIK.

Then about 8 months ago I started having serious issues with the SATA drives connected to the motherboard SATA ports ([SOLVED] Slack64-14.0 server crash entries in dmesg after a while running

Since then I haven't used the SATA ports at all (didnt need them).

Today, I reinstalled proxmox and noticed that the BIOS is now reporting only 48GB RAM instead of 64 (installed). This worked until yesterday AFAIK. I also noticed during POST:

Node 0 DCT0=1600MHz
Node 0 DCT1=1600MHz
Node 1 DCT0=N/A
Node 1 DCT1=1600MHz

No POST beeps.

I am using 4x16GB sticks so the N/A makes me think the motherboard RAM slot #3 (or Northbridge) has damage.

I am thinking RMA'ing this motherboard before the 1 year warranty expires in 3 weeks.

Has Supermicro's quality dropped so much ? It seems I am overwhelmed with hardware failures these days....
 

lpallard

Member
Aug 17, 2013
276
11
18
Ran a dmidecode on the server before & after swapping the sticks around:

Before swapping the sticks around

# dmidecode 2.11
SMBIOS 2.6 present.

Handle 0x0017, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x0015
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: <OUT OF SPEC>
Set: None
Locator: P1_DIMM1B
Bank Locator: BANK0
Type: Unknown
Type Detail: None
Speed: Unknown
Manufacturer: Manufacturer00
Serial Number: SerNum00
Asset Tag: AssetTagNum0
Part Number: ModulePartNumber00
Rank: Unknown

Handle 0x0019, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x0015
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: None
Locator: P1_DIMM1A
Bank Locator: BANK1
Type: DDR3
Type Detail: Synchronous
Speed: 1600 MHz
Manufacturer: Kingston
Serial Number: 2D1A584D
Asset Tag: AssetTagNum1
Part Number: 9965516-071.A00LF
Rank: 2

Handle 0x001B, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x0015
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: <OUT OF SPEC>
Set: None
Locator: P1_DIMM2B
Bank Locator: BANK2
Type: Unknown
Type Detail: None
Speed: Unknown
Manufacturer: Manufacturer02
Serial Number: SerNum02
Asset Tag: AssetTagNum2
Part Number: ModulePartNumber02
Rank: Unknown

Handle 0x001D, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x0015
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: None
Locator: P1_DIMM2A
Bank Locator: BANK3
Type: DDR3
Type Detail: Synchronous
Speed: 1600 MHz
Manufacturer: Kingston
Serial Number: 2D1A654D
Asset Tag: AssetTagNum3
Part Number: 9965516-071.A00LF
Rank: 2

Handle 0x0021, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x001F
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: <OUT OF SPEC>
Set: None
Locator: P2_DIMM1B
Bank Locator: BANK4
Type: Unknown
Type Detail: None
Speed: Unknown
Manufacturer: Manufacturer00
Serial Number: SerNum00
Asset Tag: AssetTagNum4
Part Number: ModulePartNumber04
Rank: Unknown

Handle 0x0023, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x001F
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: <OUT OF SPEC>
Set: None
Locator: P2_DIMM1A
Bank Locator: BANK5
Type: Unknown
Type Detail: None
Speed: Unknown
Manufacturer: Manufacturer01
Serial Number: SerNum01
Asset Tag: AssetTagNum5
Part Number: ModulePartNumber05
Rank: Unknown

Handle 0x0025, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x001F
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: <OUT OF SPEC>
Set: None
Locator: P2_DIMM2B
Bank Locator: BANK6
Type: Unknown
Type Detail: None
Speed: Unknown
Manufacturer: Manufacturer02
Serial Number: SerNum02
Asset Tag: AssetTagNum6
Part Number: ModulePartNumber06
Rank: Unknown

Handle 0x0027, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x001F
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: None
Locator: P2_DIMM2A
Bank Locator: BANK7
Type: DDR3
Type Detail: Synchronous
Speed: 1600 MHz
Manufacturer: Kingston
Serial Number: 2F1B3A44
Asset Tag: AssetTagNum7
Part Number: 9965516-071.A00LF
Rank: 2

after swapping the sticks around:

# dmidecode 2.11
SMBIOS 2.6 present.

Handle 0x0017, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x0015
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: <OUT OF SPEC>
Set: None
Locator: P1_DIMM1B
Bank Locator: BANK0
Type: Unknown
Type Detail: None
Speed: Unknown
Manufacturer: Manufacturer00
Serial Number: SerNum00
Asset Tag: AssetTagNum0
Part Number: ModulePartNumber00
Rank: Unknown

Handle 0x0019, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x0015
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: None
Locator: P1_DIMM1A
Bank Locator: BANK1
Type: DDR3
Type Detail: Synchronous
Speed: 1600 MHz
Manufacturer: Kingston
Serial Number: 2F1B3A44
Asset Tag: AssetTagNum1
Part Number: 9965516-071.A00LF
Rank: 2

Handle 0x001B, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x0015
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: <OUT OF SPEC>
Set: None
Locator: P1_DIMM2B
Bank Locator: BANK2
Type: Unknown
Type Detail: None
Speed: Unknown
Manufacturer: Manufacturer02
Serial Number: SerNum02
Asset Tag: AssetTagNum2
Part Number: ModulePartNumber02
Rank: Unknown

Handle 0x001D, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x0015
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: None
Locator: P1_DIMM2A
Bank Locator: BANK3
Type: DDR3
Type Detail: Synchronous
Speed: 1600 MHz
Manufacturer: Kingston
Serial Number: 2F1A5E4D
Asset Tag: AssetTagNum3
Part Number: 9965516-071.A00LF
Rank: 2

Handle 0x0021, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x001F
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: <OUT OF SPEC>
Set: None
Locator: P2_DIMM1B
Bank Locator: BANK4
Type: Unknown
Type Detail: None
Speed: Unknown
Manufacturer: Manufacturer00
Serial Number: SerNum00
Asset Tag: AssetTagNum4
Part Number: ModulePartNumber04
Rank: Unknown

Handle 0x0023, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x001F
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: <OUT OF SPEC>
Set: None
Locator: P2_DIMM1A
Bank Locator: BANK5
Type: Unknown
Type Detail: None
Speed: Unknown
Manufacturer: Manufacturer01
Serial Number: SerNum01
Asset Tag: AssetTagNum5
Part Number: ModulePartNumber05
Rank: Unknown

Handle 0x0025, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x001F
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: <OUT OF SPEC>
Set: None
Locator: P2_DIMM2B
Bank Locator: BANK6
Type: Unknown
Type Detail: None
Speed: Unknown
Manufacturer: Manufacturer02
Serial Number: SerNum02
Asset Tag: AssetTagNum6
Part Number: ModulePartNumber06
Rank: Unknown

Handle 0x0027, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x001F
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: None
Locator: P2_DIMM2A
Bank Locator: BANK7
Type: DDR3
Type Detail: Synchronous
Speed: 1600 MHz
Manufacturer: Kingston
Serial Number: 2D1A584D
Asset Tag: AssetTagNum7
Part Number: 9965516-071.A00LF
Rank: 2

IMO the motherboard has a problem. Can anyone confirm?

If I am not mistaken, this corresponds to the same memory slot which reported ECC errors a while back...
 

lpallard

Member
Aug 17, 2013
276
11
18
So here's an update which needs input from you guys....

Motherboard was deemed "defective" by Supermicro and sent for RMA a few weeks back. They just informed me that they have been performing testing on the board with identical processors and 8X Samsung RAM sticks and all was fine. They couldn't reproduce the dissapeaing RAM problem on 3rd slot of CPU2..

They are concluding that the CPUs or RAM is bad and the motherboard is fine. They offered to test the system with my existing CPUs and RAM if I send it to them. Fine, but IMO if the motherboard is fine, and swapping the RAM sticks did not help (whatever I put in slot3 of CPU2 is NOT detected) then I assume the RAMis also fine which leaves one of the CPUs being defective...

Are my problems usually symptoms of a dying CPU?
 
Last edited:

Patrick

Administrator
Staff member
Dec 21, 2010
12,516
5,830
113
I cannot remember the last time I saw a CPU die except from contact pad issues.
 

Entz

Active Member
Apr 25, 2013
269
62
28
Canada Eh?
Possible issue with the chassis or something in your environment (though I would tend to agree that it is a motherboard issue). Flex / grounding issue?
 

lpallard

Member
Aug 17, 2013
276
11
18
I agree this is 100% weird but... I have nevertheless I have sent the CPUs and RAM sticks to Supermicro so they can perform some testing..

Entz, now that you talk about chassis issues, a week or so before the stick of 16GB RAM disappeared, I installed the system in a Norco RPC-470 chassis... Everything else stayed the same (same PSU, etc..) only upgraded the case from a desktop tower to a rackmount case.

A few days before the RAM disappeared, I've blown the server with compressed air to eliminate dust. I am starting to wonder if some dust wouldn't have moved to somewhere not good but managed to be gone by the time the board made it to Supermicro... Far fetched IMO but possible..

SUpermicro's tech whos very pleasant to deal with, responsed when asked about the RAM issue: "The system is testing auto on/off script with 2*4334 and 8*Samsung 1600mhz. So far I cannot reproduce memory lost issue."

He then replied :"It might be your memory or your CPU. CPU has memory controller. You can send in your CPUs and memory so I can test them."

Next is to wait for the parts to get there and tested....

Weird issue no?
 

lpallard

Member
Aug 17, 2013
276
11
18
OK CPUs and RAM sticks made it to Supermicro and their tech installed them on the server motherboard. No problems were detected at all... He couldn't reproduce the problem with the missing RAM. I have mentioned the grounding issue caused by the norco case as Entz suggested a few post back, and the Supermicro tech agreed that could be a problem and could cause such issues.

I will remove all standoffs and install brass ones from another built, only around the edges of the board and two or three in the center to support it. Grounding issues are the only reason I can think of for the disappearing RAM stick, and for the SATA ports problem, all I can think of is that somehow the combination consumer SATA drives, conventional SATA ports and linux RAID are not to be together! (hence why I now use M1015's and a M5016).

I should get the board, CPUs and RAM in about a week's time, once assembled in the Norco case, I will post my findings back.

What do you guys think??
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,516
5,830
113
That makes sense I guess. Maybe double check and get out a bit of electrical tape when you re-install.
 

Entz

Active Member
Apr 25, 2013
269
62
28
Canada Eh?
Yeah definitely worth a shot. Maybe try building it outside the case as well.

When you were doing your tests did you have the RAID cards installed or no?
 

lpallard

Member
Aug 17, 2013
276
11
18
When you guys talk about electrical tape, you mean putting a piece of tape at the end of the standoffs??

Doesnt' a motherboard need to make contact with the standoffs for proper grounding?

I did not do any tests, Supermicro did and no, the RAID cards were not installed in the system. Nothing has changed with the system in months (10-12) except the Norco case.

Quick timeline to "clear up the fog":

  • August 2013: Purchased supermicro mobo, RAM and CPUs, installed in a desktop case. All was working fine.
  • November 2013: Purchased a M1015 and everything was fine.
  • February 2014: Purchased a M5016 and everything was fine.
  • July 2014: Purchased a Norco RPC-470 case and transfered the system into it, all was fine
  • 2 weeks later: Blown server with compressed air to remove dust
  • 1 week later: 16GB RAM disappeared.
  • August 2014: Sent the motherboard to Supermicro for testing along with the actual CPUs and RAM sticks
  • Mid-August 2014: Supermicro couldnt reproduce the problem
  • End of August 2014: Supermicro are sending the equipment back to me untouched

:)