Alright, so the good news is that the machine is up and running and knocks down a little over 1.2MM/1.1MM IOPS read/write 4KQ32 with no bios performance tweaking and the fans running in "acoustic" mode.
The bad news is that I've uncovered a rather significant error which I've not been able to sort out on my own. Ideally, someone who has more experience with Intel boards will be able nail this down pretty quickly.
Symptoms:
-Front panel "status light" flashes green continuously on a steady interval 100% of the time.
-Intel's Active System Console software reports a sensor critical voltage fault on one of the "discrete" sensors in the power subsection
-Every time the server is rebooted I lose two to four drives which are suddenly reported to be 1GB capacity by the operating system. Initially I thought I had actually bricked the drives, but after several attempts to revive them via various methods (Windows, Parted Magic/secure erase, Samsung DC Toolkit, etc) I discovered that the following performed in terminal from PM seems to temporarily correct the status of the drives (for drive #3 namespace 1):
root@PartedMagic:~# nvme format /dev/nvme3n1
root@PartedMagic:~# nvme subsystem-reset /dev/nvme3
root@PartedMagic:~# nvme reset /dev/nvme3
Prior to the steps above, both DC Toolkit and nvme cli run from terminal report the drive capacity to be 1GB (I'm assuming this is the cache), fw version to be FAILMOD, and some really odd SMART values..
Is there a way to determine which discrete sensor is reporting the under voltage?
Is it possible that I've under powered the system by installing only a single 750W PSU? Right now any unnecessary items have been pulled so we're talking about the motherboard, 128GB (8 sticks) DDRR4 2400, 2x2667v4's, 8x PM963's, 1x Intel P3700, 2x 8 drive 2.5" (only 4 drives installed in each) backplanes, stock fans, and three riser cards.. I'm not getting any system events originating from the PSU.
System summary:
Any commentary, advice, or possible solution will be much appreciated!
The bad news is that I've uncovered a rather significant error which I've not been able to sort out on my own. Ideally, someone who has more experience with Intel boards will be able nail this down pretty quickly.
Symptoms:
-Front panel "status light" flashes green continuously on a steady interval 100% of the time.
-Intel's Active System Console software reports a sensor critical voltage fault on one of the "discrete" sensors in the power subsection
-Every time the server is rebooted I lose two to four drives which are suddenly reported to be 1GB capacity by the operating system. Initially I thought I had actually bricked the drives, but after several attempts to revive them via various methods (Windows, Parted Magic/secure erase, Samsung DC Toolkit, etc) I discovered that the following performed in terminal from PM seems to temporarily correct the status of the drives (for drive #3 namespace 1):
root@PartedMagic:~# nvme format /dev/nvme3n1
root@PartedMagic:~# nvme subsystem-reset /dev/nvme3
root@PartedMagic:~# nvme reset /dev/nvme3
Prior to the steps above, both DC Toolkit and nvme cli run from terminal report the drive capacity to be 1GB (I'm assuming this is the cache), fw version to be FAILMOD, and some really odd SMART values..
Is there a way to determine which discrete sensor is reporting the under voltage?
Is it possible that I've under powered the system by installing only a single 750W PSU? Right now any unnecessary items have been pulled so we're talking about the motherboard, 128GB (8 sticks) DDRR4 2400, 2x2667v4's, 8x PM963's, 1x Intel P3700, 2x 8 drive 2.5" (only 4 drives installed in each) backplanes, stock fans, and three riser cards.. I'm not getting any system events originating from the PSU.
System summary:
Any commentary, advice, or possible solution will be much appreciated!
Last edited: