I'm having a strange issue tied to a specific MS-01. I'm running a hyperconverged proxmox cluster with ceph. 3x ms-01 nodes. Call them Node1, Node2, Node3. The problem i'm having is only with Node1.
Node1 went down the other day. I restarted it and it crashed as it was loading VMs...
I have 2x netapp DS4246 LFF, IOM6, and i swapped out the PSU's with some more efficient DELTA PSUs.
My main shelf went offline, I found the fans on @ 100%. Both PSU's (I only use 2 out of 4) were indicating DC Power output issue, HBA was not detecting the chassis or anything in it. I swapped...
How is down clocking the ram and CPU an acceptable solution? The system is advertised with a certain spec, whoopsie you have to run it at under advertised capability to be stable.
Update on my last post. PVE2 most was the node having unexplained ceph mon/mgr/osd crashes, and unexplained hard lockups that were happening pretty much daily.
I had a spare ms-01 that i bought for another project that I never set up, with new ram, and it's been over 24 hours without a crash...
I just ordered 2 of those e-key converters. My workaround is i've been using external nvme drives in usb enclosures as boot drives in a raid1 pair. But all the cables are jank as hell and very sensitive to being jostled around
@JaxJiang or anyone else who might know more about the thunderbolt ports.
As I understand the 13900h has on die thunderbolt controllers. What I don't understand is how much bandwidth do they have? Do the controllers use pcie lanes? if so do they share the pcie lanes?
I am using...
I've been having some strange lockups that just randomly started and i can't track them down. I am noticing my 22110 enterprise NVME drives are running warm, i'm unsure if they were always running this warm as i didn't record the thermals last year. I've seen some 3d printed modified cases...
Not yet, but I'm thinking that might be worth a try, just rebuilding exchange is annoying. I'm trying a few other things first, i've been seeing some repeating messages in my host logs that i've been working on isolating and cleaning up.
I noticed some osd and ceph-mon and ceph-mgg crashes...
Hopefully someone here has some thoughts that will help me not have to read 99 pages of form posts, i know there have been some issues with mysterious instability.
I have a 3 node 13900h ms-01 Proxmox cluster with 96gb ram each node. Each node has dual 4tb samsung p983 m.2 nvme drives and a...
Ive been running a thunderbolt backhaul for a few weeks now. It's great. I found that I had to pin the thunderbolt IRQs to the big cores, otherwise I'd get inconsistent throughput.
I also have inconsistency with thunderbolt network interface and fabric network coming up automatically after a...
Mesh Commander works fine for me except i also find that networking mounting an iso through AMT is janky AF and doesn't work right half the time, or is painfully slow.
an oculink egpu would be a much better long term solution than trying to cram some sff gpu with a low power budget in there, that's brilliant. I may just have to test this!
The paste on my units seemed fine, but I finally had a chance to install PTM7950. Not noticing a big change in temps during a stress test, but geekbench 6 and passmark both had improved single and multicore performance. Multicore was about 10% better, but single core was improved by 25%, and...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.