You've come a long way from when you lost those Micron SSDs in the early days
STH is rather unique in your willingness to share openly what is happening behind the scenes. Even the ugly. Which is a big part of why I find this place so interesting to hang around. TY.
Much different days for sure!
In comparison, this was a bit of a pain, but most of that was due to how slow servers boot rather than anything else.
I asked about the last time I was servicing the hosting racks and it was ~18 months ago (mid-Dec 2018.) A big part of that is changing strategy for hosting.
Another difference is that instead of rushing to get a piece up on this today, I am trying to make a better piece with more useful thoughts. Certainly going in with a well-defined triage plan helps since it becomes just executing rather than the emotional void of not knowing how to remedy. It meant while I was working on this I was also taking notes of what could have been better.
We are big enough now that I did at least pause and think if Intel might be unhappy if I post pictures of their drives and point to them as a point of failure. At the same time, I think you are right. It was a pretty quick decision that we were going to share this. It would be weird to talk about storage, redundancy, and backups and never talk about failures. If these machines never failed, infrastructure would look a lot different.