Scrutiny - Hard Drive S.M.A.R.T Monitoring, Historical Trends & Real World Failure Thresholds

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

AnalogJ

New Member
Jan 2, 2019
13
1
3
Hey STH,

I've been working on a project that I think you'll find interesting -- Scrutiny.

If you run a server with more than a couple of hard drives, you're probably already familiar with S.M.A.R.T and the `smartd` daemon. If not, it's an incredible open source project described as the following:

smartd is a daemon that monitors the Self-Monitoring, Analysis and Reporting Technology (SMART) system built into many ATA, IDE and SCSI-3 hard drives. The purpose of SMART is to monitor the reliability of the hard drive and predict drive failures, and to carry out different types of drive self-tests.
These S.M.A.R.T hard drive self-tests can help you detect and replace failing hard drives before they cause permanent data loss. However, there's a couple issues with smartd:

  • There are more than a hundred S.M.A.R.T attributes, however smartd does not differentiate between critical and informational metrics
  • `smartd` does not record S.M.A.R.T attribute history, so it can be hard to determine if an attribute is degrading slowly over time.
  • S.M.A.R.T attribute thresholds are set by the manufacturer. In some cases these thresholds are unset, or are so high that they can only be used to confirm a failed drive, rather than detecting a drive about to fail.
  • smartd is a command line only tool. For head-less servers with multiple drives, a web UI would be more valuable.

Scrutiny is a Hard Drive Health Dashboard & Monitoring solution, merging manufacturer provided S.M.A.R.T metrics with real-world failure rates.

Here's a couple of screenshots that'll give you an idea of what it looks like:

dashboard.png

More Scrutiny Screenshots

Scrutiny is a simple but focused application, with a couple of core features:

  • Web UI Dashboard - focused on Critical metrics
  • smartd integration (no re-inventing the wheel)
  • Auto-detection of all connected hard-drives
  • S.M.A.R.T metric tracking for historical trends
  • Customized thresholds using real world failure rates
  • Temperature tracking
  • Provided as an all-in-one Docker image (but can be installed manually)
  • (Future) Configurable Alerting/Notifications via Webhooks
  • (Future) Hard Drive performance testing & tracking

---

So where can you download and try out Scrutiny?

That's where this gets a bit complicated, so please bear with me.

I've been involved with Open Source for almost 10 years, and it's been unbelievably rewarding -- giving me the opportunity to work on interesting projects with supremely talented developers.
I'm trying to determine if its viable for me to take on more professional Open source work, and that's where you come in.
Scrutiny is designed (and destined) to be open source, however I'd like gauge if the community thinks my work on self-hosted & devops tools is valuable as well.

I was recently accepted to the Github Sponsors program, and my goal is to reach 25 sponsors (at any contribution tier -- $1 is the minimum).
Each sponsor will receive immediate access to the Scrutiny source code, binaries and Docker images. Once I reach 25 sponsors, Scrutiny will be immediately open sourced with an MIT license (and I'll make an announcement here).

I appreciate your interest, questions and feedback. I'm happy to answer any questions about this monetization experiment as well (I'll definitely be writing a blog post on it later).

Sponsor @AnalogJ on GitHub Sponsors


Currently at 16/25 sponsors
 
Last edited:

RTM

Well-Known Member
Jan 26, 2014
956
359
63
Sounds like an interesting project, and I hope it succeeds.

That all said I do have a couple of comments, that I hope you will see as constructive criticism:

The first is also a question: is the full application intended to run on *every* server you have? or do you support a light weight client + web server sort of (client/server) architecture, where information is streamed from clients (servers themselves, but in this regard they are clients) to a server that may also host the webinterface?

I do not believe many server admins want to have full webserver applications running on all their servers, for resource usage/security implications/simplicity reasons.

The second comment: It seems like you are building something that overlaps in functionality with monitoring platforms like Nagios/Icinga/Zabbix/etc. (especially if you were to implement the client/server setup as above), granted you may do more specifically towards disk monitoring, but at the very least your solution would need to coexist with software like that if used in "a typical server environment", and honestly I do not see admins wanting more monitoring solutions on their systems (you don't want more components that can fail).

On a somewhat unrelated note, I am hoping to see a solution that merges the functionality that you get through configuration management software (like Puppet/Ansible/Saltstack/etc) with monitoring functionality (found in software like Nagios/Icinga/Zabbix/etc). It seems to me to be a good match, and again fewer agents on servers is a good thing. Of course a solution like that may exist already, I just haven't found it/searched around for it enough.
 

AnalogJ

New Member
Jan 2, 2019
13
1
3
Hey @RTM

Thanks for the interest!

To answer your first question. Yeah, Scrutiny is designed to run as client/server model, with a central API/frontend server and one or more collector agents running on your servers. Currently the easiest way to get started with Scrutiny is via the provided "all-in-one" Docker image (that has both the collector and API server running in the same container) but it's also available as separate Docker images or even just the raw binaries if you'd like to manually install Scrutiny.

In regards to your second comment, I understand your concerns, others have brought up similar points: "Why try to build a dedicated application for functionality that already exists in a general purpose monitoring & alerting tool".
So my response to that is broken up into 2 use cases:

  • So lets say you already have a metrics/monitoring system running on your home server, and you already have it checking for SMART health, and you've already designed and created dashboards.
    • In that case, Scrutiny's value is in the "real world thresholds", basically I've integrated BackBlaze's SMART data (and in the future, I'd like to use data provided by scrutiny users --with their permission) to highlight critical SMART attributes, as well provide real-world failure rates for applicable attributes.
    • In addition, my goal is to eventually write "collectors" that integrate with your existing monitoring tools (Prometheus/Nagios/Icinga/Zabbix/etc) and just pull the SMART data from their API's directly, meaning that you won't need to run multiple agents on the same server.
  • For users who dont have monitoring solution running already, it provides immediate value.
    • Scrutiny is a turn-key solution that requires no complicated setup, configuration, deployment or dashboard creation, and has a minimal infrastructure footprint. It's designed to just work of the box

I while I like your idea about agent-less monitoring, it would make a super focused tool like Scrutiny alot more complicated, and provide a much larger attack surface. Scrutiny is basically a "read-only" application, and has no need for access to sensitive ssh keys or even an understanding of network topology. I do agree that for general purpose monitoring tools, an agent-less system would be preferable.

I hope that answers some of your concerns/questions.

Thanks again for your support & interest!
 

chinesestunna

Active Member
Jan 23, 2015
621
191
43
56
@AnalogJ This looks awesome and really well done - I suggest posting it to reddit with r/datahoarder as well, I'm sure you'll get tons of supporters and users there.
 

GabyBoyn

New Member
Oct 18, 2020
3
0
1
US
I don't believe that many server administrators want to have full web server applications running on all their servers, for resource usage / security / simplicity reasons.