Fishing...any TIG stack guru's around?

whitey

Moderator
Jun 30, 2014
2,770
865
113
37
Flailing around here trying to get 'what I thought to be' a fairly simple Telegraf/InfluxDB/Grafana stack going. Before I go into too much detail do we have any 'in the trenches' guru's round' here prior to me going into a novel. It's late or rather early morning here and imma bout to flog something and need a sanity check. I have the basic stack up, running on RHEL7.7, all TIG components installed/validated health, but when I go to enable a vSphere dashboard I am having fits here w/ telegraf seemingly 'going to the sad place' and I cannot for the life of me sort it out.

/fark

More details in the morning after I get some sleep, this is really pissing me off though.

TIA, whitey
 
  • Like
Reactions: T_Minus

whitey

Moderator
Jun 30, 2014
2,770
865
113
37
Basically following these guides which 'should' be simple nuff'

Install Grafana and InfluxDB on CentOS 7 - Computing for Geeks
How To Monitor VMware ESXi with Grafana and Telegraf - Computing for Geeks
VMware vSphere - Overview dashboardData for Grafana

Whether I follow the computingforgeeks guides and just try to use the inherent vSphere plugin or follow the grafana guide for setting up the vSphere plugin they both end up making telegraf barf and go into a failed systemd/systemctl state. Journalctl does not tell me much more. Another strange thing is the DB never seems to be auto-created so I manually create it once import fails but even after manual DB creation and dashboard import no luv. I've tried with both authed DB access and wide open. STRANGE thing is if i just leave the vSphere plugin disabled and do not set that up telegraf stays running but as soon as I config the vSphere plugin it goes BOOM.


Status of healthy TIG stack prior to vSphere plugin enablement:

Code:
[kevin@grafana grafana]$ sudo systemctl status telegraf
● telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB
   Loaded: loaded (/usr/lib/systemd/system/telegraf.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2019-11-29 07:10:46 UTC; 14h ago
     Docs: https://github.com/influxdata/telegraf
 Main PID: 6400 (telegraf)
   CGroup: /system.slice/telegraf.service
           └─6400 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d

Nov 29 07:10:46 grafana.undercovernerds.com systemd[1]: Started The plugin-driven server agent for reporting metrics into InfluxDB.
Nov 29 07:10:46 grafana.undercovernerds.com telegraf[6400]: 2019-11-29T07:10:46Z I! Starting Telegraf 1.12.6
Nov 29 07:10:46 grafana.undercovernerds.com telegraf[6400]: 2019-11-29T07:10:46Z I! Loaded inputs: kernel mem processes swap system cpu disk diskio
Nov 29 07:10:46 grafana.undercovernerds.com telegraf[6400]: 2019-11-29T07:10:46Z I! Loaded aggregators:
Nov 29 07:10:46 grafana.undercovernerds.com telegraf[6400]: 2019-11-29T07:10:46Z I! Loaded processors:
Nov 29 07:10:46 grafana.undercovernerds.com telegraf[6400]: 2019-11-29T07:10:46Z I! Loaded outputs: influxdb
Nov 29 07:10:46 grafana.undercovernerds.com telegraf[6400]: 2019-11-29T07:10:46Z I! Tags enabled: host=grafana.undercovernerds.com
Nov 29 07:10:46 grafana.undercovernerds.com telegraf[6400]: 2019-11-29T07:10:46Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"grafana.undercovernerds.com...erval:10s
Hint: Some lines were ellipsized, use -l to show in full.
Here's the status of failed systemctl w/ vSphere plugin enabled:

Code:
[kevin@grafana telegraf.d]$ sudo systemctl status telegraf
● telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB
   Loaded: loaded (/usr/lib/systemd/system/telegraf.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Fri 2019-11-29 21:59:25 UTC; 9s ago
     Docs: https://github.com/influxdata/telegraf
  Process: 17544 ExecStart=/usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d $TELEGRAF_OPTS (code=exited, status=1/FAILURE)
 Main PID: 17544 (code=exited, status=1/FAILURE)

Nov 29 21:59:25 grafana.undercovernerds.com systemd[1]: telegraf.service: main process exited, code=exited, status=1/FAILURE
Nov 29 21:59:25 grafana.undercovernerds.com systemd[1]: Unit telegraf.service entered failed state.
Nov 29 21:59:25 grafana.undercovernerds.com systemd[1]: telegraf.service failed.
Nov 29 21:59:25 grafana.undercovernerds.com systemd[1]: telegraf.service holdoff time over, scheduling restart.
Nov 29 21:59:25 grafana.undercovernerds.com systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
Nov 29 21:59:25 grafana.undercovernerds.com systemd[1]: start request repeated too quickly for telegraf.service
Nov 29 21:59:25 grafana.undercovernerds.com systemd[1]: Failed to start The plugin-driven server agent for reporting metrics into InfluxDB.
Nov 29 21:59:25 grafana.undercovernerds.com systemd[1]: Unit telegraf.service entered failed state.
Nov 29 21:59:25 grafana.undercovernerds.com systemd[1]: telegraf.service failed.
HALP! :-D

End goal is getting this to work and then moving onto a dashboard for my enphase solar system metrics/reporting as I am not digging the enlighten app.