-
Hi all,
from one day to another i'm suddenly unable to get graphics when clicking on "Show performance chart" in my services - it always results (in each case) with a "no data".
Restarting the omd does not help, neither restarting the whole system...
What can i do? Is there a description somewhere to follow in such cases?
I am on OMD 5.30-labs-edition with Thruk 3.12, host is OK (no load, enough free hdd space)
Thanks for any help!
Joe
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 6 comments 12 replies
-
There are many wheels involved here, which graphing and database do you use?
omd config show | grep -E "(PNP4NAGIOS|NAGFLUX|INFLUXDB|VICTORIAMETRICS|GEARMAND):"
Beta Was this translation helpful? Give feedback.
All reactions
-
Hello jstark1,
thank you for your help!
The answer to the above command is:
INFLUXDB: on
NAGFLUX: on
PNP4NAGIOS: off
VICTORIAMETRICS: off
Beta Was this translation helpful? Give feedback.
All reactions
-
now i would look for recent performance data with (replace hostname and command)
influx
Connected to http://127.0.0.1:8086 version 1.8.10
InfluxDB shell version: 1.8.10
> use nagflux;
Using database nagflux
> SELECT * FROM "metrics" WHERE "host" = 'localhost' and command = 'check_ping' and time > '2024-11-30T14:25:00Z' LIMIT 10
Beta Was this translation helpful? Give feedback.
All reactions
-
if i execute the command i get answers until the date since it stops to show graphs (21.11.2024), after that i get no results:
`
SELECT * FROM "metrics" WHERE "host" = 'win1' and command = 'check-host-alive' and time > '2024-11-21T14:25:00Z' LIMIT 10
name: metrics
time command crit crit-fill host max min performanceLabel service unit value warn warn-fill
2024年11月21日T14:26:04Z check-host-alive 100 none win1 100 0 pl hostcheck % 0 80 none
2024年11月21日T14:26:04Z check-host-alive win1 rtmin hostcheck ms 0
2024年11月21日T14:26:04Z check-host-alive 5000 none win1 0 rta hostcheck ms 0.315 3000 none
2024年11月21日T14:26:04Z check-host-alive win1 rtmax hostcheck ms 0.486
2024年11月21日T14:31:04Z check-host-alive 5000 none win1 0 rta hostcheck ms 0.301 3000 none
2024年11月21日T14:31:04Z check-host-alive win1 rtmax hostcheck ms 0.539
2024年11月21日T14:31:04Z check-host-alive 100 none win1 100 0 pl hostcheck % 0 80 none
2024年11月21日T14:31:04Z check-host-alive win1 rtmin hostcheck ms 0
2024年11月21日T14:36:04Z check-host-alive 100 none win1 100 0 pl hostcheck % 0 80 none
2024年11月21日T14:36:04Z check-host-alive win1 rtmin hostcheck ms 0
SELECT * FROM "metrics" WHERE "host" = 'win1' and command = 'check-host-alive' and time > '2024-11-22T14:25:00Z' LIMIT 10
`
Additionally i recognized today, that since 10 days the partitons free space is rapidly decreasing!
It seems that there is a problem somewhere with "cleaning up" data?!
This is remarkable because the system is up for 3 years now and there have never been such a problem.
Beta Was this translation helpful? Give feedback.
All reactions
-
ok, nagflux is not writing data to the influxdb. I would check etc/mod-gearman/server.cfg for "perdata=grafana_data", and ModGearman section in etc/nagflux/config.gcfg.
See https://omd.consol.de/docs/omd/howtos/grafana/
some troubleshooting commands:
omd diff etc/mod-gearman/server.cfg
omd diff etc/nagflux/config.gcfg
gearman_top -b | grep grafana_data
Beta Was this translation helpful? Give feedback.
All reactions
-
i don't think so:
root@nus:# df -h
Filesystem Size Used Avail Use% Mounted on
udev 7,8G 0 7,8G 0% /dev
tmpfs 1,6G 948K 1,6G 1% /run
/dev/sda2 98G 48G 46G 52% /
...
Beta Was this translation helpful? Give feedback.
All reactions
-
But InfluxDB thinks so. Same error message here: https://community.influxdata.com/t/writing-on-influxdb-stopped-no-space-on-device/2328
Beta Was this translation helpful? Give feedback.
All reactions
-
OK, understand.
i rebooted the host now and the error with "no space left on device" disappeared: good.
But data are still not available, the log var/log/nagflux/nagflux.log now says:
...
[2024年12月03日 08:11:10.493][Warn][nagiosSpoolfileWorker.go:124] NagiosSpoolfileWorker: Could not write to buffer
[2024年12月03日 08:11:10.495][Warn][dumpfileCollector.go:81] DumpfileCollector: Could not write to buffer
[2024年12月03日 08:11:10.808][Warn][Worker.go:238] Post "http://127.0.0.1:8086/write?precision=ms&db=nagflux&u=omdadmin&p=XXX": net/http: HTTP/1.x transport connection broken: malformed MIME header line: X-Influxdb-Error: partial write: unable to parse 'L?J61Q1rK??8?8????L?
N
??G
*?r)?!!?"?w?wL? J /#1 }If;????=]=1M?JX??Agb-': invalid field format unable to parse '[httpd] 127.0.0.1 - omdadmin [12/Feb/2024:16:05:10 +0100] "POST /write?db=nagflux&p=%5BREDACTED%5D&precision=ms&u=omdadmin HTTP/1.1 " 204 0 "-" "Nagflux" 1db4e7b4-c9b8-11ee-92a7-00155db338da 10716 metrics,host=HV03,service=IPMI,command=check-nrpe,performanceLabel=P2-DIMMB2\ Temp,warn-fill=none value=31.00,warn=10.0 1727771266000 metrics,host=Sophos-Central-LN,service=hostcheck,command=check-host-alive,performanceLabel=rtmax,unit=ms value=31.770 1727771267000 metrics,host=ES-Switch02,service=hostcheck,command=check-host-alive2,performanceLabel=pl,warn-fill=none,crit-fill=none,unit=% crit=100.0,value=0.0,warn=80.0 1727771268000 metrics,host=lef-uc01,service=hostcheck,command=check-host-alive,performanceLabel=rta,warn-fill=none,crit-fill=none,unit=ms min=0.0,value=12.944,warn=3000.000,crit=5000.0
[2024年12月03日 08:11:10.808][Info][Worker.go:192] Dumping queries which couldn't be sent to: -nagflux.influx
[2024年12月03日 08:11:10.821][Warn][Worker.go:238] Post "http://127.0.0.1:8086/write?precision=ms&db=nagflux&u=omdadmin&p=XXX": net/http: HTTP/1.x transport connection broken: malformed MIME header line: X-Influxdb-Error: partial write: unable to parse 'L?J61Q1rK??8?8????L?
N
??G
*?r)?!!?"?w?wL? J /#1 }If;????=]=1M?JX??Agb-': invalid field format unable to parse '?': missing fields unable to parse '?': missing fields unable to parse '[httpd] 127.0.0.1 - omdadmin [12/Feb/2024:16:05:10 +0100] "POST /write?db=nagflux&p=%5BREDACTED%5D&precision=ms&u=omdadmin HTTP/1.1 " 204 0 "-" "Nagflux" 1db4e7b4-c9b8-11ee-92a7-00155db338da 10716 metrics,host=EX01,service=hostcheck,command=check-host-alive2,performanceLabel=pl,warn-fill=none,crit-fill=none,unit=% value=0.0,warn=80.0,crit=100.0 1725972534000 metrics,host=Raum1,service=hostcheck,command=check-host-alive2,performanceLabel=rtmin,unit=ms value=1.645 1727771260000 metrics,host=DC01,service=AD\ Schemasync,command=check-nrpe,performanceLabel=DRASyncFailuresonSchemaMismatch,warn-fill=none,crit-fill=none value=0.0,warn=50.0,crit=100.0 1716874747000 metrics,host=HV03,service=IPMI,command=check-nrpe,performanc
[2024年12月03日 08:11:10.823][Info][Worker.go:192] Dumping queries which couldn't be sent to: -nagflux.influx
[2024年12月03日 08:11:10.834][Warn][Worker.go:238] Post "http://127.0.0.1:8086/write?precision=ms&db=nagflux&u=omdadmin&p=XXX": net/http: HTTP/1.x transport connection broken: malformed MIME header line: X-Influxdb-Error: partial write: unable to parse 'L?J61Q1rK??8?8????L?
N
??G
*?r)?!!?"?w?wL? J /#1 }If;????=]=1M?JX??Agb-': invalid field format unable to parse '[httpd] 127.0.0.1 - omdadmin [12/Feb/2024:16:05:10 +0100] "POST /write?db=nagflux&p=%5BREDACTED%5D&precision=ms&u=omdadmin HTTP/1.1 " 204 0 "-" "Nagflux" 1db4e7b4-c9b8-11ee-92a7-00155db338da 10716 metrics,host=HV03,service=IPMI,command=check-nrpe,performanceLabel=P2-DIMMB2\ Temp,warn-fill=none value=31.00,warn=10.0 1727771266000 metrics,host=Sophos-Central-LN,service=hostcheck,command=check-host-alive,performanceLabel=rtmax,unit=ms value=31.770 1727771267000 metrics,host=ES-Switch02,service=hostcheck,command=check-host-alive2,performanceLabel=pl,warn-fill=none,crit-fill=none,unit=% crit=100.0,value=0.0,warn=80.0 1727771268000 metrics,host=lef-uc01,service=hostcheck,command=check-host-alive,performanceLabel=rta,warn-fill=none,crit-fill=none,unit=ms min=0.0,value=12.944,warn=3000.000,crit=5000.0
[2024年12月03日 08:11:10.834][Info][Connector.go:204] Is InfluxDB(nagflux) running: true
[2024年12月03日 08:11:10.839][Warn][Worker.go:238] Post "http://127.0.0.1:8086/write?precision=ms&db=nagflux&u=omdadmin&p=XXX": net/http: HTTP/1.x transport connection broken: malformed MIME header line: X-Influxdb-Error: partial write: unable to parse 'L?J61Q1rK??8?8????L?
N
??G
*?r)?!!?"?w?wL? J /#1 }If;????=]=1M?JX??Agb-': invalid field format unable to parse '?': missing fields unable to parse '?': missing fields unable to parse '[httpd] 127.0.0.1 - omdadmin [12/Feb/2024:16:05:10 +0100] "POST /write?db=nagflux&p=%5BREDACTED%5D&precision=ms&u=omdadmin HTTP/1.1 " 204 0 "-" "Nagflux" 1db4e7b4-c9b8-11ee-92a7-00155db338da 10716 metrics,host=Raum1,service=hostcheck,command=check-host-alive2,performanceLabel=rtmin,unit=ms value=1.645 1727771260000 metrics,host=SERVICES01,service=hostcheck,command=check-host-alive2,performanceLabel=rtmax,unit=ms value=0.431 1727771267000 metrics,host=Cloud1,service=hostcheck,command=check-host-alive2,performanceLabel=rta,warn-fill=none,crit-fill=none,unit=ms value=0.000,warn=3000.000,crit=5000.000,min=0.0 1727771270000 metrics,host=le-stor02,service=hostcheck,command=check-host-alive,performanceLabel=rtmax,unit=ms value=1.231 17295951
[2024年12月03日 08:11:10.839][Info][Connector.go:204] Is InfluxDB(nagflux) running: true
[2024年12月03日 08:11:20.825][Warn][nagiosSpoolfileWorker.go:124] NagiosSpoolfileWorker: Could not write to buffer
[2024年12月03日 08:11:20.853][Warn][Worker.go:238] Post "http://127.0.0.1:8086/write?precision=ms&db=nagflux&u=omdadmin&p=XXX": net/http: HTTP/1.x transport connection broken: malformed MIME header line: X-Influxdb-Error: partial write: unable to parse 'L?J61Q1rK??8?8????L?
N
??G
*?r)?!!?"?w?wL? J /#1 }If;????=]=1M?JX??Agb-': invalid field format unable to parse '[httpd] 127.0.0.1 - omdadmin [12/Feb/2024:16:05:10 +0100] "POST /write?db=nagflux&p=%5BREDACTED%5D&precision=ms&u=omdadmin HTTP/1.1 " 204 0 "-" "Nagflux" 1db4e7b4-c9b8-11ee-92a7-00155db338da 10716 metrics,host=HV03,service=IPMI,command=check-nrpe,performanceLabel=P2-DIMMB2\ Temp,warn-fill=none value=31.00,warn=10.0 1727771266000 metrics,host=Sophos-Central-LN,service=hostcheck,command=check-host-alive,performanceLabel=rtmax,unit=ms value=31.770 1727771267000 metrics,host=ES-Switch02,service=hostcheck,command=check-host-alive2,performanceLabel=pl,warn-fill=none,crit-fill=none,unit=% crit=100.0,value=0.0,warn=80.0 1727771268000 metrics,host=lef-uc01,service=hostcheck,command=check-host-alive,performanceLabel=rta,warn-fill=none,crit-fill=none,unit=ms min=0.0,value=12.944,warn=3000.000,crit=5000.0
[2024年12月03日 08:11:20.860][Warn][Worker.go:238] Post "http://127.0.0.1:8086/write?precision=ms&db=nagflux&u=omdadmin&p=XXX": net/http: HTTP/1.x transport connection broken: malformed MIME header line: X-Influxdb-Error: partial write: unable to parse 'L?J61Q1rK??8?8????L?
N
??G
*?r)?!!?"?w?wL? J /#1 }If;????=]=1M?JX??Agb-': invalid field format unable to parse '?': missing fields unable to parse '?': missing fields unable to parse '[httpd] 127.0.0.1 - omdadmin [12/Feb/2024:16:05:10 +0100] "POST /write?db=nagflux&p=%5BREDACTED%5D&precision=ms&u=omdadmin HTTP/1.1 " 204 0 "-" "Nagflux" 1db4e7b4-c9b8-11ee-92a7-00155db338da 10716 metrics,host=Raum1,service=hostcheck,command=check-host-alive2,performanceLabel=rtmin,unit=ms value=1.645 1727771260000 metrics,host=SERVICES01,service=hostcheck,command=check-host-alive2,performanceLabel=rtmax,unit=ms value=0.431 1727771267000 metrics,host=Cloud1,service=hostcheck,command=check-host-alive2,performanceLabel=rta,warn-fill=none,crit-fill=none,unit=ms value=0.000,warn=3000.000,crit=5000.000,min=0.0 1727771270000 metrics,host=le-stor02,service=hostcheck,command=check-host-alive,performanceLabel=rtmax,unit=ms value=1.231 17295951
...
Beta Was this translation helpful? Give feedback.
All reactions
-
Does noone has an idea for me for resolving my problem?
Beta Was this translation helpful? Give feedback.
All reactions
-
Hi again,
my graphs are missed now for over 3 weeks, and in the meanwhile i have a problem with it.
What can i do?
Is there someone who can help me?
Any answer appreciated!
Joe
Beta Was this translation helpful? Give feedback.
All reactions
-
I am sorry, it is not possible to solve your problem based on some log snippets and try this, try that,....
The nagflux.log is full of error messages and to me it looks like the influxdb is damaged (provided the perfoormance data are correct and not the reason for the errors)
You should find some help from the influxdb developers, but we (as authors of OMD) are only users ourselves and can't help here.
Gerhard
Beta Was this translation helpful? Give feedback.
All reactions
-
I agree with Gerhard, your influxdb is broken. Perhaps the influx_inspect can help.
A check like:
influx_inspect verify -dir ~/var/influxdb
give me
...
Broken Blocks: 0 / 1660061, in 1.033226842s
Beta Was this translation helpful? Give feedback.
All reactions
-
Hello Jens,
thank you very much for your help!
If i run the command i only get lines with "Healthy" at the end and this summary:
...
var/influxdb/data/nagflux/autogen/984/000000074-000000003.tsm: healthy
var/influxdb/data/nagflux/autogen/992/000000027-000000002.tsm: healthy
Broken Blocks: 0 / 8139454, in 53.975394778s
...
So it seems to me that the db(s) are not broken, right?
Will it be helpful if you take a look at my system directly?
Joe
Beta Was this translation helpful? Give feedback.
All reactions
-
OK, after 4 weeks without graphs and no real help from influxdb i stop at this point with the idea, to repair anything.
Last question:
how can i reset influxdb (or omd) and go further with no historical data?
Beta Was this translation helpful? Give feedback.
All reactions
-
stop influxdb and remove everything in var/influxdb should reset the database.
Beta Was this translation helpful? Give feedback.
All reactions
-
OK. I did that and stopped omd completely to be sure, then deleted the whole var/influxdb
After starting OMD it tells me:
Doing 'start' on site mysite:
Starting influxdb...first run, waiting for initial _internal database..............OK
Running naemon configuration check... OK
Starting naemon...OK
Running apache configuration check... OK
Starting apache...OK
Starting nagflux...OK
Starting grafana...OK
Starting crontab...OK
after a few seconds i can see exactly the same errors in var/log/nagflux/nagflux.log than before!!
frustrating...
Beta Was this translation helpful? Give feedback.