For the last few years, I have built a centralized monitoring system based on Prometheus that gathers various metrics across my whole private fleet of servers.
Since writing Prometheus exporters is rather simple, I have written some of them myself:
Additionally I use the following pre-made exporters:
As you can see, this is quite a lot of different exporters running on different hosts.
A few months ago I decided to rebuild the centralized metrics server on top of VictoriaMetrics and with proper access control.
Why VictoriaMetrics? I tried it for a bit and it seems to use less RAM and less storage while supporting long term storage nicely. It also has better mechanisms for importing and exporting data than Prometheus.
Setting up victoria-metrics
is very easy. I run it like this:
victoria-metrics -enableTCP6 \
-storageDataPath=/srv/victoria-metrics \
-retentionPeriod=99y \
-httpListenAddr=127.0.0.1:8428 \
-selfScrapeInterval=20s \
-promscrape.config /usr/local/etc/prometheus/prometheus.yaml
Note that you need to enable IPv6 manually all the time.
The prometheus.yaml
file is compatible with stock Prometheus.
I then use Grafana to connect to it, using the Prometheus protocol.
I don’t consider most of above metrics to be super private, but they certainly leak metadata (e.g. am I at home or not, how much mail do I get) so I don’t want to publish them on the net accessible to everyone that finds them.
Since Prometheus mainly favors a pull based model, we need to figure out ways to protect the data.
"Obvious" solutions like using mTLS or a maintenance VPN would require reconfiguring many machines and were deemded too much effort.
Essentially, I found three solutions that I will describe in detail:
This is the easiest mechanism, when your host already runs a web server: simply use it as a proxy for the metrics, and filter access by IP address or Basic Auth. Since most webservers have HTTPS today already, you get encryption for free.
A simple nginx configuration to do this would be:
location /metrics {
proxy_http_version 1.1;
proxy_pass http://127.0.0.1:9100/metrics;
access_log off;
allow 127.0.0.1;
allow ...;
deny all;
}
You need to configure the metrics exporter to only listen on localhost.
This is a quite elegant solution that provides encryption, flexible
configuration, and can be used when the scrape target doesn’t have a
public IP address. OpenSSH provides the -R
flag to do reverse port
forwarding, but most people don’t know it also can be used to run a
reverse SOCKS proxy!
For this, I create a separate Unix user on scrape target and server, and assign it a SSH key. Then, the target runs:
ssh -o ServerAliveInterval=15 -o ExitOnForwardFailure=yes -R8083 server.example.com -NT
You should run this using service supervision so it tries to reconnect on network failures.
On the server side, you restrict access to only open a port
using /etc/ssh/authorized_keys/scrape-user
:
restrict,port-forwarding,permitlisten="8083" ssh-ed25519 ....
Then, the server can use port 8083 as a SOCKS proxy to access the network of the scrape target directly! So you can write a scrape config like:
- job_name: 'nano-exporter-hecate'
proxy_url: 'socks5://127.0.0.1:8083'
static_configs:
- targets: ['127.0.0.1:9100']
labels:
instance: 'hecate.home.vuxu.org:9100'
- targets: ['10.0.0.119:9100']
labels:
instance: 'leto.home.vuxu.org:9100'
Here, we use a host in my home network that is always on, and can also
safely scrape other hosts in the same LAN.
(Note that the IP addresses in targets
are resolved relative to the SSH client.)
I used the SSH approach for my notebook as well, but there’s the
problem that we lose data when there’s no Internet connection
available. I have thus moved my notebook to a solution using
vmagent
, which is
included with VictoriaMetrics.
vmagent
does scrape metrics just like VictoriaMetrics (and also
supports all other metrics protocols, but I don’t use them), but it
simply forwards everything via the Prometheus remote write protocol,
and locally buffers data if it can’t forward the metrics currently.
On the server side, we need to provide access to the remote write
protocol. Since VictoriaMetrics operates without internal access
control, we can use the
vmauth
gateway to
implement Basic Auth over TLS. (Again, you can use an existing HTTPS
server and proxy it, but in this case I don’t have a HTTPS server on
the metrics host.)
vmauth
needs some configuration. First, we create a self-signed
certificate (Let’s Encrypt support is limited to the commercial
version of VictoriaMetrics unfortunately):
openssl req -x509 -newkey ed25519 \
-keyout /usr/local/etc/vmauth/key.pem \
-out /usr/local/etc/vmauth/cert.pem \
-sha256 -days 3650 -nodes -subj "/CN=server.example.org" \
-addext "subjectAltName = DNS:server.example.org"
I then run it as:
vmauth -enableTCP6 \
-tls \
-tlsCertFile=/usr/local/etc/vmauth/cert.pem \
-tlsKeyFile=/usr/local/etc/vmauth/key.pem \
-reloadAuthKey=secret \
-flagsAuthKey=secret \
-metricsAuthKey=secret \
-pprofAuthKey=secret \
-auth.config=/usr/local/etc/vmauth/vmauth.yaml
(I think it’s unfortunate that we need to add auth-keys now, as internal and forwarded API are exposed on the same port...)
The vmauth.yaml
configures who can access:
users:
- username: "client"
password: "evenmoresecret"
url_prefix: "http://localhost:8428/"
Here, localhost:8428
is the VictoriaMetrics instance.
Finally, on the scrape target we can now run vmagent
:
vmagent -enableTCP6 \
-promscrape.config=/etc/vmagent/promscrape.yml \
-httpListenAddr=127.0.0.1:8429 \
-remoteWrite.url=https://server.example.org:8427/api/v1/write \
-remoteWrite.label=vmagent=myhostname \
-remoteWrite.retryMinInterval=30s \
-remoteWrite.basicAuth.username=client \
-remoteWrite.basicAuth.passwordFile=/etc/vmagent/passwd \
-remoteWrite.tlsCAFile=/etc/vmagent/cert.pem
The cert.pem
is copied from the server, the password is stored in /etc/vmagent/passwd
.
Note that the vmagent
instance is configured locally, so we can again
scrape targets that are only reachable from it. We also can adjust
the scrape targets without having to touch the metrics server itself.
NP: Godspeed You! Black Emperor—Broken Spires At Dead Kapital
A few months ago I found an article on how to organize your training plan using logic programming, which used Haskell to implement a logic language.
At first, I thought this is a good problem to solve in Prolog, but
there’s a difficulty: Prolog makes it quite hard to specify models
that have multiple possible outcomes (yes, you can work with backtracking, but
it gets tricky when you start to combine multiple predicates or fiddle
around with bagof
).
An alternative to classic Prolog is the concept of Answer Set Programming, based on the Stable Model Semantics by Gelfond and Lifschitz (1988). Here, the idea is that the logical formulas specify models, and the goal of Answer Set Programming is to compute the Answer Sets, i.e. a set of models fulfilling the formulas.
I can’t do a full introduction to Answer Set Programming here, but I can recommend the overview article Answer set programming at a glance by Brewka et. al., as well as the short article What Is Answer Set Programming? and the book Answer Set Programming by Lifschitz.
Essentially, we can think of a stable model as a maximal set of atoms that are true (derivable from the rules) without being in conflict to other propositions. If we don’t use negation, that pretty boring
However, consider this logic program:
q :- not p.
p :- not q.
Contrary to what you may expect, it has two models: {p} and {q}. This is because ASP uses default negation, which means "not A" is assumed to hold unless A is derived. Both cannot be true at the same time, but likewise the empty set is not a model because it can be extended to {p} or {q}.
The program p :- not not p.
has two models, {} and {p}.
Let’s do a small problem first so we get a feel for the tooling. I chose to use clingo, which is the most advanced open source implementation of Answer Set Programming.
If your distribution doesn’t include it, you can run it from Nix:
nix shell nixpkgs#clingo
Let’s model a directed graph with four nodes:
edge(a,b).
edge(a,c).
edge(b,d).
edge(c,d).
Our problem is to find paths from a
to d
. We can define a
relation step
, which either means "we can step from X to d
"
or "we can step from X to Y when there’s an edge from X to Y and
another step from Y":
0 { step(X,E) } 1 :- edge(X,E), E = d.
0 { step(X,Y) } 1 :- edge(X,Y), step(Y,_).
The 0 { ... } 1
decoration means that each step can be taken at most once.
Finally, we need to specify our goal, which in tradition of logic programming, is written as a negation:
:- not step(a,_).
#show step/2.
This means "it is not the case that there’s no step starting from a
".
The #show
instructions limits clingo to only output the binary
step
relation. Let’s run it:
% gringo graph.pl | clasp
clasp version 3.3.10
Reading from stdin
Solving...
Answer: 1
step(b,d) step(a,b)
SATISFIABLE
Models : 1+
Calls : 1
Time : 0.001s (Solving: 0.00s 1st Model: 0.00s Unsat: 0.00s)
CPU Time : 0.000s
Clingo has found a solution. We can go from a
to b
and from b
to d
.
We can also ask for all solutions, by passing -n0
:
% gringo graph.pl | clasp -n0
clasp version 3.3.10
Reading from stdin
Solving...
Answer: 1
step(b,d) step(a,b)
Answer: 2
step(c,d) step(b,d) step(a,b)
Answer: 3
step(c,d) step(a,c)
Answer: 4
step(c,d) step(b,d) step(a,c)
Answer: 5
step(c,d) step(b,d) step(a,b) step(a,c)
SATISFIABLE
Models : 5
Calls : 1
Time : 0.000s (Solving: 0.00s 1st Model: 0.00s Unsat: 0.00s)
CPU Time : 0.000s
Here we see there are five possible models of this system (but every model except the first and the third has superflous steps).
To see why the 0 { ... } 1
matters, here’s what happens without it:
Answer: 1
step(b,d) step(c,d) step(a,b) step(a,c)
SATISFIABLE
Now there is only one model, but it contains all paths. Cardinality bounds are implemented using negation internally.
Back to the original task: Planning Weekly Workouts in 100 lines of Haskell.
The goal is to create a weekly plan of training exercises according to some specific rules.
First, we need some definitions related to weekdays:
weekday(1..7).
n_weekday(D, DD) :- weekday(D), weekday(DD), (D+1) \ 7 == DD \ 7.
nn_weekday(D, DD) :- weekday(D), weekday(DD), (D+2) \ 7 == DD \ 7.
two_weekday(D, DD) :- n_weekday(D, DD).
two_weekday(D, DD) :- nn_weekday(D, DD).
two_weekday(D, DD) :- two_weekday(DD, D).
This is a bit more complicated because we later need "day after"
and "within two days". (\
means modulo.)
Now, let’s define workout and running exercises:
workout(push; pull; leg; none).
running(long; short; none).
To the plan: each weekday has one workout and one running exercise:
{ plan(D,W,R) : workout(W), running(R) } = 1 :- weekday(D).
We then add the constraints:
% No running on leg day
plan(D, leg, none) :- plan(D, leg, _).
% Short intervals run is after an outdoor pull/push workout
:- plan(_, none, short).
% Workout on Monday outdoors always, not legs
:- plan(1, none, _).
:- plan(1, leg, _).
% Pull day during the week?
:- plan(6..7, pull, _).
% One long run, one short run
{ plan(D,W,long) : weekday(D), workout(W) } = 1.
{ plan(D,W,short) : weekday(D), workout(W) } = 1.
% Two push, two pull, two leg
:- not { plan(D,push,R) } = 2.
:- not { plan(D,pull,R) } = 2.
:- not { plan(D,leg,R) } = 2.
% Long run on weekend
{ plan(6..7,W,long) : workout(W) } = 1.
% Run spaced out at least 2 days
:- plan(D,_,short), plan(DD,_,long), two_weekday(D, DD).
% Space out workouts at least 2 days
:- plan(D,W,_), plan(DD,W,_), W != none, two_weekday(D, DD).
% No leg day before short run
% No leg day before a long run
{ plan(D,W,R) : running(R), R != none, workout(W), plan(DD,leg,_), n_weekday(D, DD) } = 1.
#show plan/3.
clingo generates the same three plans as the Haskell program:
Solving...
Answer: 1
plan(5,leg,none) plan(2,leg,none) plan(1,pull,none) plan(3,push,none) plan(4,pull,short) plan(6,push,none) plan(7,none,long)
Answer: 2
plan(5,leg,none) plan(2,leg,none) plan(1,pull,none) plan(3,push,none) plan(4,pull,short) plan(6,none,none) plan(7,push,long)
Answer: 3
plan(7,leg,none) plan(4,leg,none) plan(1,pull,none) plan(2,push,short) plan(3,none,none) plan(5,pull,none) plan(6,push,long)
SATISFIABLE
The next problem I solved using ASP was a silly exercise from a statistics book (Lehn, Wegmann, Rettig: Aufgabensammlung zur Einführung in die Statistik), translation mine:
Exercise 11c) Ten French married couples bid goodbye to each other: the men with the men by shaking hands, the women with the women by kisses on both cheeks, and women with the men likewise by kisses on both cheeks. How many kisses and how many handshakes take place?
The implicit assumption of the exercise is that all married couples consist of a man and a woman, but it’s more fun to solve it in a generic way. So let’s do that using ASP.
First, we model the people and their fixed binary gender (more assumptions, but else the exercise is truly underspecified):
person(0..19).
gender(man; woman).
{ gender(P, G) : gender(G) } = 1 :- person(P).
I decided to model a couple using an even and odd numbered person:
couple(A, B) :- person(A), person(B), A != B, A/2 == B/2.
Next, we model the handshakes. Two people shake hands if they are not part of the same couple, and both are men:
handshake(A, B) :- person(A), person(B), A < B, not couple(A, B), gender(A, man), gender(B, man).
The A < B
ensures we only count one handshake between two men,
as handshaking is a symmetric act.
We also count the handshakes:
handshakes(N) :- N = #count { handshakes(A, B) : handshake(A, B) }.
Likewise, two kisses happen between two persons not in the same couple where one is a woman. Note that kisses are asymmetric since they go from mouth to cheek (this cost me an hour of debugging...):
kiss(A, B) :- person(A), person(B), not A == B, not couple(A, B), gender(A, woman).
kiss(A, B) :- person(A), person(B), not A == B, not couple(A, B), gender(B, woman).
kisses(N) :- H = #count { kisses(A, B) : kiss(A, B) }, N = H * 2.
Finally, we also count men and women for reporting purposes:
men(N) :- N = #count { g(P) : gender(P, man) }.
women(N) :- N = #count { g(P) : gender(P, woman) }.
#show handshakes/1.
#show kisses/1.
#show men/1.
#show women/1.
Thanks to clingo’s parallelization support, we can compute all possible 220 solutions very quickly:
Solving...
Answer: 1
women(0) men(20) kisses(0) handshakes(180)
Answer: 2
women(1) men(19) kisses(72) handshakes(162)
Answer: 3
women(1) men(19) kisses(72) handshakes(162)
...
Answer: 1048576
women(12) men(8) kisses(616) handshakes(26)
SATISFIABLE
Models : 1048576
Calls : 1
Time : 51.163s (Solving: 51.00s 1st Model: 0.10s Unsat: 0.01s)
CPU Time : 463.811s
Threads : 16 (Winner: 0)
We can also specialize the model to have only hetero couples:
:- Z = 0..9, not gender(2*Z, man).
:- Z = 0..9, not gender(2*Z+1, woman).
Then we get the unique solution:
Solving...
Answer: 1
women(10) men(10) kisses(540) handshakes(45)
SATISFIABLE
I hope this post could interest you in how Answer Set Programming can be used. Some more interesting programs can be found on Hakan Kjellerstrand’s blog.