Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Strykar/GPON

Repository files navigation

GPON

This repo ships:

  • gpon_exporter.py -- a Python Prometheus exporter that scrapes ~75 metrics from an HSGQ / ODI (Realtek RTL960x) GPON SFP over SSH, using the on-device diag and omcicli CLIs.
  • dashboard.json -- a Grafana dashboard that visualises those metrics: optical readings, alarms, ONU state, FEC/BIP/PLOAM/BWMAP counters, and collector self-health.
  • firmware/ -- the four known M110 SFU and M114/V1.1.3 HGU firmware tarballs, mirrored on the Releases page.
  • docs/ -- the SFP spec sheet, user manual, an EPON/GPON activation paper, plus QUIRKS, COVERAGE, TROUBLESHOOTING, and MIGRATION.

If you can't SSH into the SFP, the Lua web-scraper at Anime4000/RTL960x discussion #466 is the alternative.

HSGQ XPON-Stick GPON SFP

Grafana dashboard screenshot

Upgrading from a pre-v1.0.0 install? See docs/MIGRATION.md.

How to use

If you don't already run Prometheus + Grafana, the fastest path is the bundled compose stack:

cp .env.example .env # fill in ONU_SSH_PASSWORD
docker compose up -d --build # --build only needed pre-1.1
open http://localhost:3000 # admin / admin, dashboard already loaded

That brings up exporter + Prometheus + Grafana with the dashboard auto-provisioned. See Docker / Podman for the exporter-only variant when you already have observability infra.

The manual path (no docker), shortest version:

# 1. install runtime deps -- pick ONE of these (see Requirements below)
sudo pacman -S python-paramiko python-prometheus_client # Arch
# OR
sudo apt install python3-paramiko python3-prometheus-client # Debian/Ubuntu
# OR
python3 -m venv .venv && .venv/bin/pip install -r requirements.txt # any distro
# 2. run the exporter against your SFP
export ONU_SSH_PASSWORD='your-password-here'
python3 gpon_exporter.py --device admin@192.168.1.1
# /metrics now serves on http://127.0.0.1:8114/metrics
# 3. point Prometheus at it
# scrape_configs:
# - job_name: gpon_exporter
# static_configs:
# - targets: ['127.0.0.1:8114']
# 4. import dashboard.json into Grafana, pick the appropriate Prometheus datasource.

A scrape_interval between the exporter's --interval (default 5m) and ~30s works fine; matching --interval is cleanest. Anything faster just returns the same value across multiple scrapes -- harmless but wasteful. Prometheus auto-sets the instance label to the target string, so no explicit labels: block is needed unless you want a friendly override.

For long-running deployments use the systemd unit (systemd) or the Docker compose file (Docker / Podman) instead of running by hand. Multi-ONU and Proxmox/LXC are documented further down.

If something doesn't look right, the first thing to run is gpon_exporter.py --diagnose -- see docs/TROUBLESHOOTING.md.

What it exposes

~75 metrics in total -- optical readouts, ONU state, alarms, downstream PHY/PLOAM/BWMAP/OMCI/Ethernet/GEM counters, upstream counters, activation, rogue-SD, optional OMCI extras, and exporter self-health. Full per-metric tables grouped by category live in docs/COVERAGE.md.

All cumulative device counters are real Prometheus Counters with a _total suffix; use rate(metric_total[15m]) in dashboards and alerts.

Running it

Requirements

Two runtime deps: paramiko>=3.0 and prometheus_client>=0.17. Both are in standard distro repos -- prefer them over pip so you don't have to opt out of PEP 668 with --break-system-packages and risk a broken system Python.

sudo pacman -S python-paramiko python-prometheus_client # Arch
sudo apt install python3-paramiko python3-prometheus-client # Debian/Ubuntu/Proxmox LXC

If you need a newer version than your distro ships, or you're on a system without those packages, use a venv (never pip install into the system interpreter on a modern distro):

python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
# then run with .venv/bin/python3 instead of plain python3

The repo also ships a pyproject.toml, so pip install . (inside a venv) installs gpon-exporter as a console-script entry point on PATH. The requirements.txt path remains supported for users who prefer running the script directly.

Standalone

python3 gpon_exporter.py \
 --device "admin:$ONU_SSH_PASSWORD@192.168.1.1:22" \
 --webserver-port 8114

The --device flag takes a user:password@host[:port] connection string; port defaults to 22 if omitted. Password may also be left out, in which case ONU_SSH_PASSWORD from the environment is used:

export ONU_SSH_PASSWORD='your-password-here'
python3 gpon_exporter.py --device admin@192.168.1.1

The env-var path is preferable so the password doesn't appear in ps output. Multiple ONUs: pass --device once per ONU. Each can have its own embedded password, or all can fall back to the same env var:

python3 gpon_exporter.py \
 --device admin:p1@10.0.0.1 \
 --device admin:p2@10.0.0.2:2222

The metrics endpoint binds to 127.0.0.1 by default (loopback only). If your Prometheus runs on a different host, pass --bind-address 0.0.0.0. Inside Docker/Podman the compose file already does this for you.

If you bind to 0.0.0.0 on a network reachable from outside your trusted segment, put auth and TLS in front of /metrics -- the exporter doesn't do either. Caddy or nginx as a reverse proxy with basic auth is the lowest-effort path; oauth2-proxy or prometheus-with-scrape-tls suit larger setups. The metrics themselves contain device-fingerprinting information (firmware version, MAC, serial, optical levels) that's worth not publishing.

systemd

The repo's odi.service is the canonical example: hardened (DynamicUser=yes, StateDirectory=, NoNewPrivileges, capability strip, address-family/namespace restrictions), env-file based password, no After=sshd.service (the collector is an SSH client, not a server), sensible RestartSec/StartLimit* paired with the daemon's exit-on-auth-failure behaviour.

Drop the unit in /etc/systemd/system/, edit the SFP address and webserver port in ExecStart=, then create the credentials file at /etc/gpon-exporter/credentials with exactly:

ONU_SSH_PASSWORD=your-password-here
sudo install -m 0600 -o root -g root /dev/null /etc/gpon-exporter/credentials
sudo "$EDITOR" /etc/gpon-exporter/credentials
sudo systemctl daemon-reload
sudo systemctl enable --now odi

The 0600 + root-owned permissions matter -- systemd does not enforce them, and a 0644 file with ONU_SSH_PASSWORD=... is a foot-gun on a multi-user host. Even with DynamicUser=yes, the credentials file is read by systemd itself before privileges drop, so the dynamic UID never reads it directly. The known-hosts file is auto-created at /var/lib/gpon-exporter/known_hosts (managed by StateDirectory=, owned by the dynamic UID, persisted across restarts and reboots).

Docker / Podman

Two compose files ship with the repo:

File Brings up Pick this if
docker-compose.yml Exporter + Prometheus + Grafana (dashboard auto-provisioned) You don't already run observability infra. Open http://localhost:3000 after up.
docker-compose.exporter-only.yml Just the exporter You already have Prometheus + Grafana and want to wire this exporter into them.

Both pull ghcr.io/strykar/gpon-exporter:latest and fall back to building from local source if the image isn't present. Locally tested with podman 5.8.2 + podman-compose; the docker path is the same syntax.

Full stack (recommended for first-time users):

cp .env.example .env # fill in ONU_SSH_PASSWORD (and optionally GRAFANA_ADMIN_PASSWORD)
docker compose up -d # docker engine + the compose plugin
# or: podman compose up -d
open http://localhost:3000 # admin / admin on first login (Grafana forces a change)

The Grafana dashboard appears in the dashboards list on first login, already pointed at the bundled Prometheus. Both UIs bind to 127.0.0.1 only -- edit the ports: lines in docker-compose.yml if you want LAN access.

Exporter-only (existing Prometheus / Grafana):

cp .env.example .env # fill in ONU_SSH_PASSWORD
docker compose -f docker-compose.exporter-only.yml up -d
# /metrics serves on http://127.0.0.1:8114/metrics
# Add it to your existing prometheus.yml, then import dashboard.json into Grafana.

The container's HEALTHCHECK (pulls /metrics, marks unhealthy on failure) only takes effect when the image is built with the docker manifest format. Docker uses that format by default; podman defaults to OCI and silently ignores HEALTHCHECK. To honour it under podman, build with podman build --format docker -t gpon-exporter:latest . first, then bring the stack up. Without that flag the exporter still runs fine, you just don't get healthy/unhealthy status in podman ps.

Both compose files are single-device. Multiple SFPs need either docker compose -p (or podman compose -p) per device with separate .env files, or a manual multi-service rewrite.

Podman + systemd: skip podman compose and convert the stack to Quadlets with podlet -- it's the Podman-team-blessed converter:

mkdir -p ~/.config/containers/systemd
podlet --file ~/.config/containers/systemd compose --install docker-compose.yml
systemctl --user daemon-reload
systemctl --user start gpon-exporter prometheus grafana

Use /etc/containers/systemd/ and the system-wide systemctl for a root install. Re-run podlet after any change to the compose file -- the conversion is mechanical, so don't hand-edit the generated units.

Pre-1.1 note: ghcr.io/strykar/gpon-exporter is published on tag. Until v1.1.0 ships, run docker compose up -d --build (or docker compose build first) to build from local source. After 1.1 publishes, plain docker compose up -d pulls the image directly.

Proxmox (LXC)

Create a Debian 12 unprivileged LXC and enter it as root with pct enter <vmid> from the Proxmox host. Then:

$ apt install -y git python3-paramiko python3-prometheus-client
$ git clone https://github.com/Strykar/GPON.git /opt/GPON
$ cp /opt/GPON/odi.service /etc/systemd/system/
# edit ExecStart in /etc/systemd/system/odi.service to point at
# /opt/GPON/gpon_exporter.py and your SFP's IP
$ install -m 0600 -o root -g root /dev/null /etc/gpon-exporter/credentials
$ "$EDITOR" /etc/gpon-exporter/credentials # ONU_SSH_PASSWORD=...
$ systemctl daemon-reload
$ systemctl enable --now odi

If the LXC's subnet differs from the ONU's, see the next section for the gateway-side src-NAT rule.

Reaching the ONU from another subnet

The ONU is fixed at 192.168.1.1/24 with no return route off that subnet. On your gateway, claim an unused IP in 192.168.1.0/24 on the SFP-facing port and src-NAT traffic to 192.168.1.1 to that IP.

# Linux / nftables
sudo ip addr add 192.168.1.10/24 dev eth1
sudo nft add rule ip nat postrouting ip daddr 192.168.1.1 snat to 192.168.1.10
# MikroTik
/ip/address/add address=192.168.1.10/24 interface=<sfp-interface>
/ip/firewall/nat/add chain=srcnat action=src-nat dst-address=192.168.1.1 to-addresses=192.168.1.10
# pf (FreeBSD / OPNsense / pfSense)
nat on $sfp_if from any to 192.168.1.1 -> (sfp_if)

CLI flags

Flag Default What it does
--device user:password@host[:port] required ONU connection string. Repeatable for multi-ONU. Port defaults to 22 if omitted. Password may be omitted to fall back on the ONU_SSH_PASSWORD env var.
--webserver-port N 8114 Port for the Prometheus metrics endpoint.
--bind-address ADDR 127.0.0.1 Loopback by default. Use 0.0.0.0 for Docker or to expose to the network.
--interval SEC 300 Seconds between fetches.
--enable-omci off Probe omcicli for uptime, LOID, serial. See caveat below.
--once off Run a single fetch and exit. Exit code = number of devices whose fetch failed (0 = all good, useful for cron).
--diagnose off Print a verbose probe report and exit. First thing to run when filing a bug.
--log-level LEVEL INFO DEBUG, INFO, WARNING, ERROR.
--known-hosts PATH ~/.config/gpon-exporter/known_hosts File for persisting SSH host-key fingerprints. WARNING is logged on first contact (and the fingerprint persisted) and on every subsequent connect that presents a different key (durably, until the file is hand-edited). The connection still proceeds either way -- the policy is descriptive, not enforcing. For real key-change enforcement, swap _LoggingHostKeyPolicy for paramiko.RejectPolicy() in the source. Pass '' to disable persistence; first-contact and key-change warnings still log.

--enable-omci caveat

The omcicli probes deliver three useful metrics (gpon_pon_uptime_seconds, the LOID auth state, and the SFP serial number), but they're off by default. On at least firmware V1.0-220923, running any omcicli command immediately after a diag command wedges the on-device omci_app daemon for several minutes; recovery may need a power-cycle. The collector sequences omcicli before diag to dodge this, but the wedge behaviour is firmware-dependent. If you turn it on and the log shows channel closed mid-fetch, switch it off again.

Grafana dashboard

dashboard.json is organised into six rows (Attenuation, ONU Status, Laser/CPU/RAM/Voltage/Temperature, Temperature & GPON Signal Metrics, GPON counters, Collector & Device Info). Per-row and per-panel documentation, plus the rationale for each PromQL query shape, lives in docs/COVERAGE.md.

The dashboard's $instance and $ip template variables filter by scrape target and SFP IP for multi-device setups. Default time range is now-15m.

Troubleshooting

For the typical "something looks wrong" workflow -- --diagnose first, the symptom-to-cause table, recommended alerts, and recovering from the omci_app wedge -- see docs/TROUBLESHOOTING.md. Firmware-level quirks worth knowing live in docs/QUIRKS.md.

Development

Tests

Parser unit tests and a fetch-pipeline integration test live in tests/:

pip install pytest
pytest tests/ -v

The tests use canned diag output captured from a real device, so they run without network access.

Linting

pip install pylint
pylint gpon_exporter.py

The repository ships a .pylintrc with max-line-length=120 and the usual docstring lints disabled.

CI

.github/workflows/ci.yml runs pylint and pytest on every push and pull request, then performs a multi-arch (amd64 + arm64) Docker build smoke test. No image is pushed from this workflow; that's deliberate.

.github/workflows/release.yml fires only on v*.*.* tag push. It builds the multi-arch image, pushes to ghcr.io/strykar/gpon-exporter with :VERSION, :MAJOR.MINOR, and :latest tags, and attaches SLSA build provenance + SBOM. Manual reruns possible via workflow_dispatch without recutting the tag.

Contributing

Bug reports, parser fixes for new firmware, dashboard improvements all welcome. See CONTRIBUTING.md for what info to include in an issue, the test/lint expectations, and what kinds of changes I'll likely push back on. Project conduct is in CODE_OF_CONDUCT.md -- short version: be civil, keep technical discussion technical.

Firmware downloads

Firmware tarballs and the spec/manual PDFs are on the Releases page:

File Variant Date Status
M110_sfp_ODI_220923_SFU.tar M110 SFU V1.0-220923 2022年09月23日 Verified end-to-end
M110_sfp_HSGQ_SFU_240408.tar M110 SFU V1.1.8-240408 2024年04月08日 Broken WAN on Airtel BNG (ALCL OLT) -- see QUIRKS
M114_sfp_ODI_231021_HGU.tar M114 HGU V1.7.1-231021 2023年10月21日 Untested
V1.1.3_sfp_HSGQ_HGU_250620.tar HGU V1.1.4-250620 2025年06月20日 Untested
*.pdf Spec sheets, user manual, ONU activation paper n/a --

About

DDM-grade telemetry from HSGQ/ODI GPON SFP ONUs over SSH - Prometheus exporter + Grafana dashboard

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /