Copied to Clipboard
Key configuration choices:
After=docker.service: Don’t start until Docker is ready
Restart=always: Auto-restart on failures
TimeoutStopSec=5min: Give builds time to clean up
User=poddingue: Never run as root (security)
After this change, the runner survived reboots, network hiccups, and even Docker daemon restarts.
Production Workflows: The Real Test
With the runner configured, I created three automated workflows:
Weekly Docker Engine Builds
name: Weekly Docker RISC-V64 Build
on:
schedule:
* cron: '02__0' # Every Sunday at 02:00 UTC
workflow_dispatch:
inputs:
moby_ref:
description: 'Mobyreftobuild'
required: false
default: 'master'
jobs:
build-docker:
runs-on: [self-hosted, riscv64]
steps:
* name: Checkout repository
uses: actions/checkout@v4
with:
submodules: true
* name: Build Docker binaries
run: |
cd moby
docker build \
--build-arg BASE_DEBIAN_DISTRO=trixie \
--build-arg GO_VERSION=1.25.3 \
--target=binary \
-f Dockerfile .
# ... containerd, runc builds ...
* name: Create release
env:
GH_TOKEN: ${{ github.token }}
run: |
gh release create "${RELEASE_VERSION}" \
--title "${RELEASE_TITLE}" \
--notes-file release-notes.md \
release-$DATE/*
This workflow runs every Sunday morning, building:
dockerd: Docker Engine daemon
docker-proxy: Network proxy
containerd: Container runtime (v1.7.28)
runc: OCI runtime (v1.3.0)
containerd-shim-runc-v2: Containerd shim
Build time: 35-40 minutes on the BananaPi F3. Not blazing fast, but acceptable for weekly automation.
Release Tracking
name: Track Moby Releases
on:
schedule:
* cron: '06__*' # Daily at 06:00 UTC
workflow_dispatch:
jobs:
check-releases:
runs-on: ubuntu-latest # No native hardware needed!
steps:
* name: Get latest Moby release
run: |
LATEST=$(gh api repos/moby/moby/releases/latest --jq .tag_name)
# Check if we've already built it...
Notice this workflow uses ubuntu-latest, not the self-hosted runner. Why? Because it’s just GitHub API calls - no compilation needed. This reduces load on my BananaPi and provides faster execution.
APT Repository Updates
The final piece of automation: after building binaries, automatically create Debian packages and update the APT repository hosted on GitHub Pages.
name: Update APT Repository
on:
workflow_run:
workflows: ["BuildDebianPackage", "BuildDockerComposeDebianPackage", "BuildDockerCLIDebianPackage"]
types: [completed]
This workflow downloads all packages (Engine, CLI, Compose), signs them with GPG, and updates the repository using reprepro. The result: users can install Docker with a simple apt-get install.
Weeks Later: Production Issues Emerge
After three weeks of smooth operation, I started noticing strange behavior. Users reported that apt-get upgrade sometimes worked, sometimes didn’t. The APT repository seemed to randomly "forget" packages. And when I checked my latest release, I found duplicate RPM files - why did v28.5.2-riscv64 contain both moby-engine-28.5.1 AND moby-engine-28.5.2?
Time to debug.
Bug #1: The Vanishing Packages Mystery
Symptom: APT repository would update successfully, but only one package type would be present. Install docker-cli and suddenly docker.io disappeared.
Investigation: I examined the workflow logs:
== In update-apt-repo.yml
gh release download "$RELEASE_TAG" -p 'docker.io_*.deb'
reprepro -b . includedeb trixie docker.io_*.deb
Ah. The workflow only downloaded packages from the triggering release. If the Docker CLI build triggered the workflow, it only downloaded docker-cli_*.deb. Previous packages (docker.io, containerd, runc) were ignored.
Root cause: Each package type has its own release tag:
Engine releases: v28.5.2-riscv64
CLI releases: cli-v28.5.2-riscv64
Compose releases: compose-v2.40.1-riscv64
When the APT workflow ran, it would:
Download packages from the triggering release
Rebuild the repository from scratch
Upload - but only with the newly downloaded packages
Solution: Download ALL packages on every run.
* name: Download all latest .deb packages
env:
GH_TOKEN: ${{ github.token }}
run: |
# Find latest Engine release
DOCKER_RELEASE=$(gh release list --repo gounthar/docker-for-riscv64 \
--limit 50 --json tagName | \
jq -r '.[] | select(.tagName | test("^v[0-9]+\\.[0-9]+\\.[0-9]+-riscv64$")) | .tagName' | \
head -1)
# Find latest CLI release
CLI_RELEASE=$(gh release list --repo gounthar/docker-for-riscv64 \
--limit 50 --json tagName | \
jq -r '.[] | select(.tagName | test("^cli-v[0-9]+\\.[0-9]+\\.[0-9]+-riscv64$")) | .tagName' | \
head -1)
# Find latest Compose release
COMPOSE_RELEASE=$(gh release list --repo gounthar/docker-for-riscv64 \
--limit 50 --json tagName | \
jq -r '.[] | select(.tagName | test("^compose-v[0-9]+\\.[0-9]+\\.[0-9]+-riscv64$")) | .tagName' | \
head -1)
# Download from each
gh release download "$DOCKER_RELEASE" -p 'docker.io_*.deb' --clobber
gh release download "$DOCKER_RELEASE" -p 'containerd_*.deb' --clobber
gh release download "$DOCKER_RELEASE" -p 'runc_*.deb' --clobber
gh release download "$CLI_RELEASE" -p 'docker-cli_*.deb' --clobber
gh release download "$COMPOSE_RELEASE" -p 'docker-compose-plugin_*.deb' --clobber
Now the repository always contains all packages, regardless of which build triggered the update.
Verification step added:
* name: Verify all packages present
run: |
EXPECTED_PACKAGES=(
"containerd"
"docker-cli"
"docker.io"
"runc"
)
MISSING_PACKAGES=()
for pkg in "${EXPECTED_PACKAGES[@]}"; do
if reprepro -b . list trixie | grep -q "^trixie|main|riscv64: $pkg "; then
echo "✅ $pkg found"
else
echo "❌ $pkg MISSING"
MISSING_PACKAGES+=("$pkg")
fi
done
if [ ${#MISSING_PACKAGES[@]} -gt 0 ]; then
echo "⚠️ ${#MISSING_PACKAGES[@]} package(s) missing!"
exit 1
fi
This catches regressions immediately.
Bug #2: The jq Syntax Catastrophe
After fixing the package downloading, I ran into a new error:
Error: jq parse error: Invalid escape at line 1, column 45
Investigation: I had recently "fixed" a line length issue by adding a backslash:
CLI_RELEASE=$(gh release list --repo gounthar/docker-for-riscv64 \
--limit 50 --json tagName | \
jq -r '.[] | select(.tagName | test("^cli-v[0-9]+\\.[0-9]+\\.[0-9]+-riscv64$")) | \ # ← BAD!
.tagName' | \
head -1)
The backslash was inside the jq expression. jq interpreted it as an escape sequence, not as a shell line continuation.
Solution: Move the backslash outside the jq expression:
CLI_RELEASE=$(gh release list --repo gounthar/docker-for-riscv64 \
--limit 50 --json tagName | \
jq -r '.[] | select(.tagName | test("^cli-v[0-9]+\\.[0-9]+\\.[0-9]+-riscv64$")) | .tagName' | \ # ← GOOD!
head -1)
Lesson learned: when piping to jq, keep the entire jq expression on one logical line, even if you split the bash command with backslashes.
Bug #3: The Persistent RPM Problem
The most subtle bug involved RPM packaging. Users reported that downloading moby-engine-28.5.2-1.riscv64.rpm sometimes gave them the old version (28.5.1).
Investigation: I checked the release assets:
$ gh release view v28.5.2-riscv64
...
moby-engine-28.5.1-1.riscv64.rpm 25MB
moby-engine-28.5.2-1.riscv64.rpm 25MB
containerd-1.7.28-1.riscv64.rpm 30MB
runc-1.3.0-1.riscv64.rpm 8MB
Two versions of moby-engine! But why?
The RPM build workflow runs on the self-hosted runner. Unlike GitHub’s ephemeral runners, my BananaPi has persistent state. The ~/rpmbuild/RPMS/riscv64/ directory survives between builds.
Timeline:
Build v28.5.1 → creates moby-engine-28.5.1-1.riscv64.rpm
Upload all files in ~/rpmbuild/RPMS/riscv64/
Two weeks later: build v28.5.2 → creates moby-engine-28.5.2-1.riscv64.rpm
Upload all files in ~/rpmbuild/RPMS/riscv64/ → uploads BOTH versions!
Solution: Clean the build directory before building.
Added to all RPM workflows:
* name: Clean previous RPM builds
if: steps.release.outputs.has-new-release == 'true'
run: |
# Remove any existing RPM files to prevent uploading old versions
rm -f ~/rpmbuild/RPMS/riscv64/moby-engine-*.rpm
rm -f ~/rpmbuild/RPMS/riscv64/containerd-*.rpm
rm -f ~/rpmbuild/RPMS/riscv64/runc-*.rpm
echo "Cleaned previous Engine RPM files"
This is specific to self-hosted runners. On GitHub’s ephemeral runners, each build starts with a clean filesystem. On self-hosted runners, you are responsible for cleanup.
Manual cleanup: I also had to manually remove the duplicate files from the existing releases:
== List all assets
gh release view v28.5.2-riscv64 --json assets --jq '.assets[].name'
== Delete the old versions
gh release delete-asset v28.5.2-riscv64 moby-engine-28.5.1-1.riscv64.rpm
gh release delete-asset v28.5.2-riscv64 docker-cli-28.5.1-1.riscv64.rpm
Performance Characteristics
After weeks of production use, here are the real-world performance numbers:
Build Times (BananaPi F3)
| Component |
Time |
Notes |
| Test workflow |
~5 seconds |
Simple architecture checks |
| Docker Engine (complete) |
35-40 minutes |
Includes dockerd, containerd, runc |
| Docker CLI |
12-15 minutes |
Lighter build, fewer dependencies |
| Docker Compose |
8-10 minutes |
Pure Go, fast compilation |
| Tini |
2-3 minutes |
Small C project |
Resource Usage
During a full Docker build:
CPU: 8 cores at 80-95% utilization
RAM: Peak 3.5GB (out of 16GB total)
Disk I/O: ~200MB/s read during image builds
Network: ~50Mbps for downloading Go modules
The BananaPi F3 handles these builds comfortably. It’s not fast by modern standards, but it’s reliable.
Reliability Metrics
Since implementing the systemd service (3 weeks ago):
Uptime: 99.2% (only offline during power outages)
Successful builds: 47 out of 48 (one failure due to disk space)
Average weekly builds: 4-5 (one scheduled, others manual/tracking)
Lessons Learned
Self-Hosted Runners Are Different
The biggest mental shift: self-hosted runners have state. Every assumption you have from using GitHub’s ephemeral runners needs to be re-examined:
Cleanup is your responsibility: Files persist between runs
Dependencies don’t auto-update: You manage Node.js, Go, Docker versions
Disk space accumulates: Docker images, build caches, logs
Reboots happen: Systemd services are mandatory
Architecture-Specific Challenges
Some issues are unique to RISC-V64:
Limited pre-built images: Many Docker images don’t have riscv64 variants
Longer build times: 8 RISC-V cores at lower clock speeds vs 8+ x86_64 cores at 3+GHz
Beta software: Some tools (like the runner itself) are community projects
Documentation gaps: Fewer people have solved these problems before
But none of these are dealbreakers. They just require more attention.
Automation Complexity
The more automated your pipeline, the more places for subtle bugs:
Multi-package repositories: Need careful orchestration to avoid race conditions
Concurrent workflows: APT repository updates can conflict if two packages build simultaneously
Release tag conventions: Different prefixes (v*, cli-v*, compose-v*) require regex matching
Error handling: Silent failures are worse than loud failures
I added retry logic to the APT repository update:
== Push with retry logic for concurrent workflow handling
MAX_RETRIES=5
RETRY_COUNT=0
while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do
if git push origin apt-repo; then
echo "✅ Successfully pushed changes"
break
else
RETRY_COUNT=$((RETRY_COUNT + 1))
# Fetch, rebase, retry...
This handles the case where two packages finish building within seconds of each other.
Testing Is Critical
After the "vanishing packages" bug, I added comprehensive verification:
Package presence checks: Verify all expected packages exist
Version checks: Ensure new versions actually update
Installation tests: Actually install and run the packages
Regression tests: Keep test cases for past bugs
The verification step catches 90% of issues before users see them.
Recommendations for Others
If you’re setting up RISC-V64 CI/CD, here’s what I’d recommend:
Hardware Choices
Minimum viable:
4 cores (RISC-V64)
4GB RAM
64GB storage
Gigabit Ethernet
Ideal (BananaPi F3 with 16GB):
The BananaPi F3 with 8 cores and 16GB RAM exceeds minimum requirements and provides comfortable headroom for concurrent builds. The 8-core processor handles compilation efficiently without becoming a bottleneck.
Software Stack
Required:
Recommended:
Systemd for runner management
Monitoring (Prometheus, Grafana)
Log aggregation (journalctl is enough to start)
Backup strategy for runner state
Workflow Design Principles
Use ubuntu-latest for non-compilation tasks: Save your RISC-V64 runner for actual builds
Add cleanup steps: Especially for RPM/DEB builds
Implement verification: Check that automation actually worked
Handle concurrency: Retry logic for repository updates
Document everything: Future you will thank you
Monitoring and Alerts
I monitor:
Runner online/offline status (GitHub API)
Disk space (alert at 80% full)
Build success rate (alert on 2+ failures)
Docker daemon health
Simple monitoring catches problems early.
Current Status and Future Plans
Today, the build infrastructure is solid:
3 automated workflows: Docker Engine, CLI, Compose
2 package formats: DEB (APT) and RPM
3 weeks uptime: 99%+ reliability
Zero manual intervention: Fully automated builds
But there’s more to do:
Short Term
Gentoo overlay generation: Create ebuilds automatically
Binary verification: Add checksums and signatures to releases
Build caching: Reduce builds from 40 minutes to 20 minutes
Long Term
Multi-architecture support: Add ARM64, x86_64 for comparison
Runner auto-update: Detect new github-act-runner versions
High availability: Second runner for redundancy
Performance profiling: Identify bottlenecks in build process
Conclusion
Setting up CI/CD for RISC-V64 is more complex than mainstream architectures, but it’s absolutely achievable. The key insights:
Use the right tools: github-act-runner works where official runner fails
Embrace self-hosted: Persistent state requires different thinking
Test thoroughly: Automation bugs hide in production for weeks
Monitor everything: Catch problems before users do
The RISC-V64 ecosystem is maturing rapidly. A year ago, this setup would have been significantly harder. Today, it’s straightforward if you know the gotchas.
Most importantly: after three weeks of production use, with 47 successful builds serving real users upgrading their Docker installations, I can confidently say that RISC-V64 is ready for production CI/CD. Not "experimental." Not "beta." Actually ready.
Now go build something.
References
Appendix: Complete Systemd Service File
[Unit]
Description=GitHub Actions Runner (RISC-V64)
After=network.target docker.service
Wants=network.target
[Service]
Type=simple
User=poddingue
WorkingDirectory=/home/poddingue/github-act-runner-test
ExecStart=/home/poddingue/github-act-runner-test/github-act-runner run
Restart=always
RestartSec=10
KillMode=process
KillSignal=SIGTERM
TimeoutStopSec=5min
== Environment variables (optional)
== Environment="RUNNER_WORKDIR=/home/poddingue/github-act-runner-test/_work"
== Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=github-runner
[Install]
WantedBy=multi-user.target
Appendix: Useful Maintenance Commands
== Check runner status
systemctl status github-runner
== View runner logs (real-time)
sudo journalctl -u github-runner -f
== View recent logs (last 100 lines)
sudo journalctl -u github-runner -n 100
== Restart runner
sudo systemctl restart github-runner
== Update runner
cd ~/github-act-runner-test
git pull
go build -v -o github-act-runner .
sudo systemctl restart github-runner
== Check disk space
df -h ~
docker system df
== Clean Docker
docker system prune -a -f
== Check GitHub runner status via API
gh api repos/gounthar/docker-for-riscv64/actions/runners --jq '.runners[] | {name, status, busy}'
== List recent workflow runs
gh run list --limit 10
== View specific workflow run
gh run view RUN_ID
== Manually trigger workflow
gh workflow run docker-weekly-build.yml
Appendix: Troubleshooting Common Issues
Runner Shows Offline
Check service:
systemctl status github-runner
Check logs:
sudo journalctl -u github-runner -n 50
Common causes:
Network connectivity lost
Docker daemon not running
Authentication token expired (after 90 days)
Disk full
Solution:
== Restart
sudo systemctl restart github-runner
== If token expired, reconfigure
cd ~/github-act-runner-test
./github-act-runner remove
./github-act-runner configure --url ... --token NEW_TOKEN
sudo systemctl start github-runner
Build Failures
Check workflow logs:
gh run list --limit 5
gh run view RUN_ID --log
Common causes:
Solution:
== Clean disk
docker system prune -a
rm -rf ~/github-act-runner-test/_work/_temp/*
== Check available space
df -h ~
== Verify Docker works
docker run --rm hello-world
Duplicate Package Versions
Symptom: Release contains multiple versions of same package.
Cause: Self-hosted runner persistence.
Solution:
== Clean RPM build directory
rm -f ~/rpmbuild/RPMS/riscv64/*.rpm
== For Debian packages
rm -f ~/docker-for-riscv64/debian-build/*.deb
== Add cleanup to workflow (see Bug #3 above)
APT Repository Missing Packages
Symptom: apt-get install docker.io fails, package not found.
Diagnosis:
== Check repository contents
gh api repos/gounthar/docker-for-riscv64/contents/dists/trixie/main/binary-riscv64 --jq '.[] | .name'
== Check what packages exist
curl -s https://gounthar.github.io/docker-for-riscv64/dists/trixie/main/binary-riscv64/Packages | grep "Package:"
Solution: See Bug #1 - ensure all packages are downloaded on every repository update.