Interview Protocol as Code: Standardizing Technical Hiring with OpenClaw

DEV Community

Machine-readable outputs - Added fields like idioma_principal, dificuldade_estimada, and feedback_sugestao so the JSON can be consumed downstream: fed into a hiring dashboard, sent to a candidate with constructive feedback, or piped into a hiring tracking system.

Full change log: ATTRIBUTION.md

How I Used OpenClaw

An interview is a state machine: you're always somewhere in a defined sequence. OpenClaw's skill architecture maps to that exactly. I didn't write state management code or API wiring. I wrote the protocol itself in Markdown, and the framework executed it.

Stack:

Runtime: ghcr.io/openclaw/openclaw:latest (Docker)
Model: Gemini 2.5 Flash
Skill: ./skills/interview-agent/SKILL.md
UI: localhost:18789

Why it worked: The friction mattered. In the first week, I rewrote prompts 50+ times. Each iteration: edit the file, refresh the browser. No build. No deploy. That velocity let me test scoring rubrics, question phrasing, and JSON schemas fast enough to actually learn what works.

Demo

Repository: github.com/vec21/ws_dio_entrevistador

Run it:

git clone https://github.com/vec21/ws_dio_entrevistador
cd ws_dio_entrevistador
# Add your GOOGLE_API_KEY to docker-compose.yml
docker compose up -d
# Open http://localhost:18789

Mode 1 - Job Analysis

You provide (Portuguese job description):

Use the skill interview_agent to analyze this job posting as JSON:
Senior Backend Engineer - Fintech
Responsibilities:
- Critical payment APIs
- Event-driven microservices
- Observability and reliability
Requirements: Go or Kotlin, Kafka, AWS

The skill responds with:

{"job_title":"Senior Backend Engineer","seniority":"senior","primary_language":"pt","technical_skills":["Go/Kotlin","Kafka","AWS","Observability"],"risk_flags":["payment systems domain expertise required","high fault tolerance expected"],"estimated_difficulty":"high"}

Mode 4 - Answer Evaluation

You ask a question and the candidate responds:

Question: Tell me about a critical backend system you built.
Candidate response: I implemented idempotency keys, retries with exponential backoff,
database transactions, and latency/error metrics on a payments API.

The skill evaluates:

{"overall_score":4,"sub_scores":{"technical":5,"behavioral":4,"domain_knowledge":3},"positive_signals":["idempotency correctly applied","retry strategy with exponential backoff","latency and error metrics instrumented"],"missing_signals":["no incident response discussion","missing scale and SLA context"],"suggested_feedback":"Ask about the biggest failure that occurred in this system and how recovery was handled."}

The technical_knowledge score is high (idempotency + backoff are exactly right). But domain_knowledge is lower because describing a system without discussing failure modes or scale shows incomplete mastery of fintech reliability concerns. The feedback note guides the next question.

What I Learned

1. Constraints force rigor.

I started with a single global score. Disaster. Two candidates would score the same "3" but for opposite reasons: one brilliant at systems but inarticulate; the other articulate but shallow on design. I split into three scores and suddenly I could see clearly. The score becomes evidence, not a guess.

2. Flow beats features.

The single biggest quality lever wasn't smarter prompts or longer context windows. It was the flow: asking one question at a time, letting the candidate think, adapting the next question based on what you just learned. It feels like a conversation. It is a conversation. But underneath there's explicit structure. That combination-natural flow + explicit criteria-is what makes interviews repeatable and fair.

3. Multilingual means redesigning, not translating.

I could have run Portuguese prompts through a translator. Instead I rewrote them from first principles in Portuguese. Because "leverage" is a loan word in Portuguese that carries different weight. Because what counts as "senior" differs culturally. Designing for two languages forced me to articulate what I was actually measuring instead of hiding behind vague English jargon.

ClawCon Michigan

I did not attend ClawCon Michigan.