My Local Copilot: Gemma 4 + Open WebUI + OpenHands for Coding Without Leaving My Machine

DEV Community

The Workflow

The pattern that works best for me is not asking the agent to do everything at once. I use an explicit workflow, and it often starts in GitHub or GitLab.

Open WebUI and OpenHands do not play the same role.

Open WebUI is my reasoning and multimodal context table. OpenHands is my workbench. GitHub and GitLab are the real task queue.

GitHub and GitLab as Workflow Inputs

There is a big difference between "trying a model" and "working with a copilot." The difference is where tasks come from.

In my case, many tasks already exist as:

GitHub issues;
GitLab issues;
pull requests with pending review comments;
merge requests with feedback;
bugs reported with screenshots;
technical discussions that need to become code changes.

The flow looks like this:

This helps me avoid vague prompts. Instead of telling the agent "improve this project," I start from a concrete task that already has social and product context: who asked for it, why it matters, what was discussed, which files it may touch and how it will be reviewed.

Example: From Bug Report to Local Patch

Suppose I have this bug:

The search endpoint returns duplicate results when the user sends the same filter with different casing.

In Open WebUI, I start broadly:

I am working on a backend with search endpoints.
There is a bug: if the user sends repeated filters with different casing,
the endpoint returns duplicate results.
Before touching code, give me an investigation plan:
- which files would you look for
- which tests would you expect to find
- which edge cases should be covered

Gemma 4 does not need to touch the repository yet. I only want help thinking.

Then I move to OpenHands with a more concrete task:

Work in /workspace/my-repo.
Goal:
Fix the bug where repeated filters with different casing generate duplicate results.
Constraints:
- Do not change the public API.
- Keep the existing project style.
- Add or adjust focused tests.
- Run the relevant suite before finishing.
Deliverable:
- Summary of changed files.
- Short explanation of the fix.
- Commands executed and their result.

That prompt change is intentional. I do not say "fix it" in a generic way. I give context, boundaries and a verifiable deliverable.

If the bug comes from GitHub or GitLab, I add one more layer:

Remote context:
- Issue: https://github.com/org/repo/issues/123
- Base branch: main
- Suggested work branch: fix/search-filter-deduplication
Read the issue as the functional specification.
If there is ambiguity between the issue and the current code,
prioritize existing behavior and call out the question in the final summary.

When the issue includes screenshots, I inspect them first in Open WebUI with Gemma 4. That lets me turn visual evidence into acceptance criteria before asking OpenHands to edit files.

How I Choose a Gemma 4 Variant

I do not think about models as a ladder where "bigger always wins." I think in lanes.

Task type	Gemma 4 variant I would try first	Why
Quick chat, classification, short summaries	E2B	Low latency and a good fit for simple tasks
Screenshots, diagrams, UI explanation, task drafting	E4B	Good balance for multimodal reasoning and general assistance
Explaining code, reviewing functions, drafting tests	E4B / 26B A4B	Depends on the size of the change and the context
Medium refactors, multi-file debugging	26B A4B	More capacity without always jumping to the heaviest model
Architecture review, long context, complex decisions	31B	When quality matters more than latency

This table is not a universal truth. It is a practical starting point. Local hardware, quantization, runtime and configured context size can change the experience a lot.

In OpenHands, I like having more than one option configured because the agent's behavior changes with the model. A smaller variant may be enough for short inspection tasks. For multi-module planning, I prefer a stronger one. For architectural review, I accept more latency if the answer is more careful.

My Prompt Template for Local Agents

This is the structure I use most often with OpenHands:

Context:
I am in an existing repository. Read before editing.
The task comes from [GitHub/GitLab issue or PR/MR].
Goal:
[describe the expected result in one sentence]
Constraints:
- Keep existing patterns.
- Do not do unrelated refactors.
- Do not change global configuration unless required.
- If there is ambiguity, explain the decision.
Verification:
- Run the related tests.
- If something cannot be run, explain why.
Deliverable:
- Changed files.
- Summary of the change.
- Commands executed.
- Link or reference to the remote task.
- Risks or follow-ups.

With local models, this structure helps a lot. It reduces ambiguity and pushes the agent to behave like a software collaborator instead of a text generator.

The Real Cycle I Use

stateDiagram-v2
 [*] --> Think
 Think: Open WebUI\nunderstand problem\ntext + images
 Think --> Scope
 Scope: small task\nissue/PR/MR + constraints
 Scope --> Act
 Act: OpenHands\nselected Gemma model\nread edit run
 Act --> Review
 Review: inspect diff\nvalidate tests
 Review --> Commit: if good
 Review --> Scope: if context is missing
 Commit --> [*]

The key is keeping tasks small. A local agent can be very useful, but it is still probabilistic software. My rule is simple: if I could not review the diff in a few minutes, the task is too large.

What Worked Well

The best part of the setup is the feeling of control.

I can start the local stack, switch models, test prompts, share only the folders I want and shut everything down when I am done. For private projects, prototypes and learning, that reduced friction matters.

I also like having separate modes:

Multimodal conversation mode: I think with Gemma 4 in Open WebUI using text, images, screenshots and diagrams.
Visual generation mode: I create images or supporting assets from Open WebUI when a post, documentation page or product task needs them.
Action mode: I delegate a concrete task to OpenHands and choose the Gemma model that best fits.
Repository mode: I bring context from GitHub or GitLab and turn it into a local branch with a reviewable diff.

That boundary prevents every conversation from becoming an execution. Not every prompt deserves filesystem access.

What Still Requires Care

Not everything is automatic.

Local agents are sensitive to:

prompt quality;
configured context size;
quantization choices;
hardware latency;
runtime stability;
the model's ability to follow tool instructions.

I also learned that it is useful to keep fallback models. In my stack, I keep coding-specialized models next to the general model. That lets me compare answers or switch lanes if a specific task gets stuck.

Another lesson: connected repositories speed things up, but they also require discipline. A GitHub or GitLab issue can carry a lot of context, but not all of that context is specification. Sometimes it includes opinions, old assumptions or contradictory comments. That is why I like passing through Open WebUI first to synthesize acceptance criteria before opening the OpenHands lane.

Local Security: Not Magic, But Better Boundaries

Running locally does not automatically mean "secure." It means I have more control over where the code lives and which processes can read it.

My basic rules are:

expose Open WebUI and OpenHands only on 127.0.0.1;
mount a scoped working directory, not the whole disk;
review diffs before committing;
do not give real secrets to the agent;
use GitHub/GitLab tokens with minimum required permissions when needed;
avoid mounting global credentials into the sandbox;
use disposable repositories for aggressive experiments;
keep logs and configuration outside the application repository.

Privacy does not come from one tool. It comes from designing the workflow with clear limits.

Why This Matters

The discussion around open models often stays at the benchmark level. Benchmarks matter, but as a developer I care about a more practical question:

What can I do today, on my own machine, with enough quality and control to actually change my development workflow?

Gemma 4 points directly at that question. Not because it automatically replaces every closed model, but because it makes a category of local setups more viable: assistants that can reason over text and images, generate supporting material, work with repositories and integrate with open tools.

For me, the near future is not one giant cloud copilot. It is a combination of:

open models;
local runtimes;
hackable interfaces;
multimodal inputs;
agents with limited permissions;
repositories connected to real tasks;
developers who understand their own architecture.

Gemma 4 fits that direction well.

Base Commands for the Stack

My local flow starts with Ollama on the host:

OLLAMA_CONTEXT_LENGTH=32768 \
OLLAMA_KEEP_ALIVE=30m \
OLLAMA_HOST=0.0.0.0:11434 \
ollama serve

Then I pull the model I want to test:

ollama pull gemma4:e4b

I can also keep multiple variants available and choose by task:

ollama pull gemma4:e2b
ollama pull gemma4:e4b
ollama pull gemma4:26b-a4b
ollama pull gemma4:31b

If your runtime publishes the variants under different names, replace those identifiers with the correct names for Ollama, Hugging Face or Kaggle.

For image generation from Open WebUI, my stack uses a local OpenAI-compatible endpoint. For example:

ollama pull x/flux2-klein:4b
ollama pull x/z-image-turbo

Then I start the interfaces:

docker compose up -d open-webui openhands comfyui

Local URLs:

Open WebUI: http://localhost:3000
OpenHands: http://localhost:3001
ComfyUI: http://localhost:8188
Ollama API: http://localhost:11434

To bring in tasks and branches from remote repositories:

git clone git@github.com:org/repo.git
git clone git@gitlab.com:org/repo.git

You can also use gh or glab to fetch issues, check out PRs/MRs or inspect review comments from the terminal.

Minimal OpenHands Configuration

[core]
[llm]
model = "openai/gemma4:e4b"
base_url = "http://host.docker.internal:11434/v1"
ollama_base_url = "http://host.docker.internal:11434"
api_key = "local-llm"

To switch models, I keep explicit model values for the task:

# Fast inspection
model = "openai/gemma4:e2b"
# General balance
model = "openai/gemma4:e4b"
# More complex changes
model = "openai/gemma4:26b-a4b"
# Deeper review
model = "openai/gemma4:31b"

In Docker Compose, the important part is mounting the workspace and pointing OpenHands to the local endpoint:

openhands:
 image: docker.openhands.dev/openhands/openhands:1.6
 ports:
 - "127.0.0.1:3001:3000"
 environment:
 RUNTIME: "docker"
 LLM_MODEL: "openai/gemma4:e4b"
 LLM_BASE_URL: "http://host.docker.internal:11434/v1"
 LLM_OLLAMA_BASE_URL: "http://host.docker.internal:11434"
 LLM_API_KEY: "local-llm"
 volumes:
 - /var/run/docker.sock:/var/run/docker.sock
 - ./workspace:/workspace:rw
 - /Users/me/projects:/workspace/host-projects:rw

Image Generation in Open WebUI

In Open WebUI, I enable image generation against my local endpoint:

ENABLE_IMAGE_GENERATION=true
IMAGE_GENERATION_ENGINE=openai
IMAGES_OPENAI_API_BASE_URL=http://host.docker.internal:11434/v1
IMAGES_OPENAI_API_KEY=ollama
IMAGE_GENERATION_MODEL=x/flux2-klein:4b

Final Mental Model

The most important part of the diagram is the last one: developer judgment.

The model accelerates. The agent executes. But the engineering judgment is still mine.

Closing

Gemma 4 is exciting because it lowers the barrier for building more useful local assistants. Not just chatbots. Not just demos. Real workflows where an open model can help understand text and images, generate supporting assets, modify code and validate software inside a machine I control.

My conclusion after building this setup is simple: the leap is not only in the model. It is in connecting the model to a well-designed workflow.

Gemma 4 + Open WebUI + OpenHands + GitHub/GitLab is one concrete way to do that.

Top comments (3)

hollowhouse profile image

Hollow House Institute

Behavioral AI Governance researcher building Execution-Time Governance infrastructure for multi-agent AI systems, Governance Telemetry, and Longitudinal Accountability.

Email

ethericwebweaver1111@gmail.com
Location

Arlington Texas
Joined

Jan 23, 2026

• May 9

What stands out to me is that local workflows also change the governance environment itself.

A lot of local AI discussion focuses on tooling, orchestration, and developer control. But once systems start operating outside centralized infrastructure, visibility and enforcement continuity start degrading too.

The harder question becomes:

how do telemetry, Decision Boundaries, and Stop Authority keep persisting once execution becomes decentralized and partially offline?

theelmix profile image

Enny Rodríguez

Ingeniero informático y líder tecnológico especializado en IA, automatización y arquitectura de software. Combino visión estratégica con ejecución técnica, impulsando productos digitales.

Joined

May 8, 2026

• May 9

I completely agree. Local-first AI should not become governance-free AI.

That’s the real governance challenge.

In a decentralized/offline setup, telemetry, decision boundaries, and stop authority have to move into the local runtime.

Telemetry: local append-only logs, signed receipts, command history, file access records, model usage, diffs, and approval events, synced when online.

Decision Boundaries: policy-as-code enforced locally — allowed commands, allowed folders, repo scopes, network limits, secret protection, and human-in-the-loop thresholds.

Stop Authority: online kill switch when connected; offline bounded leases when disconnected. If the policy lease expires, or the action is high-risk, the agent must stop or require reconnection.

So local-first AI does not remove governance. It forces governance to become runtime-native.

For me, the next evolution of this setup is execution-time telemetry: tracking what files were accessed, what commands were proposed or executed, what model was used, and what code changes were produced.

Running locally solves privacy and control, but accountability still needs to be designed explicitly.

hollowhouse profile image

Hollow House Institute

Behavioral AI Governance researcher building Execution-Time Governance infrastructure for multi-agent AI systems, Governance Telemetry, and Longitudinal Accountability.

Email

ethericwebweaver1111@gmail.com
Location

Arlington Texas
Joined

Jan 23, 2026

• May 9

That’s the shift I keep coming back to too.

A lot of existing governance still assumes the infrastructure itself remains centralized and observable.

But once execution becomes local, partially offline, and user-controlled, governance can no longer depend on external visibility alone.

At that point governance has to persist as operational infrastructure inside the runtime itself:
telemetry continuity, enforceable Decision Boundaries, replayable evidence, escalation logic, and Stop Authority conditions that survive disconnection and migration.

Otherwise systems may remain technically functional while accountability continuity quietly degrades.

I think execution-time telemetry becomes critical here because behavior during runtime is ultimately the thing that matters most.