The Workflow
The pattern that works best for me is not asking the agent to do everything at once. I use an explicit workflow, and it often starts in GitHub or GitLab.
Open WebUI and OpenHands do not play the same role.
Open WebUI is my reasoning and multimodal context table. OpenHands is my workbench. GitHub and GitLab are the real task queue.
GitHub and GitLab as Workflow Inputs
There is a big difference between "trying a model" and "working with a copilot." The difference is where tasks come from.
In my case, many tasks already exist as:
- GitHub issues;
- GitLab issues;
- pull requests with pending review comments;
- merge requests with feedback;
- bugs reported with screenshots;
- technical discussions that need to become code changes.
The flow looks like this:
This helps me avoid vague prompts. Instead of telling the agent "improve this project," I start from a concrete task that already has social and product context: who asked for it, why it matters, what was discussed, which files it may touch and how it will be reviewed.
Example: From Bug Report to Local Patch
Suppose I have this bug:
The search endpoint returns duplicate results when the user sends the same filter with different casing.
In Open WebUI, I start broadly:
I am working on a backend with search endpoints.
There is a bug: if the user sends repeated filters with different casing,
the endpoint returns duplicate results.
Before touching code, give me an investigation plan:
- which files would you look for
- which tests would you expect to find
- which edge cases should be covered
Gemma 4 does not need to touch the repository yet. I only want help thinking.
Then I move to OpenHands with a more concrete task:
Work in /workspace/my-repo.
Goal:
Fix the bug where repeated filters with different casing generate duplicate results.
Constraints:
- Do not change the public API.
- Keep the existing project style.
- Add or adjust focused tests.
- Run the relevant suite before finishing.
Deliverable:
- Summary of changed files.
- Short explanation of the fix.
- Commands executed and their result.
That prompt change is intentional. I do not say "fix it" in a generic way. I give context, boundaries and a verifiable deliverable.
If the bug comes from GitHub or GitLab, I add one more layer:
Remote context:
- Issue: https://github.com/org/repo/issues/123
- Base branch: main
- Suggested work branch: fix/search-filter-deduplication
Read the issue as the functional specification.
If there is ambiguity between the issue and the current code,
prioritize existing behavior and call out the question in the final summary.
When the issue includes screenshots, I inspect them first in Open WebUI with Gemma 4. That lets me turn visual evidence into acceptance criteria before asking OpenHands to edit files.
How I Choose a Gemma 4 Variant
I do not think about models as a ladder where "bigger always wins." I think in lanes.
| Task type |
Gemma 4 variant I would try first |
Why |
| Quick chat, classification, short summaries |
E2B |
Low latency and a good fit for simple tasks |
| Screenshots, diagrams, UI explanation, task drafting |
E4B |
Good balance for multimodal reasoning and general assistance |
| Explaining code, reviewing functions, drafting tests |
E4B / 26B A4B |
Depends on the size of the change and the context |
| Medium refactors, multi-file debugging |
26B A4B |
More capacity without always jumping to the heaviest model |
| Architecture review, long context, complex decisions |
31B |
When quality matters more than latency |
This table is not a universal truth. It is a practical starting point. Local hardware, quantization, runtime and configured context size can change the experience a lot.
In OpenHands, I like having more than one option configured because the agent's behavior changes with the model. A smaller variant may be enough for short inspection tasks. For multi-module planning, I prefer a stronger one. For architectural review, I accept more latency if the answer is more careful.
My Prompt Template for Local Agents
This is the structure I use most often with OpenHands:
Context:
I am in an existing repository. Read before editing.
The task comes from [GitHub/GitLab issue or PR/MR].
Goal:
[describe the expected result in one sentence]
Constraints:
- Keep existing patterns.
- Do not do unrelated refactors.
- Do not change global configuration unless required.
- If there is ambiguity, explain the decision.
Verification:
- Run the related tests.
- If something cannot be run, explain why.
Deliverable:
- Changed files.
- Summary of the change.
- Commands executed.
- Link or reference to the remote task.
- Risks or follow-ups.
With local models, this structure helps a lot. It reduces ambiguity and pushes the agent to behave like a software collaborator instead of a text generator.
The Real Cycle I Use
stateDiagram-v2
[*] --> Think
Think: Open WebUI\nunderstand problem\ntext + images
Think --> Scope
Scope: small task\nissue/PR/MR + constraints
Scope --> Act
Act: OpenHands\nselected Gemma model\nread edit run
Act --> Review
Review: inspect diff\nvalidate tests
Review --> Commit: if good
Review --> Scope: if context is missing
Commit --> [*]
The key is keeping tasks small. A local agent can be very useful, but it is still probabilistic software. My rule is simple: if I could not review the diff in a few minutes, the task is too large.
What Worked Well
The best part of the setup is the feeling of control.
I can start the local stack, switch models, test prompts, share only the folders I want and shut everything down when I am done. For private projects, prototypes and learning, that reduced friction matters.
I also like having separate modes:
-
Multimodal conversation mode: I think with Gemma 4 in Open WebUI using text, images, screenshots and diagrams.
-
Visual generation mode: I create images or supporting assets from Open WebUI when a post, documentation page or product task needs them.
-
Action mode: I delegate a concrete task to OpenHands and choose the Gemma model that best fits.
-
Repository mode: I bring context from GitHub or GitLab and turn it into a local branch with a reviewable diff.
That boundary prevents every conversation from becoming an execution. Not every prompt deserves filesystem access.
What Still Requires Care
Not everything is automatic.
Local agents are sensitive to:
- prompt quality;
- configured context size;
- quantization choices;
- hardware latency;
- runtime stability;
- the model's ability to follow tool instructions.
I also learned that it is useful to keep fallback models. In my stack, I keep coding-specialized models next to the general model. That lets me compare answers or switch lanes if a specific task gets stuck.
Another lesson: connected repositories speed things up, but they also require discipline. A GitHub or GitLab issue can carry a lot of context, but not all of that context is specification. Sometimes it includes opinions, old assumptions or contradictory comments. That is why I like passing through Open WebUI first to synthesize acceptance criteria before opening the OpenHands lane.
Local Security: Not Magic, But Better Boundaries
Running locally does not automatically mean "secure." It means I have more control over where the code lives and which processes can read it.
My basic rules are:
- expose Open WebUI and OpenHands only on
127.0.0.1;
- mount a scoped working directory, not the whole disk;
- review diffs before committing;
- do not give real secrets to the agent;
- use GitHub/GitLab tokens with minimum required permissions when needed;
- avoid mounting global credentials into the sandbox;
- use disposable repositories for aggressive experiments;
- keep logs and configuration outside the application repository.
Privacy does not come from one tool. It comes from designing the workflow with clear limits.
Why This Matters
The discussion around open models often stays at the benchmark level. Benchmarks matter, but as a developer I care about a more practical question:
What can I do today, on my own machine, with enough quality and control to actually change my development workflow?
Gemma 4 points directly at that question. Not because it automatically replaces every closed model, but because it makes a category of local setups more viable: assistants that can reason over text and images, generate supporting material, work with repositories and integrate with open tools.
For me, the near future is not one giant cloud copilot. It is a combination of:
- open models;
- local runtimes;
- hackable interfaces;
- multimodal inputs;
- agents with limited permissions;
- repositories connected to real tasks;
- developers who understand their own architecture.
Gemma 4 fits that direction well.
Base Commands for the Stack
My local flow starts with Ollama on the host:
OLLAMA_CONTEXT_LENGTH=32768 \
OLLAMA_KEEP_ALIVE=30m \
OLLAMA_HOST=0.0.0.0:11434 \
ollama serve
Then I pull the model I want to test:
ollama pull gemma4:e4b
I can also keep multiple variants available and choose by task:
ollama pull gemma4:e2b
ollama pull gemma4:e4b
ollama pull gemma4:26b-a4b
ollama pull gemma4:31b
If your runtime publishes the variants under different names, replace those identifiers with the correct names for Ollama, Hugging Face or Kaggle.
For image generation from Open WebUI, my stack uses a local OpenAI-compatible endpoint. For example:
ollama pull x/flux2-klein:4b
ollama pull x/z-image-turbo
Then I start the interfaces:
docker compose up -d open-webui openhands comfyui
Local URLs:
- Open WebUI:
http://localhost:3000
- OpenHands:
http://localhost:3001
- ComfyUI:
http://localhost:8188
- Ollama API:
http://localhost:11434
To bring in tasks and branches from remote repositories:
git clone git@github.com:org/repo.git
git clone git@gitlab.com:org/repo.git
You can also use gh or glab to fetch issues, check out PRs/MRs or inspect review comments from the terminal.
Minimal OpenHands Configuration
[core]
[llm]
model = "openai/gemma4:e4b"
base_url = "http://host.docker.internal:11434/v1"
ollama_base_url = "http://host.docker.internal:11434"
api_key = "local-llm"
To switch models, I keep explicit model values for the task:
# Fast inspection
model = "openai/gemma4:e2b"
# General balance
model = "openai/gemma4:e4b"
# More complex changes
model = "openai/gemma4:26b-a4b"
# Deeper review
model = "openai/gemma4:31b"
In Docker Compose, the important part is mounting the workspace and pointing OpenHands to the local endpoint:
openhands:
image: docker.openhands.dev/openhands/openhands:1.6
ports:
- "127.0.0.1:3001:3000"
environment:
RUNTIME: "docker"
LLM_MODEL: "openai/gemma4:e4b"
LLM_BASE_URL: "http://host.docker.internal:11434/v1"
LLM_OLLAMA_BASE_URL: "http://host.docker.internal:11434"
LLM_API_KEY: "local-llm"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./workspace:/workspace:rw
- /Users/me/projects:/workspace/host-projects:rw
Image Generation in Open WebUI
In Open WebUI, I enable image generation against my local endpoint:
ENABLE_IMAGE_GENERATION=true
IMAGE_GENERATION_ENGINE=openai
IMAGES_OPENAI_API_BASE_URL=http://host.docker.internal:11434/v1
IMAGES_OPENAI_API_KEY=ollama
IMAGE_GENERATION_MODEL=x/flux2-klein:4b
Final Mental Model
The most important part of the diagram is the last one: developer judgment.
The model accelerates. The agent executes. But the engineering judgment is still mine.
Closing
Gemma 4 is exciting because it lowers the barrier for building more useful local assistants. Not just chatbots. Not just demos. Real workflows where an open model can help understand text and images, generate supporting assets, modify code and validate software inside a machine I control.
My conclusion after building this setup is simple: the leap is not only in the model. It is in connecting the model to a well-designed workflow.
Gemma 4 + Open WebUI + OpenHands + GitHub/GitLab is one concrete way to do that.