Local and arbitrary model support · warpdotdev/warp · Discussion #9619

zachlloyd
Apr 30, 2026
Maintainer

We are trying to figure out the best way to implement local model support and I wanted to start a discussion on our different potential approaches to see what resonates most with the community.

The reason local model support is not trivial for us to implement is that our harness is split between our client (rust, open-source) and server (golang, not currently open). Moving the harness to be entirely on the client is a fair amount of work.

The options we are considering here (not mutually exclusive):

Port the entire harness to the client and open-source it (most work).
Implement Warp as an ACP client and allow folks to use other harnesses within our rich terminal UI.
Implement a new rust-based "lite" local harness that speaks the same protocol that our client understands but supports local models and arbitrary endpoints.
Route local model requests through our server and back to the client via something like ngrok (hacky, but quick).

Questions on my mind:

How important to users is it to use our real harness as opposed to a harness that works as an ACP server?
How important is it that local model requests are truly local, with no server interaction?
Which aspects of our UI are most important for folks wanting local model support?

Replies: 36 comments 40 replies

djdanielsson
Apr 30, 2026

Answers to the Questions part: I am not sure I have a good answer for number 1 and 3 but 2 it's extremely important to me that local model is truly local that is likely the point of why I am using a local model to start with for that task. I haven't been using warp for a long time because it wasn't open source so I do not have many thoughts on what parts of the UI I want at this time for local models, the little I have done with like Claude cli and the information that the UI provides for that is really nice and to have something like that for local models would be cool but it might depend on what harness people are using idk.

2 replies

@zachlloyd

zachlloyd Apr 30, 2026
Maintainer Author

super helpful and tracks with what i expected

@djdanielsson

djdanielsson Apr 30, 2026

going to the options stuff I am kinda interested in number 2 the most I think, I personally want to control my harness and the context I am feeding my agents vs using just another harness.

FFatTiger
Apr 30, 2026

No server transit for any requests

1 reply

@harry-xm

harry-xm May 1, 2026

This. At my company it's a policy violation to use something like ngrok.

officiallymarky
Apr 30, 2026

Warp looks really cool, but the fact it only worked with cloud models always was a deal breaker for me. I would love to have full local AI support for not only for coding, but for the terminal agent when reacting to commands.

0 replies

phidauex
Apr 30, 2026

Thanks for opening up some discussion!

I think for your options, #1 is most appealing, but yes, more work. I'd like to think it could also help your architecture long term by relying less on scaling server side components along with client components. #2 is a bit hacky but could be quick. I connect to my Hermes agent using OpenWebUI because it is nicer than the raw terminal client - I'd connect to it through Warp if it were an option, but that doesn't really support Warp being a standalone tool. #3 could be most practical because it would be fully local, not require a second tool, and for most local models, not being feature-complete would be OK. The smaller context windows mean fewer turns, fewer tools available, etc. But my use case in Warp would mostly be "uh help me remember how this command is used" not "build out an entire ansible deployment for a lab". #4 is probably not worth doing - if someone wants to use a local model, its because they want it local.

I'd rank them - 1, 3, 2, 4

For the questions:

I'm split because I use another agent tool already. However for new users, only having it work if you already have another harness running feels duplicative.
Quite important - I either have no/patchy internet at the time I'm working, or I'm doing something that I need to keep local for confidentiality reasons, and in either case, server routing breaks the whole point.
I don't use the most advanced tools in Warp now, so for me, the "advanced command completion" and inline planning directly in the terminal UI is what I use, and would love to use with a local model that would be fully competent at that.

1 reply

@zachlloyd

zachlloyd Apr 30, 2026
Maintainer Author

extremely helpful as we think this through

mastertyko
Apr 30, 2026

My vote is strongly for option 1.

If Warp supports local or arbitrary models, I think it should mean truly local execution with no server transit. I understand that porting the harness to the client and open sourcing it is the most work, but it seems like the right long term architecture for privacy, offline use, trust, and extensibility.

Thanks!

0 replies

crazygamerZ783
Apr 30, 2026

hey got a warp fork of my own ,trying to get the ollama support work repo:https://github.com/crazygamerZ783/warp-ollama
i would appreciate a bit of help

3 replies

@rozsazoltan

rozsazoltan May 1, 2026

The repo feels a bit strange because the original Git history is missing, and the first commit doesn't show what you actually changed.

@crazygamerZ783

crazygamerZ783 May 1, 2026

it was probably because of github glitches

@regismesquita

regismesquita May 1, 2026

ollama supports openai-compatible as far as I remember, and there are already some implementations out there supporting that.

VicZhang6
Apr 30, 2026

Honestly, I want to use DeepSeek V4 Flash inside Warp — it’s cheap, and it allows me to interact with the terminal using natural language.

1 reply

@jensenojs

jensenojs May 2, 2026

same here, Although I can understand from a business logic perspective why supporting an open method of simply providing a URL + API key is not allowed, from a user demand standpoint, I think it would be more natural.

FunkyFresh67
Apr 30, 2026

Local models in Ollama or similar should be configurable as sources within Warp. Once available, you should be able to select a model during a session, either manually or by directing Warp to use it automatically based on the task or preference.

0 replies

FelixZoe
Apr 30, 2026

6666

0 replies

regismesquita
May 1, 2026

There are already a handful of "local warp server" implementations on your PR list , and on the wild forking from this repo.

People just want to be able to use a software that they really like (warp) without going through something that they don't need (your servers). we might end up with some opensource spin-off leading this if you don't just release a minimalist opensource server that simply allows people to use warp with a openai-compatible upstream.

In the future you can add something feature-rich and supporting a bunch of stuff... but for now people just want to use warp and remote models without touching someone else servers.

0 replies

apetti1920
May 1, 2026

Model selection should be allowed to be

local (lm studio, ollama, etc) 2. also allowed to be configurable and tiered, small model for cmd suggestions (with bash history injection) as well as large for agent interactions

0 replies

zachlloyd
May 1, 2026
Maintainer Author

All this feedback makes sense. We will have a proposed solution here shortly.

0 replies

bernardodsanderson
May 1, 2026

I am mostly interested as I want to use one source of models (openrouter/GLM Coding Plan) for it.

0 replies

chukwunonsomichael189-boop
May 2, 2026

0 replies

chukwunonsomichael189-boop
May 2, 2026

There are already a handful of "local warp server" implementations on your PR list , and on the wild forking from this repo.

In the future you can add something feature-rich and supporting a bunch of stuff... but for now people just want to use warp and remote models without touching someone else servers.

0 replies

chukwunonsomichael189-boop
May 4, 2026

1 reply

@EdenBendheim

EdenBendheim May 5, 2026

spam ^

EdenBendheim
May 5, 2026

I agree with the consensus that #1 is of course the best option, and I would like to point out that I am most interested in warp in its capabilities as a harness.

Like most programmers, I still do use cloud models for the majority of my tasks, and the ability to use my OpenAI Codex sub while still taking advantage of the harness (like Opencode for example) would push me to use warp more.

If this can be achieved in a lightweight manner, option #3 is the clear choice. If not then #1

0 replies

zachlloyd
May 6, 2026
Maintainer Author

Updates on our plan here...

With respect to BYOK and arbitrary endpoint support:

In the next 1-2 weeks we plan on making BYOK available on our free plan and extending it to support arbitrary OAI compatible endpoints. For now this will still go through our server harness, as it’s a much bigger lift to fully move that harness to the client in a short timeframe.
BYOK and arbitrary endpoint support will be available for individuals and companies of up to 10 employees. It will be login-gated, but free to use within that scope. For larger businesses and enterprise customers, they need to go through Business / Enterprise because they are using Warp’s managed harness as part of a broader platform offering, and that usage is metered through platform credits.

With respect to local model support, our plan is to

Work on a client side harness that connects directly to local models. We are currently slotting this into our priority list, but I expect work to start within the next few weeks.
Implement Warp as an ACP client and allow folks to use other harnesses within our rich terminal UI. This addresses the need to BYO inference without connecting to Warp’s servers. See Support for Agent Client Protocol (ACP) #7326
These will not require login and will be completely client side.

Thanks to everyone who has weighed in on this thread.

15 replies

@SergioNR

SergioNR May 15, 2026

@dagmfactory this looks great! However, I noticed the video only showcases for /agent mode - will it also work with autocomplete suggestions, or suggested prompts (eg: stop all docker containers)?

thank you!

@xbmc4lyfe

xbmc4lyfe May 16, 2026

Was just trying to make a build of the code with this exact functionality. Any way we can get a link to the source and make a custom build?

@tao12345666333

tao12345666333 May 19, 2026

Thank you! : ❤️

@Cjava08

Cjava08 May 19, 2026

When is available?

@notno

notno Jun 3, 2026

Any updates on this?

MichaelSL
May 7, 2026

Main obstacle for using Warp at our company is the privacy: we have the strong governance model and we can't use 3rd party LLM providers.

We would use BYOK and point it to the private instance hosted in Azure, AWS etc. Buying a license is not an issue: we just need to make sure that data never goes to any of Warp servers, which probably makes billing a bit challenging for Warp.

We tried other solutions with BYOK, but none of them work with agents as good as Warp.

3 replies

@zcg

zcg May 7, 2026

https://openwarp.zerx.dev/ https://github.com/zerx-lab/warp/releases/tag/v2026.05.06.preview Hello, I noticed this open-source project has been modified and it's already usable. I've downloaded it, installed it locally, and used it. It's really impressive and works great. If you can't wait for the official release, you can use this one for now.

@regismesquita

regismesquita May 7, 2026

@zcg btw I saw openwarp and the problem with it is that they heavily modified the client, making it a hard fork, it will be really hard for them to merge upstream changes.

I made one that follows the suggestion #3 implementing a fake server that allows for external models (that way I can avoid touching actual warp code as much as possible and therefore making it easier to merge upstream changes)

And I remember seeing two or three other PRs here in the repo with some other implementations.

Hopefully my project will be superseded by the

Work on a client side harness that connects directly to local models. We are currently slotting this into our priority list, but I expect work to start within the next few weeks.

until there at least I get to use warp fully local and with my internally hosted models.

@zcg

zcg May 8, 2026

@zcg btw I saw openwarp and the problem with it is that they heavily modified the client, making it a hard fork, it will be really hard for them to merge upstream changes.

I made one that follows the suggestion #3 implementing a fake server that allows for external models (that way I can avoid touching actual warp code as much as possible and therefore making it easier to merge upstream changes)

And I remember seeing two or three other PRs here in the repo with some other implementations.

Hopefully my project will be superseded by the

Work on a client side harness that connects directly to local models. We are currently slotting this into our priority list, but I expect work to start within the next few weeks.

until there at least I get to use warp fully local and with my internally hosted models.

you are right! we cant wait so try

musaabhasan
May 9, 2026

From an enterprise/privacy perspective, the important distinction is not only "local model support" but "local execution boundary." If prompts, repository context, tool traces, or intermediate reasoning still route through a remote harness, many regulated users will not be able to use it even if the final model endpoint is self-hosted.

I would rank the options this way:

Short term: arbitrary OpenAI-compatible endpoint support is useful, but should be labeled clearly if the harness remains server-side.
Medium term: ACP client support can unlock local harnesses without Warp needing to own every provider integration.
Long term: client-side/open harness is the strongest trust model because users can verify where code, prompts, and tool outputs flow.

A good product surface would show an "effective data path" for each provider: what stays local, what goes to Warp, what goes to the model endpoint, and what is logged. That is the question security teams will ask before approving local/private model workflows.

0 replies

AkikoOrenji
May 11, 2026

Came here as heard Warp now supporting Windows. Installed and then immediately uninstalled after realising the product is effectively useless without sign up and sending data to yet another provider. With most workstation level laptops now coming with a dedicated NPU or GPU with a few gig of VRAM (even shared RAM is OK for lower end qwen models) people may as well make use of to make terminal life easier. Personally just want an AI shell for system management and basic automation:

Handling of remote sessions including auto switching from Windows powershell to whatever shell the remote system has. Can't install anything on the remote end so the ability for the harness to recognise the change in shell and keep operating.
Ability for AI to log into remote system and carry out tasks at direction with strong granular guardrails (regex).
Password hand-off back to user on logon for password input (either via SSH key password or remote user password)
Automatic detection of password prompts in remote session for hand-off e.g. sudo, keystores, secondary SSH connections.
The diff tool would be good for other tasks such as comparing configurations before and after changes.
Basically any use case where remote desktop is not available but complex operations need to be carried out e.g. modifying registry keys
generation of quick single shell one liners to manipulate and write data. what’s that sed or awk flag i needed.
generate complex powershell without consulting reams of documentation.

Not interested in coding capabilities as use other tools for that.

1 and 3 are the better options. Given there are already forks in the wild why not encourage them to PR (if they haven't already)

0 replies

smthpickboy
May 12, 2026

It's frustrating how big companies constantly try to seize control of your computer and data. LLMs and their harnesses should act as assistants to the terminal, not as supervisors. If Warp continues with its closed mindset, open alternatives like OpenWarp or other terminal+LLM apps will thrive and take its place.

1 reply

@mz135135

mz135135 May 19, 2026

Mikewhodat
May 15, 2026

If I can contribute in any way whatsoever whether it's bug bounty for your project or contributing code. Please let me know I'd be highly interested. I have a personal vendetta against warp. I was just thinking along the lines of instead of starting. My own repository in starting this whole task, out from scratch, I would join the community.

This is something that I did not do with deep seek

0 replies

compgeniuses
May 18, 2026

now that's music to the ears,Also add ability to fetch model, for /models endpoint, as well as context windows, and whether model has vision or not.

0 replies

petradonka
May 22, 2026
Collaborator

Quick update here: we’ve shipped two related pieces of this work.

BYOK is now available on the Free plan for individual users, and Warp now supports custom inference endpoints compatible with the OpenAI Chat Completions API.

That means you can use your own OpenAI, Anthropic, or Google API key, or connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, a gateway, or a similar setup.

Docs:

BYOK: https://docs.warp.dev/agent-platform/inference/bring-your-own-api-key/
Custom inference endpoints: https://docs.warp.dev/agent-platform/inference/custom-inference-endpoint/

Fully client-side local model support is still a separate direction. We’re planning a lightweight local client harness so Warp can connect directly to local models without routing through Warp’s servers, and we’re also planning support for Agent Client Protocol so developers can bring other harnesses into Warp’s terminal UI.

If you try the new flow and hit a specific issue with a provider, endpoint, or model, please open a focused GitHub issue with the details so we can track it directly.

9 replies

@petradonka

petradonka May 22, 2026
Collaborator

Thanks for the report - could you please open a GitHub issue with anything you may be running into? It'll be easier for us to get these fixed that way!

@regismesquita

regismesquita May 22, 2026

So you are saying that it was supposed to accept internal ips and non-https endpoints and that something is wrong?

@petradonka

petradonka May 22, 2026
Collaborator

No, internal IPs wouldn't work — that'll need the fully client-side model support I mentioned. Whether https should be required, I'm not certain off the top of my head, I could see it being a requirement over the public internet.

@regismesquita

regismesquita May 22, 2026

got it, I can see you edit the comment now, I will wait for the local client, thanks!

@gigberg

gigberg Jun 10, 2026

So why when I use openrouter endpoint , /agent mode still not work with error:

who are you
I'm sorry, I couldn't complete that request.
Request failed with error: ErrorStatus(403, "{\"error\":\"Your account has been blocked from using AI features. If you think this is in error, please contact appeals@warp.dev. Otherwise, please upgrade to a paid plan at https://app.warp.dev/upgrade.\"}")

jleivo
May 24, 2026

I was about to test this, with the understanding that our internal LiteLLM proxy works, but as it turns out it needs to be internet accessible - which is understandable if the traffic bounces through your services. I will wait for the version that does not require publicly accessible endpoint.

though,I got to say, this page https://docs.warp.dev/agent-platform/inference/custom-inference-endpoint/ states
Custom inference endpoint | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans
"internal gateway" is not yet true then.

0 replies

bandesz
May 24, 2026

If you want to test your local LLM before Warp adds support, I used the free tier of Cloudflare Zero Trust Connectors (it's a tunnel that you can run in a Docker container) to make my local LM Studio available publicly (with authentication of course). You'll need a domain though.

Cloudflare turns on "Block AI training bots" by default for your domain, I had to disable that, otherwise Warp was getting 403s.

0 replies

aadilayub
May 26, 2026

This discussion in linked in #8759, but I don't see any mention of support for using ChatGPT Pro/Plus subscriptions. It would be really great to have this on the roadmap. Not everyone can afford paying per-token with API keys.

Echoing @Patrik88:

I want to use Warp as my full-time coding agent, but relying on standard per-token API keys (BYOK) is too expensive for heavy daily use. Implementing a flexible provider layer—similar to the open-source @mariozechner/pi-ai library—would solve this perfectly.

0 replies

Cznorth
May 27, 2026

For folks who need fully self-hosted local/remote model routing today (not just client-side harness migration), one option outside Warp ecosystem: WinkTerm — open-source AI terminal where AI and user share the same PTY. Bring your own API key, type # at the prompt for in-terminal chat, agent pre-fills commands and you press Enter.

Docker deploy, SSH/SFTP, HTTP Agent API, MIT: https://github.com/Cznorth/winkterm

Different product (web self-hosted vs native terminal), but relevant if local-model support timeline matters for your workflow.

0 replies

trigger2k20
Jun 12, 2026

Looking forward to use, any new status on fully client-side local model support or lightweight local client harness so Warp can connect directly to local models without routing trough external (as we say Kirche ums Dorf bringen - Bringing the church around the village) ?

0 replies

Local and arbitrary model support #9619

Uh oh!

zachlloyd Apr 30, 2026 Maintainer

Replies: 36 comments · 40 replies

Uh oh!

Uh oh!

Uh oh!

zachlloyd Apr 30, 2026 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zachlloyd Apr 30, 2026 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zachlloyd May 1, 2026 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zachlloyd May 6, 2026 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zachlloyd
Apr 30, 2026
Maintainer

Replies: 36 comments 40 replies

zachlloyd Apr 30, 2026
Maintainer Author

zachlloyd Apr 30, 2026
Maintainer Author

zachlloyd
May 1, 2026
Maintainer Author

zachlloyd
May 6, 2026
Maintainer Author