My fully offline AI-assisted Linux development machine

This is not a tutorial on how to reproduce every single bit of my setup. My full personal configuration is private because it has too much machine-specific and personal stuff. But I'm making a stripped-down public version with the bare minimum needed for Arch, niri, DMS, OpenCode, and llama.cpp at deepu105/archdots.

This post is more about the current shape of my Linux development machine and why I ended up with this stack.

Machine configuration

The configuration of the machine is quite crucial for this setup. Running a browser, a few IDEs, Docker, terminals, and local LLMs is not exactly a light workload.

My current machine is an ASUS ROG Flow Z13 2025 model. It is a weird little beast. It is technically a tablet, but it has enough CPU, GPU, and memory to behave like a mobile workstation.

The memory is the most interesting part here. For normal development work, 32GB is still fine and 64GB is great. But for local AI work, memory changes everything. A 27B quantized model, a large context window, Docker, Chrome, and an editor can happily eat memory like there is no tomorrow.

Having that much unified memory means the machine can run a useful local coding model without feeling like a science experiment. That is a big deal.

Operating system

I praised Fedora in the previous posts, and I still think Fedora is one of the best Linux distributions for most developers. Updates are smooth, new packages land often, and it mostly stays out of the way.

But this time I went with vanilla Arch Linux. So yes, I use Arch btw! 😉 I know, rolling release and all that. I have been using Linux long enough to know what I was signing up for.

The main reason was simple: I wanted the latest kernel, Mesa, ROCm-adjacent bits, Wayland tools, and desktop packages without waiting for the next distro release. New hardware like the Flow Z13 usually benefits from being closer to the bleeding edge. Arch gives me that. Well, OK, I also fell in love with the sexy new compositors like niri and Hyprland, and Arch is a great way to run those without waiting for backports. I started with Hyprland, but I ended up liking niri better for my workflow, and Arch made it easy to switch and experiment.

I also use Topgrade to keep the system updated. My private config even wires it into DankMaterialShell, so I can see available updates from the bar and trigger an update for everything on the system from pacman/AUR, brew, cargo, npm, VS Code plugins, Docker images, and so on in Kitty.

Desktop environment, or lack of one

This is probably the biggest change from my previous setup. I no longer run GNOME or KDE as my main desktop. I use niri, which is a scrollable tiling Wayland compositor.

If you have not used niri, the workflow is quite different from a regular tiling window manager. Instead of forcing everything into a fixed grid, windows live in columns and you scroll horizontally across them. It sounds odd until it clicks. Once it clicks, it feels very natural on ultrawide monitors and laptop displays. I especially love the touchpad gestures for switching workspaces and moving windows around. It is a very fluid way to manage windows.

Niri and DMS

Niri gives me the compositor. DMS gives me the desktop shell pieces that I would otherwise have to stitch together myself.

This is the kind of stuff where I do not want to maintain five different tools and a bunch of scripts if one project does the job well enough. DMS is still young, but it is already quite useful, especially with niri. It's also quite extensible, and I have already started adding tools that I want. For example, a locally saved TODO widget.

The Flow Z13 also needs some special handling. I have fixes for ASUS hotkeys, touchpad behavior, keyboard backlight, Thunderbolt rescans, and Wi-Fi quirks in my private config. The public archdots repo will only carry the reusable bits. This is Linux on new hardware, so of course there are quirks. What is a Linux experience without glitches, right?

Development tools

My development tools are still mostly boring, in a good way. These are subjective choices, and they do not matter as long as you are comfortable with your tools.

Shell: I use Zsh with zinit, Powerlevel10k, zoxide, and fzf. I still use a bunch of aliases for Git, Docker, package management, Jekyll, and local AI tools.

Terminal: I use Kitty. I have tabs, splits, clipboard bindings, quick access terminal, and a few custom keybindings. It is fast, it works well on Wayland, and it does not get in my way.

Editors: I use Neovim with LazyVim as my default editor. I still use Visual Studio Code depending on the project and what I am testing.

Toolchains: I use SDKMAN! for JDKs, NVM for Node.js, rustup for Rust, Bun, Go, Python, Deno, and the usual Linux build tools.

DevOps: Docker, Docker Compose, kubectl, kdash, Terraform, Distrobox, and so on. Some come from pacman or AUR, some from Homebrew, and some from language-specific installers.

Offline AI-assisted development

I use cloud AI tools as well, and they are useful. But I also wanted a setup where I can code with an AI assistant without sending code, prompts, logs, or half-written ideas to a remote API. Not because every project is secret, but because local-first tooling is a good capability to have especially in a world that's heading towards techno oligarchy.

That points to a small script. It lets me pick a GGUF model, context size, and reasoning mode. It remembers the last choice, so most of the time I just start it and get going.

Here is a quick llama-bench comparison of the local models on my machine. The numbers are tokens per second with ROCm, full GPU offload, flash attention, f16 KV cache, a 4096-token prompt, a 256-token generation, and 3 repetitions.

The full context is 256k tokens. Here is a benchmark with full context for the Qwen variants.

Running Qwen3.6 27B Q8_0 with 256k context in reasoning mode loads around 70% of the GPU memory in my setup and gives around 64 tokens/s for prompt+generation. That is quite good for a local model with that much context.

Once the server is running, OpenCode talks to it like it would talk to any OpenAI-compatible provider. The difference is that the whole loop stays on my machine.

Model	Quantization	Size	Prompt tokens/s	Generation tokens/s
Qwen3.6 27B	Q4_K_M	15.40 GiB	260.06	10.41
Qwen3.6 27B	Q6_K	20.56 GiB	279.37	8.70
Qwen3.6 27B	Q8_0	26.62 GiB	260.12	7.18
Gemma 4 31B IT	Q4_K_M	17.39 GiB	209.57	9.12
Gemma 4 31B IT	Q8_0	30.38 GiB	202.31	6.19

Model	Quantization	Size	Prompt+Generation tokens/s
Qwen3.6 27B	Q4_K_M	15.40 GiB	67.15
Qwen3.6 27B	Q6_K	20.56 GiB	65.77
Qwen3.6 27B	Q8_0	26.62 GiB	64.34

I do not only use local models, though. For complex tasks, I also use frontier models through OpenRouter, mostly Kimi K2.6 and DeepSeek V4. Occasionally I use Copilot CLI and at work, I use Claude Code as well.

For the harness, I prefer OpenCode. I do not see any noticeable performance difference between Claude Code and OpenCode with Kimi or DeepSeek for the kind of coding tasks I do, which is mostly open source projects in Rust and TypeScript. That might vary for other people, of course, but for me OpenCode has been quite good and I especially prefer its UX over others. I'm trying Pi on the side as well to see if I keep it in the mix.

Why local AI coding matters to me

Local AI is not a replacement for everything. The best hosted models are still better for many tasks, especially when you need maximum reasoning quality or very fast responses. But local models have their own sweet spot.

So no, I do not think everyone should run a local coding model. But if you enjoy owning your stack and you have the hardware for it, it is a very satisfying setup.

The AI workflow

For small tasks, I turn reasoning off because it makes tool-heavy work faster. For design questions, debugging, or code review, I turn reasoning on. The script makes that a prompt instead of forcing me to remember a long command.

This is the kind of boring automation I like. It removes friction without hiding what is actually happening.

Productivity and media tools

Browser: Google Chrome is still my primary browser. I also keep Firefox around.

Screen capture: DMS screenshot plugin, screen recorder plugin, and OBS Studio when I need more control.

Images and video: Gimp, Inkscape, Kdenlive, and a few Flatpak utilities like Upscayl and Buzz.

File manager: Dolphin, because KDE apps are still excellent even when KDE is not my main desktop.

What is still not perfect

Of course, not everything is perfect. This is bleeding-edge Linux, on a new ASUS convertible, with a new AMD chip, a Wayland compositor, and a local AI stack. If everything worked perfectly on day one, I would be suspicious.

None of these are deal breakers for me. Most are either already fixed in my private config or on my TODO list.

Conclusion

This is easily the most interesting Linux machine I have used so far. My 2019 setup was beautiful, my 2021 setup was sleek, and this one feels like a proper local-first AI development workstation.

Vanilla Arch gives me the latest bits. Niri gives me a workflow that fits both the tiny built-in screen and my ultrawide monitor. DMS gives me the desktop polish without a full desktop environment. And OpenCode plus llama.cpp gives me an AI coding assistant that can run without the cloud.

It is not the right setup for everyone. If you want a machine that never asks you to think about kernels, ROCm, compositor configs, or model files, this is probably not it. But for me, this is exactly the kind of developer machine that sparks joy.

Top comments (29)

webreflection profile image

Andrea Giammarchi

Joined

Jan 24, 2017

• May 20

I have a similar machine but it's a Desktop one (minisforum 395+ 128GB) but while I've never looked into its BIOS, I've thought the whole point of these machines was to have similar unified memory DGX spark has, as example (and I have one of those too) ... is there any reason you had to explicitly split 64GB of memory here and there as opposite of letting the machine/OS handle that for you? Specially DS4 project (which I love and use on DGX Spark) requires 96GB minimum to run but it doesn't necessarily need to take all that space, although I believe with a 32GB CPU split and a 96GB for the GPU that project should run, still curious to learn/know why nobody on macOS needs to worry about this, and neither do I on my DGX Spark (or maybe it comes pre-configured to handle that automatically) ... thanks!

That being said, nice post ... I feel you for the AMD ROCm state but it's really getting better day by day, can't wait to have it more reliable/robust to make it the mac alternative for developers!

deepu105 profile image

Deepu K Sasidharan

JHipster co-lead, Polyglot dev, Cloud Native Advocate, Developer Advocate @Okta, Author, Speaker, Software craftsman. Loves simple & beautiful code. bit.ly/JHIPSTER-BOOKS

Location

Utrecht, Netherlands
Education

Electrical & Electronics Engineering
Work

Developer advocate at Okta
Joined

Jun 11, 2019

• Jun 2

Last I tried there was some issues in loading models larger than RAM. But I think its not an issue on newer kernels, I'm planning on disabling the split and see how my previous use cases work now.

harjjotsinghh profile image

Harjot Singh

21, Engineer, Building moonshift.io

Email

harjjotsinghh@gmail.com
Location

New Delhi, India
Joined

Oct 25, 2023

• Jun 1

i love that you're focusing on a fully offline setup for AI-assisted development. it’s cool to see how you've customized your environment with arch and niri. if you're ever interested in quickly spinning up a web app, moonshift lets you deploy a next.js + postgres + auth build in about 7 minutes, and you keep the code on your github. let me know if you want to give it a shot for free.

adityamitra profile image

Aditya Mitra

Finding out a place to dig a new hole! Find me at https://bsky.app/profile/adityamitra.bsky.social

Joined

Oct 22, 2019

• May 26

You should also give omp.sh a try.
I found it much better in speed and management that opencode.

pengeszikra profile image

Peter Vivo

The Vibe Archeologist. Creator of mordorjs. |> and touch bar fanatic from Hungary. God speed you! 1John1 + 5John17 |> 1Moses1 = (1Moses2 ... 4.22John21); alpha & omega = !![];

Location

Pomaz
Education

streetwise
Work

full stack developer at TCS
Joined

Jul 24, 2020

• May 12

Looks great! I like to use linux, at least unix based terminal. For example my company laptop is a windows11 but the wls install ubuntu 22.4 partial solve my development workflow. I know that is fare from this handcraftect solutions, but the company requriments are strict, even I can't reach the dev.to from some weird company policy from my working computer. Any way I like your work!

78q6d profile image

uiqtwe6

asasas

Email

reyanabid20@gmail.com
Location

sasa
Joined

Mar 31, 2024

• May 12

Is it a company laptop?

pengeszikra profile image

Peter Vivo

The Vibe Archeologist. Creator of mordorjs. |> and touch bar fanatic from Hungary. God speed you! 1John1 + 5John17 |> 1Moses1 = (1Moses2 ... 4.22John21); alpha & omega = !![];

Location

Pomaz
Education

streetwise
Work

full stack developer at TCS
Joined

Jul 24, 2020

• May 12

2020 Dell i5 16GB Ram, worn english layout keyboard, but I always using US layout - minor confusion.
A good news copilot cli running on cloud so that capacity don't effect the computer.

deepu105 profile image

Deepu K Sasidharan

JHipster co-lead, Polyglot dev, Cloud Native Advocate, Developer Advocate @Okta, Author, Speaker, Software craftsman. Loves simple & beautiful code. bit.ly/JHIPSTER-BOOKS

Location

Utrecht, Netherlands
Education

Electrical & Electronics Engineering
Work

Developer advocate at Okta
Joined

Jun 11, 2019

• May 12

Neah

fyodorio profile image

Fyodor

Why'd you (software engineers) have to go and make things (software development) so complicated...

Location

Backwoods
Education

MSc, Royal Holloway University of London
Work

Product Engineer
Joined

Feb 10, 2018

• May 12

That's a helluva broputer... 😅

only for bros

deepu105 profile image

Deepu K Sasidharan

JHipster co-lead, Polyglot dev, Cloud Native Advocate, Developer Advocate @Okta, Author, Speaker, Software craftsman. Loves simple & beautiful code. bit.ly/JHIPSTER-BOOKS

Location

Utrecht, Netherlands
Education

Electrical & Electronics Engineering
Work

Developer advocate at Okta
Joined

Jun 11, 2019

• May 12

I'm gonna steal broputer 😂 although not sure if I should be offended or not 🤣

fyodorio profile image

Fyodor

Why'd you (software engineers) have to go and make things (software development) so complicated...

Location

Backwoods
Education

MSc, Royal Holloway University of London
Work

Product Engineer
Joined

Feb 10, 2018

• May 13

Nah, no offense, that’s a really cool setup made with lots of love and dedication, I’m pretty sure it pays off big time 👍🏼

rajas_poorna_0f9376cca3f6 profile image

Rajas Poorna

Joined

May 13, 2026

• May 13

Lovely setup!
Have you considered using Qwen3.6 35BA3B?
I use it on my MI50 32GB and basically get a 3x boost in tokens/s (both in and out) for not much intelligence penalty. Also probably worth turning on the feature to remember its thinking, given that you can support its full context window.
Once I saw that kind of tokens/s it was hard to justify the slower dense models.

deepu105 profile image

Deepu K Sasidharan

JHipster co-lead, Polyglot dev, Cloud Native Advocate, Developer Advocate @Okta, Author, Speaker, Software craftsman. Loves simple & beautiful code. bit.ly/JHIPSTER-BOOKS

Location

Utrecht, Netherlands
Education

Electrical & Electronics Engineering
Work

Developer advocate at Okta
Joined

Jun 11, 2019

• May 13

I haven't personally tried it since I saw someone comparing that with dense models for long context tasks and the MOE models hallucinated way more when context was big. I will try it when I have time and see.

deepu105 profile image

Deepu K Sasidharan

JHipster co-lead, Polyglot dev, Cloud Native Advocate, Developer Advocate @Okta, Author, Speaker, Software craftsman. Loves simple & beautiful code. bit.ly/JHIPSTER-BOOKS

Location

Utrecht, Netherlands
Education

Electrical & Electronics Engineering
Work

Developer advocate at Okta
Joined

Jun 11, 2019

• May 13 • Edited on May 13 • Edited

What context are you using

vicchen profile image

Vic Chen

AI builder exploring finance & institutional investing. Building tools to decode how the smart money moves. SF Bay Area.

Location

San Francisco, CA
Education

Stanford University, Computer Science
Work

Founder, building AI tools for finance
Joined

Feb 14, 2026

• May 12

This is the dream setup for anyone who cares about owning their stack. The llama.cpp + ROCm combo on the Flow Z13 is impressive — 128GB unified memory changes the calculus for local AI entirely. I've been thinking about a similar local-first approach for some of my financial data analysis pipelines where I really don't want prompts hitting third-party APIs. The tradeoff you mentioned about context-length slowdown with 27B models matches what I've seen too. Qwen3.6 Q8_0 at 256k context is a solid sweet spot. Thanks for sharing the bench numbers and the archdots repo — exactly the kind of practical detail that's hard to find.

v_rai_7a0813fcee9d16 profile image

Vikassh.

Joined

May 8, 2026

• May 12

Nice article. I never thought about this approach before

galileo_g_60bdf6defcc5ae7 profile image

Galileo G

Joined

May 12, 2026

• May 12

Try Krusader or similar 2 pane keyboard heavy file managers.

v_rai_7a0813fcee9d16 profile image

Vikassh.

Joined

May 8, 2026

• May 13

How has this setup performed under real traffic

deepu105 profile image

Deepu K Sasidharan

JHipster co-lead, Polyglot dev, Cloud Native Advocate, Developer Advocate @Okta, Author, Speaker, Software craftsman. Loves simple & beautiful code. bit.ly/JHIPSTER-BOOKS

Location

Utrecht, Netherlands
Education

Electrical & Electronics Engineering
Work

Developer advocate at Okta
Joined

Jun 11, 2019

• May 13

I have been using it for reviews, quick fixes, repo research etc and have been quite good. Right now building a full fledged filesystem management TUI in Rust. Will report back my findings. So far very impressed, i'm 3 prompts in and its fxing issues after first iteration.

View full discussion (29 comments)

DEV Community

Introducing LlamaStash: a zero-overhead, terminal-native llama.cpp launcher

Machine configuration

Operating system

Desktop environment, or lack of one

Niri and DMS

Development tools

Offline AI-assisted development

Why local AI coding matters to me

The AI workflow

Productivity and media tools

What is still not perfect

Conclusion

Top comments (29)