3. Configuration layers: the first source of truth of a productized CLI
Profile solves composition.
But where does the profile itself come from?
That brings us to configuration layers.
A demo CLI usually reads only environment variables.
For example:
OPENAI_API_KEY
ANTHROPIC_API_KEY
AGENT_MODEL
A productized CLI cannot rely only on environment variables.
Environment variables are too flat.
They are suitable for secrets.
They are also suitable for temporary overrides.
But they are not suitable for expressing complex policy.
For example:
This project uses the code profile by default.
This project forbids automatically running deployment commands.
This team allows read-only GitHub MCP access.
This session temporarily switches to the review profile.
CI mode must output JSONL and must not use interactive approval.
These are not isolated variables.
They have sources.
They have precedence.
They have merge rules.
They have conflict explanations.
A productized CLI should distinguish at least these configuration layers:
built-in default layer: the system's safe defaults.
user layer: global user preferences, provider credential references, common profiles.
project layer: repository instructions, allowed extensions, project tool policy.
session layer: temporary mode, output target, permission switches for this run.
command-line layer: explicit one-off overrides passed by the user.
environment layer: secrets and deployment environment injection.
The point is not that more layers are always better.
The point is that every final configuration value can answer:
Where did it come from?
Why is this the value?
Who overrode whom?
Was this override allowed?
So configuration merging should not simply be:
const config = {
...defaults,
...userConfig,
...projectConfig,
...envConfig,
...cliFlags,
};
This code looks concise.
But it cannot explain itself.
When the user asks:
Why can't this project automatically edit files?
The system can only say:
That is the final result.
That is not enough.
A productized CLI needs configuration provenance.
That means recording the source of every value.
It can be abstracted like this:
type ConfigValue<T> = {
value: T;
source: "default" | "user" | "project" | "session" | "flag" | "env";
path: string;
reason?: string;
};
type ResolvedConfig = {
activeProfile: ConfigValue<string>;
permissionMode: ConfigValue<PermissionMode>;
providerPreference: ConfigValue<ProviderPreference>;
enabledExtensions: ConfigValue<string[]>;
outputMode: ConfigValue<OutputMode>;
};
This allows harness doctor to explain the situation.
For example:
activeProfile = code
source: project
path: .harness/config.yaml
permissionMode = ask
source: user
path: ~/.harness/config.yaml
reason: project cannot escalate permission mode above user default
There is an important governance point here:
Not every higher-priority layer may override every lower-priority layer.
For example, project configuration should not force a user's read-only mode into automatic edit mode.
Command-line flags should not necessarily bypass organization policy.
Environment variables should not enable high-risk tools merely because a name happens to exist.
If an organization or hosted-side governance policy exists, it should participate as an upper bound in arbitration.
User, project, and flag layers can tighten boundaries, and they can choose runtime behavior within the allowed space, but they cannot expand privileges beyond governance policy.
So the configuration layer must not only "merge."
It must also "arbitrate."
As a decision path:
Productized CLI: profile, extension, multi-provider Mermaid 3
The most important node in this diagram is merge arbitration.
It says the configuration layer is not a simple priority stack.
It is the first source of truth of the productized CLI.
If the configuration layer has no provenance, the later profile, provider, and extension layers become hard to diagnose.
You do not know why an extension was loaded.
You do not know why the model switched to a fallback provider.
You do not know why a tool was not visible in the current turn.
In the end, users will attribute all behavior to "model instability."
But the thing that is actually unstable is the entry configuration.
4. Multi-provider: provider details must not leak into the user experience
Article 12 already established:
The provider can only return model events and tool intent.
The provider cannot execute tools.
The provider cannot own session state.
The provider cannot decide whether the loop continues.
In a productized CLI, this discipline extends one layer outward.
Not only should the runtime internals avoid provider pollution.
The user experience should avoid provider pollution too.
In other words, users should not have to relearn the whole CLI simply because they switch providers.
These experiences are all bad:
Under provider A, the tool is called read_file; under provider B, it is called file_read.
Provider A emits token events; provider B emits raw chunks.
Provider A gives understandable rate-limit errors; provider B throws raw SDK errors.
Provider A supports tool streaming; provider B does not, so CLI progress display disappears.
The review profile works under provider A, but profile fields stop working under provider B.
These are all cases of provider details leaking through.
The goal of multi-provider is not "connect to many models."
The real goal is:
When switching among providers, the Harness control semantics do not change.
This requires a Provider Resolver.
A Provider Resolver is not an adapter.
An adapter translates one provider's requests and responses into the internal contract.
A resolver chooses which provider to call for this turn based on profile, task, capability needs, cost, availability, and fallback strategy.
Think of it this way:
Profile says: I need a provider suitable for code tasks.
Runtime says: this turn needs streaming, tool calling, and a large context.
Config says: the user prefers provider A and falls back to provider B under rate limits.
Resolver says: choose provider A for this turn; if it fails, switch by explainable rules.
In types:
type ProviderPreference = {
primary: ProviderSelector;
fallbacks: ProviderSelector[];
requiredCapabilities: ProviderCapability[];
costCeiling?: CostPolicy;
latencyPreference?: "low" | "balanced" | "quality";
};
type ProviderCapability =
| "streaming"
| "tool-intent"
| "structured-output"
| "large-context"
| "vision";
type ProviderResolution = {
selectedProvider: string;
selectedModel: string;
reason: string;
missingCapabilities: ProviderCapability[];
fallbackChain: string[];
};
There is a key boundary here:
ProviderCapability is internal capability semantics.
It is not a private field from a provider SDK.
Do not write a profile like this:
openai:
response_format: json_schema
anthropic:
tool_choice: auto
That makes the profile directly depend on provider details.
A better expression is:
profile: code
provider:
require:
- streaming
- tool-intent
- structured-output
prefer:
quality: high
latency: balanced
How a specific provider expresses structured output is the provider adapter's job.
The CLI layer only expresses runtime needs.
This matches the principle from Article 12:
Provider-private formats stop at the provider runtime.
Profile and CLI only see internal capabilities.
Another important part of multi-provider is fallback.
Fallback is not simply catching an error and switching models.
If the primary provider fails because of rate limits, switching to a backup provider may seem reasonable.
But it creates a chain of questions.
Does the backup provider support the current tool schema?
Does the backup provider support the same streaming events?
Can the backup provider accept the current context length?
Is the backup provider's safety policy consistent?
How does the event log record the fallback?
Should the user-facing output show that a switch happened?
If these questions do not have unified answers, fallback creates new instability.
As a flow:
Productized CLI: profile, extension, multi-provider Mermaid 4
The most important thing in this diagram is that both providers finally flow into ModelEvent.
They do not flow into provider raw chunks.
They do not flow into SDK-private objects.
They do not flow into a pile of if/else branches.
As soon as provider details leak into Core, multi-provider tears the system apart.
As soon as provider details leak into CLI user experience, multi-provider trains users to become configuration engineers.
A productized CLI should do the opposite:
Unified contract internally.
Unified experience externally.
Provider adapters and resolver absorb the differences in between.
5. Extension: installed is not enabled, enabled is not visible, visible is not executable
In Article 11, when discussing Plugin Host, we clarified one boundary:
Extensions do not open up core.
Extensions let external capabilities enter the same Harness discipline.
In a productized CLI, this boundary becomes more concrete.
Because users will really install extensions.
For example:
harness extension install github
harness extension install playwright
harness extension install team-code-style
This looks like an ordinary plugin system.
But Agent CLI extensions are more sensitive than ordinary CLI plugins.
Because an extension may introduce:
new tools.
new MCP servers.
new Skills.
new Hooks.
new project instructions.
new provider adapters.
new permission presets.
new output renderers.
Any of these capability categories may affect model behavior.
So an extension lifecycle cannot be only install / uninstall.
It should be split into at least these stages:
discover: find installable extensions.
install: install into local or project scope.
verify: verify source, version, signature, or checksum.
trust: user or organization policy decides whether to trust it.
load: Plugin Host parses the manifest.
contribute: declare provider/tool/hook/skill/context/output capabilities.
catalog: enter the Capability Catalog.
visible: become visible this turn after Discovery Policy.
execute: execute through Tool Runtime and the permission gate.
audit: enter the session log and trace.
The three most important sentences are:
Installed is not enabled.
Enabled is not visible.
Visible is not executable.
Installed only means files exist.
Enabled means the system allows it to contribute capabilities.
Visible means the model can see some of those capabilities this turn.
Executable means a concrete intent passed permission, argument, risk, and user-approval checks.
If these stages are mixed together, extensions become a security hole.
For example, a project includes an extension.
After the user clones the project, the CLI loads it automatically.
The extension declares a deploy_production tool.
The model sees it while fixing tests.
The tool parameters do not require permission confirmation.
At that point, the extension system is not extending capability.
It is opening a bypass for the model.
A productized CLI must avoid this.
An extension manifest should only declare.
It should not execute.
For example:
type ExtensionManifest = {
id: string;
version: string;
source: "builtin" | "user" | "project" | "organization";
contributes: {
providers?: ProviderContribution[];
tools?: ToolContribution[];
skills?: SkillContribution[];
hooks?: HookContribution[];
contextSources?: ContextSourceContribution[];
outputRenderers?: OutputRendererContribution[];
};
trust: TrustRequirement;
permissions: PermissionDeclaration[];
};
This manifest enters the Plugin Host.
The Plugin Host validates the shape.
The Capability Catalog records candidate capabilities.
Discovery Policy decides the visible capabilities for this turn.
Tool Runtime executes.
Audit records.
The extension itself should not bypass these layers.
As a lifecycle:
Productized CLI: profile, extension, multi-provider Mermaid 5
This diagram connects Article 11 and Article 17.
Plugin Host solves how extensions enter the system.
Capability Discovery solves when extension capabilities enter the model's field of view.
Tool Runtime solves how extension tools execute.
Profile decides which kinds of extensions are allowed to participate in the current runtime identity by default.
For example, the code profile may allow:
local-tools
test-runner
project-skills
github-readonly
But not:
deploy-production
database-write
cloud-admin
The review profile may allow read-only GitHub.
But not Edit.
The research profile may allow Web and citation tools.
But not workspace modifications.
This is the relationship between profile and extension:
extension provides candidate capabilities.
profile defines the default boundary of the runtime identity.
discovery decides the visible set for this turn.
permission decides whether a concrete call may land.
All four layers are necessary.
Therefore extensionAllowlist is only a prerequisite for trust / enable.
It is not the visible set for this turn, and it is not permission allow for a concrete tool intent.
6. Project instructions: do not dump all repository rules into the system prompt
A productized CLI will also run into a very practical need:
Every project has its own rules.
For example:
This repository uses pnpm.
The test command is pnpm test.
Do not modify generated files.
React components must use the project's design system.
API errors must use the shape { code, message }.
Run typecheck before committing.
The easiest thing for a demo CLI to do is read a project rules file at startup and concatenate it into the system prompt.
That is acceptable early on.
But it breaks down after productization.
First, project rules may be long.
Stuffing all of them into the system prompt squeezes out task context.
Second, some project rules only apply to certain paths.
Frontend component rules should not affect backend migration files.
Third, project rules may conflict with the profile.
The project says "automatic fixing is allowed," but the user is currently using the review profile.
Fourth, project rules may be untrusted.
Repository files themselves may contain prompt injection.
So project instructions should be treated as a context source.
Not as an unconditional system prompt.
A context source needs source, scope, trust level, and activation conditions.
For example:
type ContextSource = {
id: string;
source: "builtin" | "user" | "project" | "extension";
trust: "trusted" | "workspace" | "untrusted";
appliesTo?: PathPattern[];
profileScope?: string[];
loader: ContextLoader;
projection: "summary" | "full" | "handle";
};
The projection field is critical.
Some project instructions can be summarized and kept resident.
Some should only provide a handle so the model can read them when needed.
Some should enter context only after a path match.
This is the same idea as progressive disclosure for Skills.
Do not keep all experience resident.
Let it appear at the right moment.
In the "fix failing tests" example, the CLI can first load a lightweight project-instruction summary:
The project uses pnpm.
Prefer pnpm test for tests.
Read relevant tests before editing.
Do not edit dist/ or generated/.
When the Agent reads a component file under packages/frontend, activate frontend rules.
When the Agent reads a database migration, activate database rules.
When the Agent prepares to edit a file, hand forbidden-path policy to the permission gate.
This is much more stable than "put the full project rules into the prompt."
Because the model sees rules relevant to the current task.
The Harness stores rule source and scope.
The permission system enforces the hard boundaries inside the rules.
Project instructions are no longer one huge prompt.
They become runtime inputs jointly managed by profile and context policy.
7. Runtime checks: doctor is the self-diagnostic entry point of a productized CLI
When a CLI enters the productized stage, many problems should not wait until the user task fails.
For example:
provider credentials are missing.
the default profile does not exist.
an extension is installed but not trusted.
the MCP server config path is wrong.
the project rules file fails to parse.
permission mode conflicts with profile.
the current provider does not support tool-intent.
JSON output mode has interactive approval enabled.
If these problems surface only inside the Agent loop, the experience is poor.
The user will think the model did something silly again.
But the real issue is that the startup environment does not satisfy the runtime requirements.
So a productized CLI needs doctor or status.
They are not decorative commands.
They are preflight checks.
doctor should check whether the Harness can run correctly under the current profile.
For example:
harness doctor --profile code
The output should not merely be:
OK
It should report by layer:
Config: OK
Profile: code resolved from project
Provider: primary available, fallback configured
Extensions: github-readonly trusted, playwright disabled
Capabilities: Read/Grep/Bash/Edit visible under ask mode
Context: project instructions loaded, frontend skill conditional
Output: tty interactive, json events disabled
Warnings:
- CI MCP is configured but not reachable
This lets the user understand system state before the run.
More importantly, it lets the host understand system state too.
Because Article 22 is not only about the CLI running by itself.
It also prepares for the CLI Host + Workbench in M7/M11.
A Workbench cannot understand Agent state by reading human-friendly terminal text.
It needs a stable protocol.
So doctor should ideally support structured output too:
harness doctor --profile code --json
The return value should not be a pile of logs.
It should be a stable schema.
That schema can be consumed by IDEs, Workbenches, CI, and remote hosts.
For example:
type DoctorReport = {
ok: boolean;
profile: {
id: string;
source: string;
warnings: Diagnostic[];
};
providers: ProviderDiagnostic[];
extensions: ExtensionDiagnostic[];
capabilities: CapabilityDiagnostic[];
output: OutputDiagnostic;
};
Behind this is a productization principle:
The human interface can be beautiful.
The machine interface must be stable.
If the CLI only has TTY text, the Workbench can only parse strings.
That is fragile.
If the CLI has a stable event protocol, the Workbench can turn the Agent run into a visual workspace.
8. Stable output protocol: terminal rendering is only one projection of output
A demo CLI usually outputs something like this:
Assistant: I will check the tests.
Running: npm test
...
The failure is...
That is enough for humans.
But a productized CLI has multiple consumers.
Humans watch in the terminal.
IDEs watch in sidebars.
Workbenches watch in task timelines.
CI watches in log systems.
Remote hosts watch in web UIs.
If the CLI only outputs free-form text, these consumers can only guess.
So a productized CLI should split output into two layers:
Event Stream: stable structured events, the factual output.
Renderer: renders events into TTY, JSONL, Workbench UI, or CI logs.
This matches the idea of Session Replay.
The source of truth is events.
The interface is only a projection.
A minimal event protocol can include:
type CliEvent =
| { type: "session.started"; sessionId: string; profile: string }
| { type: "provider.selected"; provider: string; model: string; reason: string }
| { type: "assistant.delta"; text: string }
| { type: "tool.intent"; tool: string; intentId: string }
| { type: "tool.approval.requested"; intentId: string; risk: string }
| { type: "tool.started"; intentId: string }
| { type: "tool.finished"; intentId: string; exitCode?: number }
| { type: "capability.visible_set.changed"; added: string[]; removed: string[] }
| { type: "diagnostic.warning"; message: string }
| { type: "session.finished"; outcome: "completed" | "failed" | "needs-user" };
Do not rush to make this exhaustive.
The key is to separate facts from rendering first.
The terminal UI can render tool.started as a spinner.
The Workbench can render it as a timeline node.
CI can render it as grouped logs.
JSONL can output it unchanged, one event per line.
But the event itself remains stable.
As a diagram:
Productized CLI: profile, extension, multi-provider Mermaid 6
The most important point in this diagram is that Runtime should not directly output pretty text.
Runtime outputs factual events.
Renderer handles presentation.
This also has another benefit:
multi-provider does not affect the output protocol.
extensions cannot privately print and break JSON.
profile can choose the output contract.
hosts can parse Agent state reliably.
If an extension needs to output progress, it should submit structured events.
It should not directly console.log.
Otherwise it will break machine output.
That is the difference between a productized CLI and a demo CLI.
A demo CLI tries to "look like it runs."
A productized CLI tries to "be understandable by every consumer."
9. How the same task flows through a productized CLI
Now connect all layers back to the running "fix failing tests" example.
The user types:
harness --profile code "Help me understand why this project is failing tests, and fix it"
The first step of the productized CLI is not to call the model.
It resolves configuration first.
It discovers that the project config also defaults to the code profile.
The command line explicitly specifies code, and there is no conflict.
The user's global policy requires confirmation for high-risk commands.
Project rules forbid modifying generated/.
The extension configuration enables test-runner and github-readonly.
Provider preference requires streaming and tool-intent.
The output target is an interactive TTY, while an internal JSON event stream is retained.
Second, Provider Resolver chooses the provider.
It finds the primary provider available.
The primary provider supports tool-intent, streaming, and the current context length.
It records a provider.selected event.
Third, Extension Runtime loads trusted extensions.
test-runner contributes a project-aware test command skill.
github-readonly contributes read-only MCP tools.
They enter the Capability Catalog.
But not all of them are visible to the model yet.
Fourth, Capability Discovery computes the visible set for this turn.
The "fix failing tests" task exposes by default:
Read
Grep
Bash(test commands with approval)
Edit(with workspace policy)
SkillSearch
ToolSearch
GitHub MCP does not enter the visible set yet.
There is no evidence yet that remote PR or CI information is needed.
Fifth, Agent Runtime starts the loop.
The model proposes a tool intent to run tests.
Tool Runtime validates the command.
Permission Gate determines that pnpm test is a low-risk test command.
After execution, the observation is written back.
Sixth, the model searches relevant code based on the failure log.
It reads files.
It discovers the failing test is in a frontend component.
The path matches frontend project instructions.
Capability Discovery adds the frontend component skill to the visible set.
Seventh, the model proposes an edit intent.
The permission check confirms the target is not under generated/.
The edit executes.
Events enter the session log.
Eighth, the model runs tests again.
The tests pass.
Renderer outputs a human-readable summary.
The event stream emits session.finished.
The whole process can be drawn as a sequence diagram:
Productized CLI: profile, extension, multi-provider Mermaid 7
The most important point in this diagram is:
profile resolution happens before the loop.
extension contribution happens before discovery.
provider selection happens before the model request.
output protocol starts from runtime events, not from a final string concatenation.
Once the order is reversed, the system becomes brittle.
For example, if you start the loop first and then discover extensions on the fly, the model cannot see the right capabilities on the first turn.
If you call the provider first and only then discover it does not support tool-intent, the system can only fail midway.
If you output free-form text first and later try to make Workbench parse it, you are left with fragile log parsing.
The stability of a productized CLI comes from these upfront resolutions.
10. Minimum landing path: do not build the whole platform at once
At this point, it is easy to imagine Productized CLI as a huge system.
But we should keep the principle of this series:
Each article advances one minimal verifiable increment.
The minimum landing path for Article 22 does not need a plugin marketplace.
It does not need an account system.
It does not need cloud sync.
It does not need a full Workbench.
It only needs to upgrade the demo CLI into a local product with a stable entry protocol.
A minimal file boundary could be:
src/cli/
main.ts
args.ts
output.ts
doctor.ts
src/config/
defaults.ts
loader.ts
merge.ts
provenance.ts
src/profile/
profile.ts
resolver.ts
builtin-profiles.ts
src/provider/
resolver.ts
capabilities.ts
src/extensions/
manifest.ts
loader.ts
trust.ts
src/events/
cli-events.ts
renderers/
tty.ts
jsonl.ts
First, create built-in profiles.
For example:
const builtinProfiles: AgentProfile[] = [
{
id: "chat",
description: "Read-only Q&A, no external actions",
policy: "readonly",
toolBundles: ["read-only"],
contextSources: ["user", "project-summary"],
providerPreference: {
primary: { family: "general" },
fallbacks: [],
requiredCapabilities: ["streaming"],
},
outputContract: "tty-interactive",
extensionAllowlist: [],
},
{
id: "code",
description: "Fix code, run tests, and modify the workspace under control",
policy: "workspace-edit-ask",
toolBundles: ["local-code-tools"],
contextSources: ["user", "project", "conditional-skills"],
providerPreference: {
primary: { family: "code" },
fallbacks: [{ family: "general" }],
requiredCapabilities: ["streaming", "tool-intent"],
},
outputContract: "tty-events",
extensionAllowlist: ["test-runner", "github-readonly"],
},
];
Second, implement configuration resolution and provenance.
For example:
const resolved = resolveConfig({
defaults: loadDefaultConfig(),
user: await loadUserConfig(),
project: await loadProjectConfig(cwd),
session: sessionOverrides,
flags: parsedFlags,
env: process.env,
});
const profile = resolveProfile(resolved.activeProfile.value, {
builtinProfiles,
userProfiles: resolved.userProfiles.value,
projectProfiles: resolved.projectProfiles.value,
});
Third, implement the provider resolver.
For example:
const providerResolution = await resolveProvider({
preference: profile.providerPreference,
availableProviders: providerRegistry.list(),
required: profile.providerPreference.requiredCapabilities,
diagnostics: true,
});
Fourth, implement the extension manifest loader.
Support only local directories first.
Support only manifest reading first.
Support only trust state first.
Do not jump straight into remote installation.
For example:
const extensions = await loadEnabledExtensions({
config: resolved.enabledExtensions.value,
trustStore,
cwd,
});
for (const extension of extensions) {
pluginHost.register(extension.manifest.contributes);
}
Fifth, implement the CLI event stream.
Emit events for all key steps.
For example:
events.emit({
type: "profile.resolved",
profile: profile.id,
source: resolved.activeProfile.source,
});
events.emit({
type: "provider.selected",
provider: providerResolution.selectedProvider,
model: providerResolution.selectedModel,
reason: providerResolution.reason,
});
Sixth, implement doctor.
It reuses the same resolver chain.
It simply does not start the agent loop.
This is important.
doctor should not have a separate configuration logic.
If doctor and run use two parsers, the worst case appears:
doctor says everything is fine.
run still fails.
So the minimal implementation's load-bearing chain should be:
args -> config resolver -> profile resolver -> extension loader -> provider resolver -> capability bootstrap -> event output -> agent runtime
And doctor only stops early on that same chain.
11. Productized CLI smells
The easiest part of this kind of system to break is not the model call.
It is the entry layer slowly growing bypasses.
Smell 1: profile is only a model alias
If the code profile is only:
model: some-code-model
Then it does not express permissions.
It does not express tool sets.
It does not express context sources.
It does not express output protocol.
That is not an Agent profile.
It is only a model alias.
Model aliases are useful.
But do not call one a profile.
Otherwise every production capability will get shoved into model config.
Provider configuration will become the new junk drawer.
Smell 2: project configuration can elevate user permissions
Project configuration should be able to tighten boundaries.
It should not silently loosen user boundaries.
If the user's global mode is read-only, project configuration must not automatically enable writes.
If organization policy disables a class of extensions, the project must not re-enable them.
Otherwise, when a user clones a repository, repository configuration can change Agent behavior.
That is very dangerous in an Agent CLI.
Smell 3: an extension enters the prompt automatically after installation
Installation only means files exist.
Enablement means contribution is allowed.
Visibility still goes through discovery.
Executability still goes through permission.
If all tools from an extension are handed to the model immediately after install, Capability Discovery has been bypassed.
That recreates the tool overload problem from Article 17.
Smell 4: provider fallback silently changes output semantics
Fallback may happen.
But it must be recorded.
It must also preserve output event semantics.
If tool.intent no longer appears after fallback, or if tool calls become provider raw chunks, the host loses its ability to parse the run.
Fallback should not turn the user experience into a different product.
Smell 5: JSON mode contains pretty logs
This is especially common in CLI products.
During development, someone writes this inside an extension or adapter:
console.log("starting provider...");
That is fine under TTY.
It is disastrous under JSONL.
Machine consumers read an invalid line.
So a productized CLI needs a unified event bus.
All output goes through renderers.
Smell 6: doctor and run do not share the resolver chain
If doctor is just a handwritten list of checks, it will quickly go stale.
A truly reliable doctor should call the same config/profile/provider/extension resolvers.
Then it renders the result as a diagnostic report.
Otherwise doctor is only a placebo.
12. How to test this layer
The testing focus for Article 22 is not model quality.
It is determinism at the entry layer.
First kind of test: profile resolution.
Given multiple configuration layers: default/user/project/flag.
When the user selects the code profile.
The system should resolve the correct policy, tool bundle, context source, provider preference, and output contract.
Second kind of test: configuration provenance.
When project config attempts to elevate permission from read-only to auto-edit.
The system should reject the elevation and explain the source in diagnostics.
Third kind of test: provider resolver.
When the primary provider lacks the tool-intent capability.
The system should choose a fallback that satisfies requirements.
If no fallback exists, it should fail at doctor time and not enter the loop.
Fourth kind of test: extension trust.
An installed but untrusted extension must not contribute capabilities.
A trusted extension may enter the catalog.
But its tools must still go through discovery and permission.
Fifth kind of test: JSONL output purity.
In --json mode, every stdout line is valid CliEvent JSON.
Human-friendly logs may go to stderr or the TTY renderer, but must not mix into JSONL.
Sixth kind of test: doctor and run consistency.
The resolved profile, provider resolution, and extension diagnostics used by doctor
must match the results used before run starts.
Seventh kind of test: stable semantics across providers for the same task.
Fake provider A and fake provider B return different raw formats.
Provider Runtime should normalize them into the same ModelEvent and ToolIntent.
CLI output events should keep the same schema.
These tests are less exciting than "the model fixed the tests."
But they are closer to the real risks of a productized CLI.
Users do not encounter complex reasoning failures every day.
They do encounter configuration, provider, extension, output protocol, and environment differences every day.
Stabilize those areas, and the Agent starts to feel like a product.
13. What this layer solves, and what it introduces
Let us wrap this article up.
Productized CLI does not solve "whether the Agent can think."
The previous articles have already addressed model, loop, tool, context, session, capability, and delegation.
This article solves:
How these runtime capabilities are exposed to real users and hosts through a stable product entry point.
It collapses scattered CLI flags into profile.
It collapses provider switching into resolver.
It collapses extension installation into manifest, trust, catalog, discovery, and permission.
It collapses project instructions into context source.
It collapses terminal output into event stream and renderer.
It collapses runtime environment problems into doctor.
It also introduces new complexity.
First, the configuration system itself becomes more complex.
So it needs provenance and diagnostics.
Second, profile may be abused as a grab bag.
So profile only expresses runtime identity; it does not execute capabilities.
Third, multi-provider introduces capability differences.
So provider capability must be expressed in internal semantics.
Fourth, extensions introduce trust problems.
So installation, enablement, visibility, and executability must stay separate.
Fifth, host/workbench needs a stable protocol.
So Runtime events and Renderer must stay separate.
This leads directly to the next article.
Once the CLI can run as a product entry point, the next step is not to keep piling capabilities into the local CLI.
The next step is to put the Harness into a more distant environment:
Sandbox, Cron, durable execution, and remote deployment.
That is Hosted Harness.
At that point, profile becomes the runtime identity of remote tasks.
Extension trust becomes a deployment boundary.
Provider resolver becomes a scheduling strategy.
Event stream becomes a remote observability protocol.
And this Productized CLI is the final local entry discipline before entering Hosted Harness.
Remember this article in one sentence:
A productized CLI does not merely wrap an Agent as a command. It turns the Agent's runtime identity, capability boundary, model preference, extension source, and output protocol into an explainable Harness entry point.
Teaching Harness Landing Point
When productizing the teaching project, add profiles and stable output before building a complex platform. A profile chooses provider, default tools, permission mode, and output renderer. CLI or API output should separate human-readable logs from machine-readable JSONL. The same Harness can then serve local interaction, CI smoke tests, and documentation.
GitHub source: 00-22-productized-cli-profile-extension.md