string. The legacy local's value was a map(object({ host = string, weight = number })). merge() resolves collisions by taking the last argument's value, but the argument order in this case depended on which input was non-empty at apply time, and the new feature flag was only non-empty in staging.
sequenceDiagram
participant Op as Operator
participant Plan as terraform plan
participant Apply as terraform apply
participant Child as child module
Op->>Plan: feature_overrides (map(any))
Plan->>Child: merge(map(any), map(any))
Child-->>Plan: any (type-check deferred)
Plan-->>Op: 0 to add, 0 to change (PASS)
Op->>Apply: same input
Apply->>Child: merge resolved to concrete value
Child-->>Apply: Error: Unsupported argument
Apply-->>Op: FAIL at 4:42pm
How map(any) defers type-checking past plan and surfaces it at apply
Diagram renders at the canonical version.
The collision had been latent for three weeks. plan succeeded because terraform's planner walked the call graph with both maps' element types collapsed to any. The merged value passed type-check as any, which type-checks against anything. apply, which actually constructs the resource, evaluated the merged value against the receiving attribute's concrete type signature and discovered the value was a string where an object was required.
That is the part that hurts. Terraform's any type defers all type-checking until apply. Every map(any) input on a root module is a future apply-time failure waiting on a contributor who does not know the implicit shape.
Three options, one open release window, seven minutes to pick
What we did before running apply again
We had three options and one open release window. I walked the on-call lead through them on the bridge call.
-
1. Delete the legacy key, Fastest. Also the riskiest: the legacy
read_replica_routing key was referenced by three modules-of-modules three layers down. Deleting it would have moved the failure from staging to production an hour later.
-
2. Rename the new key, Safe-feeling. Left the underlying
any-typed contract intact. Two months later a different contributor would add another map(any) input and we would be back on a Friday afternoon with the same shape of failure.
-
3. Rename plus add validation, Slower. Renamed the new key to
feature_routing_overrides AND added a validation block on the input that explicitly rejected the colliding shape at plan time going forward. Stopped the immediate reoccurrence.
Option three carried the day. The rename took seven minutes. The validation block took twelve. apply succeeded at 5:14pm with sixteen minutes to spare on the release window. The release shipped on time.
The audit work behind option one (the one we did NOT take) is what stuck with me. The next morning, we grep-ed the entire terraform/ tree for read_replica_routing to map every consumer. Seven references across four modules. Three in modules/services/database/locals.tf itself. One in modules/monitoring/cloudwatch.tf. One in modules/services/cache/lookups.tf, which read the value to construct its own routing decision and would have broken silently if we had deleted the legacy key the night before. The remaining two were in a state-recovery helper module the team had forgotten existed. We had nearly fired the second shot of our own foot.
We left a tombstone comment on the legacy key and an open PR that would, the following week, replace its map(any) type with a proper object({ ... }) schema. That work landed five days later. The downstream consumers caught the change at plan time, and three of them needed minor patches before the type tightening could merge. None of those patches would have caught the original collision. They all caught real existing bugs the any type had been hiding.
Two policy changes and one structural fix
What we changed afterwards
Two policy changes came out of that night, and one structural fix took longer.
The first policy: no new map(any) or any-typed inputs on root modules. The team's terraform/ directory has a pre-commit hook (8 lines of grep) that fails the commit if any new variable block contains type = any or type = map(any). Existing instances are grandfathered, with a TODO list tracked against each module. Three of the original 14 have been converted to typed objects so far. The hook has fired four times in the six weeks since.
The second policy: every PR runs terraform plan against every environment, not just the one the contributor cares about. A matrix job in CI runs plan -var-file=envs/<env>.tfvars across all four environments and fails the PR if any of them errors. This would not have caught the original collision (plan succeeded everywhere), but it catches a different class of failure where one environment's tfvars hits an unwritten code path.
# Before: latent any-typed input
variable "feature_overrides" {
type = map(any)
default = {}
description = "Per-environment feature flag overrides"
}
# In modules/services/database/locals.tf
locals {
merged_flags = merge(
local.legacy_db_flags,
var.feature_overrides,
)
}
# Above passes plan even when the two maps have a key
# whose value types disagree. The mismatch surfaces only
# at apply, when the receiving attribute is evaluated.
# After: typed, explicit, errors at plan time
variable "feature_overrides" {
type = map(object({
enabled = bool
rollout_pct = optional(number, 0)
routing = optional(string, "default")
}))
default = {}
description = "Per-environment feature flag overrides"
validation {
condition = alltrue([
for k, v in var.feature_overrides :
v.rollout_pct >= 0 && v.rollout_pct <= 100
])
error_message = "rollout_pct must be between 0 and 100."
}
}
The same variable, before and after. The lower form fails plan, not apply, when a contributor passes the wrong shape.
The structural fix took longer. A 28-input root module is not a configuration problem, it is a service-boundary problem. The team running the database stack should own a database/ root module with four inputs, not a 14-input subtree of a shared 28-input root. We split the original root into three roots along ownership boundaries (network, services, observability) using a thin terragrunt overlay for the cross-cutting variables. The split took six weeks of careful state-mv work to land without downtime. We have written more on the structural fix in the Terraform and IaC debt playbook, which covers when a shared root module starts costing more than the consistency it buys.
What we tell every team now: strong types in Terraform are not bureaucracy, they are the documentation. The half-day cost to write object({ name = string, enabled = bool, ... }) instead of map(any) buys you a plan-time failure instead of an apply-time failure, and apply-time failures land at 4:42pm on Fridays. We have stopped accepting map(any) inputs in any client engagement that involves an IaC audit, and we have not had a single contributor push back once they saw the cost.
If you are looking at a 28-input root with map(any) sprinkled through it
When your own root module is past 20 inputs
If you are reading this and your terraform/ directory has a root module past 20 inputs with several map(any) types in the input list, the failure you are heading toward is not a surprise. It is a scheduled event. The trigger will be a new contributor who does not know the implicit contract, plus one bad-enough Friday. The hardest part of cleaning it up is not the typing work itself; it is the audit of downstream consumers that have been silently depending on the loose contract for years. Two layers of modules-of-modules can hide a reference that breaks the moment you tighten the type, and your CI will not warn you because plan will keep passing right up to the apply that surfaces it.
We run these recovery and audit engagements every week. The map(any) collision pattern is the third-most-common shape we see in seed-to-Series-B SaaS Terraform repos, right after stale state lock holders and provider-version-drift cascades. It is one variant of the broader terraform apply fear problem we engage on most weeks. On a typical engagement we map every any-typed input in your root modules within the first day, prioritize them by blast radius, and either convert them in-place or split the root if the input count is the real problem. If you are looking at a Terraform root with map(any) sprinkled through it and a release window that does not forgive a 4pm apply failure, book an infrastructure review with our team and we will start with a 30-minute diagnostic call this week.
Originally published at https://infraforge.agency/insights/terraform-apply-fails-map-any-trap/.
If your team is dealing with similar infrastructure debt, we offer infrastructure reviews and recovery engagements β see /review.