How We Validate AI-Generated Terraform with a 6-Layer Engine
When TrustFix detects an OIDC trust policy misconfiguration in your AWS IAM roles, it generates a Terraform fix and opens a pull request. The fix is generated by an AI model (Claude) that understands IAM policy semantics.
But here's the question every security engineer should ask: how do you know the AI-generated fix is correct? How do you know it doesn't widen permissions instead of narrowing them? How do you know it doesn't break existing workloads?
This is the problem we spent the most engineering time on. The answer is PIE — our Policy Intelligence Engine — a 6-layer validation system that every generated fix passes through before a PR is opened.
Why validation matters more than generation
Generating Terraform HCL for an IAM trust policy condition block is straightforward. The AI sees the current policy, understands the misconfiguration, and produces the corrected version.
The hard part is knowing whether the correction is safe. IAM trust policies are combinatorial — a single condition change can have cascading effects on which identities can assume the role. A fix that looks correct in isolation might widen access in a way that's not immediately obvious.
Our approach: treat every generated fix as untrusted code and validate it through multiple independent verification layers before any human sees it.
The 6 layers
Layer 1: Structural Validation
Does the generated Terraform parse? Does it have the required resource blocks? Are the condition operators valid? Is the HCL syntactically correct?
This catches obvious generation failures — malformed JSON, missing closing braces, invalid condition operator names. It runs in milliseconds and rejects about 3% of initial generations.
Layer 2: Semantic Contracts
This is the core of PIE. We maintain 197 test assertions that encode our domain knowledge about OIDC trust policies. Each finding type has specific contracts that a valid fix must satisfy.
Examples:
- A fix for
OIDC_MISSING_SUB_CONDITIONmust add aStringEqualscondition on thesubkey - A fix for
OIDC_OVERLY_BROAD_TRUSTmust replaceStringLikewithStringEqualsor narrow the wildcard pattern - No fix may ever add
*as a condition value - No fix may ever remove an existing restrictive condition
- No fix may add
sts:AssumeRole(onlysts:AssumeRoleWithWebIdentityfor OIDC)
These contracts are deterministic and testable. Every code change to PIE runs all 197 assertions. A single contract violation blocks the fix.
Layer 3: Permission Delta Analysis
This layer compares the original trust policy against the proposed fix and classifies the change as one of four categories:
- NARROWED — The fix reduces the set of identities that can assume the role. This is the expected outcome for security fixes.
- WIDENED — The fix increases the set of identities. This is automatically blocked.
- SHIFTED — The fix changes which identities have access without clearly narrowing. Flagged for review.
- INDETERMINATE — The delta can't be definitively classified. Flagged for review.
Only NARROWED fixes proceed automatically. Everything else requires additional validation or is blocked entirely.
Layer 4: Multi-Model Adversarial Review
A second AI model (GPT-4o at temperature 0.0) reviews the fix generated by the first model (Claude). The reviewer doesn't see the original prompt — it sees only the finding description and the proposed Terraform, then evaluates it across 7 reasoning categories:
- Does the fix address the stated misconfiguration?
- Does it introduce any new security issues?
- Are the condition values correct for the specific OIDC provider?
- Is the condition operator appropriate (StringEquals vs StringLike)?
- Does the fix maintain existing legitimate access patterns?
- Are there any edge cases the fix doesn't handle?
- Overall: should this fix be applied?
The adversarial reviewer assigns a confidence score. If it identifies a flaw, the fix is blocked and regenerated with the reviewer's feedback.
This layer is available on the Team tier ($799/mo) because the dual-model cost is higher, but it catches subtle issues that deterministic checks miss — particularly around condition value formatting and edge cases in the OIDC sub claim format.
Layer 5: Confidence Scoring
All layer results are aggregated into a single 0-100 Confidence Score using severity-dependent thresholds:
- CRITICAL findings require a score of 70+ to generate a PR
- HIGH findings require 55+
- MEDIUM findings require 40+
- LOW findings require 30+
The score and per-layer breakdown appear in every PR description, so the human reviewer knows exactly how confident the system is.
Layer 6: Blast Radius Preview
Before the PR is opened, the reviewer can see a blast radius preview: which existing workflows would be affected by the policy change, whether any current legitimate access patterns would break, and the full diff between the original and proposed policy.
The result
After all six layers, the PR that reaches your repository is:
- Syntactically valid Terraform
- Semantically correct for the specific misconfiguration type
- Verified to NARROW permissions (never widen)
- Reviewed by an independent AI model for edge cases
- Scored with a transparent Confidence Score
- Accompanied by a blast radius analysis
The human reviewer's job is to verify that the fix makes sense in their specific context — not to validate that the generated Terraform is correct. PIE handles the correctness verification.
Trade-offs
PIE adds 2-5 seconds of latency to fix generation. We considered this acceptable: a few seconds of automated validation is faster than a human reviewer catching a bad fix in a PR.
The adversarial review layer (Layer 4) adds cost — roughly $0.02 per review in API charges. This is why it's on the Team tier rather than Pro.
The semantic contracts (Layer 2) require manual maintenance as we add new detection types. Each new finding type needs its own set of contracts. This is engineering work, but it's the kind of work that compounds — every contract we write catches errors forever.
Open questions
We're still iterating on PIE. Some open questions:
- Should the Confidence Score threshold be configurable per organization?
- Would formal verification (Z3 SMT solver) add meaningful confidence beyond the current layers?
- How should we handle fixes that are NARROWED but might break CI/CD pipelines that depend on the overly-broad policy?
If you have thoughts on any of these, I'd love to hear them: security@trustfix.dev.
Try TrustFix at trustfix.dev — free tier includes PIE Layers 1-3 on every generated fix.
PIE is proprietary to TrustFix. The detection rules and fix templates that feed into PIE are documented in our open-source oidc-audit CLI.
Subscribe to our newsletter
Get OIDC security research and AWS IAM insights delivered to your inbox. 2-3 posts per month.
