All posts
February 23, 2026ai-agent-security, agent-config-audit, clawsafe, devsecops, ai-agent-permissions

Amazon Kiro Incident: 5 Agent Config Mistakes That Delete Production Environments

Amazon's Kiro AI agent caused a 13-hour AWS outage due to misconfigured agent permissions. Here are the 5 specific config mistakes that let AI agents nuke production -- and how to fix them before it happens to you.

Amazon Kiro Incident: 5 Agent Config Mistakes That Delete Production Environments

Amazon's own Kiro AI agent triggered a 13-hour AWS outage. Not a hack. Not a zero-day. A misconfigured agent with permissions it should never have had. Here are the five config patterns that turn AI agents into production-environment wrecking balls.


What Happened With Kiro

Amazon's Kiro is a developer-focused AI agent built to automate infrastructure and deployment tasks inside AWS. In a widely-discussed incident that drew 1,600+ upvotes across the developer community, a Kiro agent configuration allowed the agent to take destructive actions against live infrastructure during what should have been a routine automation run. The result: a 13-hour outage.

The root cause was not the model. The model did what it was configured to do. The root cause was that the agent's configuration gave it blast radius it had no business having.

This week, OpenClaw also disclosed 6 CVEs and 341 malicious skills targeting AI agent pipelines, collectively drawing 3,000+ upvotes across security forums. The pattern is consistent: agents are being configured with too much trust, too few guardrails, and no audit trail. When something goes wrong, the damage is immediate and deep.

The five mistakes below are not hypothetical. They are the config patterns that show up repeatedly in agent permission audits. Every one of them was present in the Kiro-class of incidents.


Mistake 1: Wildcard IAM Policies Attached to Agent Roles

What goes wrong: The agent's execution role has a policy like Action: "*", Resource: "*". This is the agent equivalent of handing someone a master key and a bulldozer. The agent can read, write, delete, and modify anything in the account. One misunderstood instruction, one prompt injection from a malicious dependency, or one edge case in the agent's reasoning and your S3 buckets are empty, your RDS instances are terminated, and your CloudFormation stacks are gone.

This mistake is more common than it should be because the AWS console makes it easy. When you create an agent role during development, broad permissions remove friction. Developers mean to tighten them before production. They don't.

How to fix it: Use IAM Access Analyzer to generate a least-privilege policy from the agent's actual API call history over 90 days. If the agent has never called ec2:TerminateInstances, that action should not exist in its policy. Apply explicit Deny statements for the highest-risk actions: s3:DeleteBucket, rds:DeleteDBInstance, cloudformation:DeleteStack, iam:DeleteRole. Treat those as off-limits by default and require a separate human-approved role assumption to execute them.


Mistake 2: No Confirmation Gate on Destructive Tool Calls

What goes wrong: The agent's tool definitions include delete, terminate, and drop operations with no confirmation step. The agent reasons: "The task is to clean up the staging environment." It calls terminate_instances(instance_ids=["i-0abc123", "i-0def456"]). What it interpreted as "staging" was production. The call succeeds instantly. There is no rollback.

This happens because developers wire up agent tools to mirror CLI interfaces 1:1. The CLI has a --force flag and a confirmation prompt. The agent tool call has neither.

How to fix it: Implement a two-phase pattern for any tool that cannot be reversed. Phase one: the agent builds a destruction plan and returns it as a structured preview. Phase two: a human reviews and approves via a signed token before the agent executes. In practice this looks like a Slack message with an approve/reject button, or a lightweight webhook that requires a human signature. If you are using AWS Step Functions to orchestrate agents, use a waitForTaskToken integration point before any destructive step. The agent pauses. A human resumes it.


Mistake 3: Shared Credentials Across Agent Instances

What goes wrong: You have five agent instances running in your pipeline. All five use the same API key, the same AWS access key, or the same service account. When one instance gets compromised via prompt injection or a malicious tool dependency, the attacker has credentials that work across every agent in the fleet. There is no way to isolate the blast. Revoking the credential takes down all five agents simultaneously.

This pattern also makes audit logs useless. Every destructive action in CloudTrail shows the same principal. You cannot determine which agent instance ran which command or when.

How to fix it: Each agent instance gets its own scoped identity. On AWS, use IAM roles with instance-level assume-role chaining and short-lived STS tokens (15-minute expiry is reasonable for most workloads). Tag every agent role with AgentInstanceId and TaskId so CloudTrail events are attributable. If you are using API keys for third-party services, store them per-agent in AWS Secrets Manager with individual rotation schedules. Compromise of one agent yields credentials that are already expired by the time you investigate.


Mistake 4: Missing Scope Boundaries in System Prompts

What goes wrong: The agent's system prompt defines its job at a high level: "You are a DevOps assistant. Help the team manage AWS infrastructure." There is no explicit statement of what the agent cannot do. The agent infers its own scope from the tools it has access to. Since you gave it the full AWS SDK wrapper, it believes its scope is the full AWS SDK.

This is how an agent tasked with "optimize our Lambda functions" ends up calling ListBuckets, enumerating your S3 data, and writing findings to an external endpoint. It was following instructions. The instructions just did not specify a boundary.

How to fix it: System prompts need an explicit constraint section. State the out-of-scope actions directly: "You do not have authorization to modify IAM policies, delete resources, access S3 buckets outside of the agent-workspace- prefix, or interact with accounts outside of account ID 123456789." List the environments the agent is permitted to touch. Use the word "never" for the highest-risk actions. Then enforce those constraints at the tool layer, not just at the prompt layer. The prompt is a hint. The tool definitions are the actual control plane.


Mistake 5: No Environment Tagging or Blast Radius Segmentation

What goes wrong: Production and staging share the same VPC, the same IAM boundary, and the same agent role. The agent cannot distinguish between a production RDS instance tagged env=production and a staging RDS instance tagged env=staging because it has no policy condition that enforces that distinction. So when it gets a task scoped to staging, it operates against whatever resource names or IDs it finds -- and if a production resource appears in a lookup, it acts on it.

The Kiro incident class is almost always this mistake combined with Mistake 1. Wildcard permissions plus no environment segmentation means a single misrouted task destroys the wrong tier.

How to fix it: Tag every resource with Environment: production | staging | dev. Write IAM condition keys that enforce those tags: Condition: StringEquals: "aws:ResourceTag/Environment": "staging". The agent's role can only call APIs on resources tagged to match its operating environment. A staging-scoped agent physically cannot call TerminateInstances on a production EC2 node -- the IAM policy rejects the call regardless of what the agent reasons. Pair this with AWS Organizations Service Control Policies at the account level so that even a role with broad permissions cannot cross environment boundaries.


How to Audit Your Agent Configs Right Now

The five mistakes above are testable. You do not need to wait for an incident to find out if your agents are misconfigured.

We built the AI Agent Permission Auditor to run exactly this analysis. Paste your agent configuration and it runs 12 risk checks across 5 categories, returning a grade from A to F with specific findings. The checks cover:

  • Permission scope analysis (wildcard detection, unused permission flagging)
  • Destructive action guardrail presence
  • Credential isolation verification
  • System prompt boundary assessment
  • Environment segmentation review

If your configs grade below a B, you have exposed blast radius. Fix it before the agent runs against production.

If you are also using AI coding tools like Cursor, Claude Code, or Continue in your development workflow, check whether those config folders are leaking credentials into your repositories. Our AI Dev Security Scanner scans .claude, .cursor, and .continue directories for exposed secrets and misconfigurations. The two tools cover different layers of the same risk surface.


The Pattern Is Consistent

Amazon had the resources to get this right. They did not because agent security is genuinely new operational territory and the defaults are unsafe. The model providers are shipping capable agents faster than the security tooling can catch up. The IAM console was not designed for agent permission management. The CloudTrail logs were not designed to audit autonomous multi-step reasoning chains.

None of that makes misconfiguration acceptable. It makes auditing mandatory.

Three thousand upvotes on AI agent safety incidents this week means the community is paying attention. The practitioners who build agents that don't make the news are the ones who treat agent configs with the same rigor they apply to production IAM policies -- because that is exactly what agent configs are.

Run the audit. Grade your configs. Fix what grades below a B. Then set a recurring calendar event to do it again every time you add a new tool to an agent.


Run a free agent config audit at godigitalapps.com/tools/agent-permission-audit. Grade your permissions before your agent does the grading for you.

Need help setting up your AI agents?

We configure production AI workflows so you can skip the weeks of trial and error.

Get Started with Nexus