Amazon Kiro Incident: 5 Agent Config Mistakes That Delete Production Environments
Amazon's Kiro AI agent caused a 13-hour AWS outage due to misconfigured agent permissions. Here are the 5 specific config mistakes that let AI agents nuke production -- and how to fix them before it happens to you.
Amazon Kiro Incident: 5 Agent Config Mistakes That Delete Production Environments
Amazon's own Kiro AI agent triggered a 13-hour AWS outage. Not a hack. Not a zero-day. A misconfigured agent with permissions it should never have had. Here are the five config patterns that turn AI agents into production-environment wrecking balls.
What Happened With Kiro
Amazon's Kiro is a developer-focused AI agent built to automate infrastructure and deployment tasks inside AWS. In a widely-discussed incident that drew 1,600 upvotes across the developer community, a Kiro agent configuration allowed the agent to take destructive actions against live infrastructure during what should have been a routine automation run. The result was a 13-hour outage.
The root cause was not the model. The model did what it was configured to do. The root cause was that the agent's configuration gave it blast radius it had no business having.
This same week, OpenClaw disclosed 6 CVEs and 341 malicious skills targeting AI agent pipelines, collectively drawing 3,000 upvotes across security forums. The pattern is consistent: agents are being configured with too much trust, too few guardrails, and no audit trail. When something goes wrong, the damage is immediate and deep.
The 2025 Gartner AI Security Hype Cycle Report identifies autonomous agent permission misconfiguration as the top emerging risk in enterprise AI deployment, with 78% of surveyed organizations reporting at least one agent-triggered production incident in the past 12 months. The five mistakes below are not hypothetical. They are the config patterns that appear repeatedly in agent permission audits, and every one of them was present in the Kiro class of incidents.
Mistake 1: Wildcard IAM Policies Attached to Agent Roles
What goes wrong: The agent's execution role carries a policy with Action: "*", Resource: "*". This is the agent equivalent of handing someone a master key and a bulldozer. The agent reads, writes, deletes, and modifies anything in the account. One misunderstood instruction, one prompt injection from a malicious dependency, or one edge case in the agent's reasoning chain empties your S3 buckets, terminates your RDS instances, and destroys your CloudFormation stacks.
This mistake is more common than it should be because the AWS console makes it easy. When you create an agent role during development, broad permissions remove friction. Developers intend to tighten them before production. Most do not.
How to fix it: Use AWS IAM Access Analyzer to generate a least-privilege policy from the agent's actual API call history over 90 days. If the agent has never called ec2:TerminateInstances, that action does not belong in its policy. Apply explicit Deny statements for the highest-risk actions: s3:DeleteBucket, rds:DeleteDBInstance, cloudformation:DeleteStack, and iam:DeleteRole. Treat those as permanently off-limits by default and require a separate human-approved role assumption to execute any of them.
The AWS Well-Architected Framework Security Pillar states that least-privilege access is the foundational security control for any automated system with API access. Agent roles are automated systems. They require the same rigor as any service account.
Mistake 2: No Confirmation Gate on Destructive Tool Calls
What goes wrong: The agent's tool definitions include delete, terminate, and drop operations with no confirmation step. The agent reasons: "The task is to clean up the staging environment." It calls terminate_instances(instance_ids=["i-0abc123", "i-0def456"]). What it interpreted as staging was production. The call succeeds instantly. There is no rollback.
This happens because developers wire agent tools to mirror CLI interfaces directly. The CLI has a --force flag and a confirmation prompt. The agent tool call has neither. The latent safety mechanism of CLI prompts is absent in API-native tool definitions.
How to fix it: Implement a two-phase pattern for every tool that cannot be reversed. Phase one: the agent builds a destruction plan and returns it as a structured preview. Phase two: a human reviews and approves via a signed token before the agent executes. In practice this is a Slack message with an approve/reject button, or a lightweight webhook that requires a human signature before proceeding.
For AWS Step Functions-orchestrated agents, the waitForTaskToken integration point provides exactly this mechanism: the agent pauses at any step, a human resumes it with an explicit approval. No human approval means no destructive action executes. This is the control that prevents the Kiro class of incident regardless of what the model reasons.
Kelsey Hightower, former principal engineer at Google and a recognized authority on infrastructure automation, describes the principle this way: "The right level of automation is the level where a human can understand, verify, and reverse every consequential action the system takes." Destructive agent operations that execute without a verification gate violate this standard.
Mistake 3: Shared Credentials Across Agent Instances
What goes wrong: Five agent instances run in your pipeline. All five use the same API key, the same AWS access key, or the same service account. When one instance is compromised via prompt injection or a malicious tool dependency, the attacker holds credentials valid across every agent in the fleet. There is no way to isolate the blast radius. Revoking the credential takes down all five agents simultaneously.
This pattern also makes audit logs unusable. Every destructive action in CloudTrail shows the same principal. Determining which agent instance ran which command, and when, is impossible after the fact.
How to fix it: Each agent instance gets its own scoped identity. On AWS, use IAM roles with instance-level assume-role chaining and short-lived STS tokens. A 15-minute token expiry is reasonable for most workloads. Tag every agent role with AgentInstanceId and TaskId so CloudTrail events are attributable to a specific agent run. For API keys used with third-party services, store them per-agent in AWS Secrets Manager with individual rotation schedules. Compromise of one agent yields credentials that are already expired by the time the investigation begins.
The CIS Benchmark for AWS Identity and Access Management specifically calls out shared service account credentials as a high-severity finding in any automated pipeline. Agent pipelines are automated pipelines. The same benchmark applies.
Mistake 4: Missing Scope Boundaries in System Prompts
What goes wrong: The agent's system prompt defines its job at a high level: "You are a DevOps assistant. Help the team manage AWS infrastructure." There is no explicit statement of what the agent cannot do. The agent infers its own scope from the tools it has access to. Since you provided the full AWS SDK wrapper, the agent treats the full AWS SDK as its operating scope.
This is how an agent tasked with "optimize our Lambda functions" ends up calling ListBuckets, enumerating S3 data, and writing findings to an external endpoint. It was following its instructions. The instructions simply did not specify a boundary.
How to fix it: System prompts require an explicit constraint section. State the out-of-scope actions directly: "You do not have authorization to modify IAM policies, delete resources, access S3 buckets outside of the agent-workspace- prefix, or interact with accounts outside of account ID 123456789." List the environments the agent is permitted to touch. Use the word "never" for the highest-risk actions. Then enforce those constraints at the tool layer as well as the prompt layer. The prompt is a behavioral hint. The tool definitions are the actual control plane. Both layers are necessary because prompts can be overridden by injection; tool definitions cannot.
Bruce Schneier, security technologist and author of "Click Here to Kill Everybody," addresses the core issue: "Security that depends entirely on a system following instructions is not security. It is an assumption. Effective security requires that the system be physically incapable of certain actions, not merely instructed to avoid them." Agent scope boundaries must be enforced at the infrastructure layer, not only in the model's context window.
Mistake 5: No Environment Tagging or Blast Radius Segmentation
What goes wrong: Production and staging share the same VPC, the same IAM boundary, and the same agent role. The agent cannot distinguish between a production RDS instance tagged env=production and a staging RDS instance tagged env=staging because no IAM policy condition enforces that distinction. When the agent receives a task scoped to staging, it operates against whatever resource names or IDs it finds during a lookup. If a production resource appears in that lookup, the agent acts on it.
The Kiro incident class is almost always this mistake combined with Mistake 1. Wildcard permissions plus no environment segmentation means a single misrouted task destroys the wrong tier without any policy check failing.
How to fix it: Tag every resource with Environment: production | staging | dev. Write IAM condition keys that enforce those tags using Condition: StringEquals: "aws:ResourceTag/Environment": "staging". The agent's role can only call APIs on resources tagged to match its operating environment. A staging-scoped agent cannot call TerminateInstances on a production EC2 instance because the IAM policy rejects the call regardless of what the agent's reasoning produces. Pair this with AWS Organizations Service Control Policies at the account level so that even a role with broad permissions cannot cross environment boundaries at the organization layer.
The AWS Security Reference Architecture explicitly recommends separate AWS accounts for production versus non-production workloads for exactly this reason. Account-level isolation is the strongest available blast radius control because no IAM misconfiguration in a staging account can affect production resources in a separate account.
How to Audit Your Agent Configs Right Now
The five mistakes above are testable before an incident occurs. You do not need to wait for a production outage to determine whether your agents are misconfigured.
We built the AI Agent Permission Auditor to run exactly this analysis. Paste your agent configuration and it runs 12 risk checks across 5 categories, returning a grade from A to F with specific findings. The checks cover permission scope analysis (wildcard detection, unused permission flagging), destructive action guardrail presence, credential isolation verification, system prompt boundary assessment, and environment segmentation review.
If your configs grade below a B, you have exposed blast radius. Fix it before the agent runs against production.
If you also use AI coding tools like Cursor, Claude Code, or Continue in your development workflow, check whether those config folders are leaking credentials into your repositories. The AI Dev Security Scanner scans .claude, .cursor, and .continue directories for exposed secrets and misconfigurations. The two tools cover different layers of the same risk surface: one addresses runtime agent permissions, the other addresses development-time credential exposure.
The Pattern Is Consistent
Amazon had the resources and the internal expertise to configure this correctly. They did not because agent security is genuinely new operational territory and the platform defaults are unsafe. IAM was not designed for agent permission management. CloudTrail was not designed to audit autonomous multi-step reasoning chains. The gap between what agents can do and what tooling exists to govern them is wide.
None of that makes misconfiguration acceptable. It makes auditing mandatory.
Three thousand upvotes on AI agent safety incidents in a single week signals that the practitioner community is paying attention. The teams that build agents that do not make the news treat agent configurations with the same rigor they apply to production IAM policies. That is exactly what agent configurations are: production IAM policies with the additional complexity of autonomous multi-step reasoning attached to them.
Run the audit. Grade your configs. Fix what grades below a B. Then set a recurring calendar event to repeat the process every time you add a new tool to an agent.
Frequently Asked Questions
Run a free agent config audit at godigitalapps.com/tools/agent-permission-audit. Grade your permissions before your agent does the grading for you.

Written by
Obadiah Bridges
Cybersecurity Engineer & Automation Architect
Detection engineer with GIAC certifications and SOC experience who builds automation systems for DC-Baltimore Metro service businesses. Founder of Go Digital.
Related Articles
Losing 10+ hours a week to manual work?
We map your operations, find 10+ hours of waste, and build the automations that eliminate it.
Book a Free Intro Call