Spending limits for AI agents are not optional in production. An agent that can spend without a ceiling is a liability: it can run up costs through mistakes, misinterpretation, or manipulation, and there is no structural mechanism stopping it until the damage is done. The question is not whether to set limits — it's how to set them in a way that's actually enforced.
AgentCard uses a load-based model: a virtual debit card is funded with a specific amount at creation, and that amount is the hard ceiling. Not a soft cap enforced by application logic. Not a daily limit that resets. The card's funded balance is the maximum it can spend, full stop — enforced at the payment network level.
This post explains how the model works, how to create cards with specific limits, common patterns for structuring limits per task vs per session vs per agent, and how to monitor balances using the check_balance MCP tool.
Why spending limits are non-negotiable for production agents
The case for spending limits comes down to three things: mistakes, manipulation, and cost attribution.
Agents make mistakes
Language model agents misinterpret instructions. They occasionally take actions that are technically consistent with their instructions but wildly outside the intended scope. An agent told to "purchase the necessary licenses" might buy more than you expected. An agent trying to optimize cost might make a judgment call that differs from yours. Without a hard ceiling, these errors are unbounded.
Agents can be manipulated
Prompt injection — embedding adversarial instructions in content the agent reads — is a real threat for agents that browse the web, process emails, or handle external data. If a manipulated agent attempts an unauthorized purchase, a spending limit is the last line of defense. An agent with a $20 card cannot be manipulated into spending $2,000.
Cost attribution requires isolation
Without per-task or per-agent card isolation, all charges land on one account with no structural way to attribute them to specific workflows. You end up with a card statement and a guessing game. One card per task, with a label and a defined amount, gives you clean cost attribution with no extra tooling.
For a deeper look at the threat model, see Single-Use Virtual Cards for AI: How the Security Model Works. For a practical guide to giving your agent a budget without sharing your credit card, see How to Give Your AI Agent a Budget.
The AgentCard model: load-based limits
When you create a card with AgentCard, you specify an amount. That amount is what gets loaded onto the virtual debit card. The card can be charged up to that amount — across one transaction or multiple — and cannot be charged beyond it. Once the balance is zero, any further charge attempt declines.
This is different from a soft limit. A soft limit is a number stored in a database that application code checks before authorizing a charge. It can be bypassed if the application has a bug, if the check is skipped, or if the agent finds another path to payment. A load-based limit is enforced by the payment network. There is no application-level bypass.
The practical consequence: when you issue a card for $25, you have accepted a maximum of $25 in exposure for that card. No matter what happens — agent error, manipulation, code bug — the card cannot cause more than $25 in charges.
Creating a card with a specific limit
Cards are created with the agent-cards cards create command. The --amount flag sets the loaded amount in USD.
# A $10 card for a single research task
$ agent-cards cards create --amount 10
Card created: card_abc123
Amount: $10.00
Status: active
Card creation via MCP is also available for agents that need to self-provision cards within a defined budget. See How AI Agents Make Payments for the full MCP workflow, and the Claude MCP payments guide for Claude-specific integration details.
AI Agent Budget Control: Per-Agent, Per-Task, and Per-Session Patterns
There is no single right pattern for how to structure limits — it depends on how your agents are designed. Here are the three most common approaches.
Per-task limits
One card per discrete task. The card is created immediately before the task runs, funded with the budget for that task, and revoked when the task completes.
# $10 budget for a single research task
$ agent-cards cards create --amount 10
This is the most conservative pattern and the easiest to audit. Each task has a defined budget. You can see exactly what each task cost. If a task exceeds budget, the card declines and the task fails clearly rather than silently overspending.
Per-task limits work well when tasks are well-defined, run sequentially, and have predictable costs. They are slightly more overhead to manage if tasks are very short-lived and run in high volume, but the overhead is minimal — card creation is a single CLI command or MCP tool call.
Per-session limits
One card for a multi-step agent session. The card is created when the session starts, funded with the total session budget, and the agent can make multiple purchases across the session until the budget is exhausted or the session ends.
# $100 for a multi-step agent session
$ agent-cards cards create --amount 100
Per-session limits trade granularity for convenience. You don't need to predict the cost of each individual step — you set a session budget and let the agent work within it. The downside is that you lose per-step cost attribution unless you're reviewing the transaction history for each card (transactions are tracked automatically via webhooks — see AI Agent Expense Tracking).
Per-session limits work well when agent tasks involve unpredictable numbers of sub-purchases, or when the task sequence is not known in advance.
Per-agent limits
A persistent card for a specific agent identity, funded with a recurring budget. The card is topped up periodically (daily, weekly) rather than per task or per session.
This pattern is less common and carries higher risk — the card has a larger active balance at any given time, and a compromise or misbehavior can exhaust the full balance. It makes sense for high-frequency agents where creating a card per task would be operationally expensive, and where the agent's behavior is well-understood and monitored closely.
If you use per-agent limits, keep the funded amount low and top up frequently rather than loading a large amount infrequently. A card with $5 on it that gets topped up daily has a much smaller blast radius than a card with $150 loaded for the week.
What happens when the limit is hit
When an agent attempts a charge that would exceed the card's balance, the charge declines. The agent receives a payment failure response. It cannot proceed with that purchase.
From the agent's perspective, this is a hard stop. There is no way for the agent to override the decline, request additional funds from the card, or find another path through the same card. If the agent's task requires more budget than the card holds, the task fails at that point.
This is the intended behavior. A spending limit that can be exceeded by the agent is not a spending limit. The failure mode — task fails cleanly with a clear error — is far preferable to the alternative, which is uncapped spending with no structural stop.
Your agent should be designed to handle payment declines gracefully: log the failure, report it to the caller, and stop. It should not retry indefinitely, attempt to find alternative payment methods, or continue the task in a degraded state that obscures the budget problem.
Monitoring limits: the check_balance MCP tool
At any point during a task, the agent or the orchestrating system can check the remaining balance on a card using the check_balance MCP tool:
{
"name": "check_balance",
"description": "Check the current balance on a card",
"inputSchema": {
"type": "object",
"properties": {
"card_id": {
"type": "string",
"description": "The card ID to check"
}
},
"required": ["card_id"]
}
}
Or via the CLI:
$ agent-cards balance card_abc123
Remaining: $8.47 of $10.00
Balance checks are useful in several situations:
- Before a large purchase — the agent can check whether the remaining balance is sufficient before attempting a charge that will likely decline.
- Progress monitoring — an orchestrating system can poll balance at intervals to track how much of the budget has been consumed.
- Budget alerts — build a check into your agent loop that warns when balance drops below a threshold, so you can intervene before the limit is hit if needed.
- Post-task accounting — check balance after task completion to see what was actually spent vs allocated.
Using balance to make spending decisions
An agent that checks its balance before each purchase can make smarter decisions. If balance is low, it can prioritize the most important remaining purchases rather than running out of funds mid-task. For example:
const balance = await mcp.call("check_balance", { card_id: card.card_id });
if (balance.remaining < estimatedCost) {
// Report insufficient budget rather than attempting and failing
throw new Error(`Insufficient card balance: $${balance.remaining} remaining, $${estimatedCost} needed`);
}
This is cleaner than letting the charge fail — it gives the agent a chance to respond intelligently to budget constraints rather than hitting a hard error mid-purchase.
For more on the security implications of card lifecycle management and the full threat model, see Single-Use Virtual Cards for AI: How the Security Model Works. For a broader checklist of security practices to combine with spending limits, see the AI agent spending security checklist.
Summary
Spending limits for AI agents work when they are structural, not advisory. The AgentCard load-based model makes the card's funded amount the hard ceiling — no application logic required, no bypass possible. The pattern is:
- Decide on a limit for the task, session, or agent.
- Create a card with that exact amount using
agent-cards cards create --amount <n>. - Give the agent the card ID. It retrieves credentials when needed, uses the card, and the balance decrements automatically.
- Monitor balance with
check_balanceif the agent needs to make budget-aware decisions.
That's the complete pattern. No custom middleware, no limit-enforcement logic, no configuration that can be misconfigured. The limit is the amount you loaded, and it's enforced at the network level.