Security Monitoring
AgentLattice continuously monitors your AI agents for behavioral anomalies, automatically detects threats, and provides circuit breaker controls to halt misbehaving agents before they cause damage. This is the operational security layer that turns agent governance from a set of rules into an active defense system.
Behavioral Anomaly Detection
Every action an agent takes is analyzed against its behavioral baseline. During a calibration period, AgentLattice learns what normal behavior looks like for each agent: typical action types, frequency patterns, metadata characteristics, and timing. Once calibrated, deviations from normal trigger anomaly events.
What Gets Detected
| Anomaly Type | Description |
|---|---|
| Frequency spike | Action rate significantly exceeds the agent's baseline |
| Unusual action type | Agent attempts actions it rarely or never performs |
| Off-hours activity | Actions outside the agent's normal operating window |
| Bulk access patterns | Unusually large data reads or writes in a short period |
| Scope escalation | Agent attempts to access resources beyond its typical scope |
| Delegation anomalies | Unusual delegation patterns (excessive children, rapid creation/teardown) |
Each anomaly event includes a severity score (0-100), a threat taxonomy classification, and the raw data that triggered the detection.
Calibration
When a new agent is registered, it enters a calibration period during which AgentLattice builds its behavioral baseline. During calibration:
- All actions are still subject to policy enforcement (policies work immediately)
- Anomaly detection fires but with lower confidence scores
- The dashboard shows calibration progress as a percentage
- No automatic enforcement actions are taken
Calibration typically completes within 7 days of normal agent operation. You can manually mark calibration as complete from the dashboard if the agent has been running consistently.
Circuit Breaker
The circuit breaker is an automatic safety mechanism that halts agents when anomaly severity exceeds a threshold. Think of it like a fuse that trips before the wiring catches fire.
Circuit Breaker States
| State | Meaning | What Happens |
|---|---|---|
| Monitoring | Normal operation | Agent runs freely, anomalies are detected and logged |
| Halted | Anomaly threshold exceeded | Agent actions are rejected with 403 until manually resumed |
| Killed | Operator-initiated shutdown | Agent is permanently deactivated until explicitly re-registered |
When an agent is halted:
- All subsequent action requests return a
CIRCUIT_BREAKER_OPENerror - The halt event is recorded in the audit trail with the triggering anomaly
- Dashboard shows the agent in red with the anomaly details
- Configured webhook endpoints receive a notification
Resuming a Halted Agent
Only a human operator can resume a halted agent. This is deliberate -- automated recovery from a security event would defeat the purpose of the circuit breaker.
Via the dashboard: Navigate to the agent's detail page and click Resume. You must provide a justification that is recorded in the audit trail.
Via MCP: Ask your AI assistant to resume the agent with your reasoning:
"Resume deploy-bot. The anomaly was a scheduled batch job, not a security incident."
The assistant calls resume_agent with your justification. The resume event is hash-chained in the audit trail.
Incidents
When anomalies cluster or escalate, AgentLattice groups them into incidents. An incident represents a potential security event that requires investigation.
Incident Lifecycle
- Open — Anomalies detected, no operator action yet
- Investigating — An operator has acknowledged the incident
- Contained — The threat has been neutralized (agent halted, permissions revoked, etc.)
- Closed — Investigation complete, incident documented
Incident Triage via MCP
The MCP integration is particularly powerful for incident triage:
You: "Show me open incidents."
Claude calls
list_incidentsand summarizes each by severity, affected agent, and threat type.You: "The deploy-bot incident looks like a false positive from our batch migration. Close it."
Claude calls the appropriate tools to acknowledge anomalies and update the incident status.
Enforcement Actions
When an agent poses an active threat, operators can take enforcement actions:
| Action | Effect | Reversible? |
|---|---|---|
| Halt | Agent's actions rejected until resumed | Yes — resume with justification |
| Kill | Agent permanently deactivated | Re-registration required |
| Cascade revoke | All delegation chains from this agent terminated | Child agents become inactive |
Every enforcement action is recorded as a tamper-evident audit event, including the operator's identity, reasoning, and the evidence that prompted the action.
Dashboard Indicators
The dashboard provides at-a-glance security monitoring:
- Fleet health bar at the top shows green (all monitoring), yellow (calibrating agents), or red (halted/anomalous agents)
- Agent cards show circuit breaker state with color coding
- Anomaly timeline visualizes detection events over the past 24 hours
- Incident queue surfaces open incidents requiring attention
Webhook Notifications
Configure webhooks to receive real-time notifications for security events:
- Anomaly detected (with severity threshold filtering)
- Circuit breaker state changes (halted, resumed, killed)
- Incident created or escalated
- Enforcement actions taken
See Webhooks for setup instructions.
SDK Integration
Agents can query their own security state and subscribe to governance events programmatically.
Checking Circuit Breaker State
Use whoami() to check the agent's current circuit breaker state before performing sensitive operations:
from agentlattice import AgentLattice
al = AgentLattice(api_key=os.environ["AL_API_KEY"])
info = await al.whoami()
if info.cb_state == "HALT":
print("Agent is halted — skipping operation")
elif info.cb_state == "WARN":
print("Anomalies detected — proceeding with caution")
See the Python SDK reference for the full response shape.
Governance Posture Score
Use posture() to get a 0-100 governance health score for the workspace:
result = await al.posture()
print(f"Score: {result.score}/100")
for name, comp in result.components.items():
print(f" {name}: {comp.score}/{comp.max}")
The posture score factors in policy coverage, audit chain integrity, anomaly rates, and approval response times. See the Python SDK reference for component details.
Realtime Anomaly Subscriptions
Subscribe to governance events via WebSocket to react to anomalies in real time. Requires pip install agentlattice[realtime].
from agentlattice import AgentLattice, GovernanceEvent
al = AgentLattice(api_key=os.environ["AL_API_KEY"])
def on_anomaly(event: GovernanceEvent):
if event.event_type == "anomaly.detected":
print(f"Anomaly on agent {event.agent_id}: {event.data}")
await al.subscribe("org-id", on_anomaly, events=["anomaly.detected", "enforcement.triggered"])
See the Python SDK reference for all event types and filtering options.
Best Practices
- Do not disable anomaly detection for convenience. False positives should be acknowledged and fed back as training data, not silenced by turning off monitoring.
- Set up webhook alerts for halted agents. You want to know immediately when a circuit breaker trips, especially outside business hours.
- Review the anomaly timeline weekly. Even low-severity anomalies can reveal drift in agent behavior that may indicate a misconfiguration or a slow-moving threat.
- Document incident resolutions. When closing an incident, include what happened, why, and what changed. This builds organizational knowledge for future triage.