Navigate docs

Security Monitoring

AgentLattice continuously monitors your AI agents for behavioral anomalies, automatically detects threats, and provides circuit breaker controls to halt misbehaving agents before they cause damage. This is the operational security layer that turns agent governance from a set of rules into an active defense system.

Behavioral Anomaly Detection

Every action an agent takes is analyzed against its behavioral baseline. During a calibration period, AgentLattice learns what normal behavior looks like for each agent: typical action types, frequency patterns, metadata characteristics, and timing. Once calibrated, deviations from normal trigger anomaly events.

What Gets Detected

Anomaly Type Description
Frequency spike Action rate significantly exceeds the agent's baseline
Unusual action type Agent attempts actions it rarely or never performs
Off-hours activity Actions outside the agent's normal operating window
Bulk access patterns Unusually large data reads or writes in a short period
Scope escalation Agent attempts to access resources beyond its typical scope
Delegation anomalies Unusual delegation patterns (excessive children, rapid creation/teardown)

Each anomaly event includes a severity score (0-100), a threat taxonomy classification, and the raw data that triggered the detection.

Calibration

When a new agent is registered, it enters a calibration period during which AgentLattice builds its behavioral baseline. During calibration:

  • All actions are still subject to policy enforcement (policies work immediately)
  • Anomaly detection fires but with lower confidence scores
  • The dashboard shows calibration progress as a percentage
  • No automatic enforcement actions are taken

Calibration typically completes within 7 days of normal agent operation. You can manually mark calibration as complete from the dashboard if the agent has been running consistently.

Circuit Breaker

The circuit breaker is an automatic safety mechanism that halts agents when anomaly severity exceeds a threshold. Think of it like a fuse that trips before the wiring catches fire.

Circuit Breaker States

State Meaning What Happens
Monitoring Normal operation Agent runs freely, anomalies are detected and logged
Halted Anomaly threshold exceeded Agent actions are rejected with 403 until manually resumed
Killed Operator-initiated shutdown Agent is permanently deactivated until explicitly re-registered

When an agent is halted:

  • All subsequent action requests return a CIRCUIT_BREAKER_OPEN error
  • The halt event is recorded in the audit trail with the triggering anomaly
  • Dashboard shows the agent in red with the anomaly details
  • Configured webhook endpoints receive a notification

Resuming a Halted Agent

Only a human operator can resume a halted agent. This is deliberate -- automated recovery from a security event would defeat the purpose of the circuit breaker.

Via the dashboard: Navigate to the agent's detail page and click Resume. You must provide a justification that is recorded in the audit trail.

Via MCP: Ask your AI assistant to resume the agent with your reasoning:

"Resume deploy-bot. The anomaly was a scheduled batch job, not a security incident."

The assistant calls resume_agent with your justification. The resume event is hash-chained in the audit trail.

Incidents

When anomalies cluster or escalate, AgentLattice groups them into incidents. An incident represents a potential security event that requires investigation.

Incident Lifecycle

  1. Open — Anomalies detected, no operator action yet
  2. Investigating — An operator has acknowledged the incident
  3. Contained — The threat has been neutralized (agent halted, permissions revoked, etc.)
  4. Closed — Investigation complete, incident documented

Incident Triage via MCP

The MCP integration is particularly powerful for incident triage:

You: "Show me open incidents."

Claude calls list_incidents and summarizes each by severity, affected agent, and threat type.

You: "The deploy-bot incident looks like a false positive from our batch migration. Close it."

Claude calls the appropriate tools to acknowledge anomalies and update the incident status.

Enforcement Actions

When an agent poses an active threat, operators can take enforcement actions:

Action Effect Reversible?
Halt Agent's actions rejected until resumed Yes — resume with justification
Kill Agent permanently deactivated Re-registration required
Cascade revoke All delegation chains from this agent terminated Child agents become inactive

Every enforcement action is recorded as a tamper-evident audit event, including the operator's identity, reasoning, and the evidence that prompted the action.

Dashboard Indicators

The dashboard provides at-a-glance security monitoring:

  • Fleet health bar at the top shows green (all monitoring), yellow (calibrating agents), or red (halted/anomalous agents)
  • Agent cards show circuit breaker state with color coding
  • Anomaly timeline visualizes detection events over the past 24 hours
  • Incident queue surfaces open incidents requiring attention

Webhook Notifications

Configure webhooks to receive real-time notifications for security events:

  • Anomaly detected (with severity threshold filtering)
  • Circuit breaker state changes (halted, resumed, killed)
  • Incident created or escalated
  • Enforcement actions taken

See Webhooks for setup instructions.

SDK Integration

Agents can query their own security state and subscribe to governance events programmatically.

Checking Circuit Breaker State

Use whoami() to check the agent's current circuit breaker state before performing sensitive operations:

from agentlattice import AgentLattice

al = AgentLattice(api_key=os.environ["AL_API_KEY"])

info = await al.whoami()
if info.cb_state == "HALT":
    print("Agent is halted — skipping operation")
elif info.cb_state == "WARN":
    print("Anomalies detected — proceeding with caution")

See the Python SDK reference for the full response shape.

Governance Posture Score

Use posture() to get a 0-100 governance health score for the workspace:

result = await al.posture()
print(f"Score: {result.score}/100")
for name, comp in result.components.items():
    print(f"  {name}: {comp.score}/{comp.max}")

The posture score factors in policy coverage, audit chain integrity, anomaly rates, and approval response times. See the Python SDK reference for component details.

Realtime Anomaly Subscriptions

Subscribe to governance events via WebSocket to react to anomalies in real time. Requires pip install agentlattice[realtime].

from agentlattice import AgentLattice, GovernanceEvent

al = AgentLattice(api_key=os.environ["AL_API_KEY"])

def on_anomaly(event: GovernanceEvent):
    if event.event_type == "anomaly.detected":
        print(f"Anomaly on agent {event.agent_id}: {event.data}")

await al.subscribe("org-id", on_anomaly, events=["anomaly.detected", "enforcement.triggered"])

See the Python SDK reference for all event types and filtering options.

Best Practices

  • Do not disable anomaly detection for convenience. False positives should be acknowledged and fed back as training data, not silenced by turning off monitoring.
  • Set up webhook alerts for halted agents. You want to know immediately when a circuit breaker trips, especially outside business hours.
  • Review the anomaly timeline weekly. Even low-severity anomalies can reveal drift in agent behavior that may indicate a misconfiguration or a slow-moving threat.
  • Document incident resolutions. When closing an incident, include what happened, why, and what changed. This builds organizational knowledge for future triage.