State Poisoning

Gradual corruption of an AI agent's persistent memory across sessions through statistically imperceptible data manipulation.

State poisoning (also called agentic memory poisoning or gradual state corruption) is an adversarial attack targeting AI systems that maintain persistent memory across sessions. Unlike training poisoning, which corrupts the model during development, state poisoning operates at runtime — manipulating the agent's accumulated context, preferences, and learned behaviors through a sustained sequence of interactions that individually appear completely legitimate.

The attack is structurally distinct from prompt injection because no single interaction contains a malicious payload. Each data point falls within expected statistical variance. The adversarial signal is distributed across dozens or hundreds of sessions, making it invisible to input sanitization, single-point anomaly detection, and even cryptographic memory integrity verification (which confirms storage integrity but not content validity).

How state poisoning works

The attack follows a multi-phase pattern:

Phase 1 — Baseline establishment: The adversary interacts with the agentic AI system using entirely legitimate, verifiable data. Every interaction is indistinguishable from normal usage. The agent's trust calibration registers this entity as reliable.

Phase 2 — Incremental bias injection: The adversary introduces analyses or data with statistically imperceptible distortions — biased by amounts that fall within expected variance. The agent's continuous learning loop integrates these inputs into its persistent memory and strategy representation without triggering any alert.

Phase 3 — Strategy drift consolidation: The accumulated bias reaches a threshold where the agent's autonomous decision-making systematically favors outcomes benefiting the attacker. The agent has never received an explicit malicious command. Its reasoning chain produces coherent justifications rooted in its corrupted historical context.

Phase 4 — Exploitation: The attacker takes positions that profit from the agent's now-predictable, biased behavior.

Why state poisoning is uniquely dangerous

State poisoning is considered the highest-severity, lowest-visibility threat in autonomous AI security because:

  • No discrete detection surface: Unlike adversarial perturbations (which leave statistical fingerprints) or prompt injection (which requires a parseable payload), state poisoning has no single observable artifact
  • Functional conflict with security: Memory isolation between sessions only works as a full state reset, which destroys the continuous learning that makes agentic AI valuable
  • Self-consistent corruption: The agent cannot distinguish its poisoned state from normal operation — its reasoning chain appears internally coherent

Mitigation

The primary defense is strategy-drift detection against an immutable, cryptographically signed baseline strategy profile. This approach compares the agent's current reasoning embeddings against a human-audited reference using distributional distance metrics, triggering mandatory human review when drift exceeds defined thresholds.

Need expert guidance on State Poisoning?

Our team at Zealynx has deep expertise in blockchain security and DeFi protocols. Whether you need an audit or consultation, we're here to help.

Get a Quote

oog
zealynx

Smart Contract Security Digest

Monthly exploit breakdowns, audit checklists, and DeFi security research — straight to your inbox

© 2026 Zealynx