CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks

Alexandre Le Mercier, Thomas Demeester, Chris Develder|March 12, 2026arXiv

Key Takeaway

CLASP provides a practical, lightweight defense against poisoning attacks on state space models by detecting malicious tokens before they reach downstream tasks, with strong generalization to unseen attack patterns.

Summary

State space models like Mamba are fast alternatives to Transformers, but they're vulnerable to Hidden State Poisoning Attacks that inject malicious tokens to corrupt the model's memory.

safety efficiency architecture

Key Terms

state-space-models hidden-state-poisoning block-output-embeddings gradient-boosting