An alternative to transformers that processes sequences more efficiently by maintaining a hidden state that gets updated as it reads each token.