Ratchet Transformers are a type of neural network architecture proposed in a research paper titled "Rethinking Attention with Performers" by Krzysztof Choromanski, et al. (2021).
The Ratchet Transformer is an extension of the self-attention mechanism used in standard transformer models, such as the widely-used BERT and GPT models. The self-attention mechanism in transformers allows the model to attend to different parts of the input sequence and extract relevant features for downstream tasks.
However, the self-attention mechanism in standard transformers is computationally expensive and requires a lot of memory, making it difficult to scale to larger datasets and models. The Ratchet Transformer addresses this issue by introducing a mechanism called "ratcheting" that allows the model to perform self-attention in a more efficient way.