Algorithm Distillation paper shows that transformers can improve themselves autonomously through trial and error without ever updating their weights.

This might be the beginning of a new learning paradigm drastically faster than SGD.

arxiv.org/abs/2210.14215