Algorithm Distillation paper shows that transformers can improve themselves autonomously through trial and error without ever updating their weights.
This might be the beginning of a new learning paradigm drastically faster than SGD.
Algorithm Distillation paper shows that transformers can improve themselves autonomously through trial and error without ever updating their weights. This might be the beginning of a new learning paradigm drastically faster than SGD. No comments
|