@mhoye So this works for classification, but can it be adapted to work for transformers with attention? I still can't fully get my head around how transformers work, so I don't know if this technique translates, but if it does... 🤯