@mhoye @gabrielesvelto @rmflight On first look I think...

@mhoye @gabrielesvelto @rmflight On first look I think what this paper suggests is 1) for some classification tasks there's nicely simple approach that works well and 2) this is a promising path towards better feature engineering for language models that will in turn result in better accuracy vs cost.

Like 13 Jul 2023 at 21:15 | Wall-to-wall | Open on mstdn.social

2 comments

Choong Ng

@mhoye @gabrielesvelto @rmflight If this works out well we'll see better + smaller models for all tasks (not just classification) that outperform both current DNNs and the NCD technique they use at moderate cost. There's precedent of this being a successful approach for example using frequency domain data for audio models instead of raw PCM. There's also precedent for finding ways DNNs waste a lot of capacity on effectively routing data around and restructuring to fix (ResNets for example).

13 Jul 2023 at 21:20 | Open on mstdn.social

Choong Ng

@mhoye @gabrielesvelto @rmflight Overall though in recent history data-based approaches have tended to win so I would expect the useful bits to get incorporated into DNNs rather than DNNs being obsoleted in almost any context. My favorite essay on that topic by Rich Sutton: http://incompleteideas.net/IncIdeas/BitterLesson.html

13 Jul 2023 at 21:22 | Open on mstdn.social