Email or username:

Password:

Forgot your password?
Top-level
wasabi brain

@NewtonMark this is bad but I keep wondering what’s the point of using low quality data like slack? Unlike using something like The NY Times website which is copy edited and has some connection with ground truth, slack is mostly ungrammatical, informal communications. How does this make an LLM better?

4 comments
justforfun

@virtualinanity @NewtonMark
The following is not an endorsement of “AI” or LLM , bit rather my attempt to answer your question. The language part of LLM theoretically would benefit from exposure to work related terminology, vernacular, informal communication. The latter, I think will make it easier to mimic how people communicate outside of formal articles or grammatically correct posts. As for being grounded in truth, in my opinion, this is not a concern for those building LLMs.

wasabi brain

@justforfun @NewtonMark thank you. Very well put. I think if they put care into how they integrate these data sets it might make sense. But if they’re just kinda throwing it all together, mixing nytimes with slack messages, I could see it causing issues

immibis
@virtualinanity @NewtonMark they need metric fucktons of data and they've already run out of public data
immibis
@virtualinanity @NewtonMark also they're using slack data to train a *Slack* AI
Go Up