Email or username:

Password:

Forgot your password?
Top-level
PMacDiggity

@Meyerweb do they care though? They have a massive, highly tagged data set for LLM training, since they realized that they can gate and sell the data we created to training new models it’s likely all they care about. Even if it never gets new content, it’s likely valuable for that as it is today.

15 comments
Felis Cat

@pmacdiggity isn't all of the data generated up to this point already exfild by anyone who would want it though?

PMacDiggity

@Sternness3985 I don’t think so, I think there will be many new players entering the market. Some will be establish corporations kicking off new initiatives, some will be new startups. Reddit will want a chunk of that funding. This is likely why they’re shutting down the API so quickly; so they can monetize the data before it gets exfild by enough 3rd parties.

Felis Cat

@pmacdiggity that's a good point. it sounds like reddit believes that, too.

DELETED

@pmacdiggity @Meyerweb

Nope, once they lock the doors and charge, the bloodsuckers will go elsewhere for their free meal of data.

PMacDiggity

@joesabin @Meyerweb it will take many years, maybe decades (since many people like myself won’t trust any new forums moving forward) to create a new dataset like Reddit

DELETED

@pmacdiggity @Meyerweb

They've already harvested it. What value does it have if it loses its inputs? No one is going to buy data they got for free already.

PMacDiggity

@joesabin @Meyerweb OpenAI has it, new startups and new initiatives at existing companies don’t. Lots of competitors are entering this market.

DELETED

@pmacdiggity @Meyerweb

Only time will tell. I for one -- and I know a lot of other folks who would agree -- never trusted Reddit it always seemed like a viper's den. Too many nasties in there for my taste. So it contains only a subset of human input to be sure.

AdeptVeritatis

@pmacdiggity @Meyerweb

Training models from yesterday have the same value as marketing data from yesterday.

Jargoggles

@pmacdiggity @Meyerweb There's not much that can be done about data that's already been harvested, but at least you can do a small part in poisoning the well for future scraping. I used PowerDeleteSuite to scrub all my content. You can roll through and edit all your comments: github.com/j0be/PowerDeleteSui

Trebach

@jargoggles @pmacdiggity @Meyerweb Oh good, it IS overwriting first. That's what I've been recommending to people before deleting

iAmAnEngarneer

@trebach @jargoggles @pmacdiggity @Meyerweb neither poisons the dataset, deleting/scrubbing is merely public facing.

DELETED

@jargoggles @pmacdiggity @Meyerweb I already did that with my 13ish accounts using a different service. I Hope others do the same 👍

Andres Jalinton

@pmacdiggity @Meyerweb
@julia
The issue is, LLM "content" is so trash no one likes it. You can't train a LLM to have a good sense of humor and be funny. Sure, it can write correctly and "resume text" but it's not creative and fun like real people, if you ask it for a joke it's just... lame unless it blatantly copy someone else, and if you want a new joke?

Ragnell the Mildly Unpleasant

@pmacdiggity @Meyerweb That's why #Razit is a thing. There's instructions on how to turn all your posts into gibbereish before you leave.

Go Up