Email or username:

Password:

Forgot your password?
Simon Willison

Fascinating report from 404 Media's Samantha Cole on a trove of leaked NVIDIA Slack messages and emails about how they're scraping millions of YouTube videos to train their own new foundation video generation model: 404media.co/nvidia-ai-scraping

Posted a few of my own notes here: simonwillison.net/2024/Aug/5/n

It's not surprising to learn that they're doing this - that's practically the industry standard right now - but is still really interesting to see internal details of what they're collecting and why

4 comments
phillmv

@simon

every now and then i feel like im taking crazy pills because i remember when aaron swartz killed himself because he was going to go to jail forever because he scraped JSTOR,

and eleven years later your manager tells you “sshhhh it’s fine just scrape all of it don’t worry the CEO said it’s fine”

DELETED

@simon
Do you think Nvidia might become more chaos focused in future?

I genuinely worry about big AI companies becoming chaos agents.

Also do you think they are paying YouTube for that kind of access, or paying folks like youtube-downloader?

Simon Willison

@lewiscowles1986 NVIDIA desperately need to keep the bubble going for as long as possible, so I definitely expect them to try all sorts of things to keep people training and running larger and larger models

DELETED

@simon
I must admit, I wouldn't mind having access to more GPU compute to run models and see if there are any redeeming features. But I also don't think AI has been good for workers rights, or massively advanced much more than danger to regular folks.

By Chaos agent, I mean that I worry they might either churn out deepfakes and untrue / biased narratives to benefit them, or fail to stop bad actors using them for that.

Go Up