Email or username:

Password:

Forgot your password?
Top-level
Yiming Wu βœ… Use OurPaint

@davidrevoy @sepia everyone's data eventually will get scraped so I'm not worried about that. I don't even bother to check if mines were being scrapped.

Just do the thing you do, ai doesnt have the personality and the life experiences to make the same thing you make

7 comments
Tanner Spaw

@chengdulittlea @davidrevoy @sepia agreed on both points, even as someone that is messing with ai art. But I would still want to see a model that's just public domain stuff and the rest being opt-in.

David Revoy

@tannerspaw Same, I wish a model existed trained exclusively on CC-0 and Public Domain resources.

@chengdulittlea @sepia

David Revoy

@chengdulittlea I know, but it's not the A.I with its little legs who have decided by a beautiful morning to scrape the web for art alone. πŸ˜…

No: it's a team of dev βˆ’ eg. working at LAION laion.ai/faq/ βˆ’ who decided to scrape all URL and ALT text, enrich the database with their description, curate the result (eg. remove the watermark, keep the bias of stereotype work/gender/race). Political choices here.

I just hate that.

(Edit: I removed a Ctrl+V errorπŸ˜… )
@sepia

@chengdulittlea I know, but it's not the A.I with its little legs who have decided by a beautiful morning to scrape the web for art alone. πŸ˜…

No: it's a team of dev βˆ’ eg. working at LAION laion.ai/faq/ βˆ’ who decided to scrape all URL and ALT text, enrich the database with their description, curate the result (eg. remove the watermark, keep the bias of stereotype work/gender/race). Political choices here.

A screenshot of https://stablediffusionweb.com/#demo , I asked for a "Photo of a doctor" as a prompt, I only got white males on the four results of the AI ...
[DATA EXPUNGED]
Tak

@davidrevoy @chengdulittlea @sepia also it claims they didn't scrape anything at all, common crawl is an old project that created dumps of the ~entire internet, they seem to have simply formated whatever can be formatted there into a txt-img paralel corpus.

Tak

@davidrevoy @chengdulittlea @sepia oh actually the large one does have a detector for watermarked images, and for I believe porn, so that any downstream user could avoid training on such images.

David Revoy

@takloufer

Ha, right. Please ignore my previous reply, I read this one of your a bit later after sending it. Yes, a Mature+Watermark filtering.

@chengdulittlea @sepia

Ramin Honary

@davidrevoy @chengdulittlea @sepia

So according stable diffusion AIs trained on web-scraped images, ALL doctors are white men wearing blue shirts with neckties, lab coats, and stethoscopes.

AI can solve problems ONLY IF you have good data to train it, and acquiring is very costly. My professional experience working with AI is that there is hardly ever good data. But companies cut costs, use bad data (garbage in, garbage out) and systemic biases can end up being reinforced.

Go Up