Email or username:

Password:

Forgot your password?
Darius Kazemi

ugh, this is pretty clearly a CAPTCHA being used to tune a machine learning algorithm. these are definitely ML-generated pictures and I imagine what's going on is they are adding some noise to a query for "cake" and then asking me to tag the cake-looking stuff.

BUT! It means that it doesn't actually matter what I select. I picked a random assortment of stuff and it let me through just fine. Take THAT, training model!! :AngelDevil:

A CAPTCHA prompt that says "please click each image containing a cake" and 9 images of poorly generated food items that barely look like real objects
39 comments
garden center goth

@darius having fun determining the most useless data to provide 💅🏻

candle 🕔

@darius is ice cream a cake? the greatest thread in forum history,

hannah online

@darius how is that even supposed to work as a captcha then?? it's like they were so focused on the "using free labor to train our ML" part that they forgot the original purpose was to block bots

Darius Kazemi

@prehensile I think you actually have to pick stuff that is close-enough-to-true but not quite accurate in order for it to pass you (basically they are checking if your answer deviates *too far* from how wrong they *think* they are, it's an epistemic mess)

John Kelly

@darius I got a similar one the other day that I failed because I was interpreting #9 as a "strawberry cake"

A CAPTCHA prompt asking the user to identify all images containing a strawberry cake.
tim 🍓

@darius i saw a tiktok where people were making pancake spaghetti, so its not entirely inaccurate to select that one

Kat M. Moss

@darius Know how that goes. accessibility nightmares alone with those... it's nice to know that they are difficult for even those with sight.

[DATA EXPUNGED]
Adrian Holovaty

@darius Weird! I thought the usual CAPTCHA approach was to provide a random mix of known and unknown images, then use clever statistics and aggregation to get certainty about unknown labels. Sounds like this particular implementation is buggy.

Darius Kazemi

@adrian yes, I was simplifying when I said "it doesn't matter what I select" -- I suspect I can select stuff that is wrong but also close enough within a threshold of expected-wrongness for it to pass. OTOH in the next captcha I told them that an airplane was a house and they were like "sure thing man"

יונה

@darius I got a similar captcha the other day. Distorted pictures of animals. Next time I see one I’ll do the same. You want me to accurately label training data, you can damn well pay me.

AntiComposite

@darius I guess that would help with the reverse image search problem. But it doesn't make me feel great about the strength of hCaptcha.

Ali Alkhatib

@darius screenshotting this for something i've been thinking about 😶 lol

JP

@darius any time i'm presented with a captcha i always wonder if there are particular answers that would let me pass while deliberately fucking up whatever bullshit they're trying to train.

Alokir

@darius wait, wasn't captcha always used for training?

I remember back when Google used words they were training their text recognition algorithms.

There used to be two words, one was generated and the other was a scan from a book or newspaper. They knew the solution to the generated one and wanted us to solve the scan for them.

I sometimes had fun trying to spot the scanned word and "solve" it with random text.

ryan

@darius Whoa, I think I got one of these back at the beginning of September(!). I even grabbed a screenshot of it because it felt so off to me. It also did _not_ care which ones I chose (so I picked the, uh, worst ones)

A CAPTCHA asking me to identify which images contain a "whole glass bottle." The bottles have all the signs of being AI generated.
grin

@darius
Recaptcha is smarter, it provides you with multiple known good and some training material.

Ikecicle

@darius who is benefiting from the tuning of ML datasets using CAPTCHAs—Google?

emenel

@darius this is a new evolution that i haven't seen yet! these things have always been used to train ml in some way, even going back to more "virtuous" intentions when recaptcha was being used to fix ocr errors in gutenberg corpus scans. ...

kylie 🧚🏼‍♀️

@darius i got the same captcha once but it was brutal. i had to go through three of these before i was let in.

[DATA EXPUNGED]
potatoofdestiny

@darius just treating it like an alignment chart joke

Chris Coleman

@darius I’ve seen this a few times, but I’ve also been under the assumption that we’ve been training an ML model for Alphabet’s self driving cars for years.

DELETED

@darius I do the same thing on captchas as a rule. Once I realized that they measure confidence and timing more than actual results, captchas became way more entertaining.

ace_be_based

@darius wasn't the original captcha used to help digitize text?

DELETED

@darius Cloudflare is now testing their new Turnstile verification. The company I work is already putting it to good use:

Jame Seth Mach

re this:
a captcha asked me to identify rabbits swimming once; I didn't answer randomly, but I did get extremely judgy about all the images that were obviously a rabbit standing on or slightly submerged in water, which was most of them.
It made me do a few more rounds.
Don't get ahead of yourself, kiddo.

Григорий Клюшников

The OG recaptcha was smarter: there were two words, one to check you, and another one it wants you to recognize. It was usually easy to tell which is which, the unknown word had more noise. So I would enter "fuck" instead of that word. Worked every time 👍

TQuid

@darius captcha is ableist shit and should be subverted and destroyed. Good show.

Blapman007 :akkoderp:

@darius this is something ive noticed too!

you know what would be funny. if we all just clicked the wrong stuff and messed up the ML :thounking:

"Dog? that sure looks like a cake to me!"

Tom J. Brenner

@darius
Your explanation is spot on. For me the process reminded me of a maze of confusion. Even if you did get it correct it said try again.

Dan Collins

@darius Yeah, with hCaptcha, the secret is that the users are still the product, you're just using a different middleman. From their website: "When you use hCaptcha, companies bid on the work your users do as they prove their humanity. You get the rewards." Google mostly uses street view stuff in their captchas, but hCaptcha literally sells a data labelling service in which your users are the employees.

Daybreak

@darius That's a terrible training set anyway. They're looking for cake, but there isn't a single picture of Omni-man in there!

Go Up