Email or username:


Forgot your password?
5 posts total
Rob Hafner :verified_flashing:

We brought home some new rugs and Blip managed to turn one into an interesting napping spot.

Rob Hafner :verified_flashing:

If you're an admin of a mastodon/fediverse instance you should update your robots.txt to block "GPTBot", the crawler made by OpenAI to feed their machine learning models such as ChatGPT.

This is the easiest way right now to prevent public content from being crawled and fed into their datasets, and due to the nature of federation it works better the more instances that do it.

i am root

@tedivm Do you happen to know if a blanket disallow like:

User-agent: *
Disallow: /

Will be honored by GPTBot? I wouldn't put it past them to ignore the root disallow and require a specific `User-agent: GPTBot`, but it's not called out in the doc either way.

There's some inconclusive evidence in my Nginx access logs that they crawled my robots.txt, then proceeded to crawl other URLs. But I only see 6 hits total, and they all happened within 10 seconds. 🤷‍♂️

Rob Hafner :verified_flashing:

In exciting news I appear to be part of one of the first data breaches of the fediverse era!

I got this email 20 minutes ago letting me know my data migration from was dumped in a breach.

I'm going to be honest, I've got some opinions on the fact that a public bucket is used to store archives, with just obfuscation to stop people from downloading them.

Show previous comments

@tedivm Always assume what you put online is public.

Elias Mårtenson

@tedivm I'm always baffled when people use random filenames when they make file accessible to trusted users.

S3 already has an API to support signed downloads, and all the application needs to do is to sign a URL which the client can use to access the resource.

I've implemented it from scratch (there was no client library for Common Lisp at the time) and it was trivial. If you have a library, it's literally one function call. There is no excuse for this.

Go Up