@darius Honestly, you could have just left this at "I'm not assuming that I am entitled to the content of every public post on the federated network" and I'd have already known you were more on the level than most scrapers and bridges. But you seem to have put a lot of effort into bot only not hanging on to the content of every post on the network, but into being transparent about the how and why. I'd give you a thumbs-up react, but most Masto servers don't support it. (So obviously I agree with your stated goal as well. :neobot_giggle:)

One thing that occurred to me is when a fedi instance uses custom forks. For example, Anarres.family uses a customized fork of Glitch Social (itself a fork of Mastodon), and the version number reported on our web interface, and presumably what your software will see, is v4.4.0-alpha.1+glitch.anarres.family, which is a pretty common practice for these forks, though not all of them include the full URL of the instance like ours does. So with your example of "we saw n polls from [software] x.y.z," in our case it would not actually anonymize us because it would display "50 polls by Masto...anarres.family." Not terribly critical since you're planning on scrubbing post data, so there's not much that could be shared that we'd likely want anonymized.

You could get around that by truncating or obfuscating the version, and possibly rolling it in with the equivalent Masto or Glitch version, but that's relevant data; one of the things our fork adds is the emoji reaction code from Chuckya, so our instance is capable of sending and receiving AP messages that are not supported by other Glitch instances.