Email or username:

Password:

Forgot your password?
Tom Walker

People worry a lot about losing knowledge — about "burned-down libraries".

Comparatively few people seem to worry about what happens if you take a billion books full of auto-generated, often-untrue junk text and *add* them all to the library.

In theory, nothing is lost. In reality, everything is lost, because nothing useful can now be found.

191 comments
ꓤɔᴉʇɐʇS

@tomw This has honestly been my primary concern with AI becoming increasingly sophisticated. Not just limited to libraries, any place where information is stored or communicated would be vulnerable to being completely flooded with garbage.

M. Fioretti

@tomw which, by the way, it's basically the same thing that has been happening on social media for years now.

Thanks, good point.

Brooks Davis

@tomw i mentioned this to my spouse last night and she said the analogy she liked was that generative “AI” would turn the internet into gray goo.

joe

@tomw Damn. This is really melting my brain right now.

Felix Ling

@tomw It also gives even more power to the likes of Google, as we'll need to rely on such tools even more to weed through all the junk.

Leslie M. Kay

@tomw
Borges has a story for you - Library of Babel
Like most of his writing, it’ll blow the lid off your head.

Dr. Tineke D'Haeseleer

@tomw We do information literacy session with our students but yeah, this illustrates the problem neatly: our task keeps growing, if we want to teach them how to separate the chaff from the wheat.

Emily M. Bender (she/her)

@tomw Speaks to the importance of all kinds of information science -- information hygiene, indexing techniques, etc.

Aaron Brick — אהרן בריק

@emilymbender @tomw Exactly. Librarians have always been interested in verifiable provenance, and they already exclude vast, low-value content as a rule.

Albert Cardona

@tomw Agree with the sentiment, advocates for human-curated search directories, a la Yahoo of lore.

David Martínez Martí

@tomw yeah, but we did not need an AI for that. We have been filling our libraries with junk since the invention of the newspaper. And it has grown even more with TV. And it exploded with SEO practices.

AI is not going to do anything that didn't happen before. Just it's going to be way easier and way more often.

"There isn't an algorithm for truth" - Tom Scott.

Tom Walker

@deavid I don't agree with this. There is a difference between a mix of high and low quality writing generated at the speed and quantity of a large number of humans, and what is coming, which is low quality but plausible sounding writing generated at extremely high speed and effectively unlimited quantity.

Janne

@tomw to be honest, a lot of the non-fiction in the libraries is already crap. There is no peer review in book publishing, and for some reason non-fiction writers tend to be sloppy or excentric.

Also, especially in software engineering you have to do web searches all the time, and most of the ideas gound are crap. It requires domain specific knowledge and senior wisdom to weed the valuable parts out... And a working recommendation system...

Alakest

@tomw If indexed with fidelity there should be little problem. It's when imprecise attribution and impediments to person-to-person critique compromise review that "junk" compounds.

When communications about communicating are put behind paywalls it's money, not minds, that prevails.

Aram Sinnreich

@tomw This is a dynamic that @jesse and I discuss in our (mostly non-AI generated) book The Secret Life of Data, coming out soon from MIT Press.

Pterry

@tomw that sounds like a variant of L-space theory and could someone please switch off Hex?

ghaff

@tomw You've just basically described the concept of Borges' "The Library of Babel."

Steven Hammer

@tomw I’ve frequently referred to any form of spam as web pollution. We now have the tools to make this pollution at exponential scale.

Alex🇺🇦

@tomw No Problem, as long as those texts can be auto-detected and labelled as such.

Soozcat

@tomw I can't say whether this is deliberate, but the speed at which GPT and other AI systems are rolling out doesn't allow for much public deliberation about whether they are warranted or desired. Whether intended or not, the pollution of useful information with randomly generated AI responses will lead to a general public discouraged from looking things up and thus effectively kept ignorant, even in an age of widely disseminated information.

Ji Fu
@Soozcat what are you talking about? The market does this everyday.
Christian Hujer

@tomw I don't know if this makes you feel better or worse: I do worry about that. I am watching in horror as the quality of Google search results is in constant decline, favoring ad-ridden low quality secondary sources over high-quality primary sources, and a glorified ChatGPT tells blatant lies about almost anything with utmost confidence.

Shokk

@tomw “bad information is worse than no information”

levampyre

@tomw Isn't that what librarians are for. They tend to not add junk to their collections, but curate to add the relevant stuff.

Philip Theus (prev. Mueller)

@tomw My hope is that we will rediscover the value of expert curation.

Grant Canterbury

@tomw Damn it, Borges, you were not supposed to be generating real-world problems!!
en.wikipedia.org/wiki/The_Libr

Daniel Detlaf

@tomw

Even without AI humanity does that at an amazing rate.

Stephen Ball

@tomw This is why we trust librarians to curate the data. That's always been the case but now even more so.

Woodswalked

@tomw
Using ChatGPT to populate metadata in catalog records…

argv minus one

@tomw

I also worry about “burned-down libraries” in the sense of DRM-protected content becoming permanently inaccessible (or, at least, impossible to access without breaking the law).

Martin Vermeer FCD

@tomw 'Flooding the zone with shit'. No computers required.

Christian

@tomw

Reminds me of Neil Postman's quote below, as reported in theguardian.com/media/2017/feb

>> What Orwell feared were those who would ban books. What Huxley feared was that there would be no reason to ban a book, for there would be no one who wanted to read one. Orwell feared those who would deprive us of information. Huxley feared those who would give us so much that we would be reduced to passivity and egoism. Orwell feared that the truth would be ......

@tomw

Reminds me of Neil Postman's quote below, as reported in theguardian.com/media/2017/feb

>> What Orwell feared were those who would ban books. What Huxley feared was that there would be no reason to ban a book, for there would be no one who wanted to read one. Orwell feared those who would deprive us of information. Huxley feared those who would give us so much that we would be reduced to passivity and egoism. Orwell...

Christian

@tomw

>> ....... concealed from us. Huxley feared the truth would be drowned in a sea of irrelevance. Orwell feared we would become a captive culture. Huxley feared we would become a trivial culture.

Robbie 🇧🇪 :tux:

@tomw It's a library. People will start community generated indexes 😎

David Litwin

@tomw Borges wrote about this exact thing in The Library of Babel. Every conceivable book was in there so all was useless.

Kakurady

@tomw libraries prune their collections... Sometimes aggressively! It's the curation that makes the collection valuable. 99percentinvisible.org/episode

Ian Tindale

@tomw@mastodon.social good point – it already happened once thousands of years ago when religion infiltrated publishing, but fortunately nobody pays any attention to that now of course

Martin C

@tomw The best way to hide a tree is in a forest.

The Modesto Kid

@tomw hmm, looks like I'm late to the party

The Modesto Kid

@tomw funny how JLB dos not spend any time on the Card Catalog of Babel

griffey

@tomw There are definitely librarians and other information science types that are very worried about this...for example, me. I've been talking about the overall challenges of generative popular media for some time now, most recently here: youtube.com/watch?v=OknPeRCoT7

Go Up