Email or username:

Password:

Forgot your password?
Brian Hawthorne

How bad are the thousands of new stochastically-generated websites?
Last night I wanted to roast some hazelnuts, and I could not remember the temperature I used last time. So I searched on DuckDuckGo. Every website that I could find was machine-generated with different temps listed. One site had three separate methods listed that were essentially differently worded versions of the same thing. With different temperatures.

So I pulled my copy of Rodale’s Basic Natural Foods Cookbook off the shelf and looked it up there.

I think it may be time to download an archive copy of the 2022 Wikipedia before we lose all of our reference material. It was nice having all the world’s knowledge at my fingertips for a couple of decades, but that time seems to be past.

[ Edit: Since others have mentioned the possibility, I should mention that some of these sites may have been SEO-generated/altered and not generated by an LLM. However, even if that is the case, the fact that the sites are as bad as and indistinguishable from LLM-generated sites means to me they are just as bad and just as likely to be have only a loose resemblance to reality. There are many ways to be a fancy stochastic parrot. ]

226 comments
Bodhipaksa

@bhawthorne Same here. I wanted to know what type of battery my car's remote key takes. I got article after article (all of them long and repetitive) and none of them told me what kind of battery I needed.

Together, AI and Google are destroying the web.

MacCruiskeen

@bodhipaksa @bhawthorne Couldn't you just open up the remote and look? Most battery powered devices are marked with what they need.

For cookbooks, I have a whole bookcase full of them and never rely on internet recipes if I don't have to. I'd rather just go out and get another cookbook.

Bodhipaksa

@maccruiskeen @bhawthorne Well, of course I could open it up to see what kind of battery is in there. That's kind of obvious, isn't it?

The reason I haven't is that in my experience these things are fiddly and apt to self-disassemble. I'd rather not open it up and put it back together again, drive to the hardware store, and then repeat the process when I get home. Nor do I want to leave it disassembled while I go on my shopping trip.

The point is, it shouldn't be hard to find out. But it is.

MacCruiskeen

@bodhipaksa @bhawthorne Sorry, I guess I didn’t account for the ‘remote is badly made junk’ factor.

Bodhipaksa

@maccruiskeen @bhawthorne LOL! Well, this one might be well-made, but the one I had on my previous car (a Mazda) tended to fall to pieces while it was being opened, and was very hard to put back together again. I don't want to take that risk with the key on my new car.

Dietmar

@bhawthorne you don't have an offline copy of Wikipedia?

Chris P. :trek_ds9_sisko:#1️⃣

@grumpasaurus @bhawthorne Oh man, there is a website that I can't seem to find right now, but it's essentially a bunch of recipes in a sort of "raw" format; no ads, no stories, no ratings, just lots of diverse recipes.

I'm letting you know even though I haven't found it, in case anyone else sees this and goes "Oh, yeah, it's this".

Chris P. :trek_ds9_sisko:#1️⃣

@grumpasaurus @bhawthorne Oh! based.cooking found it!

EDIT: Because this toot kinda blew up, here are some other (I would even say better) alternatives to the above site:

en.wikibooks.org/wiki/Cookbook

And this is maintained by the wonderful @cassidy: blaede.family/recipes/

Cassidy James :eos: :gg: :fh:

@b4ux1t3 @grumpasaurus @bhawthorne nice! I’m happy for anyone to use my family’s recipe site, as well; I had the same approach to provide an easily updatable archive for my extended family. 😁 blaede.family/recipes/

Brian Hawthorne

@b4ux1t3 @grumpasaurus Thanks. It looked great until I got to the crypto cash solicitation at the end. I’m not comfortable dealing with anyone associated with that.

Chris P. :trek_ds9_sisko:#1️⃣

@bhawthorne @grumpasaurus That's fair enough. My brain seems to just filter that crap out these days, I hadn't even noticed it.

I know a few other sites have popped up in the last couple years doing similar things, this was just the one that seemed to stick in my memory. If I happen on another I'll drop a link here.

We need a "Wikipedia but for recipes"

Brian Hawthorne

@grumpasaurus @b4ux1t3 I keep one of those on my phone that has served me well for years. I think I originally got it from the Reluctant Gourmet in 2012:

Meat Temperatures & Doneness Chart
The “Remove” temperature on the left is the target temperature to remove from heat source. The “Ideal” temperature on the right is the ideal internal temperature after resting. These temperatures are all Fahrenheit. Note, these are not USDA Recommendations. The USDA temperatures are conservatively 10º – 15º higher because of food safety but not many professional chefs are cooking your medium-rare steak to 150º F. You would send it back in an instant.

@grumpasaurus @b4ux1t3 I keep one of those on my phone that has served me well for years. I think I originally got it from the Reluctant Gourmet in 2012:

Meat Temperatures & Doneness Chart
The “Remove” temperature on the left is the target temperature to remove from heat source. The “Ideal” temperature on the right is the ideal internal temperature after resting. These temperatures are all Fahrenheit. Note, these are not USDA Recommendations. The USDA temperatures are conservatively 10º – 15º higher...

GlOwl Octopus-Faenby

@jkn @b4ux1t3 remembered this one too, there are a lot of these "Wikimedia Books" on a bunch of different craft topics too. most of them are quite useful, some need to be improved a lot.

ToddZ

@bhawthorne @b4ux1t3 @grumpasaurus

I’ve noticed that when I see the word “based,” the word “crypto” often isn’t far behind.

Chris P. :trek_ds9_sisko:#1️⃣

@toddz @bhawthorne @grumpasaurus eh, based is just a slang term. It's more often used by younger folks, and crypto folks are (generally) younger.

It's like saying "whenever the word cool is used, drugs aren't far behind" in the 80s.

ToddZ

@b4ux1t3 @bhawthorne @grumpasaurus

LOL, that’s likely true. My sampling is mostly Linux YouTubers who turn out to be alt-libertarian crypto/web3 wanna-Thiels.

Jörg von Frantzius

@bhawthorne Out of interest, do the Google results contain as much AI noise as those on DuckDuckGo?

I can imagine that removing that noise from search results will become equally important for Google as their fight against SEO spam, and they have the resources to try it

Brian Hawthorne

@jfrantzius Let’s see. I haven’t used Google in a while.

First off, when I try to tap on the search box on mobile to type in a search term, the search box now appears to just be a button, which opens a new page of suggestions, and the text box where I am typing is off the top of the screen. Apparently some Google designer didn’t try their new UI with a larger typeface.

Okay, apparently I can type a search string, even though I can’t see what I am typing. “Temperature for roasting hazelnuts”.

The first page shows some images, then three “snippets” from websites I have never heard of, suggesting 375F, 350F, and 275F. 101Cookbooks, Culinary Hill (which is a hot mess of stripped out ads, popups, etc.), and Oregon Hazelnuts which appears on its face to be an industry marketing group.

After that is one search result from an Australian site, a section of “Perspectives” whatever the hell those are, a Recipes section, a Quora result, short videos, a stack exchange article, another Videos section, an Also Searched For section, and finally what appear to be half a dozen actual search results, all from unknown sources, then we are back to videos, then all sorts of excerpts unrelated to roasting hazelnuts.

[ Sorry for the delay. 15 min scheduled maintenance to switch our town fiber network over to our new MPLS-TE infrastructure. ]

Anyway, I can’t really say much about the Google results other than, “what a mess”.

@jfrantzius Let’s see. I haven’t used Google in a while.

First off, when I try to tap on the search box on mobile to type in a search term, the search box now appears to just be a button, which opens a new page of suggestions, and the text box where I am typing is off the top of the screen. Apparently some Google designer didn’t try their new UI with a larger typeface.

GlOwl Octopus-Faenby

@bhawthorne @jfrantzius can confirm from my own experience, the few times i go to try google because everything else fails have gotten much less useful results over ~the last year. before that google spit out some results i could not get from duckdg metager or startpage (or only with very specific search terms or sentences long citations from those sites)

Cecelia

@jfrantzius @bhawthorne I partly stopped using Google because of their insistence on making videos the highest results and pushing related searches into the endless page feed rather than the results to my actually search.

I found DuckDuckGo to provide some more direct results and felt a bit more like the Google I used to use, but Google still has a bit more depth, returning some results that DuckDuckGo didn’t at times (when I didn’t get the results I was looking for with it).

Both find garbage results, unfortunately, in my qualitative experience. Not sure if one is better than the other there.

#SearchEngines #DuckDuckGo #GoogleSearch

@jfrantzius @bhawthorne I partly stopped using Google because of their insistence on making videos the highest results and pushing related searches into the endless page feed rather than the results to my actually search.

I found DuckDuckGo to provide some more direct results and felt a bit more like the Google I used to use, but Google still has a bit more depth, returning some results that DuckDuckGo didn’t at times (when I didn’t get the results I was looking for with it).

Sven-Ola Tücke ✌

@bhawthorne
No unit mismatches? Converting between Celsius, Fahrenheit, Réaumur, Kelvin, or Rankine may be confusing, even for U.S. folks. For that archive you mentioned, I recommend the Kiwix Offline Reader (via Fdroid, f-droid.org/packages/org.kiwix)
Cc @dahie

mnl mnl mnl mnl mnl

@bhawthorne @Miniver in those situation I use kagi and filter out everything that has ad trackers. Even before llms, content sites with ads/affiliate links don’t have an incentive to provide quality information.

Ironically, tools like perplexity.ai also help a lot navigating through the nonsense.

mnl mnl mnl mnl mnl

@bhawthorne @Miniver I (optimistically) think that this is actually the snake eating itself. The business model of capturing eyeballs is disintegrating in front of our eyes. Soon only quality content will actually pull readers, through search engines that don’t incentivize selling those eyeballs.

Brian Hawthorne

@mnl @Miniver It is the idea of a business model that is the problem. That’s not what we started the Internet for. And I apologize for my early contributions to the commercialization of the Internet. We were sure we were building better world, but were naive to think that it wouldn’t be usurped by the oligarchs.

The next version should be non-commercial, distributed, and person-to-person. Let’s build it.

Jonathan Korman

@bhawthorne @mnl


Major tech platforms should be democratically accountable public utilities

If we do not turn the tech stacks into public utilities like this, Metcalfe’s Law and other winner-take-all dynamics will give us neo-feudalism instead: all of us dependent on a narrow oligarchy who own and control the resources which we depend upon.

miniver.blogspot.com/2022/11/m

@bhawthorne @mnl


Major tech platforms should be democratically accountable public utilities

If we do not turn the tech stacks into public utilities like this, Metcalfe’s Law and other winner-take-all dynamics will give us neo-feudalism instead: all of us dependent on a narrow oligarchy who own and control the resources which we depend upon.

Andrew Gretton

@bhawthorne frustratingly, "AI", in this case LLMs, has been used to provide verbose webspam to maximise ad impressions. I know LLMs aren't popular in much of the Fedi, but ChatGPT gives a helpful and concise answer to the original question. There's mileage in LLMs to leapfrog all this web junk we have to wade through today.

Donnodubus

@andrewgretton Why should anyone use an LLM to answer a question like this as opposed to using the authoritative source that the LLM is giving only a fuzzed impression of?

Andrew Gretton

@donnodubus fair question. My feeling is that over time, we graduate to authority figures. If you want legal advice, you don't try and determine which statutes to read, you ask someone who's synthesized and absorbed them and can hopefully regurgitate relevant parts, ideally with interpretation along the way. If I had a chef on hand, I'd probably ask about hazelnut roasting!

Although current LLMs are less than reliable, and there's a moral (and legal?) question around their training, of course

Donnodubus

@andrewgretton If you consider how knowledge and truth are actually arrived at, it's quite clear that LLMs are fundamentally incapable of achieving that kind of intelligence. No matter how advanced they get.

The best LLMs are simply better at defuzzing the source materials they've fuzzed. Which leaves the obvious observation that they're roundabout serving up the content they've stolen. Content we could just read directly...

Andrew Gretton

@donnodubus for sure, and the reference source materials are more authoritative. However, consider the utility of an LLM that's "right" most of the time. If it can tell me - more quickly and mostly accurately - how to roast hazelnuts, the average internet user will prefer that experience to Googling/DDGing/etc which today is a toxic experience due to webspam etc.

It's utility over ethics; maybe Napster all over again? And we know what "won" for years until iTunes and Spotify won. For a while.

Donnodubus

@andrewgretton well the webspam problem is driven by AI, so that's a self-fulfilling prophecy.

Convenient that these new capital ventures are being sold as the solution to the problem they made...

Andrew Gretton

@donnodubus very good point. Albeit depressing 😬 There's probably a good amount of money in a genuinely useful and entirely new search engine these days. I'm not sure Kagi and co are it, though, despite their innovative approaches.

Brian Hawthorne

@andrewgretton No. ChatGPT gives a concise answer that is worded in such a way as to seem helpful. The times that it is accurate or truly helpful are due to coincidence and chance. I have enough knowledge about cooking and the world that my guesses are accurate more often than any LLM I have found.

The point is, I don’t want a guess. I want an answer from a human being who has actually roasted nuts. Not an LLM summary of an unknown set of websites that are now mostly written by LLMs based on websites that were stuffed full of cruft designed to maximize ad revenue. Now, I was pretty sure it was 350 °F, but since my memory is not great these days, I wanted to check. Clearly, I need to pull out my 3x5 recipe card box and start recording my notes there again.

@andrewgretton No. ChatGPT gives a concise answer that is worded in such a way as to seem helpful. The times that it is accurate or truly helpful are due to coincidence and chance. I have enough knowledge about cooking and the world that my guesses are accurate more often than any LLM I have found.

The point is, I don’t want a guess. I want an answer from a human being who has actually roasted nuts. Not an LLM summary of an unknown set of websites that are now mostly written by LLMs based on websites...

Petrichor Squirrel

@bhawthorne honestly cook books are becoming such a precious resource. Online recipe websites have become such a slurry of SEO spam that they border on the useless.

Samsamros

@PetrichorSquirrel 100% We still keep a shelf worth of cooking books, and my late grandmother’s recipe cards, crafted by hand

Ami Moregore

@bhawthorne That's the weird part. Those lengthy recipe sites were at least human written before by content farms because recipes themselves can't be copywritten but the long ad copy as they reminiscence about how it's a generational family recipe or whatever is usually what is copywritten. AI generated content has consistently shown itself to be really bad at copying the essentially needed parts

Michael / Chgowiz 🎲🎲

@bhawthorne I've started downloading the basic books I need, just in case.

This is a good start towards a decent library:

anarchosolarpunk.substack.com/

Dan York

@bhawthorne As others have mentioned, do check out Kiwix - kiwix.org/en/ - You can grab an offline copy of Wikipedia and many other websites. (I’ve done so for when Internet access goes down… but you make a really good point to have one at a point in time before the LLMs started generating so much useless text.)

En jättesöt liten kille

@bhawthorne Strangely enough I’m starting to think this is the best thing to happen to this world in a long time (not the way tech dudes believe, though).

Brian Hawthorne

@thelovebing It’s time to start over with private individual home pages and manually curated links.

73 million seconds

@bhawthorne you don't think wikipedia policies will help it stay reliable in the future?

Brian Hawthorne

@73ms If the USA is taken over by the criminal and self-professed dictator next year, I don’t think Wikipedia will be allowed to maintain those policies. Either that or it will not be accessible in the USA. This is not hyperbole. It is a very likely reality.

NullNoMore

@bhawthorne - for anything not time sensitive, like recipes, I'm throwing "2013" into my searches.

C.S.Strowbridge

@bhawthorne

It's worse than that. I remember trying to find out why you can't cook red kidney beans in a slow cooker and got recipes telling me time and temperature.

Those beans, and a few others, have a protein that you need to boil to denature, or they can be toxic. Yet Google was giving me an AI answer that could kill people.

Pascal Greilach

@bhawthorne Our washer stopped working and the steps in the manual didn’t suffice. So I tried finding some DIY repair instructions. Kept the last one alive for over a decade after getting it second-hand. And it’s all just garbage. No instructions that could actually be followed, just crap that SOUNDS like instructions. Even YouTube is full of shitty generated text cards with vaguely related images. So I guess I will just throw away my five year old washer.

OldFartPhil

@Frdnspnzr @bhawthorne All part of the plan. Don’t you *want* a shiny new washer, citizen?

Fiona Craig

@bhawthorne Currently packing up my house for a refurb and I am so reluctant to part with *any* books right now. Remembering the time before the internet is a powerful incentive not to want to go back to that siloed life. Open access to knowledge is the ultimate democracy.

Darwin Woodka

@FionaCraig @bhawthorne and the kindle books lately are ass and full of typos.

Josh Davis

@bhawthorne
This was, in fact, the whole purpose behind "AI" in the first place. Can't have the plebs getting their grimy mitts on too much truth, ya know. Better to gum up the works with a bunch of auto-generated nonsense and then sell curated access to accurate information to the rich...imho.

nicholas_saunders

@bhawthorne just ask your friendly #artificialintelligence for recipes and such. Works sorta okay.

Brian Hawthorne

@nicholas_saunders I assume you are trolling here, since the entire topic of this thread is how SALAMI has destroyed any hope of getting an accurate answer. Might as well roll dice. That works sorta okay too.

nicholas_saunders

@bhawthorne sorry, I didn't read the thread. Post first, read later?

Brian Hawthorne

@freediverx From Rodale’s basic natural foods cookbook, Copyright ©️ 1984 by Rodale Press, Inc.

350 °F.

freediverx

@bhawthorne
So, using Kagi search, the first result contained the correct answer.

Susan Calvin

@freediverx @bhawthorne AI result spam is super annoying, and it was very common in my DDG results for anything at all. I think recipes are a little difficult anyway and human writers could result in a similar range, are the other answers specifically wrong or just different? (Results I see with lower temps suggest more time, similar results?)

Belakor

@freediverx @bhawthorne OK, but without the physical book as the true source you can't trust that those articles are correct. That's the entire point the OP was making.

Peter Butler

@bhawthorne This is what SEO folks call the “truth range”

In order to rank high on SERPs, you want to be right in the middle of the range, *not* the most accurate 😭

Brian Hawthorne

@peterbutler I wonder what TRUTH is an acronym for among the essios.

Marty

@bhawthorne It's really too bad that search engines don't de-rank sites like those, but people click those links a lot. What a mess,

B O

@bhawthorne what’s wrong with 2023 Wikipedia?

DELETED

@bhawthorne honestly - society can just announce 2023 as the year search had been enshittified to oblivion. people are going to start going back to libraries to find shit - swear to god you nailed it. #ifmacaquescouldtalk

BlueTurtle

@bhawthorne IMO the most reliable source will still be Wikipedia. The reason is, that the content is generated by humans. If someone adds content without a reliable source reference, it’s very likely, that it will be removed within minutes by other editors.

nicholas_saunders

@BlueTurtleAI @bhawthorne

If someone adds content without a reliable source reference, it’s very likely, that it will be removed within minutes by other editors.

--

sorry, no, AI can out edit humans any day of the week. Allow me to direct your attention to this Wikipedia article by way of illustration:

en.wikipedia.org/wiki/John_Hen

unless AI has edited it out of all resemblance...

BlueTurtle

@nicholas_saunders @bhawthorne Of course there are bots, that‘s nothing new. Editing can be limited to verified accounts. And I’m sure, there are other methods to prevent this kind of vandalism.

Ari SunDog 🏳️‍🌈🌞♾️

@bhawthorne And people wonder why I still buy old reference books and cookbooks. (I guess some don’t wonder about it, because so many people here say they do the same!)

I am definitely seeing an uptick in LLM-generated advice that could harm people in my regular searches. I like the idea of storing an old copy of Wikipedia. It could be really useful.

I am actually on the lookout for an entire set of encyclopedias rn.

Samsamros

@bhawthorne shit!! This happened to me, too. I was looking for DSL cable specifications for an old setup, and I found tons of neatly worded bullshit… everything was wrong, especially answers in Quora. Had to go to a hardware manual.

I think it’s time for making a pledge to never use text extruding machines for sharing content. I continue to generate my content the old fashioned way, and take the time to do it.

Why the fuck should I read something that wasn’t bothered to be written in the first place?!

@bhawthorne shit!! This happened to me, too. I was looking for DSL cable specifications for an old setup, and I found tons of neatly worded bullshit… everything was wrong, especially answers in Quora. Had to go to a hardware manual.

I think it’s time for making a pledge to never use text extruding machines for sharing content. I continue to generate my content the old fashioned way, and take the time to do it.

RalfMaximus

@bhawthorne

I've been very pleased with kagi.com search. Just did a test search for your chestnuts temperature and the top three sites appeared to be legit food resources. Kagi isn't free ($5/month) but sooooo totally worth it.

Matt Ferrel

@bhawthorne is there a way to know if a site was not written by a human? Especially when searching for tech support, I see oddly phrased language that could just be a translation from another language

Emily Bristor

@bhawthorne I looked up how to calibrate a Frigidaire oven. A single AI generated website article had three different methods, all incorrect. My fave was the one that had me unplug the range, adjust something in the back, then check the temp again… without plugging it back in.

Doug Webb

@bhawthorne download the 2022 Wikipedia, sure, but now a better time than ever to start editing and curating wikipedia!

The community of humans there are not perfect, but still puts an effective guard against generated crap vaguely similar to mastodon.

Brian Hawthorne

@douginamug I used to do that. Just logged in and it looks like 2009 was my last editorial contribution (I donate cash every year). I should resume editing.

DeterioratedStucco

@bhawthorne
It's tempting to think that the whole LLM thing is an attempt to monetise the Wikipedia model.

Acetopheles

@bhawthorne on the bright side, Ai might help libraries recover 😅

Eric Lawton

@bhawthorne

The Web is turning into the Library of Babel.

"The books contain every possible ordering of just 25 basic characters (22 letters, the period, the comma, and space). Though the vast majority of the books in this universe are pure gibberish, the library also must contain, somewhere, every coherent book ever written"

en.m.wikipedia.org/wiki/The_Li

@szescstopni

Jylie

@bhawthorne in a growing digital world humanity seems to hold on to the things which are physical representations, to document that they was ever really here. where media can be edited so freely can you be sure of anything? humans cling on to the material possessions to validate themselves which seems futile in practice but in application they have to have faith in something or be lost & drown in the vast sea of oblivion

David Megginson

@bhawthorne And it will get worse and worse, as new generations LLMs get trained on online text produced by previous generations of LLMs, amplifying the errors and nonsense.

MarjorieR

@bhawthorne I'm not sure this a question with a 'right' answer as (looking at the answers I see) you can use a higher temperature and cook them for less time.

So my top result (bbc.co.uk) says pre-heat to 200°C and roast for 5 minutes while most (presumably US) suggest 350°F (175-180°C) and about 10 minutes.
Or you can lower the temperature and cook for longer.

:SpinningCube:Semele

@bhawthorne Yes! The other day I ditched the idea of making cheese scones when the same kind of thing happened with me, also with DuckDuckGo, which has changed :(

Wokebloke (call me Doug)

@bhawthorne
Websites for different programming languages can now be found that are written by AI and posted for whatever reason. You have to be careful that you're getting an authentic site that will give an accurate answer to whatever programming problem you're having.

Steve Jones

@bhawthorne In the meantime, we need to filter on date.

Richard W. Woodley NO THREADS 🇨🇦🌹🚴‍♂️📷 🗺️

@bhawthorne
Remember the original Yahoo. It was actually an index not just a keyword search. And remember when there were print yellow pages directories of every website on the Internet. Keep your own index/bookmarks of trusted sources.

Greengordon

@bhawthorne

Are you using Google for your DuckDuckGo searching? If so, might as well use Google. I switched my DDG search engine to use DDG, not Google (or Bing, which was another option).

Letter N. Underscore

@bhawthorne If everybody used ad blockers then these sites wouldn't be rewarded for behaving this way.

Midnight Raven

@bhawthorne Quora (which I rarely use anyway) now pushed a ChatGPT response above any actual answers, and now actively hides answers from actual people so that's fun.

twelve_floating_hands

@bhawthorne

Just means we need to set up systems for automatic source tracing.

Article doesn't source its numbers? Browser extension should flag it.

Optimistic Skeptic

@bhawthorne
And it's not like there is a song for hazelnuts. I mean a least we know chestnuts roast on an open fire. Easy peasy.

metalfabs

@bhawthorne Not only recipes, but workout information too. Most search results are garbage. I still trust exrx.net, it looks like a website from 15 years ago (just like it did 15 years ago) and doesn't peddle supplements.

ikanreed

@bhawthorne this seems to be the ideal use case for search.marginalia.nu

It rejects websites with too much advertising or JavaScript. Leaving just passion pages in its index.

Testing your case found a real person giving a real temperature and time for roasted hazelnut in the first result.

And it was readable too!

Nixitur

@bhawthorne For questions where the answer doesn't change over time, I would recommend limiting your search results to, say, before 2018.
Granted, this is merely a bandaid solution over an ongoing problem, but it does help, and at least Google and DuckDuckGo have that functionality.

John Cormier

@bhawthorne seems like a block list of machine-generated domains would be a very useful thing to have to improve the speed and quality of online searches

Q. Edwards

@bhawthorne Welcome... to the INSHITIFICATIN!!! *poorly plays the Jurassic Park theme on recorder*

DELETED

@bhawthorne
I was there when a search engine query led to
1) a forum dedicated to the topic with real people helping (or shouting: use the forum search button first! - fair enough)
Then the forums died.
2) YouTube, with real people showing you how to solve your problem. Less interaction but often quite good.
3) YouTube, but the length of the videos were optimized for ads
...
...
4) utter garbage that resembles an answer to my query, but actually is just garbage

All hail AI :/

@bhawthorne
I was there when a search engine query led to
1) a forum dedicated to the topic with real people helping (or shouting: use the forum search button first! - fair enough)
Then the forums died.
2) YouTube, with real people showing you how to solve your problem. Less interaction but often quite good.
3) YouTube, but the length of the videos were optimized for ads
...
...
4) utter garbage that resembles an answer to my query, but actually is just garbage

Jonathan Hendry

@bhawthorne

Gonna break out my Encyclopedia Britannica DVD 2005 edition.

KristinJ

@bhawthorne So I should hold on to my 1972 Joy Of Cooking ?

gavcloud

@bhawthorne @drwho i’ve been thinking about this too, getting the Wikipedia snapshot from 2022 seems right.

ingemar

@bhawthorne Chatbot might deliver answers for a couple of month, and then then they will be useless too, since they drink from the well they poisoned.

David Marshall

@bhawthorne

Thanks to "AI" Bullshit Generating Plagiarism Engines, we now have the world's ignorance at our fingertips, masquerading as knowledge.

#AI

Michael Russell

@bhawthorne At this point I might do that search, but also just look for websites I recognize, or simply go to them directly and search there. Filtering out these crappy websites must either be really difficult, or they don't care to bother.

Pomegranate_Stew

@bhawthorne
I quit using DuckDuckGo because of this. Every time I tried to look something up it was at least two full pages of garbage sites which were often inaccurate or conflicting at best. Incredibly disappointing.

As for recipes, there is this site. Not sure what else* you’re looking for or if they’ll have it. nobullshitrecipes.com/

Recipes without the tragic/nostalgic backstory behind every one.

12foxfire

@bhawthorne When I moved, I donated a lot of books to the library. That Rodale book was one of them. Foolish me.

poldemo

@bhawthorne sounds like #analog becoming the new gold standard. 🤦‍♂️😔

iquanyin

@bhawthorne i find the same even with peoples blogs sometimes. books will likely be all the rage again

J. 🧬🦠🥚🦎

@bhawthorne Can I ask what search terms you used? I tried "hazelnut roasting temperature" just now and the top results include some stuff that could well be AI generated but also actual websites like BBC food and wikiHow. And basically all of them say to preheat the oven to either 175 or 180 °C.

Chris Hessert 🇺🇸 🇺🇦

@bhawthorne
The renaissance of that ancient database technology — books. 📚 🙂

Captain Packrat

@bhawthorne I wonder if my copy of Encarta still works...

K.R. Paradis

@bhawthorne

This could also be an opportunity to better know who has good info. I’ll probably be leaning harder on those blogs and sites that are obviously real people that you know or can grow to trust through interaction.

Sam

@bhawthorne I used to think, I know how to word a websearch in a way that will show me the exact thing I am looking for. Lately Bing started throwing random porn at me for absolutely innocent searches and on "moderate". It's not even just websites, the whole net is going to shits. Everything is fucked up, search engines don't do shit anymore. I don't know how they broke it so badly.

ehurtley

@bhawthorne Yep. I download the offline version of Wikipedia every couple years and load it onto my ancient e-reader. (First-gen “Kindle Keyboard” whose cellular access stopped working years ago.)

fmobus

@bhawthorne we reached the local maximum two quarters ago. It's downhill from here

Old Tom

@bhawthorne Auto-generated listicles has been a growing problem for some time. With AI it is metastasizing rapidly.

Are we going to have to documents out with a date stamp to avoid the creation of fake facts?

Severák

@bhawthorne recipes, datasheets, lyrics and various "how to" are worst offenders...

Shannon Moore

@bhawthorne Googles knowledge panes were supposed to be based on knowledge graphs, so some standards of 'truth',...if DDG is just crawling a giant unregulated informational flea market containing a lot of machine generated content this is not surprising. Google shouldn't have a monopoly on knowledge graphs but KGs ARE necessary to validate shitty AI data.

troi

@bhawthorne sturgeon's Law states that 90% of everything is crap. he underestimated the modern web.

Zuckey

@bhawthorne this sounds crazy but I’m starting to dabble with Kagi as a search engine. The free tier has about convinced me that it’s worth it.

Scott Feeney

@bhawthorne Isn’t this sort of a moderation/curation issue? The machine-generated websites are spam. They need to be identified, reported and blocked. We’ve all gotten used to farming out this task of blocking spammy websites to Google (or DDG or Bing); maybe we need a Wikipedia-like collective approach to it now.

wirepair

@bhawthorne interestingly i found relavant links immediately with kagi.com, it also gave me dates when the article was made, some of which are pre 2023 :>

Jon Gerdes

@bhawthorne

IT related searches are nearly as bad. Recipe sites have been a shit show for some years and the same is being seen elsewhere. ChatGPT etc (generative AI) are now being used as tools to "enhance" these dubious offerings. Those random musings are known as "hallucinations", which is a side effect of how a word guesser works.

I've almost gone back to using the web like its 1999: bookmarks and trusted sites. I still search a lot but much less, these days.

Megan Lynch (she/her)

@bhawthorne I've been thinking we need an open source search engine or maybe a directory (which seems easier to achieve). We're going to have to track for ourselves which websites we find authoritative rather than those with the best SEO.

emeritrix

@bhawthorne

So true; the first few pages of links on any general info search are clearly useless unreliable bot-generated dreck.

For those of us who are not tech savvy,* is there an easy way to download an archive of the 2022 wikipedia?

*is dystechtic a word yet?

veetee

@bhawthorne you don't trust nutroastingtemps dot com slash 12-new-hazelnut-temperatures-to-try-in-2023?

John Timaeus

@bhawthorne

I now have a print copy of the 2023 World Book encyclopedia.

There are so many threats to the knowledge landscape it isn't funny.

Erotic Mythology (hire me) 💖

@bhawthorne It's so bad that I don't results for search terms where I *know* there is a website with that info.
For recipes especially the first hits I get are not all AI generated but they're all from big, general sites where anyone can submit recipes, so the quality is hit questionable. Because of this, I started seeking out cooking blogs and Youtube accounts I know are run by actual humans.

Deborah Hartmann Preuss, pcc 🇨🇦

@bhawthorne I've given away many books. But I still have The Joy of Cooking. Guess I'm gonna be needing that after all.

DELETED

@bhawthorne

The companies that create the tools to generate these kind of sites will have tools to search for higher quality and accurate content.. For a premium.

Penguinflight

@bhawthorne Bugger I was thinking of making DDG my search engine but if it's pulling up the same shit as Google, what's the point.

Piggleston Pecanpants

@bhawthorne *gasp* You can't possibly mean people are going to have to resort to analog books?!

Kee Hinckley

@bhawthorne @grrrr_shark Kagi shows some variation in temps, but none of the hits look generated, and several (including the one they use to extract an answer) is from 2021.

Go Up