@aeva We've had this issue from the moment big companies use the pile and other such sources to build these. No amount of engineering can fix this issue, because the ai becomes useless as an information/idea/concept search if it truly has no base.
What these big companies really should be doing is spending a lot of money making fresh training data, from scratch. But we are talking like billions of writing/drawing assignments that can't be used elsewhere. I don't think any can afford to.
@aeva Also, it should be noted even humans can't make 100 percent fresh data. We too memorize things and repeat them. That is kinda the core principle of how language has meaning. So I don't think it is possible to obtain the goal proposed in this article without making the robot speak a unique language that has nothing to do with reality.
Just from a philosophy perspective, I couldn't begin to describe an answer without repeating words or phrases that have meaning elsewhere.