@gamingonlinux For what it's worth, it's because the datasets they use will inevitably contain copyrighted work (articles, posts, websites, you get the idea). I don't think OpenAI are specifically targeting copyrighted works, but the content of the datasets being used makes copyright infringement inevitable.
There's an entire ethical conversation to be had with this sort of mass-scale scraping that goes beyond machine learning research.