Too expensive maybe, or just not effective anymore as they used up any available training data. New data is generated slowly, and is massively poisoned with AI generated data, so it might be useless.
That's a lie people repeat because they want it to be true.
People evaluate dataset quality over time. There's no evidence that datasets from 2022 onwards perform any worse than ones from before 2022. There is some weak evidence of an opposite effect, causes unknown.
It's easy to make "model collapse" happen in lab conditions - but in real world circumstances, it fails to materialize.