Not just the internet text data, but most major LLM models have been trained on millions of pirated books via Libgen:
https://techcrunch.com/2025/01/09/mark-zuckerberg-gave-metas...
Not just the internet text data, but most major LLM models have been trained on millions of pirated books via Libgen:
https://techcrunch.com/2025/01/09/mark-zuckerberg-gave-metas...