Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I remember reading that llm’s have consumed the internet text data

Not just the internet text data, but most major LLM models have been trained on millions of pirated books via Libgen:

https://techcrunch.com/2025/01/09/mark-zuckerberg-gave-metas...



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: