Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Auth? Because whatever Cloudflare is doing isn't going to stop anyone serious about scraping data.





Let’s say I’m talking about content that I don’t want behind an auth wall. Is your position simply that all such sites should abandon any efforts to not have the content used for LLM training?

CF will stop bots that respect your robots.txt, and try and stop ones that don't. If your concern is just that you don't want your content used to train an LLM, this will stop the honest companies.

If you are concerned about load on your site because the crawlers are hammering your site, the ones that respect robots.txt should be respecting your crawl delay too. CF will be able to block the dumb ones that ignore your robots.txt and hammer you with no real strategy.

But serious scrapers will have rotating residential IPs and be loading your site from real browsers, they'll take effort to appear as actual users. Sites like Ticket Master have an endless arms race against these. Some Chinese LLM company will get your data if it's public lol.


Sure, I understand all that, but you haven't really answered my question of what an alternative is, you've just laid out why CF is better than nothing. (Or even managing robots.txt manually.)

Because that depends on your motivations, which I don't know. If you want to prevent your content from being used to train an LLM, CF is not going to prevent that. If you want to protect your site from heavy traffic, CF will do that just like it always has. If you absolutely don't want to your content used to train an LLM, you're basically out of luck if your site is public. So no you shouldn't abandon CF because you get other benefits from it if you need them, but don't expect that you've done anything to prevent your content fro training any given LLM.

Something like https://github.com/TecharoHQ/anubis?

It's not that different from CF, but you control it fully.


If you find a solution that’s not auth please let me know.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: