The headline is somewhat misleading: sites using Cloudflare now have an opt-in option to quickly block all AI bots, but it won't be turned on by default for sites using Cloudflare.
The idea that Cloudflare could do the latter at the sole discretion of its leadership, though, is indicative of the level of power Cloudflare holds.
They arent doing anything. They are attempting to insert themselves into the middle of a marketplace (that doesnt exist and never will) where scrapers pay for IP. They think theyre going to profit off the bots, not protect your site. Dont fall for their scam.
What do you mean they are trying to insert themselves? If I have a website that I host with cloudflare, I (as the rightful website owner) has inserted Cloudflare in between.
It isnt CF going around saying, that's a nice website you have there. I'm gonna put myself in between.
They cant do anything other than bog down the internet. I havent found a single cf provided challenge I havent been able to get past in < half a day.
This is simply juat the first step in them implementing a marketplace and trying to get into LLM SEO. They dont care about your site or protecting it. They are gearing up to start making a cut in the Middle between scrapers and publishers. Why wouldnt I go DIRECTLY to the publisher and make a deal. So dumb I hate cf so much.
The only thing cloudflare knows how to do is MITM attacks.
Let’s say I’m talking about content that I don’t want behind an auth wall. Is your position simply that all such sites should abandon any efforts to not have the content used for LLM training?
CF will stop bots that respect your robots.txt, and try and stop ones that don't. If your concern is just that you don't want your content used to train an LLM, this will stop the honest companies.
If you are concerned about load on your site because the crawlers are hammering your site, the ones that respect robots.txt should be respecting your crawl delay too. CF will be able to block the dumb ones that ignore your robots.txt and hammer you with no real strategy.
But serious scrapers will have rotating residential IPs and be loading your site from real browsers, they'll take effort to appear as actual users. Sites like Ticket Master have an endless arms race against these. Some Chinese LLM company will get your data if it's public lol.
Sure, I understand all that, but you haven't really answered my question of what an alternative is, you've just laid out why CF is better than nothing. (Or even managing robots.txt manually.)
Because that depends on your motivations, which I don't know. If you want to prevent your content from being used to train an LLM, CF is not going to prevent that. If you want to protect your site from heavy traffic, CF will do that just like it always has. If you absolutely don't want to your content used to train an LLM, you're basically out of luck if your site is public. So no you shouldn't abandon CF because you get other benefits from it if you need them, but don't expect that you've done anything to prevent your content fro training any given LLM.
The idea that Cloudflare could do the latter at the sole discretion of its leadership, though, is indicative of the level of power Cloudflare holds.