Few people realise that virtually everything we do online has, until this point,...

andy99 · 2025-07-02T17:19:09 1751476749

It's cloudflare and parasites like them that will make the internet un-free. It's already happening, I'm either blocked or back to 1998 load times be cause of "checking your browser". They are destroying the internet and will make it so only people who do approved things on approved browsers (meaning let advertising companies monetize their online activity) will get real access.

Cloudflare isn't solving a problem, they are just inserting themselves as an intermediary to extract a profit, and making everything worse.

slenk · 2025-07-02T17:46:32 1751478392

How is Cloudflare a parasite? I can use Cloudflare, and get their AI protection, for free. I have dozens of domains I have used with Cloudflare at one point and I haven't paid them a dime.

brendyn · 2025-07-03T02:28:29 1751509709

A parasite leaches off it's host to the hosts harm. Maybe it's not a good analogy, but Im in china, and it's painful after paying money for a VPN to bypass censorship to find myself routinely blocked by CDNs because they decided I'm not human. I'm honestly feeling more opressed by these middlemen than the government sometimes. For example, maybe I can't log in to a game due to being blocked by the login API, and the game company just responds by telling me to run an antivirus scanner and try again since they are not personally developing that system that lack awareness. Such people with genuine need for VPNs and privacy tools are the sacrifice for this system.

chaoskitty · 2025-07-02T23:53:11 1751500391

Serious question: You put Cloudflare between all your domains and all your visitors without looking in to how this would affect your site's reachability? If so, that's interesting, considering that many people in this community are negatively affected by Cloudflare because they're using Linux and/or some less than mainstream browser.

You might want to read some threads on here about Cloudflare.

slenk · 2025-07-02T23:57:43 1751500663

Where did I say all.

Most of the time I don't use them for their network, usually just DNS records for mail because their interface is nicer than namecheap and gives me basic stats.

To my understanding, they aren't blocking MX records behind captchas

djfivyvusn · 2025-07-03T00:19:09 1751501949

So you're not using the parasite and that's your claim why it's not a parasite?

slenk · 2025-07-03T00:39:36 1751503176

Dude, stop putting words in my mouth. I never said they weren't bad.

Some nicer people here tried the educative approach and it worked much better. I learned about Bunny. And I keep forgetting I have a few in deSec but that has a limit.

I do not understand the hostility

lurking_swe · 2025-07-03T01:48:39 1751507319

> I do not understand the hostility

Unfortunately I don’t think they were participating in the conversation in good faith. People can have an extreme view on _anything_…even internet / tech. They buy into a dream of 100% open source, or “open internet”, or 100% decentralized, whatever.

When this happens they may be convinced that “others” are crazy for not sharing their utopian vision. And once this point is reached, they struggle to communicate with their peers or normal people effectively. They share their strong opinions without sharing important context (how they reached those opinions), they think the topic is black and white (because they feel so strongly about the topic), or they become hostile to others that are not sharing that vision.

You are their latest victim lol. Ignore them, and carry on.

ranger_danger · 2025-07-04T01:57:05 1751594225

One of my favorite quotes: "As a rule, strong feelings about issues do not emerge from deep understanding." -Sloman and Fernbach

Learning how to spot this, and ignore such-minded people who argue in bad faith, has made me a lot happier and more chill in general.

amy_petrik · 2025-07-03T07:26:00 1751527560

>How is Cloudflare a parasite?

>I never said they weren't bad.

>I don't understand the hostility.

It's known the community here doesn't like Cloudflare, and anyone who's been on the customer end of Cloudflare would tend to agree. In that context, if you truly are blind to seeing this, when you said, "how is Cloudflare a parasite" to a group not liking of cloudflare... ... it may land as saying something like "How is Hitler a bad guy?", which I hope is self-evident is saying he's a good guy contextually, of course you could troll it out and devil's advocate yourself that you were merely asking an innocent question.

slenk · 2025-07-03T14:38:33 1751553513

I thought Cloudflare overall was neutral - meaning as many haters as lovers. I know the CEO frequents here as well.

When I ask how is Cloudflare a "parasite" I was being genuine. I know it was a problem for some users, but I don't think I realized how prevalent it was

lxgr · 2025-07-02T21:33:10 1751491990

> I have dozens of domains I have used with Cloudflare at one point and I haven't paid them a dime.

Maybe you haven't, but your users (primarily those using "suspicious" operating systems and browsers) certainly have – with their time spent solving captchas.

sealeck · 2025-07-02T22:05:28 1751493928

But Cloudflare have removed CAPTCHAs

lxgr · 2025-07-02T22:09:39 1751494179

Not sure if you're joking, but if you're not: Congratulations on using a very "normal/safe" OS/browser/IP.

I get captchas daily, without using any VPN and on several different IPs (work, home, mobile). The only crime I can think of is that I'm using Firefox instead of Chrome.

Symbiote · 2025-07-02T22:19:31 1751494771

Since a few days ago, I've been getting Captchas hourly or more.

It's probably because I use Firefox on Linux with an ad blocker.

For my part, I've ensured we don't use Cloudflare at work.

dotnet00 · 2025-07-03T12:57:33 1751547453

Using Linux is rare among the general public, but very normal among the kind of person who may find themselves working at Cloudflare or at a potential cloudflare partner/customer.

I don't really buy the argument that they're pushing more captchas to you just because of using Firefox on Linux with an ad blocker.

kelvinjps10 · 2025-07-02T22:34:41 1751495681

I use firefox on linux with an ad blocker and cloudfare works fine

DanOpcode · 2025-07-03T03:45:39 1751514339

It must depend on something else. Firefox & Linux have always worked fine for me, I cannot remember when I last got restricted by a Cloudflare captcha.

sitzkrieg · 2025-07-03T01:14:33 1751505273

my residental ip of years (which is not shared or cgnat) was recently flagged by cloudflare for who knows why. if you are asking, you havent seen when cloudflare thinks you are something else.

cloudflare are not the good guys because they give people free cdn and ddos protection lol

const_cast · 2025-07-04T01:02:31 1751590951

Really? Because I'm on Debian, with Firefox, with a VPN active 24/7 and I almost never get Captchas. I do get those "checking your browser" pages often but they just stick around for maybe half a second then redirect.

sealeck · 2025-07-02T23:26:30 1751498790

https://blog.cloudflare.com/end-cloudflare-captcha/

lxgr · 2025-07-03T03:45:12 1751514312

It's not much consolation to me if I'm one of the 25% still being challenged.

The world really has more than enough heuristic fraud detection systems that most people aren't even aware exist, but make life miserable for those that somehow don't fit the typical user profile. In fact, the lower the false positive rate gets, the more painful these systems usually become for each false positive.

I'm so tired of it. Sure, use ML (or "AI") to classify good and evil users for initial triaging all day long, but make sure you have a non-painful fallback.

Sebguer · 2025-07-02T23:44:16 1751499856

Managed challenges are just CAPTCHA by another name.

SchemaLoad · 2025-07-03T02:11:25 1751508685

I use a VPN and firefox and I get some extra captchas but not enough to be annoying. And you don't have to do anything more than tap the checkbox.

Meanwhile a bunch of "security" products other websites use just flat out block you if you're on a VPN. Other sites like youtube or reddit are in between where they block you unless you are logged in.

Cloudflare is the least obtrusive of the options.

lxgr · 2025-07-03T03:46:15 1751514375

No, the least obtrusive option is the one you don't even notice because it actually works (or offers a non-painful secondary flow when it doesn't).

1oooqooq · 2025-07-02T23:32:50 1751499170

you forgot /s

(the people not getting the joke, yes the new system don't make you train any image recognition dataset, but they profile the hell out anything they can get their hand on just like google captcha and then if you smell like a bot you're denied access. goodbye)

bombcar · 2025-07-02T23:31:37 1751499097

Download Brave.

Turn on Tor and browse for a week.

Now you know what “undesirables” feel like, where “undesirables” can be from a poor country, a bad IP block, outdated browsers, etc.

It sucks.

SchemaLoad · 2025-07-03T02:12:25 1751508745

It's kind of an impossible problem though. They either save some tracking cookie to link your sessions between websites, or they have to re captcha check you on every website.

v5v3 · 2025-07-03T11:22:35 1751541755

Why download brave and the use Tor.

Just use the Tor browser

bombcar · 2025-07-03T12:58:43 1751547523

Some large percentage of people fail when directed to the Tor browser; I don't know why.

slenk · 2025-07-02T23:46:56 1751500016

I already said in another post I am looking at Bunny, but they also don't seem to want to take my money. I've tried 3 cards. I am willing to pay for a good service, but I will be honest, I don't know many of cloudflare's competitors

fsflover · 2025-07-02T18:26:12 1751480772

They put themselves as a middle man for almost the whole Internet, collect huge usage data about everyone and block anybody who doesn't use mainstream tools:

https://news.ycombinator.com/item?id=42953508

https://news.ycombinator.com/item?id=13718752

https://news.ycombinator.com/item?id=23897705

https://news.ycombinator.com/item?id=41864632

https://news.ycombinator.com/item?id=42577076

AnthonyMouse · 2025-07-02T21:20:39 1751491239

You can add another one as a result of this article: The data you need to train AI and the data you need to build a search engine are the same data. So now they're inhibiting every new search engine that wants to compete with Google.

1oooqooq · 2025-07-02T23:36:38 1751499398

they always had. this post is about turning the false positives "up to 11" with impunity

baq · 2025-07-02T18:36:57 1751481417

valid.

...but OTOH it's their customers who want all of that and pay to get that, because the alternative is worse.

rock and a hard place.

MisterTea · 2025-07-02T20:49:35 1751489375

I want to know if there is a way to design an alternative that isn't controlled by a single entity which allows gatekeeping.

slenk · 2025-07-02T19:06:10 1751483170

Right - do I want them getting some info from me, or do I want my IP address exposed?

Besides CloudFront, which still costs money, what other option is there for semi-privacy and caching for free?

aorth · 2025-07-02T20:28:07 1751488087

As the old addage goes: If you're not paying for it, you're the product.

Lots of nuance, but generally: pay for things you use. Servers, engineers, and research and development are not free, so someone has to pay.

qualeed · 2025-07-02T20:46:50 1751489210

Lots of services don't even let me pay if I wanted to, so I am forced to be the product. (Donating typically does not un-productify myself).

Or I pay and am still the product. Just with less in-my-face ads.

aorth · 2025-07-03T06:57:43 1751525863

> Or I pay and am still the product. Just with less in-my-face ads.

Yes, this is enshittification. You pay for Amazon something or other, and they STILL show you ads. Horrible.

slenk · 2025-07-03T00:45:30 1751503530

fwiw, I have been convinced to look at other options

matt-p · 2025-07-02T20:59:28 1751489968

Cloud front is pretty much free for your first TB. Fastly has a free plan.

Though why should it be for free?

slenk · 2025-07-02T21:11:15 1751490675

Multiple people have brought that up. I pay for everything else, why not one more.

Although bunny.net won't take ANY of my credit or debit cards

mattl · 2025-07-02T20:10:23 1751487023

bunny.net has some options

slenk · 2025-07-02T20:32:06 1751488326

I will have to check them out I guess

1oooqooq · 2025-07-02T23:34:20 1751499260

https://infosec.exchange/@k3ym0/114762301792775770

shlomo_z · 2025-07-03T00:42:07 1751503327

Did you read his comment? He explained the issue he has with Cloudflare...

hnanon12341 · 2025-07-03T05:39:49 1751521189

Yeah but they are a dictator, OpenAI et al are the parasites.

axus · 2025-07-02T22:06:43 1751494003

From the server perspective Cloudflare is solving problems and not causing problems to other servers.

Analogy: locks for high-value items in grocery stores are annoying to customers, but other stores aren't being coerced by the locksmith to use them.

carlhjerpe · 2025-07-02T17:30:14 1751477414

I use Firefox with adblocking and some fingerprinting anti-measurements and I rarely hit their challenges. Your IP reputation must be bad.

They have an addon [1] that helps you bypass Cloudflare challenges anonymously somehow, but it feels wrong to install a plugin to your browser from the ones who make your web experience worse

1: https://developers.cloudflare.com/waf/tools/privacy-pass/

chrismorgan · 2025-07-03T03:51:56 1751514716

> Your IP reputation must be bad.

And for an extremely large number of honest users, they cannot realistically avoid this.

I live in India. Mobile data and fibre are all through tainted CGNAT, and I encounter Cloudflare challenges all the time. The two fibre providers I know about use CGNAT, and I expect others do too. I did (with difficulty!) ask my ISP about getting a static IP address (having in mind maybe ditching my small VPS in favour of hosting from home), but they said ₹500/month, which is way above market rate for leasing IPv4 addresses, more than I pay for my entire VPS in fact, so it definitely doesn’t make things cheaper. And I’m sceptical that it’d have good reputation with Cloudflare even then. It’ll probably still be in a blacklisted range.

bombela · 2025-07-03T05:30:15 1751520615

Why don't your ISPs just use IPv6?

henrixd · 2025-07-03T06:31:47 1751524307

I'm having lots of problems with fingerprinting protection on Librewolf and ungoogled-chromium. I use uBlock Origin and JShelter extensions on both. I'm always getting "your browser is out of date" despite always having the most newest versions.

Some sites like Stackexchange will work after just reloading the page. And rest of the sites usually work when I remove Javascript protection and Fingerprint detection from JShelter. Sill not all of them. So, they maybe/probably want to reliably fingerprint my browser to let me continue.

If I use crappy fingerprint protection, I'm not having problems but if I actually randomize some values then sites wont work. JShelter deterministicly randomizes some values using session identifier and eTLD+1 domain as a key to avoid breaking site functionality but apparently Cloudflare is beeing really picky. Tor browser is not having these problems but it uses different strategy to protect itself from fingerprinting and doesn't randomize values but tries to have unified values across different users making identification impossible.

godelski · 2025-07-02T18:04:15 1751479455

I'm in a pretty similar boat except I frequently hit challenges. Especially if I use a VPN (which is more trustworthy than my ISP). Ironically, I'm using Cloudflare for DoH

lxgr · 2025-07-02T21:42:23 1751492543

I'd be surprised if Cloudflare were actually correlating DoH requests to HTTP requests following them, so I don't think that's a signal they are likely to use.

godelski · 2025-07-02T23:22:59 1751498579

Probably not. In fact, it's probably a good sign that they are being accurate about that traffic being encrypted.

But I did find it ironic

rockskon · 2025-07-02T18:26:16 1751480776

LLM scrapers have dramatically been increasing the cost of hosting various small websites.

Without something being done, the data that these scrapers rely on would eventually no longer exist.

benjiro · 2025-07-02T20:41:35 1751488895

I think the correct term is, that unrestricted LLM scrapers have dramatically been increasing the cost of hosting various small websites.

Its not a issue when somebody does "ethical" scraping, with for instance, a 250ms delay between requests, and a active cache that checks specific pages (like news article links) to rescrape at 12 or 24h intervals. This type of scraping results in almost no pressure on the websites.

The issue that i have seen, is that the more unscrupulous parties, just let their scrapers go wild, constantly rescraping again and again because the cost of scraping is extreme low. A small VM can easily push 1000's of scraps per second, let alone somebody with more dedicated resources.

Actually building a "ethical" scraper involves more time, as you need to fine tune it per website. Unfortunately, this behavior is going to cost the more ethical scraper a ton, as anti-scraping efforts will increase the cost on our side.

Tmpod · 2025-07-02T21:06:59 1751490419

The biggest issue for me is clearly masquerading their User-Agent strings. Regardless of whether they are slow and respectful crawlers, they should clearly identify themselves, provide a documentation URL and obey robots.txt. Without that, I have to play a frankly tiring game of cat and mouse, wasting my time and the time of my users (they have to put up with some form of captcha or PoW thing).

I've been an active lurker in the self-hosting community and I'm definitely not alone. Nearly everyone hosting public facing websites, particularly those whose form is rather juicy for LLMs, have been facing these issues. It costs more time and money to deal with this, when applying a simple User-Agent block would be much cheaper and trivial to do and maintain.

sigh

trollbridge · 2025-07-03T01:56:10 1751507770

I use Cloudflare and edge caching, so it doesn’t really affect me, but the amount of LLM scraping of various static assets for apps I host is ridiculous.

We’re talking a JavaScript file of strings to respond like “login failed”, “reset your password” just over and over again. Hundreds of fetches a day, often from what appears to be the same system.

SchemaLoad · 2025-07-03T02:14:18 1751508858

Turn on the the Cloudflare tarpit. When it detects LLM scrapers it starts generating infinite AI slop pages to feed the scrapers. Ruining their dataset and keeping them off your actual site.

brumar · 2025-07-02T19:30:50 1751484650

Correction: extract monstreous profits. When I read about the revenues associated with Reddit AI deals, I can't even imagine what could possibly be deals that cover half of the internet. Cynically speaking, it's a genious level move.

dceddia · 2025-07-02T18:29:09 1751480949

Yep this terrifies me, 100%. We’re slowly losing the open internet and the frog is being boiled slowly enough that people are very happy to defend the rising temperature.

If DDoS wasn’t a scary enough boogeyman to get people to install Cloudflare as a man-in-the-middle on all their website traffic, maybe the threat of AI scrapers will do the trick?

The thing about this slow slide is it’s always defensible. Someone can always say “but I don’t want my site to be scraped, and this service is free, or even better yet, I can set up my own toll booth and collect money! They’re wonderful!”

Trouble is, one day, at this rate, almost all internet traffic will be going through that same gate. And once they have literally everyone (and all their traffic)… well, internet access is an immense amount of power to wield and I can’t see a world in which it remains untainted by commercial and government interests forever.

And “forever” is what’s at stake, because it’ll be near impossible to recover from once 99% of the population is happy to use one of the 3 approved browsers on the 2 approved devices (latest version only). Feels like we’re already accepting that future at an increasing rate.

RiverCrochet · 2025-07-02T18:57:15 1751482635

The Internet is not the first global network. Before the Internet, you had the global telephone network. It, too, strangulated end users, but eventually became stagnant, overpriced, and irrelevant. Super long-term, the current Internet is not immune from this. Internet standards are about getting as complicated and quirky as the old Bell stuff that was trying to make miles of buried copper the future, and if regulatory/commercial forces freeze this stuff in place, it's going to lead to stagnation eventually.

Something coming down the pike I think, for example, is that IPv4 addresses are going to get realllly expensive soon. That's going to lead to all sorts of interesting things in the Internet landscape and their applications.

I'm sure we'll probably have to spend some decades in the "approved devices and browers only" world before a next wave comes.

mattl · 2025-07-02T20:16:16 1751487376

We need a reasonable alternative to some of what Cloudflare does that can be easily installed as a package on Linux distributions without any of the following to install it.

* curl | bash

* Docker

* Anything that smacks of cryptocurrency or other scams

Just a standard repo for Debian and RHEL derived distros. Fully open source so everyone can use it. (apt/dnf install no-bad-actors)

Until that exists, using Cloudflare is inevitable.

It needs to be able to at least:

* provide some basic security (something to check for sql injection, etc)

* rate limiting

* User agent blocking

* IP address and ASN blocking

Make it easy to set up with sensible defaults and a way to subscribe to blocklists.

saint_yossarian · 2025-07-02T22:40:57 1751496057

I remember using mod_security with Apache long ago for some of this, looks like it's still around and now also supports Nginx and IIS: https://modsecurity.org/

mattl · 2025-07-03T00:56:51 1751504211

Thank you. This doesn't have everything I'm looking for, but apparently it has been packaged in Debian at least. I don't know why the website doesn't mention this.

xena · 2025-07-03T00:43:34 1751503414

I make this: https://anubis.techaro.lol. I have yet to add the SQL injection or IP list layers, but I can add that to the roadmap.

v5v3 · 2025-07-03T11:27:45 1751542065

Primary reason people use cloudflare is to hide the ip address of their own server. So they are less likely to be hacked.

Most people are not worried about DDos as their is no reason for any one to DDos them.

Until other services start offering the same, Cloudflare remains default.

mattl · 2025-07-03T00:55:12 1751504112

The proof of work stuff feels so cryptocurrency adjacent that I've been looking at other tools for my own thing, but I've seen Anubis on other websites and it seems to do a good job.

xena · 2025-07-03T01:42:01 1751506921

There's a non proof of work challenge: https://anubis.techaro.lol/docs/admin/configuration/challeng...

Also: Anubis does not mine cryptocurrency. Proof of work is easy to validate on the server and economically scales poorly in the wild for abusive scrapers.

mattl · 2025-07-03T03:28:48 1751513328

Thanks for the link. I’ll have a look.

I’m glad there’s no cryptocurrency involved (was never a concern) but I worry about the optics of something so closely associated.

(I appreciate your commenting on this. I know the project recently blew up in popularity. Keep up the great work)

xena · 2025-07-03T04:52:03 1751518323

If you have suggestions for JS based challenges that don't become a case of "read the source code to figure out how to make playwright lie", I'm all ears for the ideas :)

fsflover · 2025-07-03T07:40:13 1751528413

This unsubstantiated anti-cryptocurrency bias on HN is quite disappointing. Did you hear about filecoin, which allows to buy and sell disk space independently on large companies? Why wouldn't an anonymous cryptocurrency like Monero help with this real problem? What would the downsides be?

1oooqooq · 2025-07-02T23:41:12 1751499672

it's called not having a vibecoded app that falls to pieces on public endpoints even before ngix ratelimit can kick in

mattl · 2025-07-03T00:56:06 1751504166

Nobody is talking about a vibe coded app. I want to block AI scrapers entirely.

nickjj · 2025-07-03T10:30:35 1751538635

Yep, it's really annoying.

I'm using Firefox with a normal adblocker (uBlock Origin).

I get hit with a Cloudflare captcha often and that page itself takes a few seconds before I can even click the checkbox. It's probably an extra 6-7 seconds and it happens quite a few times a day.

It's like calling into a billion dollar company and it taking 4 minutes to reach a human because you're forced through an automated system where you need to choose 9 things before you even have a chance to reach a human. Of course it rattles through a bunch of non-skippable stuff that isn't related to your issue for the first minute, like how much the company is there to offer excellent customer support and how much they value you.

It's not about the 8 seconds or 4 minutes. It's the feeling that you're getting put into really poor experiences from companies with near-unlimited resources with no control over the situation while you slowly watch everything get worse over time.

The Cloudflare situation is worse because you have no options as an end user. If a site uses it, your only option is to stop using the site and that might not be an option if they are providing you an important service you depend on.

Secondly they now have a complete profile over your browsing history for any site that has CF enabled and there's not much you can do here except stop using 20% or whatever market share of the internet they have, and also do a DNS lookup for every domain you visit from an anonymous machine to see if it's a Cloudflare IP range.

In case you didn't know, CF offers a partial CNAME / DNS feature where your primary DNS can be hosted anywhere and then you can proxy traffic from CF to your back-end on a per domain / sub-domain level. Basically you can't just check a site's DNS provider to see if they are on CF. You would have to check each domain and sub-domain to see if it resolves to a CF IP range which is documented here: https://www.cloudflare.com/ips-v4/# and https://www.cloudflare.com/ips-v6/#

MichaelZuo · 2025-07-02T17:43:59 1751478239

If your on ipv6, I think they have to for ipv6 addresses… there’s just way too many bots and way too many addresses to feasibly do anything more precise.

If your on ipv4 you should check whether your behind a NAT otherwise you may have gotten an address that was previously used by a bot network.

lxgr · 2025-07-02T21:59:43 1751493583

> I think they have to for ipv6 addresses… there’s just way too many bots and way too many addresses

Are you really arguing that it's legitimate to consider all IPv6 browsing traffic "suspicious"?

If anything, I'd say that IPv4 is probably harder, given that NATs can hide hundreds or thousands of users behind a single IPv4 address, some of which might be malicious.

> you may have gotten an address that was previously used by a bot network.

Great, another "credit score" to worry about...

MichaelZuo · 2025-07-02T23:13:39 1751498019

For a whitelist system, then by definition yes?

If it’s a blacklist system, like I said I’ve not heard of any feasible solution more precise than banning huge ranges of ipv6 addresses.

Dylan16807 · 2025-07-03T00:29:43 1751502583

> For a whitelist system, then by definition yes?

A whitelist system would consider all IPv4 traffic suspicious by default too. This is not an answer to why you'd be suspicious of IPv6 in particular.

> I’ve not heard of any feasible solution more precise than banning huge ranges of ipv6 addresses.

Handling /56s or something like that is about the same as handling individual IPv4 addresses.

trollbridge · 2025-07-03T01:57:20 1751507840

I try to build things to be INET6 ready, and just repeat /64s like a single host. Eventually this will probably have to broadened to /56s or /48s.

MichaelZuo · 2025-07-03T01:49:48 1751507388

> A whitelist system would consider all IPv4 traffic suspicious by default too.

Based on what argument…?

Dylan16807 · 2025-07-03T02:40:36 1751510436

The definition of whitelisting. The argument you brought up.

MichaelZuo · 2025-07-03T03:10:38 1751512238

No…? Someone can clearly implement a whitelist system that applies only to ipv6… but that makes no judgement on ipv4.

Dylan16807 · 2025-07-03T03:28:52 1751513332

Let's back up a step. You said by definition a whitelist system would consider every IPv6 suspicious (until it's put on the list, presumably). What is that definition?

If "applies only to IPv6" is an optional decision someone could make, then it's not part of the definition of a whitelist system for IPs, right?

MichaelZuo · 2025-07-03T04:36:46 1751517406

What are you talking about?

The prior comment was responding directly to your comment, not any comment preceding that.

Of course it’s no longer by definition if you expand the scope beyond an ipv6 whitelist as there are an infinite number of possible whitelists.

Dylan16807 · 2025-07-03T07:47:23 1751528843

> What are you talking about?

The first comment with the word "whitelist". Before I entered the conversation. This comment: https://news.ycombinator.com/item?id=44449821

lxgr was challenging the idea that you would treat all IPv6 traffic as suspicious.

You justified it by saying that "by definition" "a whitelist system" would do that.

I want your definition of "a whitelist system". Not one of the infinite possible definitions, the one you were using right then while you wrote that comment.

> if you expand the scope beyond an ipv6 whitelist

Your comment before that was talking about IP filtering in general, both v4 and v6!

And then lxgr's comment was about both v4 and v6.

So when you said "a whitelist system" I assumed you were talking about IP whitelists in general.

If you weren't, if you jumped specifically to "IPv6 whitelist", you didn't answer the question they were asking. What is the justification to treat all IPv6 as suspicious? Why are we using the definition of 'IPv6 whitelist' in the first place?

MichaelZuo · 2025-07-03T16:15:35 1751559335

None of this even makes sense.

Why does your opinion on how a comment should be interpreted, matter more than anyone else’s opinion in the first place?

Dylan16807 · 2025-07-03T19:49:34 1751572174

I didn't say that. Huh?

I'm inviting you to tell me how to interpret it. In fact I'm nearly begging you to explain your original comment more. I'm not telling anyone how to interpret it.

I have criticisms for what was said, but that comes after (attempted) interpretation and builds on top of it. I'm not telling anyone how to interpret any post I didn't make.

Edit: In particular, my previous comment has "I assumed" to explain my previous posts, an it has an "If" about what you meant. Neither one of those is telling anyone how to interpret you.

MichaelZuo · 2025-07-03T21:26:00 1751577960

This is even closer to gibberish… what are you even trying to say?

Dylan16807 · 2025-07-03T21:36:53 1751578613

You don't understand a word I'm saying, and you have missed/declined every single time I asked you to explain the first comment I responded to.

Let's just mutually give up on this conversation.

MichaelZuo · 2025-07-04T03:10:11 1751598611

Okay then.

jefftk · 2025-07-02T17:14:34 1751476474

I write online (comments here, open source software, blogging, etc) because I have ideas I want to share. Whether it's "I did a thing and here's how" or "we should change policy in this specific way" or "does anyone know how to X" I'm happy for this to go into training models just like I'm happy for it to go into humans reading.

dolebirchwood · 2025-07-02T17:57:52 1751479072

Thank you for having this attitude. I have never attempted any blogging because I always figured no one is actually going to read it. With LLMs, however, I know they will. I actually see this as a motivation to blog, as we are in a position to shape this emerging knowledge base. I don't find it discouraging that others may be profiting off our freely published work, just as I myself have benefited tremendously from open source and the freely published works of others.

arkmm · 2025-07-02T19:56:18 1751486178

This is an interesting take, thanks for sharing. I wonder how someone should adjust their blogging if they believe their primary audience will be LLMs.

trollbridge · 2025-07-03T01:59:26 1751507966

There’s a few instances of things I stated (about historical topics or very narrow topics in sociology) that were incorrect. LLMs scraped these off of web forums or other places, and now these bogus “facts” are permanently embedded into LLM models, because nobody else ever really talked about the specific topic.

Most amusingly, someone cited LLM generated output about this telling me how this “fact” is true when I was telling them it’s not true.

lawlessone · 2025-07-02T20:35:54 1751488554

SEO -> LLMEO

godelski · 2025-07-02T18:12:42 1751479962

Tbh, that content I'm mostly fine with. My only real issue is that people are making trillions off the free labor of people like you and me, giving less time to create that OSS and blogs. But this isn't new to AI, it is just scaled.

What I do care about is the theft of my identity. A person may learn from the words I write but that person doesn't end up mimicking the way I write. They are still uniquely themselves.

I'm concerned that the more I write the more my text becomes my identifier. I use a handle so I can talk more openly about some issues.

We write OSS and blog because information should be free. But that information is then being locked behinds paywalls and becoming more difficult to be found through search. Frankly, that's not okay

lxgr · 2025-07-02T22:07:05 1751494025

> What I do care about is the theft of my identity. A person may learn from the words I write but that person doesn't end up mimicking the way I write. They are still uniquely themselves.

Of course they do, to some extent. Just because it's been infeasible to track the exact "graph of influence", that's literally how humans have learned to speak and write for as long as we've had language and writing.

> I'm concerned that the more I write the more my text becomes my identifier. I use a handle so I can talk more openly about some issues.

That's a much more serious concern, in my view. But I believe that LLMs are both the problem and solution here: "Remove style entropy" is just a prompt away, these days.

BeetleB · 2025-07-02T22:27:57 1751495277

> A person may learn from the words I write but that person doesn't end up mimicking the way I write.

Oh, I wish I could get AI to mimic the way I write! I'd pay money for it. I often want to type up an email/doc/whatever but don't because of occasional RSI issues. If I could get an AI to type it up for me while still sounding like me - that would be a big boon for my health.

jefftk · 2025-07-03T00:11:52 1751501512

> but don't because of occasional RSI issues

I also have this issues that often keep me from typing, but FYI dictation has gotten very good.

(Dictated this)

BeetleB · 2025-07-03T02:49:42 1751510982

Oh yeah, I use dictation and then clean it up with GPT. It's awesome. But I speak very differently from how I write. So I'd like to dictate it, and then have it rewrite it in my writing style.

bob1029 · 2025-07-02T19:52:20 1751485940

> OSS

> people are making trillions off the free labor of people like you and me

I read "No Discrimination Against Fields of Endeavor" to also include LLMs and especially the cases that we most deeply disagree with.

Either we believe in the principles of OSS or we do not. If you do not like the idea of your intellectual property being used for commercial purposes then this model is definitely not for you.

There is no shame in keeping your source code and other IP a secret. If you have strong expectations of being compensated for your work, then perhaps a different licensing and distribution model is what you are after.

> that information is then being locked behinds paywalls and becoming more difficult to be found through search

Sure - If you give up and delete everything. No one is forcing you to put your blog and GH repos behind a paywall.

blibble · 2025-07-02T20:30:02 1751488202

> Either we believe in the principles of OSS or we do not. If you do not like the idea of your intellectual property being used for commercial purposes then this model is definitely not for you.

I've been writing open source for more than 20 years

I gave away my work for free with one condition: leave my name on it (MIT license)

the AI parasites then strip the attribution out

they are the ones violating the principles of open source

> then perhaps a different licensing and distribution model is what you are after.

I've now stopped producing open source entirely

and I suggest every developer does the same until the legal position is clarified (in our favour)

jefftk · 2025-07-03T00:14:41 1751501681

> I suggest every developer does the same until the legal position is clarified (in our favour)

There are a lot of people developing open source software with a wide range of goals. In my case, I'm totally happy for LLMs to learn from my coding, just like they've learned from millions of other peoples. I wouldn't want them to duplicate it verbatim, but (due to copyright filters + that not usually being the best way to solve a problem) they don't.

godelski · 2025-07-02T21:02:16 1751490136

  > Either we believe in the principles of OSS or we do not.

What about respecting licenses?

Seriously, don't lick the boot. We can recognize that there's complexity here. Trivializing everything only helps the abusers.

Giving credit where credit is due is not too much to ask. Other people making money off my work can be good[0]. Taking credit for it is insulting

[0] If you're not making much, who cares. But if you're a trillion dollar business you can afford to give a little back. Here's the truth, OSS only works if we get enough money and time to do the work. That's either by having a good work life balance and good pay or enough donations coming in. We've been mostly supported by the former, but that deal seems to be going away

dotnet00 · 2025-07-03T13:25:45 1751549145

I think this may be too much of a "literal" interpretation of OSS without really considering the social contract many OSS supporters believe in, wherein users of OSS will act in good faith and might eventually reciprocate for the benefits they're getting, e.g. the way companies have slowly accepted paying their own employees to contribute to projects openly, releasing their own open source code, respecting the spirit of OSS licenses, sponsoring the developers of the thing they use, etc.

I think it's entirely fair that even staunch supporters of OSS get turned off when AI companies scrape their work to ingest into a black box regurgitator and then turn around and tell the world how their AI will make trillions of dollars by taking away the jobs of those obsolete OSS developers, showing no intention of ever giving back to the community.

mattl · 2025-07-02T20:18:33 1751487513

Open source software typically has a license. People not following the license isn’t tolerated.

This is what AI scrapers are doing. They’re taking your code, your artwork and your writing without any consideration for the license.

jefftk · 2025-07-03T00:17:50 1751501870

Weather training on code is fair use is still an open legal question, and it may well be fair use. The way a license works is by saying "you have my permission to use this code as long as you follow these conditions", but if no license is required than the conditions are irrelevant.

There is an active case on this, where Microsoft has been sued over GitHub copilot, and it has been slowly moving through the court system since 2022. Most of the claims have been dismissed, and the prediction market is at 11%: https://manifold.markets/JeffKaufman/will-the-github-copilot...

godelski · 2025-07-03T06:29:54 1751524194

> The way a license works is

Let's actually look at the MIT license, a very permissive license

  > Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to ***use***, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

  > The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

So, you can use it but need to cite the usage. It's not that hard. Fair use if you just acknowledge usage.

Is it really that difficult to acknowledge that you didn't do everything on your own? People aren't asking for money. It's just basic acknowledgement.

Forget the courts for a second, just ask yourself what is the right thing to do. Ethically.

jefftk · 2025-07-03T10:39:58 1751539198

> Forget the courts for a second, just ask yourself what is the right thing to do

Forgetting the courts, whether reading the source code and learning from it is intended to count as "use" is not clear to me, and I would have guessed no. Using a tool and examining a tool are pretty different.

godelski · 2025-07-04T01:09:04 1751591344

Context matters, right?

Human reading code? Ambiguous. But I think you're using it. Running code? Not ambiguous.

Machine processing code? I don't think that's ambiguous. It's using the code. A person is using the code to make their machine better.

This really isn't that hard.

Let's think about it this way. How do you use a book?

I think you need to be careful that you're not justifying the answer you want and instead are looking for what the right answer is. I'm saying this because you quoted me saying "what is right" and you just didn't address it. To quote Feynman (<- look, I cited my work. I fulfilled the MIT license obligations!)

  > The first principle is that you must not fool yourself, and you are the easiest person to fool.

mattl · 2025-07-03T00:23:15 1751502195

I can't see how it can be fair use. Just follow the license, it's not that difficult. Microsoft will forever be a pariah if they get away with this.

I'm putting my new code somewhere private anyway.

jefftk · 2025-07-03T00:34:55 1751502895

> I can't see how it can be fair use.

The key question is whether it is sufficiently "transformative". See Authors Guild vs Google, Kelly vs Arriba Soft, and Sony vs Universal. This is a way a judge could definitely rule, and at this point I think is the most likely outcome.

> Microsoft will forever be a pariah if they get away with this.

I doubt this. Talking to developers, it seems like the majority are pretty excited about coding assistants. Including the ones that many companies other than Microsoft (especially Anthropic) are putting out.

mattl · 2025-07-03T00:38:57 1751503137

Yeah, this is just sad to hear.

bawolff · 2025-07-02T22:05:41 1751493941

I think its 100% ok to freely train on public internet data.

What is absolutely not ok is to crawl at such an excessive speed that it makes it difficult to host small scale websites.

Truly a tragedy of the commons.

tedd4u · 2025-07-02T22:15:41 1751494541

Agree. The problem lately is that even if each single scraper is doing so “reasonably,” there are so many individuals and groups doing this that it’s still too onerous for many sites. And of course many are not “reasonable.”

SchemaLoad · 2025-07-03T02:15:59 1751508959

This is the attitude that's going to kill the public internet. Because you're right, it is a free for all right now with the only way to opt out being putting content behind restricted platforms.

visarga · 2025-07-03T01:37:16 1751506636

> everything we do online has, until this point, been free training to make OpenAI, Anthropic, etc. richer while cutting humans--the ones who produced the value--out of the loop

I think on the contrary, who sets the prompts stands to get benefits, the AI provider gets a flat fee, and authors get nothing except the same AI tools as anyone else. That is natural since the users are bringing the problem to the AI, of course they have the lion share here.

AI is useless until applied to a specific task owned by a person or company. Within such a task there is opportunity for AI to generate value. AI does not generate its own opportunities, users do.

Because users are distributed across society benefits follow the same curve. They don't flow to the center but mainly remain at the edge. In this sense LLMs are like Linux, they serve every user in their specific way, but the contributors to the open source code don't get directly compensated.

qskousen · 2025-07-03T02:03:25 1751508205

That's a really interesting way to think about it, thank you! I've always had a kind of "gut feeling" that AI training on our data is fine with me, but without really thinking too much about why. I think this explains what I've been feeling.

jowea · 2025-07-02T22:26:30 1751495190

Is it even possible that Cloudfare could manage to block all AI data scrapping? I think this measure is just going to make it harder and more expensive, which will stop AI scrappers from hitting every single page every single day and creating expenses for publishers, but not actually stop their data from ending up in a few datasets.

godelski · 2025-07-02T18:00:30 1751479230

Including your comment, including this comment.

HN itself is routinely scraped. What makes me most uncomfortable is deanonymization via speech analysis. It's something we can already do but is hard to do at scale. This is the ultimate tool for authoritarians. There's no hidden identities because your speech is your identifier. It is without borders. It doesn't matter if your government is good, a bad acting government (or even large corporate entity) has the power to blackmail individuals in other countries.

We really are quickly headed towards a dystopia. It could result in the entire destruction of the internet or an unprecedented level of self censorship. We already have algospeak because platform censorship[0]. But this would be a different type of censorship. Much more invasive, much more personal. There are things worse than the dark forest

[0] literally yesterday YouTube gave me, a person in the 25-60 age bracket, a content warning because there was a video about a person that got removed from a plane because they wore a shirt saying "End veteran suicide".

[0.1] Even as I type this I'm censored! Apple will allow me to swipe the word suicidal but not suicide! Jesus fuck guys! You don't reduce the mental health crisis by preventing people from even being able to discuss their problems, you only make it worse!

trollbridge · 2025-07-03T02:01:47 1751508107

The degree to which people say “self-delete” and “unalive” is absurd these days and I now hear it in real life.

It’s Orwellian in the truest sense of the word.

baq · 2025-07-03T06:00:57 1751522457

Orwell was the optimist. It’s Huxley’s vision we should be really worried about. Brave new world indeed.

cmeacham98 · 2025-07-02T17:27:28 1751477248

Cutting humans out of what loop? What jobs or opportunities were people posting Reddit comments or whatever getting that are now going to AI?

Larrikin · 2025-07-02T17:54:38 1751478878

People who used to post gained knowledge from their profession or hobby. I don't bother posting any of that information on large sites like Reddit anymore, for various reasons but AI scraping solidified.

I'll still post on the increasingly fewer hobby message boards that are out there.

kamarg · 2025-07-02T18:20:46 1751480446

> What jobs or opportunities were people posting Reddit comments or whatever getting that are now going to AI?

Content writing, product reviews (real & fake), creative writing, customer support, photography/art to name a few off the top of my head.

fkyoureadthedoc · 2025-07-02T18:46:21 1751481981

Now the astroturfing is done by AI agents instead of hard working serfs in a call center, you hate to see it

Kostic · 2025-07-02T18:06:57 1751479617

This would be true if not for open-weights (and even some open source) LLMs that exist today. Not everything should be done for profit.

az226 · 2025-07-03T01:52:03 1751507523

That’s the irony. Doing it now is just hampering competition and making it better for the incumbents.

rramon · 2025-07-02T17:24:13 1751477053

Isn't there a possibility that model makers retaliate by erasing them and their frameworks from memory, hurting CF adoption by devs?

mathiaspoint · 2025-07-02T22:28:25 1751495305

This has been going on even since early social media. I think most of the users actually prefer it.

giancarlostoro · 2025-07-02T18:25:03 1751480703

There's a reason reddit started charging for API usage.

fkyoureadthedoc · 2025-07-02T18:43:41 1751481821

It surely wasn't to force users into their shitty app where they can't block ads and definitely had nothing to do with their IPO. It was the AI.

giancarlostoro · 2025-07-02T18:49:49 1751482189

Ah yes, its only because of ONE singular reason they started charging for API usage. Are you okay? I'm listing one reason out of many as to why reddit started charging for API usage. After all, reddit is a for profit website.

fkyoureadthedoc · 2025-07-02T19:28:03 1751484483

> There's *A* reason

this u chief?

giancarlostoro · 2025-07-04T00:15:20 1751588120

Does "a" not mean, one out of many in this context? I know English is not my first language, but I've always taken "a reason" to mean one of many.

nektro · 2025-07-03T00:17:29 1751501849

it brings me so much joy that this is the top comment on this post

risyachka · 2025-07-02T21:19:42 1751491182

Maybe so, but I'll take Cloudflare over OpenAI and Meta every time.

dwoldrich · 2025-07-02T19:55:11 1751486111

I think the parasitism goes quite a bit further than AI. We're being digested not parasitized.

k__ · 2025-07-02T17:01:32 1751475692

Is anyone suing to make the models and their weights open source?

lofaszvanitt · 2025-07-02T21:48:55 1751492935

Cyberpunk aged well. "You better not be on the unprotected internet". Too many hazards out there. Rogue AIs and other shit...

Cloudflare is here to protecc you from all those evils. Just come under our umbrella.

Dig1t · 2025-07-02T20:29:54 1751488194

That was always the cost of free and open exchange of ideas though. The idea of the internet in the first place was to allow people to communicate in the open and publish ideas freely. There was never any stipulation that using the published ideas to make money was off limits.

Technology has advanced and now reading the sum total of the freely exchanged ideas has become particularly valuable. But who cares? The internet still exists and is still usable to freely exchange ideas the way it’s always been.

The value that one website provides is a minuscule amount, the value of one individual poster on Reddit is minuscule. Are we asking that each poster on Reddit be paid 1 penny (that’s probably what your posts are worth) for their individual contribution? My websites were used to train these models probably, but the value that each contributed is so small that I wouldn’t even expect a few cents for it.

The person who’s going to profit here is Cloudflare or the owners of Reddit, or any other gatekeeper site that is already profiting from other people’s contributions.

The “parasitism” here just feels like normal competition between giant companies who have special access to information.

tcdent · 2025-07-02T20:56:18 1751489778

[flagged]

friedtofu · 2025-07-02T21:35:02 1751492102

What the hell...

Even if you're directing this at the user's blog posts specifically; this is a ridiculously pessimistic, sad way to view things.

I hope you're just having a bad day because if you sincerely have this greedy, cynical mindset day to day(towards blogging, software, offline/real life activities, whatever) I feel sorry for you.

lxgr · 2025-07-02T21:49:43 1751492983

GP's comment might be provocatively phrased, but I don't think it's an invalid point to have:

When I publish something online for free, i.e. without requiring authentication or payment, be it a Reddit comment, a blog post, a Stackoverflow answer or anything else, I do so hoping that it will be useful to somebody somehow, without any illusions about being able to gatekeep some types of current or future consumers.