Hacker Newsnew | past | comments | ask | show | jobs | submit | fancyfredbot's commentslogin

If you can bore an LLM that's exciting.

Bore-a-Bot, the new service from Confuse-a-Cat.

That sounds like Elon merging xAI and The Boring Company

I would rather setup a "shadow" site designed only for LLMs. I would stuff it with ao much insanity that Grok would not be able to leave. How about a billion blog post where every use of "American" is replaced with "Canadian". By the time im done, grok will be spouting conspiracy theories about the decline of the strategic bacon reserve.

> By the time im done, grok will be spouting conspiracy theories about the decline of the strategic bacon reserve.

grok will blame the zionists rather than the freemasons for that one.


Who are these agressive scrapers run by?

It is difficult to figure out the incentives here. Why would anyone want to pull data from LWN (or any other site) at a rate which would cause a DDOS like attack?

If I run a big data hungry AI lab consuming training data at 100Gb/s it's much much easier to scrape 10,000 sites at 10Mb/s than DDOS a smaller number of sites with more traffic. Of course the big labs want this data but why would they risk the reputational damage of overloading popular sites in order to pull it in an hour instead of a day or two?


> If I run a big data hungry AI lab consuming training data at 100Gb/s it's much much easier to...

You are incorrectly assuming competency, thoughtful engineering and/or some modicum of care for negative externalities. The scraper may have been whipped up by AI, and shipped an hour later after a quick 15-minute test against en.wikipedia.org.

Whoever the perpetrator is, they are hiding behind "residential IP providers" so there's no reputational risks. Further, AI companies already have a reputation for engaging in distasteful practices, but popular wisdom claims that they make up for the awfulness with utility, so even if it turns out to be a big org like OpenAI or Anthropic, people will shrug their shoulders and move on.


Yes I agree it's more likely incompetence than malice. That's another reason I don't think it's a lab. Even if you don't like the big labs you can probably admit they are reasonably smart/competent.

Residential IP providers definitely don't remove reputational risk. There are many ways people can find out what you are doing. The main one being that your employees might decide to tell on you.

The IP providers are a great way of getting around cloud flare etc. They are also reasonably expensive! I find it very plausible that these IP providers are involved but I still don't understand who is paying them.


This is just an anecdote, but having been dealing with similar problems on one of my websites for the past year or so, I was experiencing a huge number of hits from different residential IP addresses (mostly Latin American) at the same time once every 5-10 minutes (which started crashing my site regularly). Digging through my server's logs and watching them in real-time, I noticed one or two Huawei IP's making requests at the same time as the dozens or hundreds of residential IP's. Blocking the Huawei IP's seemed to mysteriously cut back the residential IP requests, at least for a short amount of time (i.e. a couple of hours).

This isn't to say every attack that looks similar is being done by Huawei (which I can't say for certain, anyway). But to me, it does look an awful lot like even large organizations you'd think would be competent can stoop to these levels. I don't have an answer for you as to why.


I've been asking this for a while, especially as a lot of the early blame went on the big, visible US companies like OpenAI and Anthropic. While their incentives are different from search engines (as someone said early on in this onslaught, "a search engine needs your site to stay up; an AI company doesn't"), that's quite a subtle incentive difference. Just avoiding the blocks that inevitably spring up when you misbehave is a incentive the other way -- and probably the biggest reason robots.txt obedience, delays between accesses, back-off algorithms etc are widespread. We have a culture that conveys all of these approaches, and reciprocality has its part, but I suspect that's part of the encouragement to adopt them. It could that they're just too much of a hurry not to follow the rules, or it could be others hiding behind those bot-names (or others). Unsure.

Anyway, I think the (currently small[1]) but growing problem is going to be individuals using AI agents to access web-pages. I think this falls under the category of the traffic that people are concerned about, even though it's under an individual users' control, and those users are ultimately accessing that information (though perhaps without seeing the ads that pay of it). AI agents are frequently zooming off and collecting hundreds of citations for an individual user, in the time that a user-agent under manual control of a human would click on a few links. Even if those links aren't all accessed, that's going to change the pattern of organic browsing for websites.

Another challenge is that with tools like Claude Cowork, users are increasingly going to be able to create their own, one-off, crawlers. I've had a couple of occasions when I've ended up crafting a crawler to answer a question, and I've had to intervene and explicitly tell Claude to "be polite", before it would build in time-delays and the like (I got temporarily blocked by NASA because I hadn't noticed Claude was hammering a 404 page).

The Web was always designed to be readable by humans and machines, so I don't see a fundamental problem now that end-users have more capability to work with machines to learn what they need. But even if we track down and sucessfully discourage bad actors, we need to work out how to adapt to the changing patterns of how good actors, empowered by better access to computation, can browse the web.

[1] - https://radar.cloudflare.com/ai-insights#ai-bot-crawler-traf...


(and if anyone from Anthropic or OpenAI is reading this: teach your models to be polite when they write crawlers! It's actually an interesting alignment issue that they don't consider the externalities of their actions right now!)

Hell, they should at least be caching those requests rather than hitting the endpoint on every single AI request that needs the info.

I don't think that most of them are from big-name companies. I run a personal web site that has been periodically overwhelmed by scrapers, prompting me to update my robots.txt with more disallows.

The only big AI company I recognized by name was OpenAI's GPTBot. Most of them are from small companies that I'm only hearing of for the first time when I look at their user agents in the Apache logs. Probably the shadiest organizations aren't even identifying their requests with a unique user agent.

As for why a lot of dumb bots are interested in my web pages now, when they're already available through Common Crawl, I don't know.


Maybe someone is putting out public “scraper lists” that small companies or even individuals can use to find potentially useful targets, perhaps with some common scraper tool they are using? That could explain it? I am also mystified by this.

I bet some guy just told Claude Code to archive all of LWN for him on a whim.

Some guy doesn't show up with 10k residential IPs. This is deliberate and organized.

There are multiple israeli companies who will provide you with millions of residential proxies at a per gb usage rate and a very easy API. You can set this up in minutes with claude code.

These IP providers aren't cheap (cost per GB seems to be $4 but there are bulk discounts). The cost to grab all of LWN isn't prohibitively high for an individual but it's enough that most people probably wouldn't do it on a whim.

I suppose it only needs one person though. So it's probably a pretty plausible explanation.


LLMs just do be paperclipping

Can Claude Code even do that? Rather than provide code to do that.

Recently I needed to block some scrapers to execessive load on a server, and here are some that I identified:

BOTS=( "semrushbot" "petalbot" "aliyunsecbot" "amazonbot" "claudebot" "thinkbot" "perplexitybot" "openai.com/bot" )

This was really just emergency blocking and it included more than 1500 IP addresses.

Here's Amazon's page about their bot with more information including IP addresses

https://developer.amazon.com/amazonbot


When faced with evidence of operating procedure for the malicious, we forever take them at their word when they insist they're just incompetent.

The spirit of this site is so dead. Where are the hackers? Scraping is the best anyone is coming up with?

It's not scraping. They'd notice themselves getting banned everywhere for abuse of this magnitude, which is counterproductive to scraping goals. Rather than rate-limit the queries to avoid that attention, they're going out of their way to (pay to?) route traffic through a residential botnet so they can sustain it. This is not by accident, nor a byproduct of sloppy code Claude shat out. Someone wants to operate with this degree of aggressiveness, and they do not want to be detected or stopped.

This setup is as close to real-time surveillance as can be. Someone really wants to know what is being published on target sites with as minimal a refresh rate as possible and zero interference. It's not a western governmental entity or they'd just tap it.

As for who...there's only one group on the planet so obsessed with monitoring and policing everything everyone else is doing.


LWN includes archives of a bunch of mailing lists so that might be a factor. There are a LOT of web on that domain.

I'd guess some sort of middle management local maxima. Someone set some metric of X pages per day scraped, or Y bits per month - whatever. CEO gets what he wants.

Then that got passed down to the engineers and those engineers got ridden until they turned the dial to 11. Some VP then gets to go to the quarterly review with a "we beat our data ingestion metrics by 15%!".

So any engineer that pushes back basically gets told too bad, do it anyways.


Why is it in these invented HN scenarios that the engineers just happen to have absolutely no agency?

Because I've personally seen it. Engineer says this is silly, it will blow up in the long run - told to implement it anyways. Not much to lose for the engineer to simply do it. Substitute engineer for any line level employee in any industry and it works just as well.

I've also run into these local maxima stupidities dozens or more time in my career where it was obvious someone was gaming a performance metric at the expense of the bigger picture - which required escalation to someone who could see said bigger picture to get fixed. Happens all the time as a customer where some sales rep or sales manager wants to game short-term numbers at the expense of long-term relationships. Smaller companies you can usually get it fixed pretty quickly, larger companies tend to do more doubling down.

It usually starts with generally well-intentioned goal setting but devolves into someone optimizing a number on a spreadsheet without care (or perhaps knowledge) of the damage it can cause.

Hell, for the most extreme example look at Dieselgate. Those things don't start from some evil henchman at the top saying "lets cheat and game the metrics" - it often starts with someone setting impossible to achieve goals unknowingly in service of "setting the bar high for the organization", and by the time the backpressure filters up through the org it's oftentimes too late to fix the damage.


Because: who would refuse more money?

I don't think this evil boss and downtrodden engineer situation can explain what we're seeing.

Your theoretical engineers would figure out pretty quickly that crashing a server slows you down and the only way to keep the boss happy is to avoid the DDOS.


Perhaps incompetence instead of malice - a misconfigured or buggy scraper, etc.

NSA, trying to force everybody onto their Cloudflare reservation.

As someone that runs the infrastructure for a large OSS project. Mostly Chinese AI firms. All the big name brand AI firms play reasonably nice and respect robots.txt.

The Chinese ones are hyper aggressive, with no rate limit and pure greed scraping. They'll scrape the same content hundreds of times the same day


The Chinese are also sloppy. They will run those scrapers until they get banned and not give a fuck.

In my experience, they do not bother putting in the effort to obfuscate source or evade bans in the first place. They might try again later, but this particular setup was specifically engineered for resiliency.


Is this an example of that "chabuduo" we read about now and then?

Chinese AI is doing large amounts of request in the past weeks.

how is this showing up for you? site you host or bigger scale? I'm not surprised but rather curious.

> If I run a big data hungry AI lab consuming training data at 100Gb/s it's much much easier to scrape 10,000 sites at 10Mb/s than DDOS a smaller number of sites with more traffic

A little over a decade ago (f*ck I'm old now [0]), I had a similar conversation with an ML Researcher@Nvidia. Their response was "even if we are overtraining, it's a good problem to have because we can reduce our false negative rate".

Everyone continues to have an incentive to optimize for TP and FP at the expense of FN.

[0] - https://m.youtube.com/watch?v=BGrfhsxxmdE


china (alibaba and tencent)

I'm not at all sure alibaba or tencent would actually want to DDOS LWN or any other popular website.

They may face less reputational damage than say Google or OpenAI would but I expect LWN has Chinese readers who would look dimly on this sort of thing. Some of those readers probably work for Alibaba and Tencent.

I'm not necessarily saying they wouldn't do it if there was some incentive to do so but I don't see the upside for them.


That's what they want you to think.

I hate to break this to you but the article wrong. There are pictures of him.

This was a large enough issue, there's a Sept 20th, 2025 Daily Mail, This is Money article titled "REVEALED: Boss behind Games Workshop's £5bn Warhammer boom" [1]

Quoting:

  "this is the first picture of Kevin Rountree, the publicity-shy boss of the company behind Warhammer miniatures"

  "surfaced last week at the firm's Nottingham HQ where he was up for re-election at its annual meeting"

  "He isn't interested in publicity and is not on social media"
[1] https://archive.is/r6t8C#selection-953.76-953.176

Anyways, the original point was, that it's crazy sounding that one of the largest companies would have a CEO that nobody's seen (apparently agreed upon by the Dispatch, and the Daily Mail, and the Financial Times [2]). The image on the Warhammer wiki is possibly the same (looks kind of different than the Daily Mail pic)

Quoting Financial Times:

  "its chief executive, Kevin Rountree — formerly of the firm now known as PwC, and farmer-and-British-posho apparel provender Barbour — has virtually no profile (the Sunday Times, which recently listed Rountree as one of its businesspeople of the year, had to use a photo of an Ultramarines Space Marine instead). “He’s a real person — he’s not some kind of character out of Warhammer 40k,” Ashworth-Lord told us."
To the point of openly requesting a picture:

  "Obviously, if someone does have a photo of big Kev or wants to let us know if he’s a slandered South Shields sand dancer or similar, we’re contactable in the usual ways."
Credit to @defrost https://news.ycombinator.com/item?id=46563037 for the FT link.

[2] https://archive.is/2h16k#selection-37545.5-37587.1


Yup. The very first Google hit, https://gamesworkshop.fandom.com/wiki/Kevin_D._Rountree, has a photo of him. It's in the fscking wiki for Games Workshop, the company he runs. I have no idea how TFA couldn't find this.

The Financial Times also asserts in 2024 to have no public domain image of him either: https://www.ft.com/content/369279f5-6f44-4248-a0f1-5347864ea...

There's a possibility the Games Workshop wiki image is a generic stock business person place holder.

The Australian science fiction writer Greg Egan strongly asserts no image of him appears on the internet ... and yet images are returned if you search his name and profession ... they are all different.

See: https://www.gregegan.net/images/GregEgan.htm



Two things stand out to me:

1) Battery life claims are specific and very impressive, possibly best in class 2) Performance claims are vague and uninspiring.

Either this is an awful press release or this generation isn't taking back the performance crown.


Windows 11 usage is declining. The Xbox is selling vastly less than Sony/Nintendo. PC gamers are moving to SteamOS and Linux. The billions poured into OpenAI no longer look so smart given very competitive offerings elsewhere.

Despite all this they still have a hugely profitable business, a pretty decent OS under all the adware, and a defacto monopoly on business productivity software.


first paragraph is why we should be selling or second?

Corporations get their software into businesses through the exact same process software gets replaced in those companies… usually through IT and/or users using things personally who become their champions.

So which paragraph do you think was more relevant to their recommendation…the one where they already have most of the customers they will ever have, or the one where people are increasingly moving away from them in their daily lives?


in just last 5 months they got two new corporate customers with 1400 and 550 employees. and this is just me, one nobody that knows about. if you think they are not getting new corporate customers not daily but hourly you mite be tad misinformed.

as an exercise see how many job openings there are where you won’t be using MSFT products if you get the gig :)


Likely using a rather generous definition of “new”. There is a difference between a new customer, and buying a license. Im also fairly doubtful that every server, docker, vm, and appliance is also running Windows. And even if said 2000 users are using Windows for absolutely every system, it’s still a meaningless anecdote about a drop in the bucket. I don’t think anyone suggested that Microsoft doesn’t have customers? But I suspect they were far from “new” customers, even if a new company, because I guarantee something somewhere was replaced for every one of them; bankrupt businesses they replaced, old hardware, whatever. Arguing the opposite would certainly seem to be naive on face.

wasn't expecting to read that Microsoft is not getting new corporate customers but here we are, you learn something new every day :)

none of this is anecdotal, I make a living as contractor and in just past two years have worked on numerous moving-to-microsoft projects, Oracle to SQL Server, AWS to Azure, Sharepoint etc etc... I am not a fan of MSFT by any means but what you are writing makes absolutely no sense. You should read MSFT quarterly earnings reports and not read few anecdotal things people on HN write about MSFT. It is M7 for a reason and practically has no competition (which is why they are able to do shit like Windows 11 and Copilot and... people on HN might be bitching but it is just for entertainment purposes)


Anecdotes like “I’ve done blah blah over two years”? Correct, I ignore anecdotes just like that. You can argue whatever you like — you seem to be heavily financially motivated to do so while I neither own Microsoft stock nor earn my money by convincing people to use their products. As a result, feel free to continue your evangelism while I go ahead and extricate myself from your sphere of biases.

It's not the official reason, but also worth noting that many waterproof devices have headphone jacks.


Very sceptical that a 3kW speaker can cause "earthquake like vibrations with a radius of 2km".


I can't help but think it would be fun to try to verify the claim, though.


Fun? Sure. It is indeed fun to play with big speakers.

Direct-radiating bass reproduction is all about displacement, and the area of the piston (cone) is certainly a factor of that. More tends to be... well, more.

And this mysterious speaker (which there seems to be no color photos of, despite the 1981 date) has a radiating area of perhaps about 2 square meters.

That's around the same as qty. 18 of 18" woofers.

It's easy to find collections of way, way more than that. People even charge money to hear them; they're on the ground between the stage and the crowd barrier at any big rock show. :)


The japanese version of the article linked in another comment has a color photo (which appears to be a magazine scan) https://audio-heritage.jp/DIATONE/diatonesp/d-160(1).jpg


Marty McFly is volunteering to test it.


Resonance! Very minor earthquakes can knock picutures off the walls, items off the shelves etc. if they just happen to hit the right resonant frequency. So if you flood the area with 8Hz-ish acoustic energy, some stuff will start to shake.


You will probably end up in court. But you might not get convicted.

Shakeeb Ahmed was convicted of wire fraud for exploiting a smart contract bug.

Avi Eisenberg was also convicted for exploiting a smart contract bug, but he had his conviction overturned on appeal.

The Peraire-Bueno brothers were in court for exploiting a bug in the MEV mechanism but it ended in a mis-trial so we're going to have to wait to find out.

Not legal advice ;-)


Top Tip: If you find the orange site's conversation on crypto to be repetitive you can change the top bar. Conversation stays the same but the colour can be changed!


Readers will want to note that this delightful feature is only available to users above 251 karma, or a knack for UserCSS.


Yeah, always takes me a minute when people say 'the orange site' (especially elsewhere) - it's green if I'm logged in, so I rarely see it orange, and then it's 'wuh, I'm logged out, [logs in]'.

Fortunately I'm not prone to refer to the green site.


Wow thank you, I'm about to be on the blue site. I never knew this and really don't like the orange.

0000FF gang, unite!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: