Hacker Newsnew | past | comments | ask | show | jobs | submit | giantrobot's commentslogin

Normal search engine spiders did/do cause problems but not on the scale of AI scrapers. Search engine spiders tend to follow a robots.txt, look at the sitemap.xml, and generally try to throttle requests. You'll find some that are poorly behaved but they tend to get blocked and either die out or get fixed and behave better.

The AI scrapers are atrocious. They just blindly blast every URL on a site with no throttling. They are terribly written and managed as the same scraper will hit the same site multiple times a day or even hour. They also don't pay any attention to context so they'll happily blast git repo hosts and hit expensive endpoints.

They're like a constant DOS attack. They're hard to block at the network level because they span across different hyperscalers' IP blocks.


Puts on tinfoil hat: Maybe it isn’t AI scrapers, but actually is a massive dos attack, and it’s a conspiracy to get people to not self-host.

Maps That Are Just Datacenters

Subtle and clever. You got a laugh out of me.

Hardware acceleration has been a thing since...forever. Video in general is a balancing act between storage, bandwidth, and quality. Video playback on computers is a balancing act between storage, bandwidth, power, and cost.

Video is naturally large. You've got all the pixels in a frame, tens of frames every second, and however many bits per pixel. All those frames need to be decoded and displayed in order and within fixed time constraints. If you drop frames or deliver them slowly no one is happy watching the video.

If at any point you stick to video that can be effectively decoded on a general purpose CPU with no acceleration you're never going to keep up with the demands of actual users. It's also going to use a lot more power than an ASIC that is purpose-built to decode the video. If you decide to use the beefiest CPU in order to handle higher quality video under some power envelope your costs are going to increase making the whole venture untenable.


I hear you but I think the benefits fall mainly on streaming platforms rather than users.

Like I'm sure Netflix will lower their prices and Twitch will show fewer ads to pass the bandwidth savings onto us right?


Would anyone pay NetFlix any amount of money if they were using 1Mbps MPEG-1 that's trivially decoded on CPUs?

Ogg Theora is right there.

Theora is incredibly primitive compared even to H.264.

The moment O'Reilly went subscription-only they lost me as a customer. I have a huge library of O'Reilly books I've purchased as PDFs. Shit I've got a huge library of print O'Reilly books despite years of slimming down.

It really sucked because I've been learning from O'Reilly books for thirty years. But I've become fundamentally opposed to DRM on media and subscription-only access is the ultimate DRM. I don't have any desire to be locked into their app to access stuff I paid for and be at the whims of their poor UI decisions.


They do still sell all of their books - just not directly on their website.

Whenever possible, they're sold without DRM.


Which is fine but they ruined their direct DRM-free sales. You used to get access to a PDF, epub, and mobi version of a book with no DRM. They even conveniently allowed you to sync your purchases to Dropbox. It was awesome.

Now you have to hope something g you're interested in ends up in a Humble Bundle or something. The situation is worse in every way for a end users.


It would be such a dream if I could get an ebook, pdf, and physical copy. I love O'Reilly books and have been lucky to have access the last few years because of school.

They used to do a "digital upgrade" where you could get the digital version of a book if you had the physical copy. There was no verification on their end and it was something like $5 a book. It was an awesome way to upgrade your library.

They can't have even lost money if people were just claiming to own books to get the cheap price. Their marginal cost for the PDFs was effectively zero so at $5 they were making plenty of money on them. At the time a PDF only copy of their books was about $10.


Yeah me too. The only recent O'Reilly books I have came from humblebundle specials.

I've also been pretty disappointed with their quality and/or usefulness lately. They seem to just cover stuff in a less technical vague high level way now. Hopefully that's just a sampling error on my part.


Systems with mod_perl (or just Perl allowing normal CGI) installed, especially shared hosting was so common as to be the norm in the late 90s and early 00s.

I think instead the biggest reason PHP took off was it had far less deployment friction and better aesthetics than Perl did on machines where you didn't have admin access, basically ever shared web hosting ever.

Typically CGI scripts on shared hosting were limited to explicit cgi-bin directories that had +ExecCGI. At the same time hosts would often not enable mod_rewrite because it could get computationally expensive on hardware of the era.

This all meant that all your dynamic content had to live at some "/cgi-bin/" path. It could be difficult to have a main landing page be dynamic without an empty index HTML just having an HTTP-Refresh meta tag to your "/cgi-bin/" path.

Contrast with PHP which would be processed from any directory path and was its own built-in templating language. It was also usually included in the DirectoryIndex list so an index.php would act as a directory index leading to cleaner URLs.

In the era when deployment mean MPUT in an FTP client those small differences made a difference for people trying to make their first dynamic website and look "professional".


Perl CGI scripts were ubiquitously supported by shared hosts, but IIRC mod_perl was not unless you had some custom setup on a dedicated server. Also IIRC, mod_perl was just a lot more complicated to set up and use, while mod_php was dead simple.

The cult of RTFM is so painful to interact with and off putting. The concept is sound, reading documentation is important. However simply responding to all questions with "RTFM" is not only not helpful but as often as not useless advice.

The documentation for something may not exist, may not be clear, or may just be wrong. Unless you specifically know the answer to a question is laid out clearly in the documentation, blindly telling someone to read the documentation is just being a dismissive asshole.

A much more productive and helpful response is "did you RTFM?" or "check section X of the manual". But those sorts of questions require the desire to not be a dismissive asshole.

The cult of RTFM has always been an impediment to Linux becoming more popular. When I was first learning Linux...almost thirty years ago now...the cult of RTFM nearly put me off the whole endeavor. I was asking for help with "Xwindows" on IRC and the responses were either RTFM (which I had done) or pedant diatribes about "it's X, not Xwindows newbie! It's not micro$oft!" Which was a super fun to deal with. The experience steeled my resolve to at least ask someone if they read the manual before assholishly telling them to do so.


Hey XML is ucky, we wouldn't want to just serve it directly and let XSL "hydrate" it in the native browser engine. It's a way better idea to download Doom 2 worth of JavaScript and use a data serialization format without built-in schema validation.

Why get all that capability for inside a native browser engine? We should reinvent all of it to run as JavaScript!


> But ChatGPT doesn't have that problem because you can see and "discuss" the buying decisions.

If OpenAI is accepting money from advertisers to push products then ChatGPT is just a salesman. You won't be having "discussions" you'll be actively sold stuff at all times. What an awful yet banal dystopia.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: