Hacker Newsnew | past | comments | ask | show | jobs | submit | rollulus's commentslogin

Not weird, that’s tradition by now.


Multiple people have coined the idea repeatedly, way before you. The oldest comment on HN I could find was in December 2022 by user spawarotti: https://news.ycombinator.com/item?id=33856172


Here is an even older comment chain about it from 2020: https://news.ycombinator.com/item?id=23895706

Apparently, comparing low-background steel to pre-LLM text is a rather obvious analogy.


As well as that people often do think alike.

If you have a thought, it's likely it's not new.


Oh wow, great find! That’s really early days.


i didnt claim to invent it.

i claimed swyx heard it through me - which he did


you did!!


I used to believe that I was working with JSON schema through OpenAPI 3.0, but then I learned a hard lesson that it uses an “extended subset” of it. And what does that mean? It “means that some keywords are supported and some are not, some keywords have slightly different usage than in JSON Schema, and additional keywords are introduced.” [1]. Yes, that’s a bonkers way to say “this is not JSON schema although it looks similar enough to deceive you”. This word game and engineering choice is so bizarre that it’s almost funny.

[1]: https://swagger.io/docs/specification/v3_0/data-models/keywo...


OpenAPI 3.1 replaced the not-a-superset-or-subset of JSON Schema with the actual JSON Schema (latest version) over five years ago. No one should be using 3.0.x anymore. And 3.2 came out a few months ago, containing lots of features that have been in high demand (support for arbitrary HTTP methods, full expression of multipart and streaming messages, etc).


> No one should be using 3.0.x anymore

Many users are stuck at 3.0 or even Swagger 2.0 because the libraries they use refuse to support recent versions. Also OpenAPI is still not a strict superset because things like `discriminator` are still missing in JSON schema.


This.

If you're building a brand new, multi-language, multi-platform system that uses advanced open-api features - you will get bitten by lack of support in 3.1 versions of tooling for features that already existed and work fine right now in 3.0 tool versions. Especially if you're using a schema-first workflow (which you should be). For example, $ref's to files across windows/linux/macos across multiple different language tools - java, .net, typescript, etc.

If you need (or just want) maximum compatibility across tools, platforms and languages - open-api 3.1 is still not viable, and isn't looking like it will be anytime soon.


The solution here is to demand support for the most recent specification version from your tooling vendors. We (the OpenAPI TSC) sometimes hear from vendors "we're not moving quickly to support the latest version because our users aren't asking for it." So it's a catch-22 unless you make your needs known.


Because they are at least 28 years old.


(1997).


TL;DR. Author with “years of experience of shipping to prod” mutates globals without a mutex and is surprised enough to write a blog.


There’s an example of a mutex too…


An example where they’re creating a new mutex every time they call a function and then surprised when multiple goroutines that called that function and got entirely different mutexes somehow couldn’t coordinate the locks together.

That isn’t a core misunderstanding of Go, that’s a core misunderstanding of programming.


Classic. I see issues. Vendor’s status page is all green. Go to HN to find the confirmation. Applies to AWS, GH, everyone.

Edit: beautiful, this decentralised design of the internet.


I get the feeling that all "serious" businesses have manual processes for publicly facing status pages, for political reasons.

I don't like it.


I’ve written before on HN about when my employer hired several ex-FAANG people to manage all things cloud in our company.

Whenever there was an outage they would put up a fight against anyone wanting to update the status page to show the outage. They had so many excuses and reasons not to.

Eventually we figured out that they were planning to use the uptime figures for requesting raises and promos as they did at their FAANG employer, so anything that reduced that uptime number was to be avoided at all costs.


Are there companies that actually use their statuspage as a source of truth for uptime numbers?

I think it's way more common for companies to have a public status page, and then internal tooling that tracks the "real" uptime number. (E.g. Datadog monitors, New Relic monitoring, etc)

(Your point still stands though.)


I don’t know, but I will say that this team that was hired into our company was so hyperfocused on any numbers they planned to use for performance reviews that it probably didn’t matter which service you chose to measure the website performance. They’d find a way to game it. If we had used the internal devops observability tools I bet they would have started pulling back logging and reducing severity levels as reported in the codebase.

It’s obviously not a problem at every company because there are many companies who will recognize these shenanigans and come down hard on them. However you could tell these guys could recognize any opportunity to game the numbers if they thought those numbers would come up at performance review time.

Ironically our CEO didn’t even look at those numbers. He used the site and remembered the recent outages.


[Datadog employee here] https://updog.ai tracks the uptime of multiple services by real impact across Datadog customers.


It's because if you automate it, something could/would happen to the little script that defines "uptime," and if that goes down, suddenly you're in violation of your SLA and all of your customers start demanding refunds/credits/etc. when everything is running fine.

Or let's say your load balancer croaks, triggering a "down" status, but it's 3am, so a single server is handling traffic just fine? In short, defining "down" in an automated way is just exposing internal tooling unnecessarily and generates more false positives than negatives.

Lastly, if you are allowed 45 minutes of downtime per year and it takes you an hour to manually update the status page, you just bought yourself an extra hour to figure out how to fix the problem before you have to start issuing refunds/credits.


>you just bought yourself an extra hour to figure out how to fix the problem before you have to start issuing refunds/credits

No. Not if you're not defrauding your customers, you didn't.


There's a reason most SLAs say "you shall not establish your own monitoring of our systems."


At some level, the status updates have to be manual. Any automation you try to build on top is inevitably going to break in a crisis situation.


I found GitHub's old "how many visits to this status page have there been recently" graph on their status page to be an absurdly neat solution to this.

Requires zero insight into other infrastructure, absolutely minimal automation, but immediately gives you an idea whether it's down for just you or everybody. Sadly now deceased.


I like that https://discordstatus.com/ shows the API response times as well. There's times where Discord will seem to have issues, and those correlate very well with increased API response times usually.

Reddit Status used to show API response times way back in the day as well when I used to use the site, but they've really watered it down since then. Everything that goes there needs to be manually put in now AFAIK. Not to mention that one of the few sections is for "ads.reddit.com", classic.


https://steamstat.us still has this - while not official it's pretty nice.


Yeah, this is something people think is super easy to automate, and it is for the most basic implementation of something like a single test runner. The most basic implementation is prone to false positives, and as you say, breaking when the rest of your stuff breaks.

You can put your test runner on different infrastructure, and now you have a whole new class of false positives to deal with. And it costs you a bit more because you're probably paying someone for the different infra.

You can put several test runners on different infrastructure in different parts of the world. This increases your costs further. The only truly clear signals you get from this are when all are passing or all are failing. Any mixture of passes and fails has an opportunity for misinterpretation. Why is Sydney timing out while all the others are passing? Is that an issue with the test runner or its local infra, or is there an internet event happening (cable cut, BGP hijack, etc) beyond the local infra?

And thus nearly everyone has a human in the loop to interpret the test results and make a decision about whether to post, regardless of how far they've gone with automation.


They are manual AND political (depending on how big the company is). Because having a dashboard go to red usually has a bunch of project work behind it.


SLA breaches have consequences, no big conspiracy there


Not at all saying it's a conspiracy, I just think it's a lack of transparency.

I get why, but it would give me more confidence if they would tell me about everything.


I guess a dirty little secret might be that something is always acting up or being noisy and it would spam the status page completely.


They don't make more money by giving you more confidence in their systems.


FWIW, cloudflare's status page is showing red currently.


I usually get notifications from the sales/CS team way before the status page/incident list has any blip. This time was not an exception


It's as if they wanted an internet kill switch. /S


I love Go. It makes that I get shit done. I picked up Go more than ten years ago, because it was HN’s darling and when I didn’t know about hype cycles. No regrets.


As always there’s someone on the internet a step beyond. Meet the pussy wetter: https://pussywetter.com/


Is there a repository for that? I’d like to dissuade certain species from my porch but not others…


That really is a step beyond!


That was a “people who missed the launch discovered it later”-thread. The launch was 2.5 years ago. https://news.ycombinator.com/item?id=35964397


IIRC, they initially only allowed VPN users to access the engine up until the 2nd announcement


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: