I'm not very informed about the coup -- but doesn't it just depend on what side most of the employees sat/sit on? I don't know how much of the coup was just egos or really an argument about philosophy that the rank and file care about. But I think this would be the argument.
The thing where dozens of them simultaneously posted “OpenAI is nothing without its people” on Twitter during the coup was so creepy, like actual Jonestown vibes. In an environment like that, there’s no way there wasn’t immense pressure to fall into line.
That seems like kind of an uncharitable take when it can otherwise be explained as collective political action. I’d see the point if it were some repeated ritual but if they just posted something on Twitter one time then it sounds more like an attempt to speak more loudly with a collective voice.
They didn't need pressuring. There was enough money involved that was at risk without Sam that they did what they thought was the best way to protect their nest eggs.
Oh I read it, and I disagree with your analysis. Sotomayor makes some decent points in favor of nationwide injunctions when she deigns to engage in legal arguments, but the case against them is very compelling.
The textual case is pretty much completely against them, and if you prefer a consequentialist analysis their drawbacks are well documented across the political spectrum. I will say Barrett's time as a professor can mean her opinions are highly technical in their procedural analysis and she's not as strong of a writer as other members of the court.
This very opinion is also an example of something that Thomas has beat the drum about for some time and it later becomes a majority opinion.
What if rather than fine tuning with security vulnerabilities you fine tuned with community events announcements. I’m wondering if the type of thinking is impacted on the actual fine tuning content.
Fundamentally it’s hard to pushback against an authoritarian government. There is very little to stop Trump from sending Doge into MS headquarters with Marines and demanding admin access so they can make the change. Thinking the dependency on Microsoft (or any company) is the risk then you haven’t been paying attention.
This is interesting -- it does put into context some of what was hyped up recently in the news, for example, the Fairlife Core Power microplastics. While it is higher in Core Plastics, it's not off by an order of magnitudes compared with other milk products.
The other question I have -- what does someone who consumes very little microplastics look like? Increased lifespan, decreased risk of cancer (by how much), does it have lead-like outcomes, etc... Avoiding microplastics seems like a lot of inconvenience (at least for an individual) -- I'd want to make sure the payoff at the end is worth it.
I would think -as microplastic particles have been found even in creatures in the deepest parts of the ocean- that it is nigh impossible to avoid them.
While I agree in theory -- the problem I have is that humans I've worked with are much worse at writing tests than they are at writing the implementation. Maybe its motivation or experience, but test quality generally is much worse than implementation quality -- at least in my experience.
This might be the defining line for Gen AI - people who can read code faster will find it useful and those that write faster then they can read won’t use it.
> I can read code much faster than I can write it.
I have known and worked with many, many engineers across a wide range of skill levels. Not a single one has ever said or implied this, and in not one case have I ever found it to be true, least of all in my own case.
I don't think it's humanly possible to read and understand code faster than you can write and understand it to the same degree of depth. The brain just doesn't work that way. We learn by doing.
You definitely can. For example I know x86. I can read it and understand it quite well. But if you asked me to write even a basic program in it, it would take me a considerable amount of time.
The same goes with shell scripting.
But more importantly you don’t have to understand code to the same degree and depth. When I read code I understand what the code is doing and if it looks correct. I’m not going over other design decisions or implementation strategies (unless they’re obvious). If I did that then I’d agree. Id also stop doing code reviews and just write everything myself.
Huh, I don't know x86 but I do plenty of shell-scripting and am surprised, and a little embarrassed, that it had never dawned on me: you're right; they are easier to read, at least with a view to understanding intent, than to write. In fact, are there shell-scripting languages of which this isn't true?
I think that's wrong. I only have to write code once, maybe twice. But when using AI agents, I have to read many (5? 10? I will always give up before 15) PRs before finding one close enough that I won't have to rewrite all of it. This nonsense has not saved me any time, and the process is miserable.
I also haven't found any benefit in aiming for smaller or larger PRs. The aggregare efficiency seems to even out because smaller PRs are easier to weed through but they are not less likely to be trash.
The problem is that at this stage we mostly just have people's estimates of their own success to go on, and nobody thinks they're incompetent. Nobody's going to say "AI works really well for, me but I just pump out dross my colleagues have to fix" or "AI doesn't work for me but I'm an unproductive, burnt out hack pretending I'm some sort of craftsman as the world leaves me behind".
This will only be resolved out there in the real world. If AI turns a bad developer, or even a non-developer, into somebody that can replace a good developer, the workplace will transform extremely quickly.
So I'll wait for the world to prove me wrong but my expectation, and observation so far, is that AI multiplies the "productivity" of the worst sort of developer: the ones that think they are factory workers who produce a product called "code". I expect that to increase, not decrease, the value of the best sort of developer: the ones who spend the week thinking, then on Friday write 100 lines of code, delete 2000 and leave a system that solves more problems than it did the week before.
I aspire to live up to your description of the best sort of developer. But I think there might also be a danger that that approach can turn into an excuse for spending the week overthinking (possibly while goofing off as well; I've done it), then writing a first cut on Friday, leaving no time for the multiple iterations that are often necessary to get to the best solution. In other words, I think sometimes it's necessary to just start coding sooner than we'd like so we can start iterating toward the right solution. But that "unproductive, burnt out hack" line hits a bit too close to home for me these days, and I'm starting to entertain the possibility that an LLM-based agent might have more energy for doing those multiple iterations than I do.
It’s interesting some folks can use them to build functioning systems and others can’t get a PR out of them.
It is 100% a function of what you are trying to build, what language and libraries you are building it in, and how sensitive that thing is to factors like performance and getting the architecture just right. I've experienced building functioning systems with hardly any intervention, and repeatedly failing to get code that even compiles after over an hour of effort. There exists small, but popular, subset of programming tasks where gen AI excels, and a massive tail of tasks where it is much less useful.
So long as it isn’t eating the variety of fresh fruits and vegetables that produce gut serotonin in the microbial breakdown of complex carbohydrates and fiber, right?
Eating fresh fruits and veggies is a great general health advice but unlikely to shift your gut microbiome meaningfully (unless you are eating some specific fruits or fruit skins in specific conditions).
And randomly eating healthy stuff is probably not going to shift your biome in a particular direction nor eliminate the cause of biome issues.
I eat healthy including a lot of whole fruit, nuts, dried fruits, dark green veggies and grass finished meats and there are still times I have my biome be a bit off.
What seems clear is there is no consensus. Gemini 2.5 Pro just seems consistently worse to me, but I’ve seen others sing its praises. This might be more like iPhone vs Android than a true stack ranking of models.
Sometimes it's great, sometimes it's not. Depends on the tools you're using too, I guess.
Like when using Roo-Code, Gemini 2.5 Pro still gets confused by the wonky diff format Roo-Code wants it to use. It'll keep messing up simple edits, and if it happens once, it'll happen again and again, cause it's multi-shotting itself to make mistakes.
I don't have that with Claude-Code, it just keeps on chugging along.
One big difference there though: I got the Claude-Code Pro Max plan (or whatever it's called). I now no longer have to worry about the cost since it's a monthly flat-fee, so if it makes a mistake it doesn't make me angry, since the mistake didn't cost me 5 euros.
I am using an MCP server that adds Gemini & O3 to Claude-Code, so Claude-Code can ask them for assistance here and there, and in this Gemini 2.5 Pro has been such a great help. Especially because its context size is so much larger, it can take in a lot more files than Claude can, so it's better at spotting mistakes.
It depends on the task. Claude 4 is better at coding (haven't tried claude code, just sonnet, but you can tell). However when it comes to using an LLM to develop your thoughts (philosophy/literary criticism), I found Gemini (2.5 pro) to be better. A few days ago I was trying to get Claude to reformulate what I had said in a pretty long conversation, and it was really struggling. I copy-pasted the whole conversation into Gemini and asked it to take over. It absolutely nailed it in one shot.
I found all recent models to be "good enough" for my use (coding assistance). I've settled on just using Claude 4. At the same time the experience also makes me less worried about this tech making programmers obsolete...
Gemini 2.5 pro has been consistently excellent for me, when it works. It sometimes just spins and spins with no results but when it comes with something, it has been pretty good.
It seems like you often have LLMs grading each other. Aren’t you concerned that some models may not be “smart” enough to grade a smarter model appropriately?
Using LLMs for evaluating LLMs is incredibly common.
The point isn't in having a "perfect" evaluator, but in having a cheap and somewhat consistent evaluator.
This approach holds up well enough... as long as you don't try to use it for RL. If you do, chances are, you'll end up with an adversarial LLM that aims solely for breaking and saturating the evaluator.
But I feel like the evaluator should generally be stronger/better than what its evaluating. Otherwise you risk it evaluating at a lower level, while the better LLM is writing with more nuance that the lower LLM doesn't pick up on.
I've seen some places, e.g., NY Times, use expert panels to review the results from LLMs. For example, getting the author of a book/essay to evaluate how well the LLM summarizes and answers questions about the book/essay. While it's not scalable, it does seem like it will better evaluate cutting edge models.
reply