Hacker Newsnew | past | comments | ask | show | jobs | submit | kenjackson's commentslogin

I'm not very informed about the coup -- but doesn't it just depend on what side most of the employees sat/sit on? I don't know how much of the coup was just egos or really an argument about philosophy that the rank and file care about. But I think this would be the argument.

There was a petition with a startlingly high percentage of employees signing it, but no telling how many of them felt pressured to to keep their job.

The thing where dozens of them simultaneously posted “OpenAI is nothing without its people” on Twitter during the coup was so creepy, like actual Jonestown vibes. In an environment like that, there’s no way there wasn’t immense pressure to fall into line.

That seems like kind of an uncharitable take when it can otherwise be explained as collective political action. I’d see the point if it were some repeated ritual but if they just posted something on Twitter one time then it sounds more like an attempt to speak more loudly with a collective voice.

They didn't need pressuring. There was enough money involved that was at risk without Sam that they did what they thought was the best way to protect their nest eggs.

You clearly hadn’t read Barrett’s majority opinion. Conceptually broken as anything I’ve read recently.

Oh I read it, and I disagree with your analysis. Sotomayor makes some decent points in favor of nationwide injunctions when she deigns to engage in legal arguments, but the case against them is very compelling.

The textual case is pretty much completely against them, and if you prefer a consequentialist analysis their drawbacks are well documented across the political spectrum. I will say Barrett's time as a professor can mean her opinions are highly technical in their procedural analysis and she's not as strong of a writer as other members of the court.

This very opinion is also an example of something that Thomas has beat the drum about for some time and it later becomes a majority opinion.


I think your analysis is likely influenced by factors that are external to the actual opinion/dissent. But that’s fairly common.

Physician, heal thyself

What if rather than fine tuning with security vulnerabilities you fine tuned with community events announcements. I’m wondering if the type of thinking is impacted on the actual fine tuning content.

Fundamentally it’s hard to pushback against an authoritarian government. There is very little to stop Trump from sending Doge into MS headquarters with Marines and demanding admin access so they can make the change. Thinking the dependency on Microsoft (or any company) is the risk then you haven’t been paying attention.

That’s the point of federation. If there’s no centralized target then the Marines have a much harder job.

The incident in question targeted someone outside of the US, where DOGE has no direct influence (yet).

Tell that to the ICC judges that have been sanctioned by Microsoft and Trump's administration.

Because you know, sanctioning judges in the International Crime Court in Den Hague is literally not their (the US's) jurisdiction.


DOGE’s influence is wherever the administration wants it to be.

This is interesting -- it does put into context some of what was hyped up recently in the news, for example, the Fairlife Core Power microplastics. While it is higher in Core Plastics, it's not off by an order of magnitudes compared with other milk products.

The other question I have -- what does someone who consumes very little microplastics look like? Increased lifespan, decreased risk of cancer (by how much), does it have lead-like outcomes, etc... Avoiding microplastics seems like a lot of inconvenience (at least for an individual) -- I'd want to make sure the payoff at the end is worth it.


I would think -as microplastic particles have been found even in creatures in the deepest parts of the ocean- that it is nigh impossible to avoid them.

While I agree in theory -- the problem I have is that humans I've worked with are much worse at writing tests than they are at writing the implementation. Maybe its motivation or experience, but test quality generally is much worse than implementation quality -- at least in my experience.


I can read code much faster than I can write it.

This might be the defining line for Gen AI - people who can read code faster will find it useful and those that write faster then they can read won’t use it.


> I can read code much faster than I can write it.

I have known and worked with many, many engineers across a wide range of skill levels. Not a single one has ever said or implied this, and in not one case have I ever found it to be true, least of all in my own case.

I don't think it's humanly possible to read and understand code faster than you can write and understand it to the same degree of depth. The brain just doesn't work that way. We learn by doing.


You definitely can. For example I know x86. I can read it and understand it quite well. But if you asked me to write even a basic program in it, it would take me a considerable amount of time.

The same goes with shell scripting.

But more importantly you don’t have to understand code to the same degree and depth. When I read code I understand what the code is doing and if it looks correct. I’m not going over other design decisions or implementation strategies (unless they’re obvious). If I did that then I’d agree. Id also stop doing code reviews and just write everything myself.


Huh, I don't know x86 but I do plenty of shell-scripting and am surprised, and a little embarrassed, that it had never dawned on me: you're right; they are easier to read, at least with a view to understanding intent, than to write. In fact, are there shell-scripting languages of which this isn't true?


I think that's wrong. I only have to write code once, maybe twice. But when using AI agents, I have to read many (5? 10? I will always give up before 15) PRs before finding one close enough that I won't have to rewrite all of it. This nonsense has not saved me any time, and the process is miserable.

I also haven't found any benefit in aiming for smaller or larger PRs. The aggregare efficiency seems to even out because smaller PRs are easier to weed through but they are not less likely to be trash.


I only generate the code once with GenAI and typically fix a bug or two - or at worst use its structure. Rarely do I toss a full PR.

It’s interesting some folks can use them to build functioning systems and others can’t get a PR out of them.


The problem is that at this stage we mostly just have people's estimates of their own success to go on, and nobody thinks they're incompetent. Nobody's going to say "AI works really well for, me but I just pump out dross my colleagues have to fix" or "AI doesn't work for me but I'm an unproductive, burnt out hack pretending I'm some sort of craftsman as the world leaves me behind".

This will only be resolved out there in the real world. If AI turns a bad developer, or even a non-developer, into somebody that can replace a good developer, the workplace will transform extremely quickly.

So I'll wait for the world to prove me wrong but my expectation, and observation so far, is that AI multiplies the "productivity" of the worst sort of developer: the ones that think they are factory workers who produce a product called "code". I expect that to increase, not decrease, the value of the best sort of developer: the ones who spend the week thinking, then on Friday write 100 lines of code, delete 2000 and leave a system that solves more problems than it did the week before.


I aspire to live up to your description of the best sort of developer. But I think there might also be a danger that that approach can turn into an excuse for spending the week overthinking (possibly while goofing off as well; I've done it), then writing a first cut on Friday, leaving no time for the multiple iterations that are often necessary to get to the best solution. In other words, I think sometimes it's necessary to just start coding sooner than we'd like so we can start iterating toward the right solution. But that "unproductive, burnt out hack" line hits a bit too close to home for me these days, and I'm starting to entertain the possibility that an LLM-based agent might have more energy for doing those multiple iterations than I do.


My experiences so far suggest that you might be right.


It’s interesting some folks can use them to build functioning systems and others can’t get a PR out of them.

It is 100% a function of what you are trying to build, what language and libraries you are building it in, and how sensitive that thing is to factors like performance and getting the architecture just right. I've experienced building functioning systems with hardly any intervention, and repeatedly failing to get code that even compiles after over an hour of effort. There exists small, but popular, subset of programming tasks where gen AI excels, and a massive tail of tasks where it is much less useful.


Strongly second this ask.


So long as it isn’t eating the variety of fresh fruits and vegetables that produce gut serotonin in the microbial breakdown of complex carbohydrates and fiber, right?


Eating fresh fruits and veggies is a great general health advice but unlikely to shift your gut microbiome meaningfully (unless you are eating some specific fruits or fruit skins in specific conditions).

And randomly eating healthy stuff is probably not going to shift your biome in a particular direction nor eliminate the cause of biome issues.

I eat healthy including a lot of whole fruit, nuts, dried fruits, dark green veggies and grass finished meats and there are still times I have my biome be a bit off.


“Unlikely”, “randomly”, and “probably” had me nervous about your contribution to this discussion.

I do appreciate the quantitative precision at the end of your last sentence.


Why wouldn’t it be? Do you know which ones exhibit the desired characteristic?


The ones that aren’t food shaped name brand products or fruit colored sugar water.


I’m fairly certain that is not accurate given I was vegetarian for over a decade.


What seems clear is there is no consensus. Gemini 2.5 Pro just seems consistently worse to me, but I’ve seen others sing its praises. This might be more like iPhone vs Android than a true stack ranking of models.


Sometimes it's great, sometimes it's not. Depends on the tools you're using too, I guess. Like when using Roo-Code, Gemini 2.5 Pro still gets confused by the wonky diff format Roo-Code wants it to use. It'll keep messing up simple edits, and if it happens once, it'll happen again and again, cause it's multi-shotting itself to make mistakes.

I don't have that with Claude-Code, it just keeps on chugging along.

One big difference there though: I got the Claude-Code Pro Max plan (or whatever it's called). I now no longer have to worry about the cost since it's a monthly flat-fee, so if it makes a mistake it doesn't make me angry, since the mistake didn't cost me 5 euros.

I am using an MCP server that adds Gemini & O3 to Claude-Code, so Claude-Code can ask them for assistance here and there, and in this Gemini 2.5 Pro has been such a great help. Especially because its context size is so much larger, it can take in a lot more files than Claude can, so it's better at spotting mistakes.


It depends on the task. Claude 4 is better at coding (haven't tried claude code, just sonnet, but you can tell). However when it comes to using an LLM to develop your thoughts (philosophy/literary criticism), I found Gemini (2.5 pro) to be better. A few days ago I was trying to get Claude to reformulate what I had said in a pretty long conversation, and it was really struggling. I copy-pasted the whole conversation into Gemini and asked it to take over. It absolutely nailed it in one shot.


I found all recent models to be "good enough" for my use (coding assistance). I've settled on just using Claude 4. At the same time the experience also makes me less worried about this tech making programmers obsolete...


Gemini 2.5 pro has been consistently excellent for me, when it works. It sometimes just spins and spins with no results but when it comes with something, it has been pretty good.


It seems like you often have LLMs grading each other. Aren’t you concerned that some models may not be “smart” enough to grade a smarter model appropriately?


Using LLMs for evaluating LLMs is incredibly common.

The point isn't in having a "perfect" evaluator, but in having a cheap and somewhat consistent evaluator.

This approach holds up well enough... as long as you don't try to use it for RL. If you do, chances are, you'll end up with an adversarial LLM that aims solely for breaking and saturating the evaluator.


But I feel like the evaluator should generally be stronger/better than what its evaluating. Otherwise you risk it evaluating at a lower level, while the better LLM is writing with more nuance that the lower LLM doesn't pick up on.

I've seen some places, e.g., NY Times, use expert panels to review the results from LLMs. For example, getting the author of a book/essay to evaluate how well the LLM summarizes and answers questions about the book/essay. While it's not scalable, it does seem like it will better evaluate cutting edge models.


I’m not sure I would use “consistent” to characterize LLMs


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: