More

biophysboy · 2026-02-19T18:51:43 1771527103

That's the key: use AI for labor substitution, not labor replacement. Nothing necessarily wrong with labor saving for trivial projects, but we should be using these tools to push the boundaries of tech/science!

biophysboy · 2026-02-18T18:20:45 1771438845

To be fair, the healthcare employment flows are small (1.6M) compared to the total employment stock (160M+ people). And you would expect the healthcare labor supply to increase to match the bulge created by the baby boom. I don't think this means we will all be a healthcare middle man though; it assumes this large flow is permanent, and overstates the fraction of the labor stock.

biophysboy · 2026-02-17T18:06:15 1771351575

Not all the legacy newspapers are failing; NYT is doing well. There are other news sources beyond legacy newspapers, broadcasters, local news, and social media. There are wire services (AP, Reuters), insider access journalism (Axios, Punchbowl, Semafor), public media (NPR, PBS, BBC), investigative journalism (ProPublica), digital-first outlets (Politico, Vox), and the growing wave of small, indie , creator-led media (YouTube, Substack, Patreon).

giraffe_lady · 2026-02-17T18:38:14 1771353494

NYT as a news organization is, charitably, part of the controlled opposition. They are not meeting this moment at all, through cowardice or intentional complicity I'm not completely sure.

moogly · 2026-02-17T20:39:50 1771360790

NYT is doing well because of Wordle and all other games; their journalism is a joke.

biophysboy · 2026-02-18T01:10:26 1771377026

Why do you think I care about your journalism opinions?

moogly · 2026-02-18T12:28:39 1771417719

Yeah, sorry. I should've said they're doing regime sanewashing the right way, to paraphrase Ezra Klein.

biophysboy · 2026-02-17T17:53:28 1771350808

If its any consolation, I think CBS news will fail miserably. The new captains are at the helm of a sinking ship, which has been taking on water for decades.

Maybe a cynic will say "this was the plan", but if it was, its not a very good plan? If anything, tech executives benefited enormously from their opponents being overly attached to legacy media communication strategies. When Bezos kills the Post or Ellison kills CBS, the talent doesnt magically disappear.

biophysboy · 2026-02-17T14:58:45 1771340325

If any of you here feel like you've lost your identity, I would highly recommend watching the recently released movie "No Other Choice" by Park Chan-wook.

noisem4ker · 2026-02-20T17:44:39 1771609479

Maybe I didn't look hard enough, as I was put off by the pervasive absurdity of it, but I don't feel like I gained anything at all from watching that film.

biophysboy · 2026-02-17T14:20:12 1771338012

You can believe in a phenomena and still do good science. It all depends on if your exp design is free of bias. Randomizing, blinding, instrumentation, pre-registration, statistical rigor - there all sorts of ways to do this. I say this because I think non-scientists regularly say that science is biased because scientists are biased. The cool thing about science is you don't have to have any pretense of objectivity as a person as long as your experiment is independent.

reliabilityguy · 2026-02-17T15:07:54 1771340874

> I say this because I think non-scientists regularly say that science is biased because scientists are biased.

As any other field, the science is as good as the scientist that produced it. For example, there is a serious reproducibility crisis in multiple fields, like psychology, and social sciences. In the latter it is hard to say of its due to systemic educational failure of the PhD students in those fields, or that the field and personal politics are merging too tightly.

Unfortunately, all it takes is one bad scientist to discredit the rest, e.g., Wakefield.

biophysboy · 2026-02-17T15:42:50 1771342970

I know what you are saying, but I'm arguing that science is great because it produces better output than the people that make it as long as they stick to good methods.

As for reproducibility, its my opinion that it has more to do with incentives and constraints than the ethics or intellectual capacity of the researcher (although those are real components too)

reliabilityguy · 2026-02-17T15:46:22 1771343182

> I'm arguing that science is great because it produces better output than the people that make it as long as they stick to good methods.

I don’t think we have a contradiction here. What I am saying is that science is made by people, and we as scientists have to be extremely vigilant today not to let “ends justify the means” crowd to use the name of science for their own agenda.

biophysboy · 2026-02-17T17:42:19 1771350139

Yeah I might be nitpicking. I agree with you. I get frustrated when people conflate a neutral scientific method with a neutral scientist: that a scientist cannot explore political/ideological topics with robust methods because its biased. Do you see the sleight of hand I'm talking about? I've noticed it a lot in the reproducibility discourse.

reliabilityguy · 2026-02-18T23:04:43 1771455883

> Do you see the sleight of hand I'm talking about?

Yes. I have sociologists in my extended friends group, and we had a couple of heated discussions on why interviewing 20 subjects is not sufficient for the conclusions they have. But, turns out, it’s the norm in their specific field. Go figure

biophysboy · 2026-02-16T22:54:09 1771282449

A lot of low wage work isn't physical

biophysboy · 2026-02-16T22:50:39 1771282239

Looking at the paper, the effect is significant but weak (5-7%), even with the conditionals that magnify the effect. I would be curious to see the effect if this experiment were performed on a slightly different categorical variable (e.g. how are two white ethnicities treated). I do think its bad if preferences are "baked in" to the default though - prompting them away seems like a bad solution.

biophysboy · 2026-02-16T20:19:00 1771273140

This matches my experience using LLMs for science. Out of curiosity, I downloaded a randomized study and the CONSORT checklist, and asked Claude code to do a review using the checklist.

I was really impressed with how it parsed the structured checklist. I was not at all impressed by how it digested the paper. Lots of disguised errors.

baq · 2026-02-16T20:59:34 1771275574

try codex 5.3. it's dry and very obviously AI; if you allow a bit of anthropomorphisation, it's kind of high-functioning autistic. it isn't an oracle, it'll still be wrong, but it's a powerful, completely different from claude tool.

biophysboy · 2026-02-16T21:03:23 1771275803

Does it get numbers right? One of the mistakes it made in reading the paper was swapping sets of numbers from the primary/secondary outcomes.

baq · 2026-02-16T21:11:30 1771276290

it does get screenshots right for me, but obviously I haven't tried on your specific paper. I can only recommend trying it out, it's also has a much more generous limits in the $20 tier than opus.

biophysboy · 2026-02-16T21:44:42 1771278282

I see. To clarify, it parsed numbers in the pdf correct, but assigned them the wrong meaning. I was wondering if codex is better at interpreting non text data

enraged_camel · 2026-02-17T10:05:35 1771322735

Every time someone suggests Codex I give it a shot. And every time it disappoints.

After I read your comment, I gave Codex 5.3 the task of setting up an E2E testing skeleton for one of my repos, using Playwright. It worked for probably 45 minutes and in the end failed miserably: out of the five smoke tests it created, only two of them passed. It gave up on the other three and said they will need “further investigation”.

I then stashed all do that code and gave the exact same task to Opus 4.5 (not even 4.6), with the same prompt. After 15 mins it was done. Then I popped Codex’s code from the stash and asked Opus to look at it to see why the three m of the five tests Codex wrote didn’t pass. It looked at them and found four critical issues that Codex had missed. For example, it had failed to detect that my localhost uses https, so the the E2E suite’s API calls from the Vue app kept failing. Opus also found that the two passing tests were actually invalid: they checked for the existence of a div with #app and simply assumed it meant the Vue app booted successfully.

This is probably the dozenth comparison I’ve done between Codex and Opus. I think there was only one scenario where Codex performed equally well. Opus is just a much better model in my experience.

baq · 2026-02-17T11:38:41 1771328321

moral of the story is use both (or more) and pick the one that works - or even merge the best ideas from generated solutions. independent agentic harnesses support multi-model workflows.

enraged_camel · 2026-02-17T12:33:27 1771331607

I don't think that's the moral of the story at all. It's already challenging enough to review the output from one model. Having to review two, and then comparing and contrasting them, would more than double the cognitive load. It would also cost more.

I think it's much more preferable to pick the most reliable one and use it as the primary model, and think of others as fallbacks for situations where it struggles.

baq · 2026-02-17T12:50:16 1771332616

you should always benchmark your use cases and you obviously don't review multiple outputs; you only review the consensus.

see how perplexity does it: https://www.perplexity.ai/hub/blog/introducing-model-council

biophysboy · 2026-02-16T19:31:10 1771270270

The link is described in the first line of the article.

groundzeros2015 · 2026-02-16T21:47:29 1771278449

The link is it’s funded by Founder’s Fund.

Do you know Lyft and meta has links to Peter thiel?

alterom · 2026-02-17T03:26:53 1771298813

Meta, the famously ethical, privacy-first, "move slow and make sure you do right by the people" company.

...yeah. We're aware.

groundzeros2015 · 2026-02-17T14:17:59 1771337879

Well glad we are honest here. I personally would find an article talking about meta as a “thiel backed entity” as something trying to prey on fear