More

xienze · 2025-06-15T10:24:20 1749983060

But the Tower of Hanoi can be solved without "tools" by humans, simply by understanding the problem, thinking about the solution, and writing it out. Having the LLM shell out to a Python example that it "wrote" (or rather, "pasted" since surely a Python solution to the Tower of Hanoi was part of its training set) is akin to a human Googling "program to solve Tower of Hanoi", copy-pasting and running the solution. Yes the LLM has "reasoned" that the solution to the problem is call out to a solution that it "knows" is out there, but that's not really "thinking" about how to solve a problem in the human sense.

What happens when some novel Tower of Hanoi-esque puzzle is presented and there's nothing available in its training set to reference as an executable solution? A human can reason about and present a solution, but an LLM? Ehh...

thomasahle · 2025-06-15T18:32:00 1750012320

> But the Tower of Hanoi can be solved without "tools" by humans, simply by understanding the problem, thinking about the solution, and writing it out.

The paper doesn't give any evidence humans are able to do this. And I honestly find it very implausible. Even Gary Marcus admits in (1) that humans would probably make mistakes.

xienze · 2025-06-16T11:43:15 1750074195

You are aware that humans created and solved the puzzle in the first place, right? Not sure I understand this line of reasoning that if there are humans in this world incapable of solving some problem then boom, checkmate, LLMs can reason about and understand problems like humans do.

DiogenesKynikos · 2025-06-15T10:36:38 1749983798

LLMs are perfectly capable of writing code to solve problems that are not in their training set. I ask LLMs to write code for niche problems that you won't find answers to just by Googling all the time. The LLMs usually get it right.

dfawcus · 2025-06-15T14:34:20 1749998060

Maybe they can, but what the human is able to do is examine the tower of Hanoi problem, and the derive the general rule for solving (odd or even number of disks).

Based upon that comprehension, we then need little working memory (tokens) to solve the problem, it just becomes tedious to execute the algorithm.. But the algorithm was derived after considering the first 3 or 4 cases.

Whereas for the moment, LLMS are just pattern matching; whereas we do the pattern match, then derive the generalised rule.

saberience · 2025-06-15T15:06:47 1750000007

LLMs do that too though lol.

The Tower of Hanoi problem is terrible example for somehow suggesting humans are superior.

Firstly, there are plenty of humans who can’t solve this problem even for 3 disks, let alone 6 or 7. Secondly, LLMs can both give you general instructions to solve for any case and they can write out exhaustive move lists too.

Anyway, the fact that there are humans who cannot do Tower of Hanoi already rules it out as a good test of general intelligence anyway. We don’t say that a human doesn’t have “general intelligence” if they cannot solve Towers of Hanoi, so why then would it be a good test for LLM general intelligence?

xienze · 2025-06-15T10:45:08 1749984308

> LLMs are perfectly capable of writing code to solve problems that are not in their training set.

Examples of these problems? You'll probably find that they're simply compositions of things already in the training set. For example, you might think that "here's a class containing an ID field and foobar field. Make a linked list class that stores inserted items in reverse foobar order with the ID field breaking ties" is something "not in" the training set, but it's really just a composition of the "make a linked list class" and "sort these things based on a field" problems.

saberience · 2025-06-15T15:13:00 1750000380

That’s exactly what humans do though lol.

We reason about things based on our training data. We have a hard time or impossible time reasoning about things we haven’t trained on.

Ie: a human with no experience of board games cannot reason about chess moves. A human with no math knowledge cannot reason about math problems.

How would expect an LLM to reason about something with no training data?

YeGoblynQueenne · 2025-06-15T15:27:50 1750001270

>> Ie: a human with no experience of board games cannot reason about chess moves. A human with no math knowledge cannot reason about math problems.

Then how did the first humans solve math and chess problems, if there were none around solved to give them examples of how to solve them in the first place?

TeMPOraL · 2025-06-15T18:59:14 1750013954

Incrementally, by tiny steps. Including a lot of doing first, then realizing later this is relevant to some chess/math thing.

Also the idea of "problems" like "chess problems" and "math problems" is itself constructed. Chess wasn't created by stacking together enough "chess problems" until they turned into a game - it was invented and tuned as a game for a long time before someone thought about distilling "problems" from it, in order to aid learning the game; from there, it also spilled out into space of logical puzzles in general.

This is true of every skill, too. You first have people who master something by experience, and then you have others who try to distill elements of that skill into "problems" or "exercise regimes" or such, in order to help others reach mastery quicker. "Problems" never come first.

Also: most "problems" are constructed around a known solution. So another answer to "how did the first humans solve" them is simply, one human back-constructed a problem around a solution, and then gave it to a friend to solve. The problem couldn't be too hard either, as it's no fun to not be able to solve it, or to require too much hints. Hence, tiny increments.

dzamo_norton · 2025-06-16T16:59:33 1750093173

And if we left a population of SOTA LRMs on an island for long time, would we return to find that they had done the same?

DiogenesKynikos · 2025-06-15T14:58:46 1749999526

What you're describing is successful generalization from the training dataset, also called "understanding" by laypeople.

Workaccount2 · 2025-06-15T15:05:18 1749999918

The problem with this is that anything presented can be claimed to be in the training set, which is likely a zetebyte in size if not larger. However the counter-factual, the LLM failing a problem that is provably in it's training set (there are many), seems to carry no weight.

procgen · 2025-06-15T15:12:59 1750000379

> that they're simply compositions of things already in the training set

Yes, knowledge is compositional. This is just as true for humans as it is for machines.

jen20 · 2025-06-15T22:00:34 1750024834

> The LLMs usually get it right.

This has not been my experience. They might do something in the right direction. They might write complete garbage. But the amount of time an LLM writes code that compiles and executes first time is vanishingly few for me. Perhaps I'd have better luck if I were doing things which weren't _actual_ niche problems.

jcz_nz · 2025-06-16T04:06:03 1750046763

Actually, my experience at least that when dealing with novel problems LLMs fail miserably. Try accessing uncommon API’s - or areas where you’re unsure an API actually exists (REST against Exchange for admin stuff!). Both ChatGPT and Claude produce nice looking solutions dependent on non-existent libraries. Repeatedly.

eviks · 2025-06-15T15:52:59 1750002779

> can be solved without "tools" by humans, simply by understanding the problem,

This already excludes a lot of humans

xienze · 2025-06-10T16:07:31 1749571651

> I only play games w/ Steam on Linux (Steam Deck or Framework 13 laptop) and Denuvo makes this impossible

Are you sure about that? I have a ROG Ally running Bazzite and I have played several games on this page[0] that use Denuvo.

0: https://store.steampowered.com/curator/26095454-Denuvo-Watch...

xienze · 2025-05-24T13:56:13 1748094973

Source for that claim? If you had spent half a second researching before blindly launching into “Christians r dum” rhetoric you’d have noticed that they actually teach a course on evolution: https://www.hillsdale.edu/courses/evolution-biological-diver...

f30e3dfed1c9 · 2025-05-24T15:36:21 1748100981

It is not clear at all from the course description what that course teaches:

"An introduction to the vast diversity of life from prokaryotic forms to the eukaryotic vertebrate mammals. This course introduces the beginning biology student to all the major groups of organisms and to their fundamental taxonomic relationships."

"This vast diversity exists because that's the way God made them" is perfectly compatible with that description.

Also, from the description of an event held April 11, 2025:

"Are the special creation of Adam and Eve and the evolution of humans over millions of years compatible? 100 years after the Scopes Trial, the debate continues."

So "they actually teach a course on evolution" seems to fall well short of a full description of exactly what they teach there.

f30e3dfed1c9 · 2025-05-24T15:52:28 1748101948

For comparison's sake, here is a description of a more typical Evolutionary Biology course:

"Emphasizes the fundamental evolutionary concepts that provide explanations for the diversification of life on Earth. Specific topics include the evidence for evolution, adaptation by natural selection, speciation, systematics, molecular and genome evolution, and macroevolutionary patterns and processes."

xienze · 2025-05-24T16:04:10 1748102650

[flagged]

MegaButts · 2025-05-25T22:04:55 1748210695

You are weirdly defensive about this.

f30e3dfed1c9 · 2025-05-24T16:46:07 1748105167

> do you have anything to back that up other than assumptions?

Do you? I have obviously not taken the class. Have you? What exactly do they teach in it and how do you know?

what · 2025-05-25T22:08:16 1748210896

>I have obviously not taken the class

So how can say anything about what it teaches? How do you know?

0xB31B1B · 2025-05-24T14:17:00 1748096220

My sister went there and her and her friends and their families believe these things.

xienze · 2025-05-19T19:20:50 1747682450

> when women earn more, they spend most of what they earn, on things that require labor to produce.

Uh, what happens if the labor to produce things costs pennies because it's all been shipped out to China?

xienze · 2025-05-19T19:12:45 1747681965

... Or illegal immigrants undercutting American workers in construction and low-skill industries, or H1B workers undercutting American tech workers, or...

xienze · 2025-05-17T17:22:39 1747502559

It’s right there in the article:

>The ministry’s report admits the chief reason why so many women wind up being prosecuted is because they are more likely to open the door to inspectors.

Men literally or figuratively tell the inspectors to fuck off, women don’t. It’s not bias, systemic issues, or whatever. It’s entirely on how women handle the situation that explains the difference.

RHSeeger · 2025-05-17T19:14:57 1747509297

Right, but the post I was responding to was listing out reasons that women might be prosecuted more, including being targeted more. So, unless you mean that they are targeted more _because_ they are more likely to open the door to inspectors, my point stands.

naijaboiler · 2025-05-17T17:35:44 1747503344

They need to figure it a different way to do enforcement that’s not as discriminatory. The current approach leads to disparate outcomes and is therefore inherently discriminatory

RHSeeger · 2025-05-17T19:20:29 1747509629

It's not necessarily discriminatory just because it winds up impacting one group more than another.

For example, if you have 2 groups of people and one of the groups is doing something wrong twice as much, and you enforce the law on everyone... it's fair, not discriminatory. (Ignoring the fact that what is labelled "wrong" can be done in such a way as to be discriminatory. I'm assuming a neutral view of what is wrong )

xienze · 2025-05-16T23:10:35 1747437035

You have to keep in mind that entity beans were developed in a time before generics, annotations, and widespread use of byte code enhancement that made a lot of the easy, magical stuff we take for granted possible.

icedchai · 2025-05-17T02:57:44 1747450664

I remember. During the same time period, I wrote some Java apps that used plain old JDBC, plus some of my own helper functions for data mapping. They were lighter weight and higher performance compared to the "enterprise" Java solutions. Unfortunately they weren't buzzword compliant though.

xienze · 2025-05-14T22:05:15 1747260315

> I have heard that it's received wisdom that gamers complain that HDR modes are "too dark", so perhaps that's part of why they ruined their game's renderer.

A lot of people have cheap panels that claim HDR support (read: can display an HDR signal) but have garbage color space coverage, no local dimming, etc. and to them, HDR ends up looking muted.

simoncion · 2025-05-28T00:06:51 1748390811

For what it's worth, I have a monitor with VA panel that has no local dimming. It has a maximum brightness of a bit shy of 380 nits, and a real contrast ratio of ~3000:1, so HDR stuff ends up looking fine in dim rooms.

I would much rather have this monitor than one the "zoned" backlights that I've seen. They inevitably put nasty, nasty halos around medium-to-high-contrast parts of the picture. IME, any local dimming scheme that's less fine-grained than per-pixel dimming is simply not good enough.

xienze · 2025-05-13T19:21:05 1747164065

I don’t think banks are deliberately trying to avoid using TOTP, it’s just that they have to cater to the lowest common denominator, you know, the kind for which anything computer-related is basically black magic.

SMS is an easy target because ~everyone has a cell phone and with things like Apple’s verification code auto-complete, the amount of friction is greatly reduced.

With standard TOTP, now they have to worry about if the user correctly added the secret information to whatever authenticator app. And write corresponding documentation explaining how to do so, for every major authenticator app.

There also has to be a backup flow for when the user loses their authenticator app which is probably just going to be SMS. So why not stick with just SMS in the first place?

I hate using SMS for 2FA, but I understand the business decisions around it. I think as engineers we forget, to be frank, just how bad most people are with technology.

xp84 · 2025-05-13T19:28:28 1747164508

This is no excuse for not offering it. And no, SMS must NOT be a backup that’s always available, as the article points out, its availability for use is a security hole.

If you can’t access your actual 2FA there should be an option for the bank to have it call that registered number and ask you “Hey this is (Bank). Are you trying to log in right now from Moscow on a Windows 10 PC using Firefox? If so, please call the number on the back of your card, hit 9, put in your SSN, then we’ll turn off 2FA for one login and let you add a new one. Btw if it is not you, your password is definitely compromised.”

xienze · 2025-05-13T19:34:33 1747164873

> “Hey this is (Bank). Are you trying to log in right now from Moscow on a Windows 10 PC using Firefox? If so, please call the number on the back of your card, hit 9, put in your SSN, then we’ll turn off 2FA for one login and let you add a new one. Btw if it is not you, your password is definitely compromised.”

Stop, do not pass Go, do not collect $200. Having someone call and ask for your SSN is a non-starter.

And in what world is SMS not available but being able to call that same phone is?

jacobgkau · 2025-05-14T00:28:31 1747182511

> Having someone call and ask for your SSN is a non-starter.

That's not what he said. This hypothetical robocall would simply instruct you to call a different (known good, printed on your card) number to authenticate, at which point you know who's on the line.

> And in what world is SMS not available but being able to call that same phone is?

It's a good point about the robocall notification itself, but I imagine this kind of system wouldn't even need that to work in order to function. What actually unlocks your account is calling the bank's system and inputting your SSN; you could preemptively do it from another phone if you know you lost your 2FA codes and are trying to log in.

This person's idea would replace your phone number being your authentication with your phone number simply being used for a notification, shifting the actual authentication to something the bank already knows but that someone who stole your credit card (and maybe your phone along with it) wouldn't inherently have. I got a bad whiff from it at first, but after thinking about it a little more, I think it's a good idea.

error503 · 2025-05-13T19:49:54 1747165794

Recovery codes is an option, for one.

Since we're talking about a legacy bank here, going to a branch and proving your identity is an option.

Worst case, you could always call and speak to a human who will do whatever verification they do if you forgot your password, which is functionally equivalent.

aleks224 · 2025-05-14T14:15:43 1747232143

Do TOTP authentication apps typically provide recovery codes option? Can they squash all of the added TOTP codes you have in the app into one code?

Zak · 2025-05-13T19:29:56 1747164596

> With standard TOTP, now they have to worry about if the user correctly added the secret

The standard flow I usually see for setting up TOTP ends with entering an authentication code. If it's not valid then the setup isn't finished.

xienze · 2025-05-13T19:33:10 1747164790

That's not what I'm talking about. I'm talking about the act of adding the secret to the authenticator app in the first place. There needs to be documentation to the effect of "open Google Authenticator, and if you don't have it, download it on the App Store or Google Play store. Open the app and choose 'new secret', ...". Probably also put in a QR code and link for good measure. Rinse and repeat for all the major authenticator apps. THEN you can have them verify.

It adds up to a decent amount of supporting documentation that the bank is responsible for providing.

Zak · 2025-05-13T19:42:03 1747165323

Outside of services like Github where the average user is expected to know what an RFC is, I usually just see Google Authenticator supported and no mention of the fact that alternatives exist. That seems like an adequate solution.

xienze · 2025-05-11T21:28:00 1746998880

> The point is returning to those levels means abandoning Baltimore, Houston, much of Los Angeles and most of Miami and multi-trillion dollar projects to protect San Francisco, New York and Boston.

Here’s my problem with all this stuff. All the science says LA, NYC, etc. are going to be underwater. Not maybe, not in the worst case, no. All the reporting says this is pretty much a forgone conclusion, and has for many years.

So why have these cities not started working on erecting (say) 50ft tall “future-proof” sea walls? Even if they end up not being needed, it _seems_ like this is the type of climate change mitigation step that would be a prudent thing to do. Certainly more so than the whole lot of nothing currently being done. Surely LA and NYC politicians and voters, being so much more educated than all those dumb red state hicks would be in favor of that, wouldn’t they?

JumpCrisscross · 2025-05-11T22:04:21 1747001061

> why have these cities not started working on erecting (say) 50ft tall “future-proof” sea walls?

Because we don’t need to yet? Also, a sea wall doesn’t block, it deflects. Protecting Manhattan means deflecting those surges to e.g. Long Island and New Jersey. That’s a difficult conversation much easier had after a hurricane washes away some of the opposition (and/or generates urgency in the core).

> LA and NYC politicians and voters, being so much more educated than all those dumb red state hicks would be in favor of that, wouldn’t they?

Yes, but they’ll do what those states do with their own climate risks: wait for a catastrophic failure that ultimately costs more but unlocks federal funding and so costs less locally.

ianbutler · 2025-05-11T21:51:50 1747000310

In short there's no actual will and people think short term.

A bit longer:

Good luck sourcing that from taxes. People vote, and those projects would A, fall to graft, B piss off many in your voter base both as a consequence of the graft and the general disagreement over their value.

The answer is you would see the people who greenlit the projects voted out and the projects would be scuttled.

People can say they know this is a problem but because its in the abstract most of your voter base just won't go for it and it's squarely in a "people don't actually vote in their best interest" type of problem.

It's a riot trying to get a few new MTA tunnels approved and needed repair and modernization for the NYC subways is always basically just out of the question.

So 50 ft sea walls? Yeah people would actually be under water and still doubting the need for them.