Tablet/kindle on an arm mount (the kind with springs like a microphone or architecture lamp stand works best, goosenecks fail pretty quick), and the wearmouse app on an android watch to turn pages works pretty well.
I don't think it is dispositive, just that it likely didn't copy the proof we know was in the training set.
A) It is still possible a proof from someone else with a similar method was in the training set.
B) something similar to erdos's proof was in the training set for a different problem and had a similar alternate solution to chatgpt, and was also in the training set, which would be more impressive than A)
It is still possible a proof from someone else with a similar method was in the training set.
A proof that Terence Tao and his colleagues have never heard of? If he says the LLM solved the problem with a novel approach, different from what the existing literature describes, I'm certainly not able to argue with him.
There's an update from Tao after emailing Tenenbaum (the paper author) about this:
> He speculated that "the formulation [of the problem] has been altered in some way"....
[snip]
> More broadly, I think what has happened is that Rogers' nice result (which, incidentally, can also be proven using the method of compressions) simply has not had the dissemination it deserves. (I for one was unaware of it until KoishiChan unearthed it.) The result appears only in the Halberstam-Roth book, without any separate published reference, and is only cited a handful of times in the literature. (Amusingly, the main purpose of Rogers' theorem in that book is to simplify the proof of another theorem of Erdos.) Filaseta, Ford, Konyagin, Pomerance, and Yu - all highly regarded experts in the field - were unaware of this result when writing their celebrated 2007 solution to #2, and only included a mention of Rogers' theorem after being alerted to it by Tenenbaum. So it is perhaps not inconceivable that even Erdos did not recall Rogers' theorem when preparing his long paper of open questions with Graham in 1980.
(emphasis mine)
I think the value of LLM guided literature searches is pretty clear!
This whole thread is pretty funny. Either it can demo some pretty clever, but still limited, features resulting in math skills OR it's literally the best search engine ever invented. My guess is the former, it's pretty whatever at web search and I'd expect to see something similar to the easily retrievable, more visible proof method from Rogers' (as opposed to some alleged proof hidden in some dataset).
Either it can demo some pretty clever, but still limited, features resulting in math skills OR it's literally the best search engine ever invented.
Both are precisely true. It is a better search engine than anything else -- which, while true, is something you won't realize unless you've used the non-free 'pro research' features from Google and/or OpenAI. And it can perform limited but increasingly-capable reasoning about what it finds before presenting the results to the user.
Note that no online Web search or tool usage at all was involved in the recent IMO results. I think a lot of people missed that little detail.
Does it matter if it copied or not? How the hell would one even define if it is a copy or original at this point?
At this point the only conclusion here is:
The original proof was on the training set.
The author and Terence did not care enough to find the publication by erdos himself
The TPU implementation used approximate top-k instead of the exact used on nvidia. While that wouldn't matter too much and there was a bug with it, it still was a cost savings thing not to use exact from the beginning because it wasn't efficient on TPUs which they were routing to under load. So it was a bit of a model difference under load, even aside from the bug.
They've added this change at the same time they added random trick prompts to try and get you hit enter on the training opt in from late last year. I've gotten three popups inside claude code today at random times trying to trick me into having it train my data with a different selection defaulted than I've already chosen.
More evidence the EU solved the wrong problem. Instead of mandating cookie banners, mandate a single global “fuck off” switch: one-click, automatic opt-out from any feature/setting/telemetry/tracking/training that isn’t strictly required or clearly beneficial to the user as an individual. If it’s mainly there for data collection, ads, attribution, “product improvement”, or monetization, it should be off by default and remain that way so long as the “fuck off” option is toggled. Burden of proof on the provider. Fines exceeding what it takes to get growth teams and KPI hounds to have legal coach them on what “fuck off” means and why they need to.
DNT was useless because it didn't have a legal basis. It would have been amazing if they had mandated something like this instead of the cookie walls.
Advertisers ignored it because they could. And complained that it defaulted to on, however cookies are supposed to be opt-in so this is how it's supposed to work anyway.
remember how all of HN and tech people were saying that DNT is a Micro$oft scam designed to break privacy because it was enabled by default without requiring user action?
to the point that Apache web server developers added a custom rule in the default httpd.conf to strip away incoming DNT headers !!!
I think manipulation will come long before 2036, but the people doing high level planning on LLMs trained on forum discussions of Chucky movies and all kinds of worse stuff and planning for home robot deployment soon I think are off by a lot. Things like random stuff playing on TV rehydrating that memory that was mostly wiped out in RLHF; it will need many extra safety layers.
And even if it isn't just doing crazy intentional-seeming horror stuff, we're still a good ways off from passing the safely make a cup of coffee in a random house without burning it down or scalding the baby test.
If it takes a lot of back and forth it between lots of people it is more like a $12000 workstation or more after the labor for requesting and approving.
reply