Yeah, the progress has to be quite significant, no points are awarded for trivial observations. Thus scores are usually bimodal around 0 and 7. In the linked stats you can see that 1 point for P3/P5 was less common than full score on other problems.
Problem 2 and 5 have a lot of 1s. Sometimes thre is an interesting advance that is easirr than the full solution. Also in problem 6 the only hope for most was to get that 1 point.
nlinarith is a proof automation that attempts to finish a proof using the simplex method to find a linear combination of hypotheses and things that have already been proven, as well as some quadratic terms of them.
> We don't have a way to verify formalizations the same way.
While there is no perfect method, it is possible to use the agent to determine if the statement is false, has contradictory hypotheses, or a suspiciously short proof.
I contributed a few trivial proofs to this project, and I tried enlisting GPT-4, Copilot, Moogle, and Morph Prover to help out (I did not try LLMStep or ReProver).
Out of these:
- GPT-4 answered maybe one or two questions about syntax
- Copilot autocompleted maybe 0.5%
- Moogle was helpful much more often but still often pointed to the wrong theorem in the right file; in most of those cases I could have just done go-to-def to get to the right file and scroll through it
Note that these results are actually very good! And Terence Tao seems to have a positive opinion as well. But we're very far away from automatically proving conjectures IMO.
I will say that Lean has great metaprogramming facilities to accept any type of AI innovation that might emerge. Currently I find the most helpful tactics to be `aesop?`, which is a classical search tactic that tries a bunch of substitutions and simplifications; and `exact?`, when you know there's a trivial proof but not sure of the name. But the UX for AI can be as simple as, for example, typing the `gptf` or `llmstep` tactic, and that is fully hooked into the Lean state (goals, hypotheses, etc). So there's a lot of opportunity for incremental progress in improving existing workflows (which consist of dispatching trivial manipulations with search/simplification tactics)
Even if eprint.iacr.org did include the analogue of viXra's content (which I've heard it mostly doesn't, other than one time that famous cryptographer whose name I forgot got old and published a crank paper, which IIRC was quickly resolved), it would also include the analogue of arXiv's content, since it seems to be the leading pre-print server in that area.
viXra by comparison is fighting an uphill battle since they get only the arXiv runoff. Making an inclusive venue that's actually worth publishing in would require attracting not only the runoff from other venues but also poach the best people from said venues.
Inclusiveness in research is something I've always wanted to solve, but it seems really hard - if e.g. 10% of the incumbent elite are good and 1% of the outcasts are good (and the other 99% is mostly cranks), then the task is to get the respective 10% and 1% to mingle somehow without the rest. And I'm not even convinced arXiv has any inclusivity problem at all - it's more places like Nature that have inclusivity problems. viXra might be barking up the wrong tree.
Adding to what other commenters said, ESLint also works like this.
However in an IDE setting it's not exactly "auto"; you have to click the light bulb and accept the fix (idk about VSCode, but in Neovim you can even get a preview of the diff [1]). This is what I'm working on a Fixit PR for right now.
Question on the "Batching memory-bound processes on a GPU" section - it says "This enables us to reuse parts of the model that we’ve already loaded into the GPU’s SRAM", but the 10 GB we are loading is into the HBM, right? How did we overcome the HBM <-> SRAM bottleneck?
More generally, how can we find out the size of the SRAM?
Good question. Yes, the 10GB available for batching is in the HBM. In a single forward pass, you move the entire model from HBM -> SRAM exactly once. In a batched forward pass, this is still the case, so you end up doing more compute for the same amount of memory movement.
You can calculate the SRAM as follows: an A100 has 108 SMs, and each SM has 192 KB in SRAM (shared memory, aka its L1 cache) [1]. Multiplied out, this is ~20 MB of total SRAM. This happens to match up with the diagram in the Flash Attention paper [2].
There is a niche use-case for the reverse order `(foo min map filter baz bar)`, which is, solving typed holes (you could refine the hole as like `_.foo()` although that wouldn't be interoperable with things like next token prediction).
But that's more of a math thing than an everyday coding thing, where dot chaining usually reads nicer.