More

llwu · on July 25, 2024

Partial credit is quite challenging to earn, per: https://www.imo-official.org/year_statistics.aspx?year=2024

Davidzheng · on July 25, 2024

Link isn't working for me can you summarize. What i heard ten years ago was that 1pt is quite common for a significant progress

llwu · on July 26, 2024

Yeah, the progress has to be quite significant, no points are awarded for trivial observations. Thus scores are usually bimodal around 0 and 7. In the linked stats you can see that 1 point for P3/P5 was less common than full score on other problems.

gus_massa · on July 26, 2024

Problem 2 and 5 have a lot of 1s. Sometimes thre is an interesting advance that is easirr than the full solution. Also in problem 6 the only hope for most was to get that 1 point.

llwu · on July 25, 2024

nlinarith is a proof automation that attempts to finish a proof using the simplex method to find a linear combination of hypotheses and things that have already been proven, as well as some quadratic terms of them.

Docs: https://leanprover-community.github.io/mathlib4_docs/Mathlib...

llwu · on July 25, 2024

> We don't have a way to verify formalizations the same way.

While there is no perfect method, it is possible to use the agent to determine if the statement is false, has contradictory hypotheses, or a suspiciously short proof.

llwu · on Dec 8, 2023

I contributed a few trivial proofs to this project, and I tried enlisting GPT-4, Copilot, Moogle, and Morph Prover to help out (I did not try LLMStep or ReProver).

Out of these:

- GPT-4 answered maybe one or two questions about syntax

- Copilot autocompleted maybe 0.5%

- Moogle was helpful much more often but still often pointed to the wrong theorem in the right file; in most of those cases I could have just done go-to-def to get to the right file and scroll through it

Note that these results are actually very good! And Terence Tao seems to have a positive opinion as well. But we're very far away from automatically proving conjectures IMO.

I will say that Lean has great metaprogramming facilities to accept any type of AI innovation that might emerge. Currently I find the most helpful tactics to be `aesop?`, which is a classical search tactic that tries a bunch of substitutions and simplifications; and `exact?`, when you know there's a trivial proof but not sure of the name. But the UX for AI can be as simple as, for example, typing the `gptf` or `llmstep` tactic, and that is fully hooked into the Lean state (goals, hypotheses, etc). So there's a lot of opportunity for incremental progress in improving existing workflows (which consist of dispatching trivial manipulations with search/simplification tactics)

llwu · on Nov 23, 2023

Even if eprint.iacr.org did include the analogue of viXra's content (which I've heard it mostly doesn't, other than one time that famous cryptographer whose name I forgot got old and published a crank paper, which IIRC was quickly resolved), it would also include the analogue of arXiv's content, since it seems to be the leading pre-print server in that area.

viXra by comparison is fighting an uphill battle since they get only the arXiv runoff. Making an inclusive venue that's actually worth publishing in would require attracting not only the runoff from other venues but also poach the best people from said venues.

Inclusiveness in research is something I've always wanted to solve, but it seems really hard - if e.g. 10% of the incumbent elite are good and 1% of the outcasts are good (and the other 99% is mostly cranks), then the task is to get the respective 10% and 1% to mingle somehow without the rest. And I'm not even convinced arXiv has any inclusivity problem at all - it's more places like Nature that have inclusivity problems. viXra might be barking up the wrong tree.

llwu · on Nov 22, 2023

Adding to what other commenters said, ESLint also works like this.

However in an IDE setting it's not exactly "auto"; you have to click the light bulb and accept the fix (idk about VSCode, but in Neovim you can even get a preview of the diff [1]). This is what I'm working on a Fixit PR for right now.

[1] https://github.com/aznhe21/actions-preview.nvim

llwu · on Nov 21, 2023

Question on the "Batching memory-bound processes on a GPU" section - it says "This enables us to reuse parts of the model that we’ve already loaded into the GPU’s SRAM", but the 10 GB we are loading is into the HBM, right? How did we overcome the HBM <-> SRAM bottleneck?

More generally, how can we find out the size of the SRAM?

varunshenoy · on Nov 21, 2023

Good question. Yes, the 10GB available for batching is in the HBM. In a single forward pass, you move the entire model from HBM -> SRAM exactly once. In a batched forward pass, this is still the case, so you end up doing more compute for the same amount of memory movement.

You can calculate the SRAM as follows: an A100 has 108 SMs, and each SM has 192 KB in SRAM (shared memory, aka its L1 cache) [1]. Multiplied out, this is ~20 MB of total SRAM. This happens to match up with the diagram in the Flash Attention paper [2].

[1] https://developer.nvidia.com/blog/cuda-refresher-cuda-progra...

[2] https://arxiv.org/pdf/2205.14135.pdf

Const-me · on Nov 21, 2023

> How did we overcome the HBM <-> SRAM bottleneck?

Because every number we load from the model through that bottleneck gets reused, to compute different requests within the batch.

llwu · on July 25, 2023

There is a niche use-case for the reverse order `(foo min map filter baz bar)`, which is, solving typed holes (you could refine the hole as like `_.foo()` although that wouldn't be interoperable with things like next token prediction).

But that's more of a math thing than an everyday coding thing, where dot chaining usually reads nicer.

Mixed is definitely the worst, like you said.

llwu · on July 25, 2023

This is a cool idea - it could be tested it by seeing whether W_V tends to map <bot> to 0, and whether <bot> tends to get attended to.

There are interesting things to said whether or not this turns out to be the case.

llwu · on July 25, 2023

This is probably what the author meant to say but elided. I can see why it looks off though.