I tested this pretty extensively and it has a common failure mode that prevents me from using: extracting footnotes and similar from the full text of academic works. For some reason, many of these models are trained in a way that results in these being excluded, despite these document sections often containing import details and context. Both versions of DeepseekOCR have the same problem. Of the others I’ve tested, dot-ocr in layout mode works best (but is slow) and then datalab’s chandra model (which is larger and has bad license constraints).
I can get multiple sets of footnotes (critical + content notes) reliably recognized and categorized using gemini-3-flash-preview. I took 15-20 hours to iterate on my prompt for a specific format. Otherwise it would not produce good enough results. It was a slow process because results from batch did not mirror what I was getting from the chat mode, and you have to wait for batch results while analyzing the last set. There was also a bit of debugging of the batch protocol going on at the same time. Flash is also surprisingly affordable for the results I am getting, 4-5x less than I had anticipated. I gave up on gemini-3-pro pretty quickly because it overthinks and messes things up.
I have been looking for an OCR model that can accurately handle footnotes. It’s essential for processing legal texts in particular, which often have footnotes that break across pages. Sadly I’ve yet to encounter a good solution.
I found Mathpix to be quite good with this type of documents, including footnotes but to be fair my documents did not have that many. It’s also proprietary.
Yeah, do crons even work consistently for GitHub Actions? I tried to set one up the other day and it just randomly skipped runs. There were some docs that suggested they’re entirely unreliable as well.
Everyone who tells the story of the reformation leaves out that Martin Luther also used this new technology to widely disseminate his deranged anti-Semitic lies and conspiracies, leading to pogroms against Jews, a hundred years of war across Europe, and providing the ideological basis for the rise of Nazism.
You're right that later in his life he spread antisemitism and other terrible opinions as he was extremely elitist towards the peasantry. Definitely not a fan of that sort of thing.
But I didn't want to make a value judgement about Martin Luther's ideological legacy, but wanted to introduce some nuance into the narrative about disruptive innovation.
Tried this the other day and the setup on this is super cumbersome and requires you to constantly rebuild your entire dev and Claude Code environment every time you use a new container, including whitelisting URLs for package managers and the like.
There are techniques to mitigate this. You can reuse containers instead of creating a new one each time. You can mount in directories (like ~/.claude) from your local machine so you dont have to set claude up each time.
I use agents in a container and persist their config like you suggest. After seeing some interest I shared my setup at https://github.com/asfaload/agents_container
It works fine for me on Linux.
Would you mind walking through the logic of that a bit for me? I'm definitely interested in productizing this, and would be interested in open sourcing as soon as I have breathing room (I have no money).
Nah, I don’t miss at all typing all the tests, CLIs, and APIs I’ve created hundreds of times before. I dunno if I it’s because I do ML stuff, but it’s almost all “think a lot about something, do some math, and and then type thousands of lines of the same stuff around the interesting work.”
I just had Claude Code finetune a reranker model to improve it significantly across a large set of evals. I chose the model to fine tune, the loss function, created the underlying training dataset for the re-ranking task, and designed the evals. What thinking did I outsource exactly?
I guess did not waste time learning the failure-prone arcana of how to schedule training jobs on HuggingFace, but that also seems to me like a net benefit.
reply