This should honestly have been implemented a long time ago. Much of academia is pressured to churn out papers month after month as academia is prioritizing volume over quality or impact.
wow - this is really well made! i've been doing research w/ Transformer-based audio/speech models and this is made with incredible detail. Attention as a concept itself is already quite unintuitive for beginners due to is non-linearity, so this also explains it very well
> Attention as a concept itself is already quite unintuitive
Once you realize that Attention is really just a re-framing of Kernel Smoothing it becomes wildly more intuitive [0]. It also allows you to view Transformers as basically learning a bunch of stacked Kernels which leaves them in a surprisingly close neighborhood to Gaussian Processes.
> I'd be grateful for any pointers to an example where system developers (or someone else in a position to know) have verified the success of a prompt extraction.
You can try this yourself with any open source llm setup that lets you provide a system prompt no? Just give it a prompt, ask the model the prompt ,and see if it matches.
gpt-oss is trained to refuse so it wont share (you can provide system prompt on lmstudio)
Wow. Arco was my first introduction into Arch-based systems - it taught me invaluable amounts of information. I run Arch Linux now, but Arco always has had a special place in my heart. So sad to see it go.