Hacker News new | past | comments | ask | show | jobs | submit login

A noob speaking here. Why aren't there efforts to have a memory bank like structure where you attend to a sub set of codes depending on the key(at the attention level)? is this already done with the global attention mechanism (what is it even)?





There are k v optimisations, unsure if gemma works with them, I didn't try.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: