Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
gslaller
1 day ago
|
parent
|
context
|
favorite
| on:
Gemma 3 Technical Report [pdf]
A noob speaking here. Why aren't there efforts to have a memory bank like structure where you attend to a sub set of codes depending on the key(at the attention level)? is this already done with the global attention mechanism (what is it even)?
genewitch
18 hours ago
[–]
There are k v optimisations, unsure if gemma works with them, I didn't try.
reply
Join us for
AI Startup School
this June 16-17 in San Francisco!
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: