Hacker News new | past | comments | ask | show | jobs | submit login
Enabling Continual Learning in Neural Networks (deepmind.com)
152 points by interconnector on March 14, 2017 | hide | past | favorite | 19 comments



This builds on the ideas behind PathNet, previously discussed at https://news.ycombinator.com/item?id=13675891

Whereas PathNet permanently freezes parameters and pathways used for previously learned tasks, in this case the authors compute how important each connection is to the most recently learned task, and protect each connection from future modification by an amount proportional to its importance. Important pathways tend to persist, and unimportant pathways tend to be discarded, gradually freeing "underused" connections for learning new tasks.

The authors call this process Elastic Weight Consolidation (EWC). Figure 1 in the paper does a great job of explaining how EWC finds solutions in the search space of solutions that are good for new tasks without incurring significant losses for previous tasks.

Very cool!


that's pretty incredible - I'm no AI expert but it certainly sounds like this algorithm provides an ANN equivalent of neuroplasticity, which seems like a big step.


I'm confused. I don't get what the novelty in this is. It looks like all they do is include an input that identifies different tasks and then trains one neural network to learn a separate distributions for each task, with some weight sharing...


It may be an obvious solution, but has anyone done that before? While retaining the ability to have said weight sharing?


Of course, people have done this before [1]. There is quite of bit of research looking into Multi-task learning. Just look through some of the references in that Luong et all paper. Deepmind has been putting out some amazing research lately, but this paper definitely does not fall in that category.

1. http://nlp.stanford.edu/pubs/luong2016iclr_multi.pdf


I know Jeff Hawkins' Numenta have been tracking "permanence" in their artificial synapses for many years now.


> but has anyone done that before?

The answer that question is nearly always YES, no matter what you ask it about.


And relativity is just the idea that the laws of physics are the same in every inertial reference frame.


To compare this to the introduction of relativity is just silly.


Less silly than it looks at first sight. After all, for day-to-day use (except for GPS I guess, and most people probably don't even realize it would not work without a correction for relativistic effects) relativity is very little gain over classical Newtonian physics and a lot more complex to work out from a mathematical point of view.

So even though 'that's how it really should work' we tend to take the shortcut because it is 'good enough' for almost all use cases.

Which caused us to miss the wood for the trees for a long time. This minor change is what enables learning in the first place, and as such it could easily be a game changer.


Nuclear reactors?


Making an analogy is not the same as implying equivalence.


Sidenote: On the list of contributors I noticed there are Research Engineers and Research Scientists. What is the difference between the two?


Research engineers turn theory, pseudocode, or smaller proof of concepts into a more fleshed out implementation. Once a research project exceeds a few thousand lines of code it becomes useful to have dedicated engineers doing architectural design, owning unit testing / backtesting frameworks, code quality control, etc.

Source: was a research engineer at Intel Labs several years ago.


While this is true, it is also sometimes merely based on your degree. I was a "Research Engineer" doing the same work as "Research Scientists" because my degree was in "Computer Engineering" not "Computer Science."

So, YMMV.


(Another one) For DeepMind case, in order to be Research Scientists you have to have or about to have a PhD degree.


This came out two days ago and uses what they call intelligent synapses to improve multi-task learning: https://arxiv.org/abs/1703.04200

Seems closely related.


"..After learning a task, we compute how important each connection is to that task."

Anyone know if this was expanded on in the whitepaper?


Presumably you could track e.g. the average gradient of the error with respect to each weight.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: