Enabling Continual Learning in Neural Networks

cs702 · on March 14, 2017

This builds on the ideas behind PathNet, previously discussed at https://news.ycombinator.com/item?id=13675891

Whereas PathNet permanently freezes parameters and pathways used for previously learned tasks, in this case the authors compute how important each connection is to the most recently learned task, and protect each connection from future modification by an amount proportional to its importance. Important pathways tend to persist, and unimportant pathways tend to be discarded, gradually freeing "underused" connections for learning new tasks.

The authors call this process Elastic Weight Consolidation (EWC). Figure 1 in the paper does a great job of explaining how EWC finds solutions in the search space of solutions that are good for new tasks without incurring significant losses for previous tasks.

Very cool!

beaconstudios · on March 15, 2017

that's pretty incredible - I'm no AI expert but it certainly sounds like this algorithm provides an ANN equivalent of neuroplasticity, which seems like a big step.

rayuela · on March 14, 2017

I'm confused. I don't get what the novelty in this is. It looks like all they do is include an input that identifies different tasks and then trains one neural network to learn a separate distributions for each task, with some weight sharing...

Filligree · on March 14, 2017

It may be an obvious solution, but has anyone done that before? While retaining the ability to have said weight sharing?

rayuela · on March 14, 2017

Of course, people have done this before [1]. There is quite of bit of research looking into Multi-task learning. Just look through some of the references in that Luong et all paper. Deepmind has been putting out some amazing research lately, but this paper definitely does not fall in that category.

1. http://nlp.stanford.edu/pubs/luong2016iclr_multi.pdf

etiam · on March 15, 2017

I know Jeff Hawkins' Numenta have been tracking "permanence" in their artificial synapses for many years now.

gnaritas · on March 14, 2017

> but has anyone done that before?

The answer that question is nearly always YES, no matter what you ask it about.

eutectic · on March 14, 2017

And relativity is just the idea that the laws of physics are the same in every inertial reference frame.

rayuela · on March 14, 2017

To compare this to the introduction of relativity is just silly.

jacquesm · on March 14, 2017

Less silly than it looks at first sight. After all, for day-to-day use (except for GPS I guess, and most people probably don't even realize it would not work without a correction for relativistic effects) relativity is very little gain over classical Newtonian physics and a lot more complex to work out from a mathematical point of view.

So even though 'that's how it really should work' we tend to take the shortcut because it is 'good enough' for almost all use cases.

Which caused us to miss the wood for the trees for a long time. This minor change is what enables learning in the first place, and as such it could easily be a game changer.

posterboy · on March 14, 2017

Nuclear reactors?

eutectic · on March 15, 2017

Making an analogy is not the same as implying equivalence.

colmvp · on March 14, 2017

Sidenote: On the list of contributors I noticed there are Research Engineers and Research Scientists. What is the difference between the two?

jhurliman · on March 14, 2017

Research engineers turn theory, pseudocode, or smaller proof of concepts into a more fleshed out implementation. Once a research project exceeds a few thousand lines of code it becomes useful to have dedicated engineers doing architectural design, owning unit testing / backtesting frameworks, code quality control, etc.

Source: was a research engineer at Intel Labs several years ago.

taeric · on March 14, 2017

While this is true, it is also sometimes merely based on your degree. I was a "Research Engineer" doing the same work as "Research Scientists" because my degree was in "Computer Engineering" not "Computer Science."

So, YMMV.

yigitdemirag · on March 15, 2017

(Another one) For DeepMind case, in order to be Research Scientists you have to have or about to have a PhD degree.

apl · on March 15, 2017

This came out two days ago and uses what they call intelligent synapses to improve multi-task learning: https://arxiv.org/abs/1703.04200

Seems closely related.

spynxic · on March 14, 2017

"..After learning a task, we compute how important each connection is to that task."

Anyone know if this was expanded on in the whitepaper?

eutectic · on March 15, 2017

Presumably you could track e.g. the average gradient of the error with respect to each weight.