Uncertainty guided life long learning in bayesian networks:
Identifying the most influential parameters in a representation learned for one task plays an important role in continual learning.
This paper addresses two learning directions:
1) completely eliminating catastrophic forgetting using weight pruning(dropping neurons/weights?),
where a hard selection mask freezes the most certain parameters (BLLL-PRN)
where a hard selection mask freezes the most certain parameters (BLLL-PRN)
2) reducing catastrophic forgetting by adaptively modifying the learning rates based on how uncertain
the parameter values are (BLLL-REG)(This method has the added advantage that there's no need
have the information of task boundaries)
the parameter values are (BLLL-REG)(This method has the added advantage that there's no need
have the information of task boundaries)
Since we have considered a Bayesian approach, we have to look at Bayesian Neural networks.
The posterior over the weights of the neural network is learned in a BNN. One of the popular ways
to perform inference in a BNN is using Variational Inference. This paper exactly follows
BayesByBackProp algorithm by [Blundell, 2015].
Each weight parameter of the network is represented with the help of a mean $\mu$ and variance
$\sigma$.
These two statistics can help determine which are the important parameters for a given task.
The posterior over the weights of the neural network is learned in a BNN. One of the popular ways
to perform inference in a BNN is using Variational Inference. This paper exactly follows
BayesByBackProp algorithm by [Blundell, 2015].
Each weight parameter of the network is represented with the help of a mean $\mu$ and variance
$\sigma$.
These two statistics can help determine which are the important parameters for a given task.
Bayesian Life Long Learning:
BLL - PRN
BayesbyBackprop - weights are from a distribution parameterized by $\mu$ and $\sigma$.
Signal Noise Ratio, which is used to indicate the importance of a parameter is defined for each of the
parameters’ as $\frac{\mu}{\sigma}$:
Signal Noise Ratio, which is used to indicate the importance of a parameter is defined for each of the
parameters’ as $\frac{\mu}{\sigma}$:
So the parameters are sorted according to their SNR value and pruned if their values are below a
certain threshold.
certain threshold.
These pruned parameters are ignored from being used with the help of a binary mask which is different
for each task at the time of inference.
for each task at the time of inference.
BLL - REG
Controlling further changes in the learned representations.
The more certain the mean of a distribution is, the less it should be allowed to be updated to learn
future concept. Hence, the idea is to condition the changes in the mean of each parameter’s
distribution on its uncertainty.
future concept. Hence, the idea is to condition the changes in the mean of each parameter’s
distribution on its uncertainty.
This is imposed by scaling the learning rate of a particular $\mu_{i}$ according to its $\sigma_{i}$
value.
value.
Pruning is done as a function of the validation accuracy.
Comments
Post a Comment