Bayesian Unlearning

This is a note from the paper https://papers.nips.cc/paper/2020/file/b8a6550662b363eb34145965d64d0cfb-Paper.pdf

Graphical model

Suppose that initially, we have a dataset $D=D_r \cup D_e$ ($r$ for remaining and $e$ for erased, $D_r \cap D_e = \varnothing$). Moreover, we have done the inference (exact inference or any type of approximate inference, for example, variational inference) for the Bayesian Deep Learning of a network with parameter $\theta$ and got $p(\theta|D)=p(\theta|D_r,D_e)$.

Now, suppose that we want to "unlearn the model knowledge" of $D_e$ from the model. There might be several reasons for this: $D_e$ might be some erroneous data that we collected and now we think that it might hurt the model and would want to remove; or $D_e$ might correspond to data of a group of users that would like to exercise her right to be forgotten (so that their data needs to be erased from the system).

In any case, we want to adjust the posterior to become closer to $p(\theta|D_r)$.

Of course, we can perform the inference again from scratch for $D_r$. For example, we can do variational inference with a variational distribution $q(\theta)$ directly on $D_r$ with the ELBO: $\mathbb{E}_{q(\theta)}[\log p(D_r|\theta)]-\text{KL}[q(\theta)||p(\theta)]$.

However, by this way, we don't take advantage of the posterior $p(\theta|D)$ that we already have. Also, $D_r$ might not be available (we don't want to disclose our remaining data for example).

Variational Inference with Evidence Upper Bound

Let's say that we want to tune the variational distribution $q(\theta)$ to approximate $p(\theta|D_r)$.

We have: \begin{equation} p(\theta|D)=p(\theta|D_r,D_e)=\frac{p(D_e|\theta)p(\theta|D_r)}{p(D_e|D_r)} \end{equation}

\begin{equation} \begin{split} \log p(D_e|D_r) &= \log p(D_e|\theta) + \log p(\theta|D_r) - \log p(\theta|D) \\ &= \mathbb{E}_{q(\theta)}\left[\log p(D_e|\theta) + \log p(\theta|D_r) - \log p(\theta|D)\right]\\ &= \mathbb{E}_{q(\theta)}\left[\log p(D_e|\theta)\right] + \text{KL}[q(\theta)||p(\theta|D)] - \text{KL}[q(\theta)||p(\theta|D_r)] \end{split} \end{equation}\begin{equation} \implies EUBO = \mathbb{E}_{q(\theta)}\left[\log p(D_e|\theta)\right] + \text{KL}[q(\theta)||p(\theta|D)] = \log p(D_e|D_r) + \text{KL}[q(\theta)||p(\theta|D_r)] \end{equation}

Therefore, EUBO is the upper bound of $\log p(D_e|D_r)$, which is constant w.r.t. to $q$. Minimizing EUBO would thus minimize $\text{KL}[q(\theta)||p(\theta|D_r)]$, which makes $q(\theta)$ approximate $p(\theta|D_r)$.

The first term of the EUBO is for unlearning from $D_e$ (actively "subotage" the model performance on $D_e$), while the second term is a trade-off/regularization term that keeps the model from entirely forgetting the posterior belief $p(\theta|D)$.

Some problems?

The normal variational inference's ELBO (described in the first paragraph) has the term $\mathbb{E}_{q(\theta)}[\log p(D_r|\theta)]$, which enforces $q(\theta)$ to explain $D_r$ well. The EUBO above, however, doesn't have such a term. It is questionable if $q(\theta)$ learned by the EUBO can explain $D_r$ well (whether it has high accuracy on $D_r$). It might result in catastrophic forgetting.

The second term of EUBO ($\text{KL}[q(\theta)||p(\theta|D)]$) aims to block that from happening. However, it's interesting to see if the "regularization" effect is strong enough. Espescially, when the size of $D_e$ is big, the effect will become very weak. Moreover, the "regularization" effect does not directly enforce $q(\theta)$ to explain $D_r$ well.

I conjecture that the problem is even worse when:

  • Size of $D_e$ is big
  • There is some overlapping between $D_e$ and $D_r$.
In [ ]: