Warning: Undefined array key "scheme" in /home/bitrix/www/index.php on line 202

Warning: Undefined array key "host" in /home/bitrix/www/index.php on line 202

Warning: Undefined array key "scheme" in /home/bitrix/www/index.php on line 210

Warning: Undefined array key "scheme" in /home/bitrix/www/index.php on line 265

Warning: Undefined array key "host" in /home/bitrix/www/index.php on line 265
CRFU: Compressive Representation Forgetting Against Privacy Leakage on Machine Unlearning
lynx   »   [go: up one dir, main page]

CRFU: Compressive Representation Forgetting Against Privacy Leakage on Machine Unlearning

Weiqi Wang,  , Chenhan Zhang,  , Zhiyi Tian,  ,
Shushu Liu,   and Shui Yu
W. Wang, Z. Tian and S. Yu are with the School of Computer Science, University of Technology Sydney, Australia.
E-mail: Weiqi.Wang-2@student.uts.edu.au,
{zhiyi.tian-1, shui.yu}@uts.edu.au C. Zhang is with the Postdoctoral Research Fellow, Macquarie University, Australia. E-mail: chzhang@ieee.org S. Liu is with the Department of Communication and Networking, Aalto University, Espoo, Finland.E-mail: liu.shushu@aalto.fi This paper was supported in part by Australia ARC LP220100453, ARC DP200101374, and ARC DP240100955.
(Corresponding author: Chenhan Zhang.)
Abstract

Machine unlearning allows data owners to erase the impact of their specified data from trained models. Unfortunately, recent studies have shown that adversaries can recover the erased data, posing serious threats to user privacy. An effective unlearning method removes the information of the specified data from the trained model, resulting in different outputs for the same input before and after unlearning. Adversaries can exploit these output differences to conduct privacy leakage attacks, such as reconstruction and membership inference attacks. However, directly applying traditional defenses to unlearning leads to significant model utility degradation. In this paper, we introduce a Compressive Representation Forgetting Unlearning scheme (CRFU), designed to safeguard against privacy leakage on unlearning. CRFU achieves data erasure by minimizing the mutual information between the trained compressive representation (learned through information bottleneck theory) and the erased data, thereby maximizing the distortion of data. This ensures that the model’s output contains less information that adversaries can exploit. Furthermore, we introduce a remembering constraint and an unlearning rate to balance the forgetting of erased data with the preservation of previously learned knowledge, thereby reducing accuracy degradation. Theoretical analysis demonstrates that CRFU can effectively defend against privacy leakage attacks. Our experimental results show that CRFU significantly increases the reconstruction mean square error (MSE), achieving a defense effect improvement of approximately 200%percent200200\%200 % against privacy reconstruction attacks with only 1.5%percent1.51.5\%1.5 % accuracy degradation on MNIST.

Index Terms:
Machine unlearning, compressive representation, privacy leakage, reconstruction attacks.

1 Introduction

Machine unlearning has garnered extensive attention recently, as the “right to be forgotten” has been legislated worldwide. This legislation mandates that Machine Learning (ML) service providers not only delete data collected from individual users but also remove the contribution of that data from trained ML models [1]. The concept of machine unlearning was first introduced by Cao and Yang in [2], involving the removal of a specific subset of data previously used to train an ML model. ML model providers must ensure that the provided model is no longer associated with the erased data. Various methods, including exact unlearning [2, 3] and approximate unlearning [4, 5], have been proposed as effective solutions to exercise users’ rights and protect their privacy.

Although machine unlearning aims to preserve users’ privacy by removing their data from trained models, recent studies have revealed that the changes induced by unlearning can leak private information about the erased data. For instance, Chen et al. [6] proposed membership inference attacks to infer which data samples are present in the erased dataset. Additionally, Zhang et al. and Hu et al. [9, 8, 7] identified a further attack, known as privacy reconstruction, capable of recovering deleted data based on unlearning updates. Similarly, [10] introduced the privacy reconstruction attack aimed at recovering specific data points used in model updates. The privacy leakage caused by unlearning updates has become a critical issue in machine unlearning, and few works have effectively addressed this threat, particularly concerning reconstruction attacks.

Addressing potential privacy breaches in machine unlearning is critical and requires urgent attention. However, finding an effective solution remains a significant challenge. Firstly, existing unlearning methods, while effective, conflict with privacy leakage defenses because unlearning necessitates the complete removal of the specified samples’ information from the trained model. This typically results in different outputs for the same query set before and after unlearning, which adversaries can exploit to infer private information about the erased data. Secondly, directly incorporating defense strategies such as differential privacy [11] into unlearning methods exacerbates model utility degradation, leading to what is termed “unlearning catastrophic” [4, 12], rendering the model unusable. The key challenge lies in designing an unlearning method that reduces the private information in the model output to prevent inference attacks while still achieving effective unlearning performance.

Refer to caption
Figure 1: Compressive representation forgetting unlearning (CRFU) to defend against privacy leakage on unlearning. When a user (Alice) requests to unlearn her sensitive dataset Desubscript𝐷𝑒D_{e}italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT from a trained IB model, CRFU minimizes the mutual information between the erased dataset and the learned representation to unlearn both the representer and approximator of this model. Since both our unlearning and original learning distorts as much information of inputs X𝑋Xitalic_X in representation Z𝑍Zitalic_Z as possible, the outputs of the model contain less information about the inputs that could be used for privacy inference by Bob. It can also be explained from the training Markov chain, XZY^𝑋𝑍^𝑌X\to Z\to\hat{Y}italic_X → italic_Z → over^ start_ARG italic_Y end_ARG; and the representer is trained to minimize the mutual information between X and Z; therefore, I(X;X)I(X;Z)I(X;Y^)much-greater-than𝐼𝑋𝑋𝐼𝑋𝑍𝐼𝑋^𝑌I(X;X)\gg I(X;Z)\geq I(X;\hat{Y})italic_I ( italic_X ; italic_X ) ≫ italic_I ( italic_X ; italic_Z ) ≥ italic_I ( italic_X ; over^ start_ARG italic_Y end_ARG ).

Our work: In this paper, we propose a Compressive Representation Forgetting Unlearning (CRFU) scheme to defend against privacy leakage on machine unlearning, as illustrated in Figure 1. The CRFU scheme is tailored for unlearning models trained using information bottleneck (IB) theory [13]. An IB-trained model comprises two components, a representer and an approximator, which aim to maximize the compression of input data while preserving sufficient information relevant to the labels in the learned representation. CRFU implements unlearning for both components. Specifically, CRFU further minimizes the mutual information between learned representations and the erased data (both inputs and labels) to remove the information of these samples in IB-trained models. Since the unlearning process is based on representations, which maximizes the distortion of both the original training data inputs and the erased data inputs, the model’s output contains less information that adversaries can exploit. Consequently, CRFU can effectively defend against privacy leakage attacks for a black-box ML model [14]. However, directly optimizing unlearning methods would lead to significant model utility degradation, known as “catastrophic unlearning.” To counteract model utility decline, we introduce a remembering constraint and an unlearning rate in CRFU. Adjusting the unlearning rate helps balance the trade-off between completely removing the influence of specified samples and preserving previously acquired knowledge.

We conduct theoretical analysis and extensive experiments to demonstrate that CRFU achieves a significant improvement in defense against privacy leakage attacks [8, 6] than existing unlearning methods [15, 4]. Moreover, we evaluate the erasure effectiveness of CRFU employing a widely recognized methodology outlined in [16]. This involves strategically integrating backdoored samples into the training dataset used for the original model training. The primary goal of the unlearning process is to comprehensively remove the influence of these backdoored samples on the trained model. The efficacy is evaluated based on the performance of backdoor attacks: a lower attack success rate signifies superior unlearning performance. The results show that CRFU significantly outperforms the recently proposed state-of-the-art approximate unlearning methods [15, 4, 5] in both privacy leakage defense and erasure effectiveness.

To sum up, our contributions are:

  • To our best knowledge, this is the first work that studies the defense of privacy leakage on unlearning. Our approach, compressive representation forgetting unlearning (CRFU), can achieve a significant defense effect and is easy to extend to support most ML algorithms.

  • We introduce a remembering constraint and an unlearning rate in CRFU to achieve a balance between forgetting the erased samples and remembering previously learned knowledge. Adjusting the unlearning rate can control the unlearning extent and effectively mitigate the utility decline.

  • Theoretical analysis and experimental results demonstrate the effect of CRFU in defending against privacy leakage and mitigating model utility degradation, which outperforms state-of-the-art unlearning methods. The codebase is accessible at https://github.com/wwq5-code/CRFU.git.

The rest of the paper is structured as follows. We review the related work in Section 2. An overview of the required background and notation can be found in Section 3. In Section 4, we first define the compressive representation forgetting unlearning problem based on IB theory; then, we present a detailed introduction to compressive representation forgetting unlearning. Section 5 provides a theoretical analysis of why CRFU can defend against privacy leakage on unlearning. Section 6 details our experimental findings, showcasing a comparative analysis with related work. In Section 7, we offer a concise summary of the paper.

2 Related Work

2.1 Machine Unlearning

Machine unlearning, a method designed to eliminate the contribution of certain data from a trained model, facilitates individuals’ right to be forgotten. This concept is explored in works such as [17, 18]. A straightforward but often impractical approach to machine unlearning involves retraining the machine learning model from scratch. This method can be prohibitively expensive in terms of computational overhead and storage requirements, particularly for complex deep learning tasks. Current strategies for machine unlearning are generally divided into two main types: “fast retraining” and “approximate unlearning”.

Fast retraining methods in machine unlearning primarily focus on reducing the computational overhead associated with retraining models. These methods involve a partial redesign of learning algorithms and necessitate storing training data or intermediate parameters during the training process, thereby incurring increased storage costs [2, 3, 19, 20]. In [2], Cao and Yang restructured traditional machine learning algorithms into a summation-based framework. This innovation allows for quicker updates to the model when unlearning is required. Instead of retraining the entire model from scratch, only a few summations need to be modified, significantly accelerating the process. In [3, 19], Bourtoule et al. and Yan et al. proposed advanced methods for unlearning samples in deep neural networks. Their approach involves segmenting the complete dataset into smaller portions, referred to as shards, with individual sub-models trained on each shard. When data needs to be unlearned, it is removed from the relevant shard, and only the sub-model for that specific shard is retrained. This method, however, incurs significant storage costs, as it requires maintaining the intermediate training parameters for each shard and storing the entire training dataset. Additionally, its efficiency diminishes with an increase in the frequency of data removal requests.

Approximate unlearning methods aim to directly modify the original trained model to approximate ground truth (model retrained using the remaining dataset). In [15, 21], Guo et al. introduced a certified-removal method that draws parallels to the principles of differential privacy, as established by Dwork[11]. The mechanism ensures that a model, after data erasure, remains indistinguishable from a model that never incorporated the erased data. To address the potential decline in model utility due to unlearning, both [15] and [21] employed a strategy that limits the updates to model parameters. This approach aligns with the concept of differential unlearning, maintaining model effectiveness while adhering to unlearning requirements. In [4], Nguyen et al. approached unlearning by approximating the posterior based on the data set for removal, utilizing Bayesian inference techniques as outlined in [22]. While these methods employ bounds or thresholds to avert significant loss of unlearning, they can still lead to a reduction in model utility after unlearning. To address this, we introduce a remembering constraint and an unlearning rate in CRFU to diminish the accuracy decline caused by unlearning.

2.2 Privacy Attacks on Unlearning

As machine unlearning becomes hot, it also brings new challenges of privacy threats. Chen et al. [6] highlighted that the variance in a model’s outputs before and after unlearning could inadvertently expose the privacy of the removed data. They further explored this issue by proposing a membership inference attack specifically targeting the unlearning process. Lu et al. [23] further proposed label‐only membership inference attacks targetting black-box machine unlearning. In addition to membership inference attacks on unlearning, Gao et al. [9] introduced the concept of deleted reconstruction attacks. This form of model inversion attack [24], aims to reconstruct removed data by analyzing the outputs of both the original and the unlearned models. Similarly, Salem et al. [10] proposed a reconstruction attack aimed at recovering specific data samples involved in the model updating process. This is achieved by comparing the model’s outputs before and after the update.

However, there are few works have effectively addressed the threat caused by machine unlearning. In [6], Chen et al. suggested various strategies to mitigate privacy leaks in machine unlearning, one of which includes the application of differential privacy. However, they noted that while these methods can enhance privacy protection, they often exacerbate the issue of utility degradation in the model. In the paper, we introduce CRFU, a novel approach that is tailored to unlearning IB-trained models by discarding the information of the specified data from the trained compressed representations. Since the data has been compressed during learning and unlearning, it can effectively thwart adversaries’ attempts to infer private properties from the model outputs.

3 Preliminary

TABLE I: Basic Notations
Notations Descriptions
D=(X,Y)𝐷𝑋𝑌D=(X,Y)italic_D = ( italic_X , italic_Y ) Full training data D𝐷Ditalic_D, including inputs X𝑋Xitalic_X and labels Y𝑌Yitalic_Y
De=(Xe,Ye)subscript𝐷𝑒subscript𝑋𝑒subscript𝑌𝑒D_{e}=(X_{e},Y_{e})italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) The erased data Desubscript𝐷𝑒D_{e}italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT, including inputs Xesubscript𝑋𝑒X_{e}italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT and labels Yesubscript𝑌𝑒Y_{e}italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT
Dr=(Xr,Yr)subscript𝐷𝑟subscript𝑋𝑟subscript𝑌𝑟D_{r}=(X_{r},Y_{r})italic_D start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = ( italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) The remaining data Drsubscript𝐷𝑟D_{r}italic_D start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, including Xrsubscript𝑋𝑟X_{r}italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and Yrsubscript𝑌𝑟Y_{r}italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT
Z𝑍Zitalic_Z The compressive representation
p(Z|X)𝑝conditional𝑍𝑋p(Z|X)italic_p ( italic_Z | italic_X ) The representation posterior learned based on X𝑋Xitalic_X
p(Z|XXe)𝑝conditional𝑍subscript𝑋subscript𝑋𝑒p(Z|X_{-X_{e}})italic_p ( italic_Z | italic_X start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) The representation posterior unlearned based on Xesubscript𝑋𝑒X_{e}italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT
Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG The predicted approximation of the model
Y^(Xe,Ye)subscript^𝑌subscript𝑋𝑒subscript𝑌𝑒\hat{Y}_{-(X_{e},Y_{e})}over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT The unlearned approximation, also being written as Y^usubscript^𝑌𝑢\hat{Y}_{u}over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT
x,z,y𝑥𝑧𝑦x,z,yitalic_x , italic_z , italic_y The persample from X,Z,Y𝑋𝑍𝑌X,Z,Yitalic_X , italic_Z , italic_Y
θrsuperscript𝜃𝑟\theta^{r}italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT The representer of an IB model
θasuperscript𝜃𝑎\theta^{a}italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT The approximator of an IB model
repsubscript𝑟𝑒𝑝\mathcal{L}_{rep}caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT The representation loss
appsubscript𝑎𝑝𝑝\mathcal{L}_{app}caligraphic_L start_POSTSUBSCRIPT italic_a italic_p italic_p end_POSTSUBSCRIPT The approximation loss of predicting
repusuperscriptsubscript𝑟𝑒𝑝𝑢\mathcal{L}_{rep}^{u}caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT The representation unlearning loss
appusuperscriptsubscript𝑎𝑝𝑝𝑢\mathcal{L}_{app}^{u}caligraphic_L start_POSTSUBSCRIPT italic_a italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT The approximation unlearning loss
β𝛽\betaitalic_β The Lagrange multiplier in IB
βusubscript𝛽𝑢\beta_{u}italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT The unlearning rate of CRFU

In Table I, we present a concise summary of the key notations employed throughout this paper. The full training dataset is represented as D𝐷Ditalic_D, consisting of the input data X𝑋Xitalic_X and labels Y𝑌Yitalic_Y. The notation Z𝑍Zitalic_Z is used to denote the compressive representation learned through the IB model, which is optimized to maximize the distortion of input data while concurrently retaining maximal relevant information about the labels. The representation posterior, formulated from X𝑋Xitalic_X, is expressed as p(Z|X)𝑝conditional𝑍𝑋p(Z|X)italic_p ( italic_Z | italic_X ). Additionally, the unlearned representation posterior, unlearning based on the specified data Xesubscript𝑋𝑒X_{e}italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT, is denoted as p(Z|XXe)𝑝conditional𝑍subscript𝑋subscript𝑋𝑒p(Z|X_{-X_{e}})italic_p ( italic_Z | italic_X start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ). The notation Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG is used to utilized to represent the predictive approximation output of an IB model. After unlearning, the predictive approximation is denoted either as Y^(Xe,Ye)subscript^𝑌subscript𝑋𝑒subscript𝑌𝑒\hat{Y}_{-(X_{e},Y_{e})}over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT or Y^usubscript^𝑌𝑢\hat{Y}_{u}over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT. In experimental setups, we use sample variables x,z,y𝑥𝑧𝑦x,z,yitalic_x , italic_z , italic_y drawn from populations X,Z,Y𝑋𝑍𝑌X,Z,Yitalic_X , italic_Z , italic_Y. In IB algorithms, there are two types of losses: representation loss (repsubscript𝑟𝑒𝑝\mathcal{L}_{rep}caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT) and prediction approximation loss (appsubscript𝑎𝑝𝑝\mathcal{L}_{app}caligraphic_L start_POSTSUBSCRIPT italic_a italic_p italic_p end_POSTSUBSCRIPT). Here, β𝛽\betaitalic_β acts as a Lagrange multiplier in IB, balancing distortion and utility. In the CRFU process, we encounter the final unlearning loss, a combination of representation unlearning loss (repusuperscriptsubscript𝑟𝑒𝑝𝑢\mathcal{L}_{rep}^{u}caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT) and approximation unlearning loss (appusuperscriptsubscript𝑎𝑝𝑝𝑢\mathcal{L}_{app}^{u}caligraphic_L start_POSTSUBSCRIPT italic_a italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT). The parameter βusubscript𝛽𝑢\beta_{u}italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT is used to manage the trade-off between completely unlearning the influence of specified samples and not entirely forgetting the previously learned representation.

3.1 Threat Model of Privacy Leakage Attacks on Unlearning

One effective privacy leakage attack is called model inversion, initially introduced by Fredrikson et al. [24]. It aims to deduce missing attributes of input features by interacting with a trained ML model. An example of this is the reconstruction of facial images from deep learning models, as studied in [25]. In the context of machine unlearning, researchers like Hu et al. [9, 8, 6, 7] have identified that the discrepancies in a model’s outputs before and after unlearning could potentially reveal information about the deleted data. These attacks in the realm of unlearning primarily focus on exploiting these differences to compromise the erased data privacy. The main process and adversary capabilities of privacy leakage attacks on unlearning are summarized as follows.

Refer to caption
Figure 2: Privacy leakage attack based on the black-box model’s outputs before and after unlearning.

3.1.1 Attacking Data Preparation

Assuming D𝐷Ditalic_D represents the original training dataset and e𝑒eitalic_e is the sample a user requests to unlearn, we denote Dr=D\{e}subscript𝐷𝑟\𝐷𝑒D_{r}=D\backslash\{e\}italic_D start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = italic_D \ { italic_e } as the remaining dataset. If there are several erased samples, we refer to the erased dataset as Desubscript𝐷𝑒D_{e}italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT. Machine unlearning aims to erase the influence of the specified dataset Desubscript𝐷𝑒D_{e}italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT from a trained ML model (D)𝐷\mathcal{M}(D)caligraphic_M ( italic_D ), where the model is trained based on the full training dataset D𝐷Ditalic_D. The unlearning update is performed by executing an unlearning algorithm 𝒰𝒰\mathcal{U}caligraphic_U on the current trained model using the erased dataset. More formally, given a erased dataset Desubscript𝐷𝑒D_{e}italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT, a trained ML model \mathcal{M}caligraphic_M, and the unlearning algorithm 𝒰𝒰\mathcal{U}caligraphic_U, the unlearning process can be defined as 𝒰:De,(D)u(D\De):𝒰subscript𝐷𝑒𝐷subscript𝑢\𝐷subscript𝐷𝑒\mathcal{U}:D_{e},\mathcal{M}(D)\to\mathcal{M}_{u}(D\backslash D_{e})caligraphic_U : italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , caligraphic_M ( italic_D ) → caligraphic_M start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_D \ italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ), where usubscript𝑢\mathcal{M}_{u}caligraphic_M start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT is the unlearned version model \mathcal{M}caligraphic_M.

3.1.2 Adversary Goal and Capabilities

The goal of privacy leakage attacks on machine unlearning is to infer the privacy of the erased samples Desubscript𝐷𝑒D_{e}italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT of an unlearning update. We assume the adversary can only have black-box access to the target model. This limitation means that the adversary can interact with the model solely through queries using a specific set of data samples, known as the probing set, and subsequently receive the corresponding outputs. Furthermore, it is presumed that the adversary’s local probing dataset is sourced from the same distribution as that of the target model’s training dataset. Moreover, we consider that the adversary has knowledge of both the unlearning and original learning algorithms and has the capability to establish the same learning and unlearning training as the target model. This can be achieved by performing model hyperparameter stealing attacks [26]. Finally, and most importantly, we assume that the target model effectively removes the information of the erased data, and the erased dataset and the remaining dataset are disjoint.

3.1.3 Privacy Inference Attack Process on Unlearning

In privacy inference attacks, the adversary’s initial step involves collecting varied outputs from their probe data Xprobesubscript𝑋𝑝𝑟𝑜𝑏𝑒X_{probe}italic_X start_POSTSUBSCRIPT italic_p italic_r italic_o italic_b italic_e end_POSTSUBSCRIPT. This includes obtaining the original model outputs Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG before unlearning, as well as the outputs Y^(Xe,Ye)subscript^𝑌subscript𝑋𝑒subscript𝑌𝑒\hat{Y}_{-(X_{e},Y_{e})}over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT after unlearning. The key to the attack lies in the difference δ=Y^(Xe,Ye)Y^𝛿subscript^𝑌subscript𝑋𝑒subscript𝑌𝑒^𝑌\delta=\hat{Y}_{-(X_{e},Y_{e})}-\hat{Y}italic_δ = over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT - over^ start_ARG italic_Y end_ARG, which is used to train the attack model. The structure of the attack model typically includes an encoder and a decoder, resembling the architecture of Variational Autoencoders (VAEs) [27]. We replicate the privacy inference attacks following the methodology outlined by [10], and the detailed attack process is depicted in Figure 2. The direct victims are those users who request for unlearning. Their unlearning requests prompt the ML server to execute an unlearning update, which is leveraged by this kind of attack to infer the privacy of the erased data (Xe,Ye)subscript𝑋𝑒subscript𝑌𝑒(X_{e},Y_{e})( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ).

3.2 Implementation of Information Bottleneck

Information Bottleneck (IB) was first introduced in [13] to distort information of data inputs while maintaining the information of data targets in the representation. Traditional IB [13, 28] primarily utilizes the Blahut-Arimoto algorithm [29] to optimize the IB objective. The goal of this approach is to identify a compressed encoding distribution that is both adequate for the intended ML application and maximally distorts information from the original data. The IB objective is formulated as below,

=βI(Z;X)I(Z;Y).subscript𝛽𝐼𝑍𝑋𝐼𝑍𝑌\mathcal{L}_{\mathcal{IB}}=\beta I(Z;X)-I(Z;Y).caligraphic_L start_POSTSUBSCRIPT caligraphic_I caligraphic_B end_POSTSUBSCRIPT = italic_β italic_I ( italic_Z ; italic_X ) - italic_I ( italic_Z ; italic_Y ) . (1)

Here, I(Z;X)𝐼𝑍𝑋I(Z;X)italic_I ( italic_Z ; italic_X ) represents the mutual information between the encoded representation Z𝑍Zitalic_Z and the inputs X𝑋Xitalic_X, and I(Z;Y)𝐼𝑍𝑌I(Z;Y)italic_I ( italic_Z ; italic_Y ) denotes the mutual information between Z𝑍Zitalic_Z and the outputs Y𝑌Yitalic_Y. The parameter β𝛽\betaitalic_β serves as a Lagrange multiplier, regulating the distortion ratio of X𝑋Xitalic_X in the model.

An IB model splits the training process into two parts. Firstly, it employs a representer θrsuperscript𝜃𝑟\theta^{r}italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT to compress the information from inputs X𝑋Xitalic_X to a compact representation Z𝑍Zitalic_Z. Secondly, it employs an approximator θasuperscript𝜃𝑎\theta^{a}italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT to identify the target values based on the representation Z𝑍Zitalic_Z. The representation loss function of the representer is rep=βIθr(Z;X)subscript𝑟𝑒𝑝𝛽subscript𝐼superscript𝜃𝑟𝑍𝑋\mathcal{L}_{rep}=\beta I_{\theta^{r}}(Z;X)caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT = italic_β italic_I start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z ; italic_X ) and the corresponding approximation loss function of the approximator is app=Iθa(Z;Y)subscript𝑎𝑝𝑝subscript𝐼superscript𝜃𝑎𝑍𝑌\mathcal{L}_{app}=-I_{\theta^{a}}(Z;Y)caligraphic_L start_POSTSUBSCRIPT italic_a italic_p italic_p end_POSTSUBSCRIPT = - italic_I start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z ; italic_Y ). The training process follows the Markov chain YXZY^𝑌𝑋𝑍^𝑌Y\to X\to Z\to\hat{Y}italic_Y → italic_X → italic_Z → over^ start_ARG italic_Y end_ARG, as depicted in the upper part of Figure 3. Prior studies, such as [30, 31, 32], have expanded on the mutual information terms and introduced two variational distributions, q(Z)𝑞𝑍q(Z)italic_q ( italic_Z ) and q(Y|Z)𝑞conditional𝑌𝑍q(Y|Z)italic_q ( italic_Y | italic_Z ). These distributions help in deriving an upper bound for the IB optimization loss function. The optimization of the learning process is then achieved by minimizing this upper bound. We consider a distribution q(Z)𝑞𝑍q(Z)italic_q ( italic_Z ) where the elements in the space Z𝑍Zitalic_Z are mutually independent, expressed as q(Z)=jqj(zj)𝑞𝑍subscriptproduct𝑗subscript𝑞𝑗subscript𝑧𝑗q(Z)=\prod_{j}q_{j}(z_{j})italic_q ( italic_Z ) = ∏ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). Finally, an IB loss objective Eq. 1 can be expanded in a per-sample way as

=app+rep=1Ni=1N𝔼zpθr(Z|xi)[logpθa(yi|z)]+subscript𝑎𝑝𝑝subscript𝑟𝑒𝑝limit-from1𝑁superscriptsubscript𝑖1𝑁subscript𝔼𝑧subscript𝑝superscript𝜃𝑟conditional𝑍superscript𝑥𝑖delimited-[]subscript𝑝superscript𝜃𝑎conditionalsuperscript𝑦𝑖𝑧\displaystyle\mathcal{L}=\mathcal{L}_{app}+\mathcal{L}_{rep}=\frac{1}{N}\sum_{% i=1}^{N}\mathbb{E}_{z\backsim p_{\theta^{r}}(Z|x^{i})}[-\log\ p_{\theta^{a}}(y% ^{i}|z)]+caligraphic_L = caligraphic_L start_POSTSUBSCRIPT italic_a italic_p italic_p end_POSTSUBSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_z ∽ italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z | italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT [ - roman_log italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_z ) ] + (2)
+βKL[pθr(Z|xi)||j|Z|qji(zji)].\displaystyle+\beta\text{KL}[p_{\theta^{r}}(Z|x^{i})||\prod_{j}^{|Z|}q^{i}_{j}% (z^{i}_{j})].+ italic_β KL [ italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z | italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) | | ∏ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_Z | end_POSTSUPERSCRIPT italic_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_z start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ] .

It’s important to note that the assumed prior q(Z)𝑞𝑍q(Z)italic_q ( italic_Z ) represents the ideal distribution, capturing minimal information about X𝑋Xitalic_X while retaining sufficient information for the task target Y𝑌Yitalic_Y. A higher value of β𝛽\betaitalic_β leads to increased distortion, making p(Z|X)𝑝conditional𝑍𝑋p(Z|X)italic_p ( italic_Z | italic_X ) more closely approximate the assumed general prior. On the contrary, a smaller β𝛽\betaitalic_β means the model pays less attention to compressing the information of X𝑋Xitalic_X, causing the representation to contain more information about individual X𝑋Xitalic_X.

4 Compressive Representation Forgetting Unlearning

4.1 Problem Definition

We define the problem of machine unlearning in the context of trained IB models as follows. Upon receiving an unlearning request, the full training dataset D=(X,Y)𝐷𝑋𝑌D=(X,Y)italic_D = ( italic_X , italic_Y ) will be partitioned into two distinct subsets: a smaller erased dataset De=(Xe,Ye)subscript𝐷𝑒subscript𝑋𝑒subscript𝑌𝑒D_{e}=(X_{e},Y_{e})italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) and a larger remaining dataset Dr=(Xr,Yr)subscript𝐷𝑟subscript𝑋𝑟subscript𝑌𝑟D_{r}=(X_{r},Y_{r})italic_D start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = ( italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ). These subsets are mutually exclusive, fulfilling D=DrDe𝐷subscript𝐷𝑟subscript𝐷𝑒D=D_{r}\bigcup D_{e}italic_D = italic_D start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⋃ italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT and DrDe=subscript𝐷𝑟subscript𝐷𝑒D_{r}\bigcap D_{e}=\emptysetitalic_D start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⋂ italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = ∅. Then, the IB algorithm (\mathcal{IB}caligraphic_I caligraphic_B) is used to train the model (θr,θa)superscript𝜃𝑟superscript𝜃𝑎\mathcal{M}(\theta^{r},\theta^{a})caligraphic_M ( italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ), consisting of a representer θrsuperscript𝜃𝑟\theta^{r}italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT and an approximator θasuperscript𝜃𝑎\theta^{a}italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT. This model learns a representation Z𝑍Zitalic_Z, by minimizing I(X;Z)𝐼𝑋𝑍I(X;Z)italic_I ( italic_X ; italic_Z ) and maximizing I(Y;Z)𝐼𝑌𝑍I(Y;Z)italic_I ( italic_Y ; italic_Z ).

Typically, the machine unlearning process is initiated when a user requests the removal of their specific data samples, Desubscript𝐷𝑒D_{e}italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT, from the trained model (θr,θa)superscript𝜃𝑟superscript𝜃𝑎\mathcal{M}(\theta^{r},\theta^{a})caligraphic_M ( italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ). In response, the server employs an unlearning algorithm 𝒰𝒰\mathcal{U}caligraphic_U to erase the contribution of Desubscript𝐷𝑒D_{e}italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT while retaining the learned knowledge on the remaining dataset Drsubscript𝐷𝑟D_{r}italic_D start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. The unlearned model u(θr,θa)subscript𝑢superscript𝜃𝑟superscript𝜃𝑎\mathcal{M}_{u}(\theta^{r},\theta^{a})caligraphic_M start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ) hopes to find the unlearned representation p(Z|Xr)𝑝conditional𝑍subscript𝑋𝑟p(Z|X_{r})italic_p ( italic_Z | italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ), which is expected to be equal to the representation learned by retraining the model based on the remaining dataset. The problem of unlearning an IB model can be described as

Problem Statement.

Assume a gold-standard unlearned representation posterior p(Z|Xr)𝑝conditional𝑍subscript𝑋𝑟p(Z|X_{r})italic_p ( italic_Z | italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) for an IB model, retrained with u=(Dr)subscript𝑢subscript𝐷𝑟\mathcal{M}_{u}=\mathcal{IB}(D_{r})caligraphic_M start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = caligraphic_I caligraphic_B ( italic_D start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) based on the remaining dataset. An unlearning algorithm 𝒰𝒰\mathcal{U}caligraphic_U is designed to unlearn a representation posterior p(Z|XXe)𝑝conditional𝑍subscript𝑋subscript𝑋𝑒p(Z|X_{-X_{e}})italic_p ( italic_Z | italic_X start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) of the model u=𝒰(De,(D))subscript𝑢𝒰subscript𝐷𝑒𝐷\mathcal{M}_{u}=\mathcal{U}(D_{e},\mathcal{IB}(D))caligraphic_M start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = caligraphic_U ( italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , caligraphic_I caligraphic_B ( italic_D ) ), unlearning based on the erased dataset De=(Xe,Ye)subscript𝐷𝑒subscript𝑋𝑒subscript𝑌𝑒D_{e}=(X_{e},Y_{e})italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ). Then, the erased dataset-based unlearned posterior is hoped to match the remaining dataset-based retrained posterior, i.e.,

p(Z|Xr)=p(Z|XXe).𝑝conditional𝑍subscript𝑋𝑟𝑝conditional𝑍subscript𝑋subscript𝑋𝑒p(Z|X_{r})=p(Z|X_{-X_{e}}).italic_p ( italic_Z | italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) = italic_p ( italic_Z | italic_X start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) . (3)

In this problem statement, the retrained posterior p(Z|Xr)𝑝conditional𝑍subscript𝑋𝑟p(Z|X_{r})italic_p ( italic_Z | italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) can be obtained by retraining the IB model based on the remaining dataset as

min\displaystyle\minroman_min I(Xr;Z),𝐼subscript𝑋𝑟𝑍\displaystyle I(X_{r};Z),italic_I ( italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ; italic_Z ) , (4)
s.t. I(Xr;Yr)I(Z;Yr)=I(Xr;Yr|Z)=0.𝐼subscript𝑋𝑟subscript𝑌𝑟𝐼𝑍subscript𝑌𝑟𝐼subscript𝑋𝑟conditionalsubscript𝑌𝑟𝑍0\displaystyle I(X_{r};Y_{r})-I(Z;Y_{r})=I(X_{r};Y_{r}|Z)=0.italic_I ( italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ; italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) - italic_I ( italic_Z ; italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) = italic_I ( italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ; italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT | italic_Z ) = 0 .

Eq. 3 means that an unlearned IB-trained model is still hoped to keep the property like a retrained model, squeezing as much information of Xrsubscript𝑋𝑟X_{r}italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT as possible from the representation Z𝑍Zitalic_Z while keeping a good utility of approximating Yrsubscript𝑌𝑟Y_{r}italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. Retraining the model from scratch remains a viable approach under this definition of unlearning. However, as previously discussed, retraining from scratch incurs significant storage and computational costs and has no protection to defend against reconstruction attacks on unlearning. Therefore, we propose the CRFU approach to solve this problem in the following.

Refer to caption
Figure 3: IB learning process (upper half) and CRFU unlearning process (lower half) with a fixed learned IB model (a trained and fixed upper half).

4.2 CRFU Method

4.2.1 Overview of CRFU

We begin by briefly introducing the core concept of the proposed CRFU, which is illustrated in Figure 3. CRFU is tailored to unlearn models that were trained using the IB method (shown in the upper half in Figure 3). An IB method mainly learns a representation Z𝑍Zitalic_Z that compresses information of inputs while retaining sufficient information for the targets. The CRFU, the lower half in Figure 3, aims to erase the influence of De=(Xe,Ye)subscript𝐷𝑒subscript𝑋𝑒subscript𝑌𝑒D_{e}=(X_{e},Y_{e})italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) from Z𝑍Zitalic_Z. To achieve this goal, we minimize the mutual information I(Xe;Z)𝐼subscript𝑋𝑒𝑍I(X_{e};Z)italic_I ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) and I(Ye;Z)𝐼subscript𝑌𝑒𝑍I(Y_{e};Z)italic_I ( italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ); however, directly minimizing them could result in the representation Z𝑍Zitalic_Z catastrophically forgetting everything, decreasing model utility. To mitigate the degradation of model utility, we introduce a constraint ensuring that the unlearned representation Z𝑍Zitalic_Z, derived from the distribution p(Z|XXe)𝑝conditional𝑍subscript𝑋subscript𝑋𝑒p(Z|X_{-X_{e}})italic_p ( italic_Z | italic_X start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ), should be close to the representation Z𝑍Zitalic_Z obtained from the original-fixed distribution p(Z|X)𝑝conditional𝑍𝑋p(Z|X)italic_p ( italic_Z | italic_X ). We deal with the predicting approximation Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG in the same way. Finally, combining the minimizing mutual information and a constraint, we formalize the losses repusubscriptsuperscript𝑢𝑟𝑒𝑝\mathcal{L}^{u}_{rep}caligraphic_L start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT and appusubscriptsuperscript𝑢𝑎𝑝𝑝\mathcal{L}^{u}_{app}caligraphic_L start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a italic_p italic_p end_POSTSUBSCRIPT to unlearn the trained IB model.

4.2.2 Theoretical Exact Compressive Representation Forgetting Unlearning

In an IB model, the objective is to learn a representation Z𝑍Zitalic_Z by minimizing the mutual information I(X;Z)𝐼𝑋𝑍I(X;Z)italic_I ( italic_X ; italic_Z ), while simultaneously maximizing I(Y;Z)𝐼𝑌𝑍I(Y;Z)italic_I ( italic_Y ; italic_Z ). In our CRFU approach, we specifically focus on minimizing I(Xe;Z)𝐼subscript𝑋𝑒𝑍I(X_{e};Z)italic_I ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) for the inputs Xesubscript𝑋𝑒X_{e}italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT designated for erasure. This is aimed at effectively eliminating the information of Xesubscript𝑋𝑒X_{e}italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT from the previously learned representation Z𝑍Zitalic_Z. Similarly, we work on minimizing I(Ye;Z)𝐼subscript𝑌𝑒𝑍I(Y_{e};Z)italic_I ( italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ), the mutual information between the targets Yesubscript𝑌𝑒Y_{e}italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT and the representation Z𝑍Zitalic_Z, thereby removing the information of Yesubscript𝑌𝑒Y_{e}italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT from Z𝑍Zitalic_Z. This dual minimization ensures a comprehensive unlearning of both input and label information from the representation.

The ideal unlearned representation should contain no information about (Xe,Ye)subscript𝑋𝑒subscript𝑌𝑒(X_{e},Y_{e})( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ), optimally satisfying the conditions where I(Xe;Z)=0𝐼subscript𝑋𝑒𝑍0I(X_{e};Z)=0italic_I ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) = 0 and I(Ye;Z)=0𝐼subscript𝑌𝑒𝑍0I(Y_{e};Z)=0italic_I ( italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) = 0, indicating zero mutual information between the erased data and the representation Z𝑍Zitalic_Z. However, directly minimizing I(Xe;Z)𝐼subscript𝑋𝑒𝑍I(X_{e};Z)italic_I ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) and I(Ye;Z)𝐼subscript𝑌𝑒𝑍I(Y_{e};Z)italic_I ( italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) poses a significant risk: it could lead the representation Z𝑍Zitalic_Z to inadvertently “forget” everything, including the valuable knowledge it had previously learned. Therefore, we must keep the unlearned representation Z𝑍Zitalic_Z and approximation Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG remembering the knowledge learned in the former training.

To maintain the learned knowledge, we propose a solution that minimizes the Kullback–Leibler divergence (KLD) [33] between the unlearned and originally learned representations. This involves minimizing both KL[p(Z|X)||p(Z|XXe)]\text{KL}[p(Z|X)||p(Z|X_{-X_{e}})]KL [ italic_p ( italic_Z | italic_X ) | | italic_p ( italic_Z | italic_X start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] and KL[p(Y^|Z)||p(Y^(Xe,Ye)|Z)]\text{KL}[p(\hat{Y}|Z)||p(\hat{Y}_{-(X_{e},Y_{e})}|Z)]KL [ italic_p ( over^ start_ARG italic_Y end_ARG | italic_Z ) | | italic_p ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT | italic_Z ) ], ensuring that the unlearned representations remain closely aligned with the original ones. During unlearning, p(Z|X)𝑝conditional𝑍𝑋p(Z|X)italic_p ( italic_Z | italic_X ) and p(Y^|Z)𝑝conditional^𝑌𝑍p(\hat{Y}|Z)italic_p ( over^ start_ARG italic_Y end_ARG | italic_Z ) are calculated using the trained IB model as a temporary reference model. Combining the above solutions, we formalize the unlearned representation loss as

repu=I(Xe;Z)+KL[p(Z|X)||p(Z|XXe)],\mathcal{L}_{rep}^{u}=I(X_{e};Z)+\text{KL}[p(Z|X)||p(Z|X_{-X_{e}})],caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT = italic_I ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) + KL [ italic_p ( italic_Z | italic_X ) | | italic_p ( italic_Z | italic_X start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] , (5)

and the unlearned approximation loss as

appu=I(Ye;Z)+KL[p(Y^|Z)||p(Y^(Xe,Ye)|Z)].\mathcal{L}_{app}^{u}=I(Y_{e};Z)+\text{KL}[p(\hat{Y}|Z)||p(\hat{Y}_{-(X_{e},Y_% {e})}|Z)].caligraphic_L start_POSTSUBSCRIPT italic_a italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT = italic_I ( italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) + KL [ italic_p ( over^ start_ARG italic_Y end_ARG | italic_Z ) | | italic_p ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT | italic_Z ) ] . (6)

Integrating these two loss functions, we formulate the optimization of CRFU as the proposition outlined below.

Proposition 1.

Define the CRFU loss function as

usuperscript𝑢\displaystyle\mathcal{L}^{u}caligraphic_L start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT =βrepu+appuabsent𝛽superscriptsubscript𝑟𝑒𝑝𝑢superscriptsubscript𝑎𝑝𝑝𝑢\displaystyle=\beta\cdot\mathcal{L}_{rep}^{u}+\mathcal{L}_{app}^{u}= italic_β ⋅ caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_a italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT (7)
=β(I(Xe;Z)+KL[p(Z|X)||p(Z|XXe)])\displaystyle=\beta\cdot(I(X_{e};Z)+\text{KL}[p(Z|X)||p(Z|X_{-X_{e}})])= italic_β ⋅ ( italic_I ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) + KL [ italic_p ( italic_Z | italic_X ) | | italic_p ( italic_Z | italic_X start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] )
+I(Ye;Z)+KL[p(Y^|Z)||p(Y^(Xe,Ye)|Z)].\displaystyle+I(Y_{e};Z)+\text{KL}[p(\hat{Y}|Z)||p(\hat{Y}_{-(X_{e},Y_{e})}|Z)].+ italic_I ( italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) + KL [ italic_p ( over^ start_ARG italic_Y end_ARG | italic_Z ) | | italic_p ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT | italic_Z ) ] .

Then, minimizing the loss in Eq. 7 to unlearn the erased dataset Desubscript𝐷𝑒D_{e}italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT from an IB model trained based on D𝐷Ditalic_D is equivalent to retraining an IB model by minimizing r=βI(Xr;Z)I(Yr;Z)subscript𝑟𝛽𝐼subscript𝑋𝑟𝑍𝐼subscript𝑌𝑟𝑍\mathcal{L}_{r}=\beta I(X_{r};Z)-I(Y_{r};Z)caligraphic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = italic_β italic_I ( italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ; italic_Z ) - italic_I ( italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ; italic_Z ) based on Drsubscript𝐷𝑟D_{r}italic_D start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT.

Since CRFU unlearns an IB model trained through Eq. 2, we can combine the original IB loss and CRFU loss to prove that minimizing Eq. 7 based on a trained IB model is equivalent to minimizing the loss of retraining an IB model on the remaining dataset. We first prove the representation loss. Given that X=XrXe𝑋subscript𝑋𝑟subscript𝑋𝑒X=X_{r}\cup X_{e}italic_X = italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∪ italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT and XrXe=subscript𝑋𝑟subscript𝑋𝑒X_{r}\cap X_{e}=\emptysetitalic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∩ italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = ∅, and considering that the data are IID, along with the conditional independence of Xrsubscript𝑋𝑟X_{r}italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and Xesubscript𝑋𝑒X_{e}italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT given Z𝑍Zitalic_Z, it follows that I(X;Z)=I(Xr;Z)+I(Xe;Z)𝐼𝑋𝑍𝐼subscript𝑋𝑟𝑍𝐼subscript𝑋𝑒𝑍I(X;Z)=I(X_{r};Z)+I(X_{e};Z)italic_I ( italic_X ; italic_Z ) = italic_I ( italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ; italic_Z ) + italic_I ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ). Since I(Xe;Z)0𝐼subscript𝑋𝑒𝑍0I(X_{e};Z)\geq 0italic_I ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) ≥ 0 and KL[p(Z|X)||p(Z|XXe)]0\text{KL}[p(Z|X)||p(Z|X_{-X_{e}})]\geq 0KL [ italic_p ( italic_Z | italic_X ) | | italic_p ( italic_Z | italic_X start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] ≥ 0, we can expand original and unlearned representation loss rep+repusubscript𝑟𝑒𝑝superscriptsubscript𝑟𝑒𝑝𝑢\mathcal{L}_{rep}+\mathcal{L}_{rep}^{u}caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT function as

rep+repusubscript𝑟𝑒𝑝superscriptsubscript𝑟𝑒𝑝𝑢\displaystyle\mathcal{L}_{rep}+\mathcal{L}_{rep}^{u}caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT =I(X;Z)+I(Xe;Z)+KL[p(Z|X)||p(Z|XXe)]\displaystyle=I(X;Z)+I(X_{e};Z)+\text{KL}[p(Z|X)||p(Z|X_{-X_{e}})]= italic_I ( italic_X ; italic_Z ) + italic_I ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) + KL [ italic_p ( italic_Z | italic_X ) | | italic_p ( italic_Z | italic_X start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] (8)
=I(Xr;Z)+2I(Xe;Z)+KL[p(Z|X)||p(Z|XXe)]\displaystyle=I(X_{r};Z)+2I(X_{e};Z)+\text{KL}[p(Z|X)||p(Z|X{-X_{e}})]= italic_I ( italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ; italic_Z ) + 2 italic_I ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) + KL [ italic_p ( italic_Z | italic_X ) | | italic_p ( italic_Z | italic_X - italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) ]
I(Xr;Z).absent𝐼subscript𝑋𝑟𝑍\displaystyle\geq I(X_{r};Z).≥ italic_I ( italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ; italic_Z ) .

It is clear that the sum of the representation loss repsubscript𝑟𝑒𝑝\mathcal{L}_{rep}caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT and the unlearned representation loss repusuperscriptsubscript𝑟𝑒𝑝𝑢\mathcal{L}_{rep}^{u}caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT serves as an upper bound for the mutual information I(Xr;Z)𝐼subscript𝑋𝑟𝑍I(X_{r};Z)italic_I ( italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ; italic_Z ). Minimizing this upper bound during both original training and unlearning is equivalent to minimizing I(Xr;Z)𝐼subscript𝑋𝑟𝑍I(X_{r};Z)italic_I ( italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ; italic_Z ) itself.

Similarly, for the original and unlearned approximation loss function app+appusubscript𝑎𝑝𝑝superscriptsubscript𝑎𝑝𝑝𝑢\mathcal{L}_{app}+\mathcal{L}_{app}^{u}caligraphic_L start_POSTSUBSCRIPT italic_a italic_p italic_p end_POSTSUBSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_a italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT, we can obtain the expanded approximation as

app+appusubscript𝑎𝑝𝑝superscriptsubscript𝑎𝑝𝑝𝑢\displaystyle\mathcal{L}_{app}+\mathcal{L}_{app}^{u}caligraphic_L start_POSTSUBSCRIPT italic_a italic_p italic_p end_POSTSUBSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_a italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT =I(Y;Z)+I(Ye;Z)absent𝐼𝑌𝑍𝐼subscript𝑌𝑒𝑍\displaystyle=-I(Y;Z)+I(Y_{e};Z)= - italic_I ( italic_Y ; italic_Z ) + italic_I ( italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) (9)
+KL[p(Y^|Z)||p(Y^(Xe,Ye)|Z)]\displaystyle+\text{KL}[p(\hat{Y}|Z)||p(\hat{Y}_{-(X_{e},Y_{e})}|Z)]+ KL [ italic_p ( over^ start_ARG italic_Y end_ARG | italic_Z ) | | italic_p ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT | italic_Z ) ]
=(I(Y;Z)I(Ye;Z))absent𝐼𝑌𝑍𝐼subscript𝑌𝑒𝑍\displaystyle=-(I(Y;Z)-I(Y_{e};Z))= - ( italic_I ( italic_Y ; italic_Z ) - italic_I ( italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) )
+KL[p(Y^|Z)||p(Y^(Xe,Ye)|Z)]\displaystyle+\text{KL}[p(\hat{Y}|Z)||p(\hat{Y}_{-(X_{e},Y_{e})}|Z)]+ KL [ italic_p ( over^ start_ARG italic_Y end_ARG | italic_Z ) | | italic_p ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT | italic_Z ) ]
=I(Yr;Z)+KL[p(Y^|Z)||p(Y^(Xe,Ye)|Z)]\displaystyle=-I(Y_{r};Z)+\text{KL}[p(\hat{Y}|Z)||p(\hat{Y}_{-(X_{e},Y_{e})}|Z)]= - italic_I ( italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ; italic_Z ) + KL [ italic_p ( over^ start_ARG italic_Y end_ARG | italic_Z ) | | italic_p ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT | italic_Z ) ]
I(Yr;Z).absent𝐼subscript𝑌𝑟𝑍\displaystyle\geq-I(Y_{r};Z).≥ - italic_I ( italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ; italic_Z ) .

Since Y=YrYe𝑌subscript𝑌𝑟subscript𝑌𝑒Y=Y_{r}\cup Y_{e}italic_Y = italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∪ italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT and YrYe=subscript𝑌𝑟subscript𝑌𝑒Y_{r}\cap Y_{e}=\emptysetitalic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∩ italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = ∅ and the data are IID, it follows that I(Yr;Ye)=0𝐼subscript𝑌𝑟subscript𝑌𝑒0I(Y_{r};Y_{e})=0italic_I ( italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ; italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) = 0 and H(Y)=H(Yr,Ye)=H(Yr)+H(Ye)I(Yr;Ye)=H(Yr)+H(Ye)𝐻𝑌𝐻subscript𝑌𝑟subscript𝑌𝑒𝐻subscript𝑌𝑟𝐻subscript𝑌𝑒𝐼subscript𝑌𝑟subscript𝑌𝑒𝐻subscript𝑌𝑟𝐻subscript𝑌𝑒H(Y)=H(Y_{r},Y_{e})=H(Y_{r})+H(Y_{e})-I(Y_{r};Y_{e})=H(Y_{r})+H(Y_{e})italic_H ( italic_Y ) = italic_H ( italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) = italic_H ( italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) + italic_H ( italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) - italic_I ( italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ; italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) = italic_H ( italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) + italic_H ( italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ). We can similarly achieve H(Yr|Z)𝐻conditionalsubscript𝑌𝑟𝑍H(Y_{r}|Z)italic_H ( italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT | italic_Z ). And since I(Yr;Z)=H(Yr)H(Yr|Z)𝐼subscript𝑌𝑟𝑍𝐻subscript𝑌𝑟𝐻conditionalsubscript𝑌𝑟𝑍I(Y_{r};Z)=H(Y_{r})-H(Y_{r}|Z)italic_I ( italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ; italic_Z ) = italic_H ( italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) - italic_H ( italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT | italic_Z ), thus Eq. 9 holds. Minimizing the original and unlearned approximation as Eq. 9 is the upper bound of I(Yr;Z)𝐼subscript𝑌𝑟𝑍-I(Y_{r};Z)- italic_I ( italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ; italic_Z ) in retraining. Therefore, minimizing the CRFU loss function based on a trained IB model is equivalent to minimizing the loss function of retraining an IB model based on the remaining dataset.

4.2.3 Variational Optimization Method for CRFU

Optimizing the CRFU loss function Eq. 7 in a deep learning scenario with a huge amount of training data is challenging. To address this issue, we present a variational method to make the CRFU loss function calculatable in deep learning. The unlearned representation loss can be expanded as

repusuperscriptsubscript𝑟𝑒𝑝𝑢\displaystyle\mathcal{L}_{rep}^{u}caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT =I(Xe;Z)+KL[p(Z|X)||p(Z|XXe)]\displaystyle=I(X_{e};Z)+\text{KL}[p(Z|X)||p(Z|X_{-X_{e}})]= italic_I ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) + KL [ italic_p ( italic_Z | italic_X ) | | italic_p ( italic_Z | italic_X start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] (10)
=KL[pθr(Z|Xe)||pθr(Z)]+KL[pθfixr(Z|X)||pθr(Z|XXe)].\displaystyle=\text{KL}[p_{\theta^{r}}(Z|X_{e})||p_{\theta^{r}}(Z)]+\text{KL}[% p_{\theta_{fix}^{r}}(Z|X)||p_{\theta^{r}}(Z|X_{-X_{e}})].= KL [ italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z | italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) | | italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z ) ] + KL [ italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f italic_i italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z | italic_X ) | | italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z | italic_X start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] .

Generally, minimizing this loss is intractable due to the complexity involved in computing the KL term. This computation requires knowledge of the marginal distribution Z𝑍Zitalic_Z and pθr(Z)=𝑑xp(z|x)p(x)subscript𝑝superscript𝜃𝑟𝑍differential-d𝑥𝑝conditional𝑧𝑥𝑝𝑥p_{\theta^{r}}(Z)=\int dx\ p(z|x)p(x)italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z ) = ∫ italic_d italic_x italic_p ( italic_z | italic_x ) italic_p ( italic_x ), which is not easily obtainable. To address this issue, we introduce q(Z)𝑞𝑍q(Z)italic_q ( italic_Z ) as a variational approximation to this marginal as the variational IB learning in [30, 31]. Since KL[p(Z)||q(Z)]0dzp(z)logp(z)dzp(z)logq(z)\text{KL}[p(Z)||q(Z)]\geq 0\Longrightarrow\int dzp(z)\log p(z)\geq\int dzp(z)% \log q(z)KL [ italic_p ( italic_Z ) | | italic_q ( italic_Z ) ] ≥ 0 ⟹ ∫ italic_d italic_z italic_p ( italic_z ) roman_log italic_p ( italic_z ) ≥ ∫ italic_d italic_z italic_p ( italic_z ) roman_log italic_q ( italic_z ), we have the following upper bound of Eq. 10:

repusuperscriptsubscript𝑟𝑒𝑝𝑢\displaystyle\mathcal{L}_{rep}^{u}caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT =I(Xe;Z)+KL[p(Z|X)||p(Z|XXe)]\displaystyle=I(X_{e};Z)+\text{KL}[p(Z|X)||p(Z|X_{-X_{e}})]= italic_I ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) + KL [ italic_p ( italic_Z | italic_X ) | | italic_p ( italic_Z | italic_X start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] (11)
KL[pθr(Z|Xe)||q(Z)]Forgetting the erased inputs from the representation\displaystyle\leq\underbrace{\text{KL}[p_{\theta^{r}}(Z|X_{e})||q(Z)]}_{\text{% Forgetting the erased inputs from the representation}}≤ under⏟ start_ARG KL [ italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z | italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) | | italic_q ( italic_Z ) ] end_ARG start_POSTSUBSCRIPT Forgetting the erased inputs from the representation end_POSTSUBSCRIPT
+KL[pθfixr(Z|X)||pθr(Z|XXe)]Remembering the original learned representation.\displaystyle+\underbrace{\text{KL}[p_{\theta_{fix}^{r}}(Z|X)||p_{\theta^{r}}(% Z|X_{-X_{e}})]}_{\text{Remembering the original learned representation}}.+ under⏟ start_ARG KL [ italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f italic_i italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z | italic_X ) | | italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z | italic_X start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] end_ARG start_POSTSUBSCRIPT Remembering the original learned representation end_POSTSUBSCRIPT .

This unlearned representation loss can be explained as forgetting the impact of the erased data’s inputs (first term) and remembering the knowledge of original full training inputs (second term). The proof of the correctness of Eq. 11 is similar to the proof of Eq. (14) in [30]. Hence, we omit the detailed proof here.

Following the solutions to appsubscript𝑎𝑝𝑝\mathcal{L}_{app}caligraphic_L start_POSTSUBSCRIPT italic_a italic_p italic_p end_POSTSUBSCRIPT in [30, 31], the unlearned approximation loss function can be optimized in a similar way as

appusuperscriptsubscript𝑎𝑝𝑝𝑢\displaystyle\mathcal{L}_{app}^{u}caligraphic_L start_POSTSUBSCRIPT italic_a italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT =I(Ye;Z)+KL[p(Y^|Z)||p(Y^(Xe,Ye)|Z)]\displaystyle=I(Y_{e};Z)+\text{KL}[p(\hat{Y}|Z)||p(\hat{Y}_{-(X_{e},Y_{e})}|Z)]= italic_I ( italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) + KL [ italic_p ( over^ start_ARG italic_Y end_ARG | italic_Z ) | | italic_p ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT | italic_Z ) ] (12)
𝑑zp(Z|XXe)logpθa(Ye|Z)Forgetting the erased targets from the approximationsimilar-to-or-equalsabsentsubscriptdifferential-d𝑧𝑝conditional𝑍subscript𝑋subscript𝑋𝑒subscript𝑝superscript𝜃𝑎conditionalsubscript𝑌𝑒𝑍Forgetting the erased targets from the approximation\displaystyle\simeq\underbrace{\int dz\ p(Z|X_{-X_{e}})\log\ p_{\theta^{a}}(Y_% {e}|Z)}_{\text{Forgetting the erased targets from the approximation}}≃ under⏟ start_ARG ∫ italic_d italic_z italic_p ( italic_Z | italic_X start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) roman_log italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT | italic_Z ) end_ARG start_POSTSUBSCRIPT Forgetting the erased targets from the approximation end_POSTSUBSCRIPT
+KL[pθfixa(Y^|Z)||pθa(Y^(Xe,Ye)|Z)]Remembering the original learned approximation\displaystyle+\underbrace{\text{KL}[p_{\theta_{fix}^{a}}(\hat{Y}|Z)||p_{\theta% ^{a}}(\hat{Y}_{-(X_{e},Y_{e})}|Z)]}_{\text{Remembering the original learned % approximation}}+ under⏟ start_ARG KL [ italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f italic_i italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( over^ start_ARG italic_Y end_ARG | italic_Z ) | | italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT | italic_Z ) ] end_ARG start_POSTSUBSCRIPT Remembering the original learned approximation end_POSTSUBSCRIPT

The computable optimization loss Eq. 12 represents a balance between completely unlearning the information of the erased labels Yesubscript𝑌𝑒Y_{e}italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT (first term) and not entirely forgetting the originally learned information of Y𝑌Yitalic_Y in representation Z𝑍Zitalic_Z (second term). Combining Eqs. 12 and 11 and optimizing them together is the CRFU loss Eq. 7 that we proposed in proposition 1. When minimizing this loss, we should notice that the second term in Eqs. 12 and 11 is calculated based on the original full training dataset; however, storing all original datasets and training it again is impractical. A simple way we used is fixing the trained model (i.e., the model before unlearning) as a temp model that is specially used to calculate the first term but only based on the erased dataset.

4.2.4 An Unlearning Rate for CRFU

To make the unlearning method adaptive to different tasks, we introduced an unlearning rate parameter, βusubscript𝛽𝑢\beta_{u}italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, which serves to regulate the extent of unlearning during both the representation and approximation unlearning process. Specifically, we add an unlearning rate parameter βusubscript𝛽𝑢\beta_{u}italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT before the forgetting terms in Eqs. 11 and 12 to optimize the balance between forgetting the information of erased samples and remembering the representation learned before. Adjusting the unlearning rate βusubscript𝛽𝑢\beta_{u}italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT will also impact the unlearning speed. The final equation can be described as

usuperscript𝑢\displaystyle\mathcal{L}^{u}caligraphic_L start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT =βrepu(βu)+appu(βu)absent𝛽superscriptsubscript𝑟𝑒𝑝𝑢subscript𝛽𝑢superscriptsubscript𝑎𝑝𝑝𝑢subscript𝛽𝑢\displaystyle=\beta\cdot\mathcal{L}_{rep}^{u}(\beta_{u})+\mathcal{L}_{app}^{u}% (\beta_{u})= italic_β ⋅ caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) + caligraphic_L start_POSTSUBSCRIPT italic_a italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) (13)
β(βuKL[pθr(Z|Xe)||q(Z)]\displaystyle\simeq\beta\cdot(\beta_{u}\cdot\text{KL}[p_{\theta^{r}}(Z|X_{e})|% |q(Z)]≃ italic_β ⋅ ( italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⋅ KL [ italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z | italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) | | italic_q ( italic_Z ) ]
+KL[pθr(Z|X)||pθfixr(Z|XXe)])\displaystyle+\text{KL}[p_{\theta^{r}}(Z|X)||p_{\theta_{fix}^{r}}(Z|X_{-X_{e}}% )])+ KL [ italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z | italic_X ) | | italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f italic_i italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z | italic_X start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] )
+βu𝑑zp(Z|XXe)logpθa(Ye|Z)subscript𝛽𝑢differential-d𝑧𝑝conditional𝑍subscript𝑋subscript𝑋𝑒subscript𝑝superscript𝜃𝑎conditionalsubscript𝑌𝑒𝑍\displaystyle+\beta_{u}\cdot\int dz\ p(Z|X_{-X_{e}})\log\ p_{\theta^{a}}(Y_{e}% |Z)+ italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⋅ ∫ italic_d italic_z italic_p ( italic_Z | italic_X start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) roman_log italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT | italic_Z )
+KL[pθfixa(Y^|Z)||pθa(Y^(Xe,Ye)|Z)]\displaystyle+\text{KL}[p_{\theta_{fix}^{a}}(\hat{Y}|Z)||p_{\theta^{a}}(\hat{Y% }_{-(X_{e},Y_{e})}|Z)]+ KL [ italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f italic_i italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( over^ start_ARG italic_Y end_ARG | italic_Z ) | | italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT | italic_Z ) ]

We present a pseudocode of CRFU in Algorithm 1. At the beginning of executing representation forgetting unlearning, we first prepare a trained IB model (θr,θa)superscript𝜃𝑟superscript𝜃𝑎\mathcal{M}(\theta^{r},\theta^{a})caligraphic_M ( italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ), the erased dataset De=(Xe,Ye)subscript𝐷𝑒subscript𝑋𝑒subscript𝑌𝑒D_{e}=(X_{e},Y_{e})italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ), and training epochs E𝐸Eitalic_E. Then, we fix the original optimal model fix(θfixr,θfixa)(θr,θa)subscript𝑓𝑖𝑥subscriptsuperscript𝜃𝑟𝑓𝑖𝑥subscriptsuperscript𝜃𝑎𝑓𝑖𝑥superscript𝜃𝑟superscript𝜃𝑎\mathcal{M}_{fix}(\theta^{r}_{fix},\theta^{a}_{fix})\leftarrow\mathcal{M}(% \theta^{r},\theta^{a})caligraphic_M start_POSTSUBSCRIPT italic_f italic_i italic_x end_POSTSUBSCRIPT ( italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_f italic_i italic_x end_POSTSUBSCRIPT , italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_f italic_i italic_x end_POSTSUBSCRIPT ) ← caligraphic_M ( italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ) as a temp model for the later unlearned representation and approximation loss calculation, as shown in Line 1 of Algorithm 1. When executing unlearning training, lines 2 to 7, we draw a minibatch of m𝑚mitalic_m data samples {(xi,yi)}i=1msuperscriptsubscriptsubscript𝑥𝑖subscript𝑦𝑖𝑖1𝑚\{(x_{i},y_{i})\}_{i=1}^{m}{ ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT from Desubscript𝐷𝑒D_{e}italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT. Using the representer θrsuperscript𝜃𝑟\theta^{r}italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT, we generate the corresponding representations zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for these samples. The unlearning loss function is then calculated as per Eqs. 11 and 12. Following this, we update the parameters of the representer and approximator (θr,θa)superscript𝜃𝑟superscript𝜃𝑎(\theta^{r},\theta^{a})( italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ) in the IB model in accordance with Eq. 13 at line 7.

Input: The trained model (θr,θa)superscript𝜃𝑟superscript𝜃𝑎\mathcal{M}(\theta^{r},\theta^{a})caligraphic_M ( italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ), the erased dataset (Xe,Ye)subscript𝑋𝑒subscript𝑌𝑒(X_{e},Y_{e})( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) and training epochs E𝐸Eitalic_E
Output: Unlearned model u(θur,θua)subscript𝑢subscriptsuperscript𝜃𝑟𝑢subscriptsuperscript𝜃𝑎𝑢\mathcal{M}_{u}(\theta^{r}_{u},\theta^{a}_{u})caligraphic_M start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT )
1 Establish the fixed temporary model: fix(θfixr,θfixa)(θr,θa)subscript𝑓𝑖𝑥subscriptsuperscript𝜃𝑟𝑓𝑖𝑥subscriptsuperscript𝜃𝑎𝑓𝑖𝑥superscript𝜃𝑟superscript𝜃𝑎\mathcal{M}_{fix}(\theta^{r}_{fix},\theta^{a}_{fix})\leftarrow\mathcal{M}(% \theta^{r},\theta^{a})caligraphic_M start_POSTSUBSCRIPT italic_f italic_i italic_x end_POSTSUBSCRIPT ( italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_f italic_i italic_x end_POSTSUBSCRIPT , italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_f italic_i italic_x end_POSTSUBSCRIPT ) ← caligraphic_M ( italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT )
2 for E𝐸Eitalic_E epochs do
3       Draw a minibatch of m𝑚mitalic_m samples {(xi,yi)}i=1msuperscriptsubscriptsuperscript𝑥𝑖superscript𝑦𝑖𝑖1𝑚\{(x^{i},y^{i})\}_{i=1}^{m}{ ( italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT from erased dataset De=(Xe,Ye)subscript𝐷𝑒subscript𝑋𝑒subscript𝑌𝑒D_{e}=(X_{e},Y_{e})italic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT );
4       Generate zipθr(|xi)z^{i}\backsim p_{\theta^{r}}(\cdot|x^{i})italic_z start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∽ italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ⋅ | italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT );
5       Compute the loss function Eq. 11 for unlearning representation on a per-sample basis repu(βu)=βumi=1mKL[pθr(Z|xi)||j|Z|qji(zji)]\mathcal{L}_{rep}^{u}(\beta_{u})=\frac{\beta_{u}}{m}\sum_{i=1}^{m}\text{KL}[p_% {\theta^{r}}(Z|x^{i})||\prod_{j}^{|Z|}q^{i}_{j}(z^{i}_{j})]caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) = divide start_ARG italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT KL [ italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z | italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) | | ∏ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_Z | end_POSTSUPERSCRIPT italic_q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_z start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ]
          +1mi=1mKL[pθfixr(Z|xi)||pθr(Z|xi)]+\frac{1}{m}\sum_{i=1}^{m}\text{KL}[p_{\theta_{fix}^{r}}(Z|x^{i})||p_{\theta^{% r}}(Z|x^{i})]+ divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT KL [ italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f italic_i italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z | italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) | | italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z | italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ]
6       Compute the loss function Eq. 12 for unlearning approximation on a per-sample basis appu(βu)=βu1mi=1mlogpθa(yi|zi)superscriptsubscript𝑎𝑝𝑝𝑢subscript𝛽𝑢subscript𝛽𝑢1𝑚superscriptsubscript𝑖1𝑚subscript𝑝superscript𝜃𝑎conditionalsuperscript𝑦𝑖superscript𝑧𝑖\mathcal{L}_{app}^{u}(\beta_{u})=\beta_{u}\cdot\frac{1}{m}\sum_{i=1}^{m}\log\ % p_{\theta^{a}}(y^{i}|z^{i})caligraphic_L start_POSTSUBSCRIPT italic_a italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) = italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⋅ divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_z start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT )
          +1mi=1mKL[pθfixa(yi^|zi)||pθa(yi^(Xe,Ye)|zi)]+\frac{1}{m}\sum_{i=1}^{m}\text{KL}[p_{\theta_{fix}^{a}}(\hat{y^{i}}|z^{i})||p% _{\theta^{a}}(\hat{y^{i}}_{-(X_{e},Y_{e})}|z^{i})]+ divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT KL [ italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f italic_i italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( over^ start_ARG italic_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG | italic_z start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) | | italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( over^ start_ARG italic_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT | italic_z start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ]
7       Update the model based on the gradients of the integrated loss functions Eq. 13 as (θr,θa)(θr,θa)η(θr,θa)(βrepu(βu)(\theta^{r},\theta^{a})\leftarrow(\theta^{r},\theta^{a})-\eta\nabla_{(\theta^{% r},\theta^{a})}(\beta\cdot\mathcal{L}_{rep}^{u}(\beta_{u})( italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ) ← ( italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ) - italic_η ∇ start_POSTSUBSCRIPT ( italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ( italic_β ⋅ caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT )
          +appu(βu))+\mathcal{L}_{app}^{u}(\beta_{u}))+ caligraphic_L start_POSTSUBSCRIPT italic_a italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) )
8      
Return u(θr,θa)subscript𝑢superscript𝜃𝑟superscript𝜃𝑎\mathcal{M}_{u}(\theta^{r},\theta^{a})caligraphic_M start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_θ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT );
Algorithm 1 Compressive Representation Forgetting Unlearning

5 Theoretical Analysis of Privacy Leakage Defense of CRFU

In this section, we give a theoretical explanation of why our proposed CRFU can effectively defend against privacy leakage attacks of unlearning in a black-box ML setting.

5.1 Reasons behind Effective Privacy Leakage Attacks

Before discussing why our proposed CRFU is effective against privacy leakage attacks, we first explain why the existing privacy inference attacks can effectively recover users’ privacy. Similar to most model inference attacks, the privacy inference attack on unlearning aims to recover training samples by analyzing a black-box ML model’s outputs. These attacks focus on differences between the outputs of the original and updated unlearning models to reconstruct the privacy of the deleted data in this unlearning process, as discussed in [6, 9, 8]. The effectiveness of such attacks lies in the requirement that the unlearning mechanism must ensure the unlearned model forgets the specified samples. Therefore, effective unlearning, which erased the information of specified data from the model, will result in different outputs for the same inputs before and after unlearning. Attackers can exploit private information from erased data by comparing the model’s different outputs before and after unlearning. They mimic the changes between the original and unlearned models using an IID auxiliary dataset, thereby training a model to infer the privacy of the erased data.

5.2 How CRFU Defends against Privacy Leakage Attacks

CRFU can effectively defend against such privacy leakage attacks because both learning and unlearning are based on the IB framework, which discards maximized information of inputs X𝑋Xitalic_X of the bottleneck Z𝑍Zitalic_Z, leaving little information in the model output for adversaries to infer. Specifically, CRFU implements the data erasure of inputs Xesubscript𝑋𝑒X_{e}italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT by further minimizing I(Xe;Z)𝐼subscript𝑋𝑒𝑍I(X_{e};Z)italic_I ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) based on the minimized I(X;Z)𝐼𝑋𝑍I(X;Z)italic_I ( italic_X ; italic_Z ). Ideally, we denote the information remaining after unlearning as I(Xr;Z)𝐼subscript𝑋𝑟𝑍I(X_{r};Z)italic_I ( italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ; italic_Z ). Therefore, the maximum information that an adversary can infer from the different representations Z𝑍Zitalic_Z before and after unlearning is I(X;Z)I(Xr;Z)=I(Xe;Z)𝐼𝑋𝑍𝐼subscript𝑋𝑟𝑍𝐼subscript𝑋𝑒𝑍I(X;Z)-I(X_{r};Z)=I(X_{e};Z)italic_I ( italic_X ; italic_Z ) - italic_I ( italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ; italic_Z ) = italic_I ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ). During the learning process, the I(Xe;Z)𝐼subscript𝑋𝑒𝑍I(X_{e};Z)italic_I ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) is minimized with a constraint of maintaining enough information for targets, i.e., I(Xe;Z)I(Ye;Z)𝐼subscript𝑋𝑒𝑍𝐼subscript𝑌𝑒𝑍I(X_{e};Z)\geq I(Y_{e};Z)italic_I ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) ≥ italic_I ( italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ). Therefore, the ideal protection of CRFU is I(Xe;Z)=I(Ye;Z)𝐼subscript𝑋𝑒𝑍𝐼subscript𝑌𝑒𝑍I(X_{e};Z)=I(Y_{e};Z)italic_I ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ) = italic_I ( italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ). Attackers are only able to infer information about the labels, but they cannot reconstruct the samples that have been erased.

It can also be explained from the perspective of a Markov chain, which is applied to both the model learning and unlearning processes. In the IB model training, the Markov chain is established as YXZY^𝑌𝑋𝑍^𝑌Y\to X\to Z\to\hat{Y}italic_Y → italic_X → italic_Z → over^ start_ARG italic_Y end_ARG. For the CRFU training, the corresponding chain is YeXeZY^(Xe,Ye)subscript𝑌𝑒subscript𝑋𝑒𝑍subscript^𝑌subscript𝑋𝑒subscript𝑌𝑒Y_{e}\to X_{e}\to Z\to\hat{Y}_{-(X_{e},Y_{e})}italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT → italic_Z → over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT. It ensures that attackers cannot infer more information from Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG than Z𝑍Zitalic_Z in a black-box ML scenario because Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG is derived from Z𝑍Zitalic_Z. Due to the Markov chain principle, once information is lost in one layer, it cannot be regained in subsequent layers. This process can be described as

{I(X;Y)I(Z;Y)I(Y^,Y)I(Xe;Ye)I(Z;Ye)I(Y^(Xe,Ye),Ye)cases𝐼𝑋𝑌𝐼𝑍𝑌𝐼^𝑌𝑌otherwise𝐼subscript𝑋𝑒subscript𝑌𝑒𝐼𝑍subscript𝑌𝑒𝐼subscript^𝑌subscript𝑋𝑒subscript𝑌𝑒subscript𝑌𝑒otherwise\begin{cases}I(X;Y)\geq I(Z;Y)\geq I(\hat{Y},Y)\\ I(X_{e};Y_{e})\geq I(Z;Y_{e})\geq I(\hat{Y}_{-(X_{e},Y_{e})},Y_{e})\end{cases}{ start_ROW start_CELL italic_I ( italic_X ; italic_Y ) ≥ italic_I ( italic_Z ; italic_Y ) ≥ italic_I ( over^ start_ARG italic_Y end_ARG , italic_Y ) end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_I ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) ≥ italic_I ( italic_Z ; italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) ≥ italic_I ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_CELL start_CELL end_CELL end_ROW (14)

Therefore, for an IB model that has undergone unlearning through CRFU, the upper bound of privacy inference attacks based on the different outputs is the reconstruction capability on the different representation Z𝑍Zitalic_Z. This is because the outputs of the original trained model (Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG) and the unlearned model (Y^(Xe,Ye)subscript^𝑌subscript𝑋𝑒subscript𝑌𝑒\hat{Y}_{-(X_{e},Y_{e})}over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT) are derived from the representations before and after unlearning, respectively. The information that the adversary can infer will be no more than I(Ye;Z)𝐼subscript𝑌𝑒𝑍I(Y_{e};Z)italic_I ( italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ).

5.3 β𝛽\betaitalic_β-Compression Defense of CRFU

We provide an example of the representation term’s process using the Gaussian distribution case, which is widely employed in many studies [34, 30]. Let the assumed prior q(Z)=𝒩(Z;0,I)𝑞𝑍𝒩𝑍0Iq(Z)=\mathcal{N}(Z;0,\textbf{I})italic_q ( italic_Z ) = caligraphic_N ( italic_Z ; 0 , I ) and the posterior p(Z|X)=𝒩(Z;μi,σi)𝑝conditional𝑍𝑋𝒩𝑍superscript𝜇𝑖superscript𝜎𝑖p(Z|X)=\mathcal{N}(Z;\mu^{i},\sigma^{i})italic_p ( italic_Z | italic_X ) = caligraphic_N ( italic_Z ; italic_μ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) are Gaussian. Let J𝐽Jitalic_J be the dimensionality of Z𝑍Zitalic_Z, μisuperscript𝜇𝑖\mu^{i}italic_μ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and σisuperscript𝜎𝑖\sigma^{i}italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT are the mean and standard deviation output by the representer θrsuperscript𝜃𝑟\theta^{r}italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT at datapoint xisuperscript𝑥𝑖x^{i}italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, and let μjisubscriptsuperscript𝜇𝑖𝑗\mu^{i}_{j}italic_μ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and σjisubscriptsuperscript𝜎𝑖𝑗\sigma^{i}_{j}italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT denote the j𝑗jitalic_j-th element of these vectors. Then we have

KL[pθr(Z|xi)||qθr(Z)]=12j=1J((μji)2+(σji)2log((σji)2)1)\text{KL}[p_{\theta^{r}}(Z|x^{i})||q_{\theta^{r}}(Z)]=\frac{1}{2}\sum_{j=1}^{J% }((\mu^{i}_{j})^{2}+(\sigma^{i}_{j})^{2}-\log((\sigma^{i}_{j})^{2})-1)KL [ italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z | italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) | | italic_q start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z ) ] = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT ( ( italic_μ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - roman_log ( ( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) - 1 ) (15)

The KL divergence of Eq. 15 is one optimizing term of Eqs. 2 and 13. And the mean μisuperscript𝜇𝑖\mu^{i}italic_μ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and s.d. σisuperscript𝜎𝑖\sigma^{i}italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT are determined by xisuperscript𝑥𝑖x^{i}italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and the representer parameters θrsuperscript𝜃𝑟\theta^{r}italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT. The representer network θrsuperscript𝜃𝑟\theta^{r}italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT transforms the input data xisuperscript𝑥𝑖x^{i}italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT into the parameters of a complex, high-dimensional joint Gaussian distribution. Specifically, for each input xisuperscript𝑥𝑖x^{i}italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, the representer outputs a mean vector μJi={μ1i,μ2i,,μji}subscriptsuperscript𝜇𝑖𝐽subscriptsuperscript𝜇𝑖1subscriptsuperscript𝜇𝑖2subscriptsuperscript𝜇𝑖𝑗\mu^{i}_{J}=\{\mu^{i}_{1},\mu^{i}_{2},...,\mu^{i}_{j}\}italic_μ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT = { italic_μ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_μ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } and a log-variance vector log((σJi)2)={log((σ1i)2),log((σ2i)2),,log((σji)2)}superscriptsubscriptsuperscript𝜎𝑖𝐽2superscriptsubscriptsuperscript𝜎𝑖12superscriptsubscriptsuperscript𝜎𝑖22superscriptsubscriptsuperscript𝜎𝑖𝑗2\log((\sigma^{i}_{J})^{2})=\{\log((\sigma^{i}_{1})^{2}),\log((\sigma^{i}_{2})^% {2}),...,\log((\sigma^{i}_{j})^{2})\}roman_log ( ( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = { roman_log ( ( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , roman_log ( ( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , … , roman_log ( ( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) }. Here, for each input, the representer θrsuperscript𝜃𝑟\theta^{r}italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT will output J𝐽Jitalic_J joint Gaussian distribution. These parameters describe a high-dimensional Gaussian distribution for the data point xisuperscript𝑥𝑖x^{i}italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT as

μJi,log((σJi)2)=θr(xi).subscriptsuperscript𝜇𝑖𝐽superscriptsubscriptsuperscript𝜎𝑖𝐽2superscript𝜃𝑟superscript𝑥𝑖\mu^{i}_{J},\log((\sigma^{i}_{J})^{2})=\theta^{r}(x^{i}).italic_μ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT , roman_log ( ( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (16)

Then, we can calculate the representation ZJisubscriptsuperscript𝑍𝑖𝐽Z^{i}_{J}italic_Z start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT using the reparameterization trick. We sample a latent vector Zjisuperscriptsubscript𝑍𝑗𝑖Z_{j}^{i}italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT from this distribution:

ZJi=μJi+σJiϵ,subscriptsuperscript𝑍𝑖𝐽subscriptsuperscript𝜇𝑖𝐽subscriptsuperscript𝜎𝑖𝐽italic-ϵZ^{i}_{J}=\mu^{i}_{J}+\sigma^{i}_{J}\cdot\epsilon,italic_Z start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT = italic_μ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT + italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ⋅ italic_ϵ , (17)

where ϵ𝒩(0,I)similar-toitalic-ϵ𝒩0𝐼\epsilon\sim\mathcal{N}(0,I)italic_ϵ ∼ caligraphic_N ( 0 , italic_I ). When Eq. 15 is minimized, such as in the case of KL[pθr(Z|xi)||qθr(Z)]=0\text{KL}[p_{\theta^{r}}(Z|x^{i})||q_{\theta^{r}}(Z)]=0KL [ italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z | italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) | | italic_q start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z ) ] = 0, the learned representation posterior p(Z|xi)𝑝conditional𝑍superscript𝑥𝑖p(Z|x^{i})italic_p ( italic_Z | italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) becomes identical to the assumed prior, 𝒩(Z;0,I)𝒩𝑍0I\mathcal{N}(Z;0,\textbf{I})caligraphic_N ( italic_Z ; 0 , I ). In this scenario, the representation Z𝑍Zitalic_Z loses all information about the sample xiXsuperscript𝑥𝑖𝑋x^{i}\in Xitalic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ italic_X and xiXesuperscript𝑥𝑖subscript𝑋𝑒x^{i}\in X_{e}italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT, limiting attackers to inferring only the information of the general prior, 𝒩(Z;0,I)𝒩𝑍0I\mathcal{N}(Z;0,\textbf{I})caligraphic_N ( italic_Z ; 0 , I ). Greater distortion thus offers a stronger defense against privacy inference attacks.

However, to ensure model utility, there is a constraint that Z𝑍Zitalic_Z must contain sufficient information about Y𝑌Yitalic_Y. A distortion rate β𝛽\betaitalic_β is introduced in Eqs. 2 and 13 to balance the information compression and the model utility. We can define the protection capability of CRFU as follows:

Definition 1.

CRFU achieves the β𝛽\betaitalic_β-Compression Defense against the privacy leakage attacks on unlearning if the original IB model is trained using Eq. 2 and CRFU unlearns the trained IB model using Eq. 13, where Eqs. 2 and 13 are optimized using the same compressive parameter β𝛽\betaitalic_β.

Since the compression ratio is controlled by β𝛽\betaitalic_β in Eqs. 2 and 13, an increasing β𝛽\betaitalic_β leads to greater information distortion of X𝑋Xitalic_X and a reduced privacy inference effect from Z𝑍Zitalic_Z while concurrently diminishing the model utility about Y𝑌Yitalic_Y. To balance the tradeoff between prediction utility and defense against privacy leakage attacks, selecting a suitable β𝛽\betaitalic_β is crucial. Our upcoming experiments will further assess the defense effectiveness against privacy leakage attacks at varying levels of β𝛽\betaitalic_β.

TABLE II: Reconstructing quality of different β𝛽\betaitalic_β
Dataset Original Images β=0𝛽0\beta=0italic_β = 0 (Normal unl. models HBU and VBU) β=0.001𝛽0.001\beta=0.001italic_β = 0.001 β=0.01𝛽0.01\beta=0.01italic_β = 0.01 β=0.1𝛽0.1\beta=0.1italic_β = 0.1 β=1𝛽1\beta=1italic_β = 1
MNIST [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
Reconstruction MSE - 231.31 252.23 312.87 478.76 668.07
Inference AUC - 0.674 0.653 0.573 0.526 0.515
KLD to Prior q(Z)𝑞𝑍q(Z)italic_q ( italic_Z ) - 12.39 5.00 1.45 0.37 0.14
Accuracy - 97.45%percent97.4597.45\%97.45 % 97.32%percent97.3297.32\%97.32 % 96.99%percent96.9996.99\%96.99 % 96.45%percent96.4596.45\%96.45 % 95.75%percent95.7595.75\%95.75 %
Fashion MNIST [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
Reconstruction MSE - 220.31 240.41 284.79 386.08 561.73
Inference AUC - 0.656 0.638 0.579 0.519 0.515
KLD to Prior q(Z)𝑞𝑍q(Z)italic_q ( italic_Z ) - 12.23 5.77 1.78 0.33 0.13
Accuracy - 87.95%percent87.9587.95\%87.95 % 87.72%percent87.7287.72\%87.72 % 87.29%percent87.2987.29\%87.29 % 87.15%percent87.1587.15\%87.15 % 86.15%percent86.1586.15\%86.15 %
CIFAR10 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
Reconstruction MSE - 1174.33 1180.84 1188.57 1267.78 1550.24
Inference AUC - 0.821 0.810 0.769 0.669 0.571
KLD to Prior q(Z)𝑞𝑍q(Z)italic_q ( italic_Z ) - 9.09 3.75 1.76 0.53 0.11
Accuracy - 84.75%percent84.7584.75\%84.75 % 84.75%percent84.7584.75\%84.75 % 84.00%percent84.0084.00\%84.00 % 83.96%percent83.9683.96\%83.96 % 82.95%percent82.9582.95\%82.95 %

6 Performance Evaluation

6.1 Experiment Setup

Datasets. We assess the effectiveness of the proposed Compressive Representation Forgetting Unlearning (CRFU) method using four benchmark datasets: MNIST, Fashion-MNIST [35], CIFAR10 [36], and STL-10 [37]. The four datasets are benchmark datasets for image classification tasks, which cover a wide range of object categories with different learning complexities.

Models. In our experiments, we utilize two model architectures of different sizes, a 5-layer multi-layer perceptron (MLP) with ReLU activations and ResNet-18, to construct the IB model. Specifically, we employ two 5-layer MLP models, one as the representer and one as the approximator, to construct the IB models trained on MNIST and Fashion-MNIST. We employ one ResNet-18 as the representer and one 5-layer multi-layer MLP as the approximator to construct the IB models trained on CIFAR10 and STL-10.

For experimental simplicity, we set a consistent minibatch size of m=20𝑚20m=20italic_m = 20. We set the learning rate η=0.001𝜂0.001\eta=0.001italic_η = 0.001 on both the MNIST and Fashion MNIST datasets. For experiments conducted on the CIFAR10 and STL-10 datasets, we employ the learning rate to η=0.0005𝜂0.0005\eta=0.0005italic_η = 0.0005. In the performance evaluation for defense, we set the unlearning rate βu=0.1subscript𝛽𝑢0.1\beta_{u}=0.1italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = 0.1. In the evaluation of unlearning utility, we set the distortion rate β=0.001𝛽0.001\beta=0.001italic_β = 0.001 on MNIST and Fashion MNIST and β=0.0001𝛽0.0001\beta=0.0001italic_β = 0.0001 on CIFAR10 and STL-10. All methods were implemented using PyTorch and tested on a computing cluster equipped with four NVIDIA 1080ti GPUs.

Evaluation Metrics for Attacking Methods and Unlearning Benchmarks. Our evaluation focuses on two aspects: the defense against privacy leakage attacks and the utility of the unlearned model. From the perspective of defense capabilities, we test all unlearning methods against the state-of-the-art attack implementations as described in [10, 6], including the reconstruction attack and the membership inference attack. For an effective attack, we implement the reconstruction and membership inference attacks directly on the representation Z𝑍Zitalic_Z, representing the upper bound of such attacks in CRFU, based on the outputs Y^(Xe,Ye)subscript^𝑌subscript𝑋𝑒subscript𝑌𝑒\hat{Y}_{-(X_{e},Y_{e})}over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT. As discussed in Section 5, an attacker cannot infer more information from Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG and Y^(Xe,Ye)subscript^𝑌subscript𝑋𝑒subscript𝑌𝑒\hat{Y}_{-(X_{e},Y_{e})}over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT than from Z𝑍Zitalic_Z before and after unlearning, since both the original model and CRFU base their outputs on Z𝑍Zitalic_Z. Following the approach in [10], we measure the quality of reconstruction using mean square error (MSE). We rely on the traditional AUC metric to measure the absolute performance of the membership inference according to [6].

From the perspective of unlearned model utility, we compare the effectiveness and efficiency of CRFU and state-of-the-art two main kinds of approximate unlearning methods, including Hessian-matrix-based unlearning (HBU) [21, 15] and variational bayesian unlearning (VBU) [5, 4]. To rigorously assess the effectiveness of unlearning methods, we adopt a widely-used technique as outlined by Hu et al.[16], which involves embedding backdoor triggers into the samples that are to be erased during the initial training of the original IB model. The objective of all unlearning methods is to eliminate the influence of these embedded backdoors from the trained models. After unlearning, we evaluate the success of these methods by checking whether the backdoor still poses a threat to the unlearned model. The effectiveness of the unlearning methods is gauged in two ways: firstly, by measuring the model’s accuracy on a test dataset, and secondly, by assessing the backdoor accuracy on the erased dataset [16]. Additionally, we analyze the efficiency of unlearning by timing the model’s run, computed as the product of the per-batch training time and the total number of training epochs.

6.2 Evaluations of Defense Capability

As analyzed in Section 5, CRFU performs unlearning based on a trained IB model. Increasing the value of β𝛽\betaitalic_β during training leads to a learned representation that more closely approximates the general prior. A smaller KLD, KL[p(Z|X)||q(Z)]\text{KL}[p(Z|X)||q(Z)]KL [ italic_p ( italic_Z | italic_X ) | | italic_q ( italic_Z ) ], implies better distortion quality. However, increased distortion can compromise prediction accuracy. To explore the relationship between β𝛽\betaitalic_β and its defense effectiveness, we conducted experiments with varying β𝛽\betaitalic_β values in both the original model and the CRFU on the MNIST, Fashion MNIST, and CIFAR10 datasets. For clarity of illustration, we have not added backdoors in the erased data during this experiment. We set a fixed unlearning rate βu=0.1subscript𝛽𝑢0.1\beta_{u}=0.1italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = 0.1, and only 9 erased samples of the full training data. To facilitate the experimental process easily, we carried out the reconstruction and membership inference attack on the representation Z𝑍Zitalic_Z, marking the highest level of attack an adversary can achieve with Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG and Y^(Xe,Ye)subscript^𝑌subscript𝑋𝑒subscript𝑌𝑒\hat{Y}_{-(X_{e},Y_{e})}over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - ( italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT. The results showcasing the reconstruction effect and model utility across different β𝛽\betaitalic_β values are presented in Table II.

In our study on the MNIST dataset, as detailed in rows 2 to 6 of Table II, we observed that the quality of reconstruction and the AUC of membership inference is optimal when β=0𝛽0\beta=0italic_β = 0. Under this condition, the method can recover the highest level of detail from the original images. This is reflected in the recorded MSE for reconstruction, which is the lowest observed value at 231.31. At the same time, the inference AUC is the highest at 0.674. When β=0𝛽0\beta=0italic_β = 0, the model will not distort the information of X𝑋Xitalic_X from the representation Z𝑍Zitalic_Z, making the model similar to normal unlearning methods (HBU and VBU) that have not considered information compression.

The reconstruction MSE rises, and the inference AUC decreases, with increasing β𝛽\betaitalic_β, aligning with our prior analysis that a larger β𝛽\betaitalic_β results in greater distortion. Consequently, Z𝑍Zitalic_Z contains less information about Xesubscript𝑋𝑒X_{e}italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT. As β𝛽\betaitalic_β increases to 1 in our experiments, the KLD between p(Z|X)𝑝conditional𝑍𝑋p(Z|X)italic_p ( italic_Z | italic_X ) and q(Z)𝑞𝑍q(Z)italic_q ( italic_Z ) attains its minimal value. This indicates that the representation Z𝑍Zitalic_Z becomes most akin to the general prior, achieving an optimal level of distortion. This heightened distortion renders the reconstruction process more challenging. At such a distortion ratio, attackers are likely to reconstruct only a blurry image from Z𝑍Zitalic_Z, which lacks much of the detailed information present in the original images. Correspondingly, the MSE for reconstruction is at its highest at this point, registering at 668.07. Conversely, as β𝛽\betaitalic_β increases, there is a slight decrease in prediction accuracy. Specifically, the accuracy drops from 97.45%percent97.4597.45\%97.45 % when β=0𝛽0\beta=0italic_β = 0 to 95.75%percent95.7595.75\%95.75 % when β=1𝛽1\beta=1italic_β = 1.

In our experiments with the Fashion MNIST dataset, detailed in rows 7 to 11 of Table II, we noted trends akin to those observed in the MNIST dataset. Specifically, as β𝛽\betaitalic_β increases, the model’s learned representation increasingly resembles the assumed general prior. This similarity makes the tasks of reconstructing the original images and inferring membership more challenging. Specifically, the increase in β𝛽\betaitalic_β from 0 to 1 leads to a rise in the difficulty of reconstruction and membership inference. The images reconstructed under higher β𝛽\betaitalic_β values tend to lose more detailed information from the original images. Correspondingly, the MSE for reconstruction increases from 220.31 to 561.73, and the AUC decreases from 0.656 to 0.515. At the same time, the KLD to the assumed prior decreases from 12.23 to 0.13, indicating that the representation is becoming more similar to the general prior. Alongside these changes, the prediction accuracy of the model experiences a minor decrease, moving from 87.95%percent87.9587.95\%87.95 % at β=0𝛽0\beta=0italic_β = 0 to 86.15%percent86.1586.15\%86.15 % at β=1𝛽1\beta=1italic_β = 1.

The defense evaluations on the CIFAR10 dataset, as detailed in row 12 of Table II, demonstrate the challenges of CIFAR10 image recovery. Given the dataset’s complexity, unlike MNIST and Fashion MNIST, the intricate nature of CIFAR10 images makes the visual assessment of reconstruction quality less straightforward. However, we can derive conclusions similar to those from the other datasets by examining the reconstruction MSE, membership inference AUC, and model accuracy. When β=0𝛽0\beta=0italic_β = 0, attackers are able to reconstruct images from CIFAR10 with an MSE of 1174.33. As β𝛽\betaitalic_β increases, the difficulty of reconstruction also rises, with the MSE climbing to 1550.24 at β=1𝛽1\beta=1italic_β = 1. This trend indicates that a larger β𝛽\betaitalic_β leads to the discarding of more information from the model’s representation, albeit at the cost of a slight decrease in prediction accuracy. Specifically, on the CIFAR10 dataset, the accuracy decreases from 84.75%percent84.7584.75\%84.75 % at β=0𝛽0\beta=0italic_β = 0 to 82.95%percent82.9582.95\%82.95 % at β=1𝛽1\beta=1italic_β = 1. At the same time, more information distorted hinders membership inference attacks, resulting in decreasing inference AUC from 0.821 to 0.571.

In short, CRFU can achieve a better reconstruction and membership inference attack defense when the distortion ratio β𝛽\betaitalic_β is larger. Simultaneously, as β𝛽\betaitalic_β increases, the accuracy drops a little. However, compared with the improvement of the defense effect on the reconstruction and membership inference attacks, we consider this accuracy reduction insignificant in the experimental parameters range.

TABLE III: Overall Unlearning Effectiveness and Efficiency Evaluation on MNIST, Fashion MNIST, CIFAR10 and STL-10
MNIST 𝐸𝐷𝑅=6%𝐸𝐷𝑅percent6{\it EDR}=6\%italic_EDR = 6 %, Rep.: MLP, App.: MLP
Origin HBU VBU CRFU Retrain
Running Time (s) 44 2.42 0.20 0.15 41.36
Acc. on test dataset 97.6% 87.99% 92.97% 94.72% 97.63%
Backdoor Acc. 100% 0.04% 0.42% 0.58% 0.08%
Fashion MNIST 𝐸𝐷𝑅=6%𝐸𝐷𝑅percent6{\it EDR}=6\%italic_EDR = 6 %, Rep.: MLP, App.: MLP
Origin HBU VBU CRFU Retrain
Running Time (s) 190.8 12.67 1.99 1.56 179.35
Acc. on test dataset 88.4% 76.13% 81.02% 82.96% 88.6%
Backdoor Acc. 100% 0.04% 1.66% 1.63% 0.08%
CIFAR10 𝐸𝐷𝑅=6%𝐸𝐷𝑅percent6{\it EDR}=6\%italic_EDR = 6 %, Rep.: Resnet18, App.: MLP
Origin HBU VBU CRFU Retrain
Running Time (s) 552 28.21 0.77 0.72 518.88
Acc. on test dataset 81.16% 74.32% 74.16% 75.17% 86.65%
Backdoor Acc. 99.83% 1.27% 1.13% 1.7% 0.03%
STL-10 𝐸𝐷𝑅=6%𝐸𝐷𝑅percent6{\it EDR}=6\%italic_EDR = 6 %, Rep.: Resnet18, App.: MLP
Origin HBU VBU CRFU Retrain
Running Time (s) 497 31.44 10.60 11.24 467
Acc. on test dataset 63.38% 50.53% 50.40% 52.67% 61.65%
Backdoor Acc. 99.93% 1.33% 1.46% 1.67% 0.01%
Refer to caption
(a) Different EDR on MNIST
Refer to caption
(b) On Fashion MNIST
Refer to caption
(c) Different EDR on CIFAR10
Refer to caption
(d) Different EDR on STL-10
Refer to caption
(e) Different EDR on MNIST
Refer to caption
(f) On Fashion MNIST
Refer to caption
(g) Different EDR on CIFAR10
Refer to caption
(h) Different EDR on STL-10
Refer to caption
(i) Different EDR on MNIST
Refer to caption
(j) On Fashion MNIST
Refer to caption
(k) Different EDR on CIFAR10
Refer to caption
(l) Different EDR on STL-10
Figure 4: Performance of different unlearning methods of various 𝐸𝐷𝑅𝐸𝐷𝑅{\it EDR}italic_EDR

6.3 Evaluations of Model Utility

After demonstrating the defense effect of CRFU against reconstruction attacks on unlearning, we compare the unlearning performance of CRFU and other existing state-of-the-art HBU and VBU methods. As introduced at the experiment setup, we refer to [16] to add backdoors in the erased dataset to test the unlearning effect. Although all methods successfully unlearn the erased dataset, continued unlearning training will deteriorate the model’s utility. Therefore, we set a threshold of 2%percent22\%2 % backdoor accuracy on the erased data, halting the model training upon reaching this threshold. This backdoor accuracy threshold guarantees the unlearning effect much better than randomly selecting, which is 10%percent1010\%10 % backdoor accuracy. Moreover, we set fixed β𝛽\betaitalic_β as introduced before and set βu=0.1subscript𝛽𝑢0.1\beta_{u}=0.1italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = 0.1 here.

We evaluate the utility of different unlearning methods from two aspects: effectiveness and efficiency. Overall results on four datasets, MNIST, Fashion MNIST, CIFAR10, and STL-10, are shown in Table III, where the erased data ratio (𝐸𝐷𝑅𝐸𝐷𝑅{\it EDR}italic_EDR) is 6%percent66\%6 % of the training data and unlearning rate βu=0.1subscript𝛽𝑢0.1\beta_{u}=0.1italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = 0.1. Since these models are backdoored, accuracy on the test dataset may decrease, especially on CIFAR10, only around 81.16%percent81.1681.16\%81.16 % here. All unlearning methods successfully diminish the impact of backdoored data from the trained model lower than 2%percent22\%2 % backdoor accuracy. HBU achieves the best removal effect on MNIST, Fashion MNIST, and STL-10, but it consumes the longest running time and significantly degrades the model’s accuracy. CRFU achieves the best performance in both running time and accuracy on the test dataset most of the time. Though CRFU has not achieved the best backdoor removal effect, it effectively implements the unlearning of backdoored samples, reducing the backdoor accuracy to lower than 2%. Detailed comparisons of different unlearning methods on MNIST, Fashion MNIST, CIFAR10, and STL-10 are demonstrated in Figure 4 and will be introduced in the following.

6.3.1 Efficiency of Unlearning

We evaluate the efficiency through the running time of three unlearning methods: CRFU, VBU and HBU. Figures 4(a), 4(b), 4(c) and 4(d) show the results of the running time of all compared methods on MNIST, Fashion MNIST, CIFAR10, and STL-10. When 𝐸𝐷𝑅𝐸𝐷𝑅{\it EDR}italic_EDR is larger, it backdoors the model deeper and takes increasing time to remove the influence of these injected backdoors. It is proven on all three datasets that the running time has a slight increase as the 𝐸𝐷𝑅𝐸𝐷𝑅{\it EDR}italic_EDR increases. CRFU consumes a similar running time as VBU, both achieving a speedup exceeding 10×10\times10 × when compared to HBU on MNIST, Fashion MNIST, and CIFAR10. HBU demands the most running time due to its requirement to compute the Hessian matrix using the remaining dataset to estimate the contribution of the erased data, which consumes much more time than directly unlearning based on the erased dataset. Even though HBU consumes the highest running time, it still can achieve a huge speedup compared to retraining from scratch.

6.3.2 Effectiveness of Unlearning

The effectiveness of unlearning is assessed based on two metrics: the model’s accuracy on the test dataset and the backdoor accuracy on the erased dataset. These are presented in Figures 4(e), 4(f), 4(g) and 4(h) for test dataset accuracy, and Figures 4(i), 4(j), 4(k) and 4(l) for backdoor accuracy, respectively. First, the accuracy of the unlearned models decreases slightly as 𝐸𝐷𝑅𝐸𝐷𝑅{\it EDR}italic_EDR increases on all three datasets. The only exception is when 𝐸𝐷𝑅=2%𝐸𝐷𝑅percent2{\it EDR}=2\%italic_EDR = 2 %, unlearning this small backdoored dataset has a huge accuracy degradation than bigger 𝐸𝐷𝑅𝐸𝐷𝑅{\it EDR}italic_EDR. Through extensive experimentation, we discovered that while 𝐸𝐷𝑅=2%𝐸𝐷𝑅percent2{\it EDR}=2\%italic_EDR = 2 % backdoored samples can successfully implant a backdoor in the model, they do not embed it as deeply as a larger 𝐸𝐷𝑅𝐸𝐷𝑅{\it EDR}italic_EDR would. As a result of this shallow backdooring, unlearning the backdoor trigger using these erased samples (most features are valid) tends to damage the model more significantly. In this context, RFU achieves the best accuracy maintenance compared with the other two state-of-the-art unlearning methods, improving around 2%percent22\%2 % accuracy.

Second, from the perspective of backdoor accuracy of the unlearned model, all methods effectively eliminate the impact of backdoored data, making the backdoor accuracy lower than 2%percent22\%2 %, which is much lower than randomly selecting 10%percent1010\%10 %. Though HBU performs the worst in accuracy maintenance, it achieves the best backdoor removal in most of the results on the four datasets. CRFU achieves a similar backdoor removal effect as VBU on Fashion MNIST but slightly higher than VBU on CIFAR10 and STL-10.

Refer to caption
Figure 5: The variations in accuracy on the remaining (abbreviated as Re.) dataset and backdoor accuracy on the erased (abbreviated as Er.) dataset during unlearning of various unlearning rate βusubscript𝛽𝑢\beta_{u}italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT on MNIST
Refer to caption
Figure 6: The variations in accuracy on the remaining (abbreviated as Re.) dataset and backdoor accuracy on the erased (abbreviated as Er.) dataset during unlearning of various βusubscript𝛽𝑢\beta_{u}italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT on Fashion MNIST
Refer to caption
Figure 7: The variations in accuracy on the remaining (abbreviated as Re.) dataset and backdoor accuracy on the erased (abbreviated as Er.) dataset during unlearning of various unlearning rate βusubscript𝛽𝑢\beta_{u}italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT on CIFAR10

6.4 Influence of the Proposed Unlearning Rate

In this section, we evaluate the influence of the introduced unlearning rate. Since only our method has this parameter, we directly show the training progress of our method of different unlearning rate βusubscript𝛽𝑢\beta_{u}italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT to analyze the influence of βusubscript𝛽𝑢\beta_{u}italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT in CRFU. Specifically, we show the changes in model accuracy and backdoor accuracy in each CRFU training epoch on the remaining (abbreviated as Re.) dataset and the erased (abbreviated as Er.) dataset, respectively. To better illustrate the training process, we continue the model unlearning even when reaching the 2%percent22\%2 % backdoor accuracy threshold set before. Moreover, we set β=0.001𝛽0.001\beta=0.001italic_β = 0.001 for MNIST and Fashion MNIST and β=0.0001𝛽0.0001\beta=0.0001italic_β = 0.0001 for CIFAR10 and set 𝐸𝐷𝑅=6%𝐸𝐷𝑅percent6{\it EDR}=6\%italic_EDR = 6 %.

Figure 5 illustrates the results of variations in accuracy and backdoor accuracy, using different βusubscript𝛽𝑢\beta_{u}italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, from 0.010.010.010.01 to 1111, on MNIST. When βusubscript𝛽𝑢\beta_{u}italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT is large, CRFU unlearns the backdoored samples faster than βusubscript𝛽𝑢\beta_{u}italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT is small because the larger unlearning rate βusubscript𝛽𝑢\beta_{u}italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT means CRFU updating with a larger content on the forgetting terms of Eqs. 11 and 12. On CIFAR10, when βu=0.01subscript𝛽𝑢0.01\beta_{u}=0.01italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = 0.01, the accuracy of the model on the remaining dataset even does not decrease in the former 150 epochs training, but the model still removes the backdoor samples within ten epochs at the same time, as shown in Figure 7.

Controlling the unlearning speed is one of the advantages of the unlearning rate βusubscript𝛽𝑢\beta_{u}italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT; slowing down the accuracy degradation during unlearning training is another better advantage. A slower unlearning accuracy degradation speed makes the unlearning catastrophic controllable. For example, when we choose a small βusubscript𝛽𝑢\beta_{u}italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, such as 0.010.010.010.01, when the backdoor accuracy decreases to 00, the accuracy of the model still performs like the original model and drops slowly in the continuing training. It gives us more time to observe the unlearning process and stop the model training if accuracy degradation appears. By contrast, if we do not have the unlearning rate βusubscript𝛽𝑢\beta_{u}italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, i.e., the βu=1subscript𝛽𝑢1\beta_{u}=1italic_β start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = 1 in all situations, the catastrophic unlearning appears quickly after unlearning the erased samples, which is obvious in Figures 5 and 6.

7 Summary and Future Work

In the paper, we propose the CRFU scheme to defend against privacy leakage attacks on machine unlearning. Specifically, we minimize the information of the erased data remaining in the learned representation to remove the contribution of specified data from the trained model. Since the representation extracts only pertinent information about labels from inputs while distorting other details, it effectively shields against privacy leakage attacks on unlearning. To avoid catastrophic unlearning, we design the remembering constraint term during data erasure and propose an unlearning rate to control the unlearning extent. Our theoretical analysis focuses on the security of CRFU in defense against reconstruction attacks. Additionally, comprehensive experimental evidence shows that our protocol can effectively counter both reconstruction and membership inference attacks based on unlearning model updates while minimizing the impact on unlearning accuracy.

Although CRFU demonstrates an effective defense against the reconstruction and membership inference attacks based on the unlearning model updates, there remains a vast uncharted area in preventing privacy leakages caused by machine unlearning. Future work should continue this line of inquiry, developing more robust privacy-preserving unlearning methods to defend against more attacks, such as adaptive attacks, where adversaries dynamically adjust their strategies based on the defense mechanisms. Additionally, as we analyzed before, the defense upper bound of CRFU is I(Ye;Z)𝐼subscript𝑌𝑒𝑍I(Y_{e};Z)italic_I ( italic_Y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ; italic_Z ), which means adversaries can still infer the information about the unlearned labels. Exploring the gradient update forging or utilizing differential privacy techniques to overcome this challenge would be promising. Last but not least, we envision the deployment of the CRFU mechanism in practical applications, including graph analysis, point-of-interest recommendation, and medical diagnosis, thereby protecting participants’ privacy.

References
  • Mantelero [2013] A. Mantelero, “The EU proposal for a general data protection regulation and the roots of the ’right to be forgotten’,” Comput. Law Secur. Rev., vol. 29, no. 3, pp. 229–235, 2013.
  • Cao and Yang [2015] Y. Cao and J. Yang, “Towards making systems forget with machine unlearning,” in 2015 IEEE Symposium on Security and Privacy.   IEEE, 2015, pp. 463–480.
  • Bourtoule et al. [2021] L. Bourtoule, V. Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, and N. Papernot, “Machine unlearning,” in 2021 IEEE Symposium on Security and Privacy (SP).   IEEE, 2021, pp. 141–159.
  • Nguyen et al. [2020] Q. P. Nguyen, B. K. H. Low, and P. Jaillet, “Variational bayesian unlearning,” Advances in Neural Information Processing Systems, vol. 33, pp. 16 025–16 036, 2020.
  • Fu et al. [2022] S. Fu, F. He, and D. Tao, “Knowledge removal in sampling-based bayesian inference,” in The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022.   OpenReview.net, 2022.
  • Chen et al. [2021] M. Chen, Z. Zhang, T. Wang, M. Backes, M. Humbert, and Y. Zhang, “When machine unlearning jeopardizes privacy,” in Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, 2021, pp. 896–911.
  • Zhang et al. [2023] K. Zhang, W. Wang, Z. Fan, X. Song, and S. Yu, “Conditional matching gan guided reconstruction attack in machine unlearning,” in GLOBECOM 2023-2023 IEEE Global Communications Conference.   IEEE, 2023, pp. 44–49.
  • Hu et al. [2024] H. Hu, S. Wang, T. Dong, and M. Xue, “Learn what you want to unlearn: Unlearning inversion attacks against machine unlearning,” in 2024 IEEE Symposium on Security and Privacy (SP), 2024, pp. 262–262.
  • Gao et al. [2022] J. Gao, S. Garg, M. Mahmoody, and P. N. Vasudevan, “Deletion inference, reconstruction, and compliance in machine (un) learning,” Proceedings on Privacy Enhancing Technologies, vol. 3, pp. 415–436, 2022.
  • Salem et al. [2020] A. Salem, A. Bhattacharya, M. Backes, M. Fritz, and Y. Zhang, “Updates-leak: Data set inference and reconstruction attacks in online learning,” in 29th USENIX Security Symposium, USENIX Security 2020, August 12-14, 2020.   USENIX Association, 2020, pp. 1291–1308.
  • Dwork [2006] C. Dwork, “Differential privacy,” in Automata, Languages and Programming: 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10-14, 2006, Proceedings, Part II 33.   Springer, 2006, pp. 1–12.
  • Du et al. [2019] M. Du, Z. Chen, C. Liu, R. Oak, and D. Song, “Lifelong anomaly detection through unlearning,” in Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, 2019, pp. 1283–1297.
  • Tishby et al. [2000] N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” arXiv preprint physics/0004057, 2000.
  • Papernot et al. [2017] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” in Proceedings of the 2017 ACM on Asia conference on computer and communications security, 2017, pp. 506–519.
  • Guo et al. [2020] C. Guo, T. Goldstein, A. Y. Hannun, and L. van der Maaten, “Certified data removal from machine learning models,” in Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, ser. Proceedings of Machine Learning Research, vol. 119.   PMLR, 2020, pp. 3832–3842.
  • Hu et al. [2022] H. Hu, Z. Salcic, G. Dobbie, J. Chen, L. Sun, and X. Zhang, “Membership inference via backdooring,” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, L. D. Raedt, Ed.   ijcai.org, 2022, pp. 3832–3838.
  • Graves et al. [2021] L. Graves, V. Nagisetty, and V. Ganesh, “Amnesiac machine learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 13, 2021, pp. 11 516–11 524.
  • Garg et al. [2020] S. Garg, S. Goldwasser, and P. N. Vasudevan, “Formalizing data deletion in the context of the right to be forgotten,” in Advances in Cryptology - EUROCRYPT 2020 - 39th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Zagreb, Croatia, May 10-14, 2020, Proceedings, Part II, ser. Lecture Notes in Computer Science, A. Canteaut and Y. Ishai, Eds., vol. 12106.   Springer, 2020, pp. 373–402.
  • Yan et al. [2022] H. Yan, X. Li, Z. Guo, H. Li, F. Li, and X. Lin, “ARCANE: an efficient architecture for exact machine unlearning,” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, L. D. Raedt, Ed.   ijcai.org, 2022, pp. 4006–4013.
  • Wu et al. [2022] G. Wu, M. Hashemi, and C. Srinivasa, “Puma: Performance unchanged model augmentation for training data removal,” arXiv preprint arXiv:2203.00846, 2022.
  • Sekhari et al. [2021] A. Sekhari, J. Acharya, G. Kamath, and A. T. Suresh, “Remember what you want to forget: Algorithms for machine unlearning,” Advances in Neural Information Processing Systems, vol. 34, 2021.
  • Box and Tiao [2011] G. E. Box and G. C. Tiao, Bayesian inference in statistical analysis.   John Wiley & Sons, 2011.
  • Lu et al. [2022] Z. Lu, H. Liang, M. Zhao, Q. Lv, T. Liang, and Y. Wang, “Label-only membership inference attacks on machine unlearning without dependence of posteriors,” International Journal of Intelligent Systems, vol. 37, no. 11, pp. 9424–9441, 2022.
  • Fredrikson et al. [2014] M. Fredrikson, E. Lantz, S. Jha, S. M. Lin, D. Page, and T. Ristenpart, “Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing,” in Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, USA, August 20-22, 2014.   USENIX Association, 2014, pp. 17–32.
  • Mai et al. [2018] G. Mai, K. Cao, P. C. Yuen, and A. K. Jain, “On the reconstruction of face images from deep face templates,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 5, pp. 1188–1202, 2018.
  • Wang and Gong [2018] B. Wang and N. Z. Gong, “Stealing hyperparameters in machine learning,” in 2018 IEEE symposium on security and privacy (SP).   IEEE, 2018, pp. 36–52.
  • Kingma and Welling [2014] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” stat, vol. 1050, p. 1, 2014.
  • Tishby and Zaslavsky [2015] N. Tishby and N. Zaslavsky, “Deep learning and the information bottleneck principle,” in 2015 ieee information theory workshop (itw).   IEEE, 2015, pp. 1–5.
  • Blahut [1972] R. Blahut, “Computation of channel capacity and rate-distortion functions,” IEEE transactions on Information Theory, vol. 18, no. 4, pp. 460–473, 1972.
  • Alemi et al. [2017] A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings.   OpenReview.net, 2017.
  • Achille and Soatto [2018] A. Achille and S. Soatto, “Information dropout: Learning optimal representations through noisy computation,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 12, pp. 2897–2905, 2018.
  • Bang et al. [2021] S. Bang, P. Xie, H. Lee, W. Wu, and E. Xing, “Explaining a black-box by using a deep variational information bottleneck approach,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 13, 2021, pp. 11 396–11 404.
  • Joyce [2011] J. M. Joyce, “Kullback-leibler divergence,” in International encyclopedia of statistical science.   Springer, 2011, pp. 720–722.
  • Kusner et al. [2017] M. J. Kusner, B. Paige, and J. M. Hernández-Lobato, “Grammar variational autoencoder,” in International conference on machine learning.   PMLR, 2017, pp. 1945–1954.
  • Xiao et al. [2017] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747, 2017.
  • Krizhevsky et al. [2009] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
  • Coates et al. [2011] A. Coates, A. Ng, and H. Lee, “An analysis of single-layer networks in unsupervised feature learning,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics.   JMLR Workshop and Conference Proceedings, 2011, pp. 215–223.
[Uncaptioned image] Weiqi Wang (IEEE M’24) received the M.Sc. degree in computer science from Soochow University, Suzhou, China, in 2018. He is currently pursuing the Ph.D. degree with the School of Computer Science, University of Technology Sydney, where he is advised by Prof. Shui Yu. He previously worked as a senior algorithm engineer at the Department of AI-Strategy, Local consumer services segment, Alibaba Group. He has been actively involved in the research community by serving as a reviewer for prestige journals such as ACM Computing Surveys, IEEE Communications Surveys and Tutorials, IEEE TIFS, IEEE TDSC, IEEE TIP, IEEE Transactions on SMC, and IEEE IOTJ, and international conferences such as The ACM Web Conference (WWW), ICLR, IEEE ICC, and IEEE GLOBECOM. His research interests are in machine unlearning, federated learning, and security and privacy in machine learning.
[Uncaptioned image] Chenhan Zhang (IEEE S’19 M’24) obtained his Ph.D. from University of Technology Sydney, Australia, in 2024, where he was advised by Prof. Shui Yu. Before that, he obtained his B.Eng. (Honours) from University of Wollongong, Australia, and M.S. from City University of Hong Kong, Hong Kong, in 2017 and 2019, respectively. He is currently a postdoctoral research fellow at Cyber Security Hub, Macquarie University, Australia. His research interests include security and privacy in graph neural networks and trustworthy spatiotemporal cyber physical systems. He has been actively involved in the research community by serving as a reviewer for prestige venues such as ICLR, IJCAI, INFOCOM, IEEE TDSC, IEEE IoTJ, ACM Computing Survey, and IEEE Communications Surveys and Tutorials.
[Uncaptioned image] Zhiyi Tian (IEEE M’24) received the B.S. degree and the M.S. degree from Sichuan University, China, in 2017 and 2020, respectively. He received the Ph.D. degree in 2024 from University of Technology Sydney, Australia. He currently is a research associate of University of Technology Sydney, Australia. His research interests include security and privacy in deep learning, semantic communications. He has been actively involved in the research community by serving as a reviewer for prestige journals, such as ACM Computing Surveys, IEEE Communications Surveys and Tutorials, TIFS, TKDD, and international conferences, such as IEEE ICC and IEEE GLOBECOM.
[Uncaptioned image] Shushu Liu (IEEE M’22) is a researcher in Nokia bell Labs, Espoo, Finland. She received her Ph.D. degree from the Department of Communication and Networking, Aalto University, Espoo in 2023, Finland. She received the B.Sc. and M.Sc. degrees in computer science from Soochow University, Suzhou, China, in 2014 and 2017. She has been serving as the reviewer for prestigious journals such as IEEE INFOCOM, IEEE ICDCS, IEEE GlobeCom, IEEE ICC, ACM MobileHCI, ACM SAC, etc. Her research interests include data security and privacy, 5G communication and networking, Web 3.0 and etc.
[Uncaptioned image] Shui Yu (IEEE F’23) obtained his PhD from Deakin University, Australia, in 2004. He is a Professor of School of Computer Science, Deputy Chair of University Research Committee, University of Technology Sydney, Australia. His research interest includes Cybersecurity, Network Science, Big Data, and Mathematical Modelling. He has published five monographs and edited two books, more than 600 technical papers at different venues, such as IEEE TDSC, TPDS, TC, TIFS, TMC, TKDE, TETC, ToN, and INFOCOM. His current h-index is 80. Professor Yu promoted the research field of networking for big data since 2013, and his research outputs have been widely adopted by industrial systems, such as Amazon cloud security. He is currently serving the editorial boards of IEEE Communications Surveys and Tutorials (Area Editor) and IEEE Internet of Things Journal (Editor). He served as a Distinguished Lecturer of IEEE Communications Society (2018-2021). He is a Distinguished Visitor of IEEE Computer Society, and an elected member of Board of Governors of IEEE VTS and ComSoc, respectively. He is a member of ACM and AAAS, and a Fellow of IEEE.
Лучший частный хостинг