Loss Traces: Free Record-Level Privacy Risk Evaluation

TL;DR

You can estimate the vulnerability of a training sample to privacy attacks by looking at its loss trace.

Easy-to-fit outlier

Loss drops late but reaches near zero

29% vulnerable

Hard-to-fit outlier

Loss drops slowly, stays relatively high

12% vulnerable

Average samples

Loss drops quickly and stays low

4.6% vulnerable

Our method LT-IQR (Loss Trace Interquartile Range) analyzes per-sample loss trajectories during training to identify vulnerable samples.
We define "vulnerable" as samples that are confidently and correctly identified by the LiRA membership inference attack at FPR=10⁻³.
On CIFAR-10, LT-IQR achieves 92% precision at identifying the most vulnerable 1% of samples - all without training any shadow models!

Abstract

Membership inference attacks (MIAs) are widely used to empirically assess privacy risks in machine learning models, both providing model-level vulnerability metrics and identifying the most vulnerable training samples. State-of-the-art methods, however, require training hundreds of shadow models with the same architecture as the target model. This makes the computational cost of assessing the privacy of models prohibitive for many practical applications, particularly when used iteratively as part of the model development process and for large models.

We propose a novel approach for identifying the training samples most vulnerable to membership inference attacks by analyzing artifacts naturally available during the training process. Our method, Loss Trace Interquantile Range (LT-IQR), analyzes per-sample loss trajectories collected during model training to identify high-risk samples without requiring any additional model training.

Through experiments on standard benchmarks, we demonstrate that LT-IQR achieves 92% precision@k=1% in identifying the samples most vulnerable to state-of-the-art MIAs. This result holds across datasets and model architectures with LT-IQR outperforming both traditional vulnerability metrics, such as loss, and lightweight MIAs using few shadow models. We also show LT-IQR to accurately identify points vulnerable to multiple MIA methods and perform ablation studies.

We believe LT-IQR enables model developers to identify vulnerable training samples, for free, as part of the model development process. Our results emphasize the potential of artifact-based methods to efficiently evaluate privacy risks.

Main Results

Figure 1: Precision@k=1% when identifying vulnerable samples

Figure 1: Precision@k=1% (250 samples) when identifying vulnerable samples determined by LiRA attack at a variable FPR threshold.

What this means: We asked different methods to identify the 250 most vulnerable training samples (top 1%). Our method got it right 92% of the time.

Traditional approaches fail: Methods that only look at the final model state — like checking which samples have low loss (21% precision) or high gradient norms (20% precision) — perform barely better than random guessing.

We beat expensive methods: Even RMIA, a state-of-the-art attack that requires training 2 shadow models, achieves slightly lower precision. Our method requires zero additional model training.

The key insight: By tracking how each sample's loss changes throughout training (not just at the end), we can identify which samples are being memorized and are therefore vulnerable to privacy attacks.

Implementation

You can collect per-sample losses for free during training by simply changing the loss reduction:

# Standard PyTorch training loop
criterion = nn.CrossEntropyLoss(reduction="none")  # Change from default "mean"

# During training
loss = criterion(outputs, targets)
# Here loss has shape [batch_size] - per-sample losses

# Save the per-sample losses
saved_losses.append(loss.detach())

# Take mean for backward pass
loss.mean().backward()

That's it! No shadow models, no additional training - just analyze the loss traces you're already computing.

Detailed Results

Performance at different coverage levels (k) when identifying points vulnerable to LiRA attack at FPR=10⁻³

Dataset	Model	k = 1%		k = 3%		k = 5%		k = 10%
Dataset	Model	Precision	Recall	Precision	Recall	Precision	Recall	Precision	Recall
CIFAR-10	RN-20	0.79	0.15	0.64	0.36	0.55	0.51	0.39	0.71
CIFAR-10	WRN40-4	0.91	0.11	0.83	0.29	0.75	0.43	0.59	0.69
CIFAR-10	WRN28-2	0.92	0.09	0.83	0.26	0.76	0.39	0.60	0.61
CIFAR-100	WRN28-2	0.97	0.04	0.94	0.12	0.90	0.19	0.83	0.34
CINIC-10	WRN28-2	0.94	0.07	0.88	0.20	0.82	0.32	0.71	0.55

At k=1%: LT-IQR achieves 79-97% precision across all configurations - that's correctly identifying vulnerable samples in the top 250 with near-perfect accuracy.

Graceful degradation: Even when asked to identify the top 10% (2,500 samples), we maintain 60-83% precision. The method scales well to larger coverage.

High recall possible: At k=10%, we can identify up to 71% of all vulnerable points (see RN-20), showing we can catch most at-risk samples when needed.

Dataset complexity matters: CIFAR-100 shows the best performance (97% precision), likely because the more complex dataset creates more distinct memorization patterns.

BibTeX

@article{pollock2024free,
title={Free Record-Level Privacy Risk Evaluation Through Artifact-Based Methods},
author={Pollock, Joseph and Shilov, Igor and Dodd, Euodia and de Montjoye, Yves-Alexandre},
journal={arXiv preprint arXiv:2411.05743},
year={2024}
}