Efficient Representative And Surgical Unlearning Selection
Universal Machine Unlearning via Coreset Selection
Python 3.9+ PyTorch 2.0+ License: MIT Tests Models Strategies
Remove "Harry Potter" from GPT-2 in 5 minutes. No installation needed.
| Method | Time | Accuracy Loss | MIA AUC |
|---|---|---|---|
| Full Retrain | 24 hours | 0% | 0.51 |
| Random Deletion | 2 hours | -15% | 0.73 |
| Erasus (Influence) | 30 min | -2% | 0.52 |
90% faster than retraining, ~2% accuracy loss. MIA AUC β 0.5 = certified forgetting.
Erasus is a research-grade Python framework for Machine Unlearning across all major foundation model types. It surgically removes specific data, concepts, or behaviors from trained models β without the computational cost of full retraining.
It supports Vision-Language Models, Large Language Models, Diffusion Models, Audio Models, and Video Models through a unified API backed by 27 unlearning strategies, 19 coreset selectors, 7 loss functions, and a comprehensive evaluation suite with 15+ metrics.
Erasus operates in a three-stage pipeline:
ββββββββββββββββββββββββ ββββββββββββββββββββββββ ββββββββββββββββββββββββ
β 1. CORESET SELECTION ββββββΆβ 2. TARGETED ββββββΆβ 3. EVALUATION & β
β β β UNLEARNING β β CERTIFICATION β
β Pick the minimal β β β β β
β set of samples that β β Apply gradient ascent,β β MIA, accuracy, β
β define forgetting β β Fisher, SCRUB, LoRA, β β perplexity, FID, β
β "support vectors" β β or 16+ other methods β β certified removal β
ββββββββββββββββββββββββ ββββββββββββββββββββββββ ββββββββββββββββββββββββ
Key Innovation: Geometry-aware coreset selection identifies the "support vectors of forgetting" β proving that unlearning k% of the most influential samples approximates unlearning 100% with bounded utility loss.
| Feature | Description |
|---|---|
| π― Coreset-Driven Forgetting | 24 coreset selectors (influence functions, CRAIG, herding, k-center, EL2N, TracIn, Data Shapley, Active Learning) reduce compute by up to 90% |
| π§© Ensemble Unlearning | Combine strategies sequentially or via weight averaging for robust forgetting |
| π·π Multimodal Decoupling | Unlearn image-text associations without breaking visual or textual generalization |
| π Federated Unlearning | Decentralized unlearning across clients with FedAvg aggregation and client-side forgetting |
| π‘οΈ Utility Preservation | Retain-Anchor loss + Fisher regularization constrain model drift on safe data |
| π Certified Removal | Formal (Ξ΅, Ξ΄)-removal verification with PAC-style guarantees |
| π Integrated Evaluation | MIA, confidence, feature distance, perplexity, FID, activation analysis, backdoor detection, 25+ metrics |
| π Visualization Suite | Loss landscapes, embedding plots, gradient flow, interactive Plotly dashboards, HTML reports |
| π Model Agnostic | Works with any PyTorch model + HuggingFace Transformers (BERT, LLaMA, T5, CLIP, DALL-E) |
| π₯οΈ CLI + Python API | erasus unlearn, erasus benchmark, erasus visualize, or full Python API |
| π§ͺ Experiment Tracking | Built-in W&B, MLflow, local JSON tracking + HPO with Optuna |
| π Theoretical Bounds | PAC-learning utility bounds, influence bounds, certified unlearning radius |
| Modality | Models | Unlearner |
|---|---|---|
| Vision-Language | CLIP, LLaVA, BLIP-2, Flamingo, VisionTransformer | VLMUnlearner |
| Language | LLaMA, Mistral, GPT-2/J, BERT, T5 | LLMUnlearner |
| Diffusion | Stable Diffusion 1.x/2.x/XL, DALL-E, Imagen | DiffusionUnlearner |
| Audio | Whisper, CLAP, Wav2Vec | AudioUnlearner |
| Video | VideoMAE, VideoCLIP | VideoUnlearner |
| Federated | Any Architecture | FederatedUnlearner |
| Any | Auto-detect | MultimodalUnlearner |
# From PyPI pip install erasus pip install erasus[full] # with diffusers, datasets, wandb, etc. pip install erasus[hub] # Hugging Face Hub push/pull # From source (development) git clone https://github.com/OnePunchMonk/erasus.git cd erasus pip install -e . # With all optional dependencies pip install -e ".[full]" # Hugging Face Hub (push/pull unlearned models) pip install -e ".[hub]" # Interactive dashboards (Streamlit / Gradio) pip install -e ".[dashboard]" # Development pip install -e ".[dev]"
- Demo (Colab): Remove Harry Potter from GPT-2 β 5 min, zero setup
- Notebooks:
notebooks/01_introduction.ipynb,notebooks/02_coreset_analysis.ipynb,examples/notebooks/interactive_demo.ipynb - Streamlit:
streamlit run apps/dashboard_streamlit.py - Gradio:
python apps/dashboard_gradio.py(requirespip install gradio)
bash scripts/setup_env.sh # CPU bash scripts/setup_env.sh --gpu # CUDA 12.1
docker compose -f docker/docker-compose.yml up test # Run tests docker compose -f docker/docker-compose.yml run dev # Dev shell docker compose -f docker/docker-compose.yml up benchmark # GPU benchmarks
from erasus.unlearners import ErasusUnlearner # 1. Load your model model = ... # Any PyTorch model # 2. Create unlearner unlearner = ErasusUnlearner( model=model, strategy="gradient_ascent", # 27 strategies available selector="influence", # 19 selectors available device="cuda", ) # 3. Unlearn result = unlearner.fit( forget_data=forget_loader, # Data to remove retain_data=retain_loader, # Data to preserve prune_ratio=0.1, # Use top 10% coreset epochs=5, ) # 4. Evaluate metrics = unlearner.evaluate( forget_data=forget_loader, retain_data=retain_loader, ) print(f"MIA AUC: {metrics['mia_auc']:.4f}") # Should β 0.5
from erasus.unlearners import VLMUnlearner, LLMUnlearner, DiffusionUnlearner # CLIP: Remove NSFW concepts vlm = VLMUnlearner(model=clip_model, strategy="modality_decoupling") vlm.fit(forget_data=nsfw_loader, retain_data=safe_loader) # LLaMA: Remove hazardous knowledge llm = LLMUnlearner(model=llama_model, strategy="gradient_ascent") llm.fit(forget_data=harmful_loader, retain_data=benign_loader) # Stable Diffusion: Remove artist styles diff = DiffusionUnlearner(model=sd_model, strategy="concept_erasure") diff.fit(forget_data=artist_loader, retain_data=general_loader)
from erasus.unlearners import MultimodalUnlearner # Automatically picks the right unlearner unlearner = MultimodalUnlearner.from_model(your_model)
# Run unlearning erasus unlearn --config configs/default.yaml # Evaluate results erasus evaluate --config configs/default.yaml --checkpoint model.pt # Run benchmarks erasus benchmark --strategies gradient_ascent,scrub --selectors random,influence # Generate visualizations erasus visualize --type embeddings --method tsne --output embeddings.png erasus visualize --type comparison --output comparison.png erasus visualize --type report --output report.html
| Category | Strategies |
|---|---|
| Gradient Methods | Gradient Ascent, SCRUB (CVPR 2024), Fisher Forgetting, Negative Gradient, Modality Decoupling, Saliency Unlearning |
| Parameter Methods | LoRA Unlearning, Sparse-Aware, Mask-Based, Neuron Pruning, Layer Freezing |
| Data Methods | Amnesiac ML, SISA, Certified Removal, Knowledge Distillation |
| LLM-Specific | SSD (NeurIPS 2024), Token Masking, Embedding Alignment, Causal Tracing, Attention Surgery |
| Diffusion-Specific | Concept Erasure (ICCV 2023), Noise Injection, U-Net Surgery, Timestep Masking, Safe Latents |
| VLM-Specific | Contrastive Unlearning, Cross-Modal Decoupling, Attention Unlearning, Vision-Text Split |
| Ensemble | Sequential / Averaged multi-strategy combination |
| Category | Selectors |
|---|---|
| Gradient-Based | Influence Functions, TracIn, Gradient Norm, GradMatch/CRAIG, EL2N, Representer, Forgetting Score |
| Geometry-Based | k-Center, Herding, GLISTER, Submodular, k-Means++, Farthest First |
| Learning-Based | Forgetting Events, Data Shapley, Valuation Network, Active Learning, Loss Accumulation |
| Ensemble | Voting Selector, Auto-Selector, Weighted Fusion |
from erasus.metrics import MetricSuite suite = MetricSuite(["accuracy", "mia", "perplexity"]) results = suite.run(model, forget_loader, retain_loader)
| Category | Metrics |
|---|---|
| Forgetting | MIA (+ LiRA, LOSS variants), Confidence, Feature Distance, Activation Analysis, Backdoor ASR, Extraction Attack |
| Utility | Accuracy, Perplexity, Retrieval (R@1/5/10), FID, BLEU, ROUGE, CLIP Score, Inception Score |
| Efficiency | Time Complexity, Memory Usage, Speedup Ratio, FLOPs Estimation |
| Privacy | Differential Privacy (Ξ΅, Ξ΄), Privacy Audit |
from erasus.visualization import ( EmbeddingVisualizer, LossLandscapeVisualizer, GradientVisualizer, ReportGenerator, ) from erasus.visualization.attention import AttentionVisualizer from erasus.visualization.comparisons import ComparisonVisualizer # t-SNE / PCA embeddings viz = EmbeddingVisualizer(model) viz.plot(data_loader, method="tsne") # Loss landscape landscape = LossLandscapeVisualizer(model) landscape.plot_2d_contour(data_loader) # Attention heatmaps (before vs. after) attn_viz = AttentionVisualizer(model_after) attn_viz.plot_attention_comparison(inputs, model_before) # Before/after comparisons comp = ComparisonVisualizer() comp.plot_prediction_shift(model_before, model_after, forget_loader) comp.plot_metric_comparison(metrics_before, metrics_after) # HTML report report = ReportGenerator("Unlearning Report") report.add_metrics(metrics) report.save("report.html")
from erasus.certification import CertifiedRemovalVerifier, UnlearningVerifier # Formal (Ξ΅, Ξ΄)-removal verification verifier = CertifiedRemovalVerifier(epsilon=1.0, delta=1e-5) result = verifier.verify(unlearned_model, retrained_model, n_total=10000, n_forget=500) print(f"Certified: {result['certified']}") # Statistical verification stat_verifier = UnlearningVerifier(significance=0.05) tests = stat_verifier.verify_all(model, forget_loader, retain_loader)
from erasus.certification.bounds import TheoreticalBounds # PAC-learning utility bound bounds = TheoreticalBounds.pac_utility_bound( n_total=50000, n_forget=500, n_retain=49500, delta=0.05, model=model, ) print(f"Utility drop bound: {bounds['pac_utility_drop_bound']:.4f}") # Certified unlearning radius radius = TheoreticalBounds.unlearning_radius( epsilon=1.0, delta=1e-5, n_forget=500, ) print(f"Certified radius: {radius['certified_radius']:.4f}")
| Loss | Description |
|---|---|
| Retain Anchor | Cross-entropy on retain data to preserve utility |
| Contrastive | CLIP-style contrastive loss for VLM alignment |
| KL Divergence | Distribution matching between models |
| MMD | Maximum Mean Discrepancy for distribution comparison |
| Fisher Regularization | Fisher information-weighted parameter penalty |
| Adversarial | GAN-style loss for indistinguishable forget/retain outputs |
| Triplet | Push forget embeddings away from retain-set anchors |
| L2 Regularization | Simple weight-drift penalty |
from erasus.experiments import ExperimentTracker, HyperparameterSearch, AblationStudy # Supports: "local", "wandb", "mlflow" with ExperimentTracker("clip_unlearning", backend="wandb") as tracker: tracker.log_config({"strategy": "gradient_ascent", "lr": 1e-4}) result = unlearner.fit(...) tracker.log_metrics({"mia_auc": 0.52, "accuracy": 0.94}) tracker.log_model(model) # Hyperparameter search (Optuna or random fallback) search = HyperparameterSearch( objective_fn=my_objective, param_space={"lr": {"type": "float", "low": 1e-5, "high": 1e-2, "log": True}}, n_trials=50, ) best = search.run() # Ablation studies ablation = AblationStudy(base_config={...}, run_fn=run_trial) ablation.run_full_ablation({"lr": [1e-3, 1e-4, 1e-5], "strategy": ["ga", "scrub"]}) print(ablation.summary())
erasus/
βββ core/ # Base classes, registry, config, types
βββ unlearners/ # High-level API (7 modality-specific unlearners)
βββ strategies/ # 27 unlearning algorithms (gradient, parameter, data, LLM, diffusion, VLM, ensemble)
βββ selectors/ # 19 coreset selection methods (gradient, geometry, learning, ensemble)
βββ metrics/ # 15+ evaluation metrics (forgetting, utility, efficiency, privacy)
βββ losses/ # 8 loss functions (retain-anchor, Fisher, adversarial, triplet, KL, MMD, L2)
βββ visualization/ # Embeddings, loss surfaces, gradients, attention heatmaps, comparisons, reports
βββ data/ # Dataset loaders (TOFU, WMDP, COCO, I2P, CC), preprocessing, partitioning
βββ models/ # 10 model wrappers (VLM, LLM, diffusion, audio, video)
βββ privacy/ # DP mechanisms, privacy accountant, certificates
βββ certification/ # Certified removal, statistical verification, theoretical bounds
βββ experiments/ # W&B / MLflow / local tracking, HPO, ablation studies
βββ cli/ # Command-line interface (unlearn, evaluate, benchmark, visualize)
βββ utils/ # Checkpointing, distributed, helpers, logging, callbacks, early stopping
Run standardized benchmarks:
# TOFU Benchmark (LLM unlearning) python benchmarks/tofu/run.py --strategies gradient_ascent,scrub --epochs 5 # Coreset comparison (knowledge_distillation Γγ°γ€ all selectors) python benchmarks/tofu/run_coreset_comparison.py # MUSE Benchmark (all strategies, leaderboard) python benchmarks/muse/run_all_strategies.py # WMDP Benchmark (hazardous knowledge, all strategies) python benchmarks/wmdp/run_all_strategies.py --subsets bio,cyber # Full suite bash scripts/run_benchmarks.sh
| Example | Description |
|---|---|
| CLIP Coreset Comparison | Compare random vs. gradient_norm selectors |
| LLaVA Unlearning | VLM unlearning with gradient ascent |
| LLaMA Concept Removal | Remove concepts from LLaMA |
| GPT-2 Strategy Comparison | Compare gradient_ascent vs. negative_gradient |
| LoRA Efficient Unlearning | Parameter-efficient unlearning |
| SD NSFW Removal | Remove NSFW concepts (Notebook) |
| SD Artist Removal | Remove artist styles |
| TOFU Benchmark | End-to-end benchmark (Leaderboard) |
| Coreset Comparison | knowledge_distillation Γγ°γ€ all selectors |
| MUSE Leaderboard | All strategies on MUSE-style data |
| WMDP Leaderboard | All strategies on WMDP hazardous knowledge |
| CLIP Object Removal | Remove visual concepts from VLM (MiniCLIP demo) |
| Code Copyright Removal | Remove proprietary code from LLM (MiniCodeGPT demo) |
340 tests passed β
| 0 failed | 54s
python -m pytest tests/ -v --tb=short
| Test Suite | Status |
|---|---|
| Integration (pipelines) | β |
| End-to-end | β |
| Unit (selectors) | β |
| Unit (strategies) | β |
| Unit (metrics) | β |
| Core / imports / components | β |
Erasus integrates and builds upon these key works:
| Method | Paper | Venue |
|---|---|---|
| SCRUB | Kurmanji et al. | CVPR 2024 |
| Selective Synaptic Dampening | Foster et al. | NeurIPS 2024 |
| Concept Erasure (ESD) | Gandikota et al. | ICCV 2023 |
| Gradient Ascent | Golatkar et al. | NeurIPS 2020 |
| Fisher Forgetting | Golatkar et al. | NeurIPS 2020 |
| CRAIG | Mirzasoleiman et al. | NeurIPS 2020 |
| GLISTER | Killamsetty et al. | ICLR 2021 |
| Influence Functions | Koh & Liang | ICML 2017 |
| TracIn | Pruthi et al. | NeurIPS 2020 |
| Data Shapley | Ghorbani & Zou | ICML 2019 |
| Forgetting Events | Toneva et al. | ICLR 2019 |
| EL2N | Paul et al. | ICML 2021 |
| Amnesiac ML | Graves et al. | S&P 2021 |
- Core framework (base classes, registry, config)
- 10 model architectures
- 27 unlearning strategies (gradient, parameter, data, LLM, diffusion, VLM, ensemble)
- 19 coreset selectors
- 15+ evaluation metrics (forgetting, utility, efficiency, privacy)
- 8 loss functions (Fisher, adversarial, triplet, L2, retain-anchor, KL, MMD, contrastive)
- Visualization suite (embeddings, landscapes, gradients, attention, comparisons, reports)
- CLI (
erasus unlearn,erasus evaluate,erasus benchmark,erasus visualize) - Certification & privacy modules + theoretical bounds (PAC, influence, certified radius)
- Experiment tracking (W&B, MLflow, local) + HPO + ablation studies
- Benchmark runners (TOFU, WMDP)
- Callbacks & early stopping
- 340+ passing tests
- Additional model architectures (Flamingo, T5, DALL-E, Wav2Vec)
- HuggingFace Hub integration
- Interactive Gradio/Streamlit dashboard
- Tutorial notebooks
- PyPI release
See project_ideas.md for extension ideas: more SOTA algorithms, benchmarks, integrations, and research directions. Paper reproductions live in papers/reproductions/ (e.g. SCRUB, SSD, Concept Erasure, Fisher Forgetting, SISA, Amnesiac).
Contributions are welcome! Whether it's new unlearning strategies, coreset selectors, model support, or documentation.
# Setup development environment git clone https://github.com/OnePunchMonk/erasus.git cd erasus pip install -e ".[dev]" python -m pytest tests/ -v
MIT License β see LICENSE for details.
@software{erasus2026, title={Erasus: Universal Machine Unlearning via Coreset Selection}, author={Aggarwal, Avaya}, year={2026}, url={https://github.com/OnePunchMonk/erasus} }
Built with β€οΈ for the machine unlearning research community