Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Surgical Machine Unlearning for LLMs, VLMs, and Diffusion models. Erasus uses coreset selection to enable efficient data removal with 27+ strategies and 19 selectors. Supports certified removal, multimodal decoupling, and comprehensive evaluation with 90% less compute than retraining.

Notifications You must be signed in to change notification settings

OnePunchMonk/erasus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

21 Commits

Repository files navigation

πŸ‘» Erasus

Efficient Representative And Surgical Unlearning Selection
Universal Machine Unlearning via Coreset Selection

Python 3.9+ PyTorch 2.0+ License: MIT Tests Models Strategies


πŸš€ Try it NOW (No Setup Required)

Open In Colab

Remove "Harry Potter" from GPT-2 in 5 minutes. No installation needed.


Why Erasus?

Method Time Accuracy Loss MIA AUC
Full Retrain 24 hours 0% 0.51
Random Deletion 2 hours -15% 0.73
Erasus (Influence) 30 min -2% 0.52

90% faster than retraining, ~2% accuracy loss. MIA AUC β‰ˆ 0.5 = certified forgetting.


Erasus is a research-grade Python framework for Machine Unlearning across all major foundation model types. It surgically removes specific data, concepts, or behaviors from trained models β€” without the computational cost of full retraining.

It supports Vision-Language Models, Large Language Models, Diffusion Models, Audio Models, and Video Models through a unified API backed by 27 unlearning strategies, 19 coreset selectors, 7 loss functions, and a comprehensive evaluation suite with 15+ metrics.


🧠 How It Works

Erasus operates in a three-stage pipeline:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1. CORESET SELECTION │────▢│ 2. TARGETED │────▢│ 3. EVALUATION & β”‚
β”‚ β”‚ β”‚ UNLEARNING β”‚ β”‚ CERTIFICATION β”‚
β”‚ Pick the minimal β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ set of samples that β”‚ β”‚ Apply gradient ascent,β”‚ β”‚ MIA, accuracy, β”‚
β”‚ define forgetting β”‚ β”‚ Fisher, SCRUB, LoRA, β”‚ β”‚ perplexity, FID, β”‚
β”‚ "support vectors" β”‚ β”‚ or 16+ other methods β”‚ β”‚ certified removal β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Innovation: Geometry-aware coreset selection identifies the "support vectors of forgetting" β€” proving that unlearning k% of the most influential samples approximates unlearning 100% with bounded utility loss.


⚑ Key Features

Feature Description
🎯 Coreset-Driven Forgetting 24 coreset selectors (influence functions, CRAIG, herding, k-center, EL2N, TracIn, Data Shapley, Active Learning) reduce compute by up to 90%
🧩 Ensemble Unlearning Combine strategies sequentially or via weight averaging for robust forgetting
πŸ“·πŸ“ Multimodal Decoupling Unlearn image-text associations without breaking visual or textual generalization
🌐 Federated Unlearning Decentralized unlearning across clients with FedAvg aggregation and client-side forgetting
πŸ›‘οΈ Utility Preservation Retain-Anchor loss + Fisher regularization constrain model drift on safe data
πŸ” Certified Removal Formal (Ξ΅, Ξ΄)-removal verification with PAC-style guarantees
πŸ“Š Integrated Evaluation MIA, confidence, feature distance, perplexity, FID, activation analysis, backdoor detection, 25+ metrics
πŸ“ˆ Visualization Suite Loss landscapes, embedding plots, gradient flow, interactive Plotly dashboards, HTML reports
πŸ”Œ Model Agnostic Works with any PyTorch model + HuggingFace Transformers (BERT, LLaMA, T5, CLIP, DALL-E)
πŸ–₯️ CLI + Python API erasus unlearn, erasus benchmark, erasus visualize, or full Python API
πŸ§ͺ Experiment Tracking Built-in W&B, MLflow, local JSON tracking + HPO with Optuna
πŸ“ Theoretical Bounds PAC-learning utility bounds, influence bounds, certified unlearning radius

πŸ—οΈ Supported Models

Modality Models Unlearner
Vision-Language CLIP, LLaVA, BLIP-2, Flamingo, VisionTransformer VLMUnlearner
Language LLaMA, Mistral, GPT-2/J, BERT, T5 LLMUnlearner
Diffusion Stable Diffusion 1.x/2.x/XL, DALL-E, Imagen DiffusionUnlearner
Audio Whisper, CLAP, Wav2Vec AudioUnlearner
Video VideoMAE, VideoCLIP VideoUnlearner
Federated Any Architecture FederatedUnlearner
Any Auto-detect MultimodalUnlearner

πŸ“¦ Installation

# From PyPI
pip install erasus
pip install erasus[full] # with diffusers, datasets, wandb, etc.
pip install erasus[hub] # Hugging Face Hub push/pull
# From source (development)
git clone https://github.com/OnePunchMonk/erasus.git
cd erasus
pip install -e .
# With all optional dependencies
pip install -e ".[full]"
# Hugging Face Hub (push/pull unlearned models)
pip install -e ".[hub]"
# Interactive dashboards (Streamlit / Gradio)
pip install -e ".[dashboard]"
# Development
pip install -e ".[dev]"

Notebooks & dashboards

  • Demo (Colab): Remove Harry Potter from GPT-2 β€” 5 min, zero setup
  • Notebooks: notebooks/01_introduction.ipynb, notebooks/02_coreset_analysis.ipynb, examples/notebooks/interactive_demo.ipynb
  • Streamlit: streamlit run apps/dashboard_streamlit.py
  • Gradio: python apps/dashboard_gradio.py (requires pip install gradio)

Quick Setup Script

bash scripts/setup_env.sh # CPU
bash scripts/setup_env.sh --gpu # CUDA 12.1

Docker

docker compose -f docker/docker-compose.yml up test # Run tests
docker compose -f docker/docker-compose.yml run dev # Dev shell
docker compose -f docker/docker-compose.yml up benchmark # GPU benchmarks

πŸš€ Quick Start

Python API

from erasus.unlearners import ErasusUnlearner
# 1. Load your model
model = ... # Any PyTorch model
# 2. Create unlearner
unlearner = ErasusUnlearner(
 model=model,
 strategy="gradient_ascent", # 27 strategies available
 selector="influence", # 19 selectors available
 device="cuda",
)
# 3. Unlearn
result = unlearner.fit(
 forget_data=forget_loader, # Data to remove
 retain_data=retain_loader, # Data to preserve
 prune_ratio=0.1, # Use top 10% coreset
 epochs=5,
)
# 4. Evaluate
metrics = unlearner.evaluate(
 forget_data=forget_loader,
 retain_data=retain_loader,
)
print(f"MIA AUC: {metrics['mia_auc']:.4f}") # Should β†’ 0.5

Modality-Specific Unlearners

from erasus.unlearners import VLMUnlearner, LLMUnlearner, DiffusionUnlearner
# CLIP: Remove NSFW concepts
vlm = VLMUnlearner(model=clip_model, strategy="modality_decoupling")
vlm.fit(forget_data=nsfw_loader, retain_data=safe_loader)
# LLaMA: Remove hazardous knowledge
llm = LLMUnlearner(model=llama_model, strategy="gradient_ascent")
llm.fit(forget_data=harmful_loader, retain_data=benign_loader)
# Stable Diffusion: Remove artist styles
diff = DiffusionUnlearner(model=sd_model, strategy="concept_erasure")
diff.fit(forget_data=artist_loader, retain_data=general_loader)

Auto-Detect Model Type

from erasus.unlearners import MultimodalUnlearner
# Automatically picks the right unlearner
unlearner = MultimodalUnlearner.from_model(your_model)

CLI

# Run unlearning
erasus unlearn --config configs/default.yaml
# Evaluate results
erasus evaluate --config configs/default.yaml --checkpoint model.pt
# Run benchmarks
erasus benchmark --strategies gradient_ascent,scrub --selectors random,influence
# Generate visualizations
erasus visualize --type embeddings --method tsne --output embeddings.png
erasus visualize --type comparison --output comparison.png
erasus visualize --type report --output report.html

πŸ”§ Strategies & Selectors

Unlearning Strategies (30)

Category Strategies
Gradient Methods Gradient Ascent, SCRUB (CVPR 2024), Fisher Forgetting, Negative Gradient, Modality Decoupling, Saliency Unlearning
Parameter Methods LoRA Unlearning, Sparse-Aware, Mask-Based, Neuron Pruning, Layer Freezing
Data Methods Amnesiac ML, SISA, Certified Removal, Knowledge Distillation
LLM-Specific SSD (NeurIPS 2024), Token Masking, Embedding Alignment, Causal Tracing, Attention Surgery
Diffusion-Specific Concept Erasure (ICCV 2023), Noise Injection, U-Net Surgery, Timestep Masking, Safe Latents
VLM-Specific Contrastive Unlearning, Cross-Modal Decoupling, Attention Unlearning, Vision-Text Split
Ensemble Sequential / Averaged multi-strategy combination

Coreset Selectors (24)

Category Selectors
Gradient-Based Influence Functions, TracIn, Gradient Norm, GradMatch/CRAIG, EL2N, Representer, Forgetting Score
Geometry-Based k-Center, Herding, GLISTER, Submodular, k-Means++, Farthest First
Learning-Based Forgetting Events, Data Shapley, Valuation Network, Active Learning, Loss Accumulation
Ensemble Voting Selector, Auto-Selector, Weighted Fusion

πŸ“Š Evaluation & Metrics

from erasus.metrics import MetricSuite
suite = MetricSuite(["accuracy", "mia", "perplexity"])
results = suite.run(model, forget_loader, retain_loader)
Category Metrics
Forgetting MIA (+ LiRA, LOSS variants), Confidence, Feature Distance, Activation Analysis, Backdoor ASR, Extraction Attack
Utility Accuracy, Perplexity, Retrieval (R@1/5/10), FID, BLEU, ROUGE, CLIP Score, Inception Score
Efficiency Time Complexity, Memory Usage, Speedup Ratio, FLOPs Estimation
Privacy Differential Privacy (Ξ΅, Ξ΄), Privacy Audit

πŸ“ˆ Visualization

from erasus.visualization import (
 EmbeddingVisualizer,
 LossLandscapeVisualizer,
 GradientVisualizer,
 ReportGenerator,
)
from erasus.visualization.attention import AttentionVisualizer
from erasus.visualization.comparisons import ComparisonVisualizer
# t-SNE / PCA embeddings
viz = EmbeddingVisualizer(model)
viz.plot(data_loader, method="tsne")
# Loss landscape
landscape = LossLandscapeVisualizer(model)
landscape.plot_2d_contour(data_loader)
# Attention heatmaps (before vs. after)
attn_viz = AttentionVisualizer(model_after)
attn_viz.plot_attention_comparison(inputs, model_before)
# Before/after comparisons
comp = ComparisonVisualizer()
comp.plot_prediction_shift(model_before, model_after, forget_loader)
comp.plot_metric_comparison(metrics_before, metrics_after)
# HTML report
report = ReportGenerator("Unlearning Report")
report.add_metrics(metrics)
report.save("report.html")

πŸ” Certification & Privacy

from erasus.certification import CertifiedRemovalVerifier, UnlearningVerifier
# Formal (Ξ΅, Ξ΄)-removal verification
verifier = CertifiedRemovalVerifier(epsilon=1.0, delta=1e-5)
result = verifier.verify(unlearned_model, retrained_model, n_total=10000, n_forget=500)
print(f"Certified: {result['certified']}")
# Statistical verification
stat_verifier = UnlearningVerifier(significance=0.05)
tests = stat_verifier.verify_all(model, forget_loader, retain_loader)

Theoretical Bounds

from erasus.certification.bounds import TheoreticalBounds
# PAC-learning utility bound
bounds = TheoreticalBounds.pac_utility_bound(
 n_total=50000, n_forget=500, n_retain=49500, delta=0.05, model=model,
)
print(f"Utility drop bound: {bounds['pac_utility_drop_bound']:.4f}")
# Certified unlearning radius
radius = TheoreticalBounds.unlearning_radius(
 epsilon=1.0, delta=1e-5, n_forget=500,
)
print(f"Certified radius: {radius['certified_radius']:.4f}")

πŸ“‰ Loss Functions

Loss Description
Retain Anchor Cross-entropy on retain data to preserve utility
Contrastive CLIP-style contrastive loss for VLM alignment
KL Divergence Distribution matching between models
MMD Maximum Mean Discrepancy for distribution comparison
Fisher Regularization Fisher information-weighted parameter penalty
Adversarial GAN-style loss for indistinguishable forget/retain outputs
Triplet Push forget embeddings away from retain-set anchors
L2 Regularization Simple weight-drift penalty

πŸ§ͺ Experiment Tracking

from erasus.experiments import ExperimentTracker, HyperparameterSearch, AblationStudy
# Supports: "local", "wandb", "mlflow"
with ExperimentTracker("clip_unlearning", backend="wandb") as tracker:
 tracker.log_config({"strategy": "gradient_ascent", "lr": 1e-4})
 result = unlearner.fit(...)
 tracker.log_metrics({"mia_auc": 0.52, "accuracy": 0.94})
 tracker.log_model(model)
# Hyperparameter search (Optuna or random fallback)
search = HyperparameterSearch(
 objective_fn=my_objective,
 param_space={"lr": {"type": "float", "low": 1e-5, "high": 1e-2, "log": True}},
 n_trials=50,
)
best = search.run()
# Ablation studies
ablation = AblationStudy(base_config={...}, run_fn=run_trial)
ablation.run_full_ablation({"lr": [1e-3, 1e-4, 1e-5], "strategy": ["ga", "scrub"]})
print(ablation.summary())

πŸ“ Project Structure

erasus/
β”œβ”€β”€ core/ # Base classes, registry, config, types
β”œβ”€β”€ unlearners/ # High-level API (7 modality-specific unlearners)
β”œβ”€β”€ strategies/ # 27 unlearning algorithms (gradient, parameter, data, LLM, diffusion, VLM, ensemble)
β”œβ”€β”€ selectors/ # 19 coreset selection methods (gradient, geometry, learning, ensemble)
β”œβ”€β”€ metrics/ # 15+ evaluation metrics (forgetting, utility, efficiency, privacy)
β”œβ”€β”€ losses/ # 8 loss functions (retain-anchor, Fisher, adversarial, triplet, KL, MMD, L2)
β”œβ”€β”€ visualization/ # Embeddings, loss surfaces, gradients, attention heatmaps, comparisons, reports
β”œβ”€β”€ data/ # Dataset loaders (TOFU, WMDP, COCO, I2P, CC), preprocessing, partitioning
β”œβ”€β”€ models/ # 10 model wrappers (VLM, LLM, diffusion, audio, video)
β”œβ”€β”€ privacy/ # DP mechanisms, privacy accountant, certificates
β”œβ”€β”€ certification/ # Certified removal, statistical verification, theoretical bounds
β”œβ”€β”€ experiments/ # W&B / MLflow / local tracking, HPO, ablation studies
β”œβ”€β”€ cli/ # Command-line interface (unlearn, evaluate, benchmark, visualize)
└── utils/ # Checkpointing, distributed, helpers, logging, callbacks, early stopping

πŸ† Benchmarks

Run standardized benchmarks:

×ば぀ all selectors) python benchmarks/tofu/run_coreset_comparison.py # MUSE Benchmark (all strategies, leaderboard) python benchmarks/muse/run_all_strategies.py # WMDP Benchmark (hazardous knowledge, all strategies) python benchmarks/wmdp/run_all_strategies.py --subsets bio,cyber # Full suite bash scripts/run_benchmarks.sh">
# TOFU Benchmark (LLM unlearning)
python benchmarks/tofu/run.py --strategies gradient_ascent,scrub --epochs 5
# Coreset comparison (knowledge_distillation ×ば぀ all selectors)
python benchmarks/tofu/run_coreset_comparison.py
# MUSE Benchmark (all strategies, leaderboard)
python benchmarks/muse/run_all_strategies.py
# WMDP Benchmark (hazardous knowledge, all strategies)
python benchmarks/wmdp/run_all_strategies.py --subsets bio,cyber
# Full suite
bash scripts/run_benchmarks.sh

πŸ§‘β€πŸ’» Examples

Example Description
CLIP Coreset Comparison Compare random vs. gradient_norm selectors
LLaVA Unlearning VLM unlearning with gradient ascent
LLaMA Concept Removal Remove concepts from LLaMA
GPT-2 Strategy Comparison Compare gradient_ascent vs. negative_gradient
LoRA Efficient Unlearning Parameter-efficient unlearning
SD NSFW Removal Remove NSFW concepts (Notebook)
SD Artist Removal Remove artist styles
TOFU Benchmark End-to-end benchmark (Leaderboard)
Coreset Comparison knowledge_distillation ×ば぀ all selectors
MUSE Leaderboard All strategies on MUSE-style data
WMDP Leaderboard All strategies on WMDP hazardous knowledge
CLIP Object Removal Remove visual concepts from VLM (MiniCLIP demo)
Code Copyright Removal Remove proprietary code from LLM (MiniCodeGPT demo)

βœ… Test Status

340 tests passed βœ… | 0 failed | 54s
python -m pytest tests/ -v --tb=short
Test Suite Status
Integration (pipelines) βœ…
End-to-end βœ…
Unit (selectors) βœ…
Unit (strategies) βœ…
Unit (metrics) βœ…
Core / imports / components βœ…

πŸ“š Research References

Erasus integrates and builds upon these key works:

Method Paper Venue
SCRUB Kurmanji et al. CVPR 2024
Selective Synaptic Dampening Foster et al. NeurIPS 2024
Concept Erasure (ESD) Gandikota et al. ICCV 2023
Gradient Ascent Golatkar et al. NeurIPS 2020
Fisher Forgetting Golatkar et al. NeurIPS 2020
CRAIG Mirzasoleiman et al. NeurIPS 2020
GLISTER Killamsetty et al. ICLR 2021
Influence Functions Koh & Liang ICML 2017
TracIn Pruthi et al. NeurIPS 2020
Data Shapley Ghorbani & Zou ICML 2019
Forgetting Events Toneva et al. ICLR 2019
EL2N Paul et al. ICML 2021
Amnesiac ML Graves et al. S&P 2021

πŸ—ΊοΈ Roadmap

  • Core framework (base classes, registry, config)
  • 10 model architectures
  • 27 unlearning strategies (gradient, parameter, data, LLM, diffusion, VLM, ensemble)
  • 19 coreset selectors
  • 15+ evaluation metrics (forgetting, utility, efficiency, privacy)
  • 8 loss functions (Fisher, adversarial, triplet, L2, retain-anchor, KL, MMD, contrastive)
  • Visualization suite (embeddings, landscapes, gradients, attention, comparisons, reports)
  • CLI (erasus unlearn, erasus evaluate, erasus benchmark, erasus visualize)
  • Certification & privacy modules + theoretical bounds (PAC, influence, certified radius)
  • Experiment tracking (W&B, MLflow, local) + HPO + ablation studies
  • Benchmark runners (TOFU, WMDP)
  • Callbacks & early stopping
  • 340+ passing tests
  • Additional model architectures (Flamingo, T5, DALL-E, Wav2Vec)
  • HuggingFace Hub integration
  • Interactive Gradio/Streamlit dashboard
  • Tutorial notebooks
  • PyPI release

πŸ’‘ Project ideas

See project_ideas.md for extension ideas: more SOTA algorithms, benchmarks, integrations, and research directions. Paper reproductions live in papers/reproductions/ (e.g. SCRUB, SSD, Concept Erasure, Fisher Forgetting, SISA, Amnesiac).

🀝 Contributing

Contributions are welcome! Whether it's new unlearning strategies, coreset selectors, model support, or documentation.

# Setup development environment
git clone https://github.com/OnePunchMonk/erasus.git
cd erasus
pip install -e ".[dev]"
python -m pytest tests/ -v

πŸ“œ License

MIT License β€” see LICENSE for details.


πŸ“– Citation

@software{erasus2026,
 title={Erasus: Universal Machine Unlearning via Coreset Selection},
 author={Aggarwal, Avaya},
 year={2026},
 url={https://github.com/OnePunchMonk/erasus}
}

Built with ❀️ for the machine unlearning research community

About

Surgical Machine Unlearning for LLMs, VLMs, and Diffusion models. Erasus uses coreset selection to enable efficient data removal with 27+ strategies and 19 selectors. Supports certified removal, multimodal decoupling, and comprehensive evaluation with 90% less compute than retraining.

Topics

Resources

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /