Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

CFinTech/SparseSSM

Repository files navigation


SparseSSM

Efficient Selective Structured State Space Models Can Be Pruned in One-Shot

State-space language models such as Mamba match Transformer quality while permitting linear complexity inference, yet still comprise billions of parameters that hinder deployment. Existing one-shot pruning methods are tailored to attention blocks and fail to account for the time-shared and discretized state-transition matrix at the heart of the selective state-space module (SSM). In this paper, we introduce SparseSSM, the first training-free pruning framework that extends the classic optimal brain surgeon (OBS) framework to state space architectures. Our layer-wise algorithm (i) derives an approximate second-order saliency score that aggregates Hessian-trace information across time steps, (ii) incorporates a component sensitivity analysis to guide feed-forward network (FFN) pruning, which also sheds light on where redundancy resides in mamba architecture, (iii) can be easily extended to semi-structured and structured sparsity. Empirically, we prune 50% of SSM weights without fine-tuning and observe no zero-shot accuracy loss, achieving the current state-of-the-art pruning algorithm for Mamba-based LLMs.

πŸš€Quick Start

1. Install environment

git clone https://github.com/CFinTech/SparseSSM
cd SparseSSM
pip install -r requirements.txt

2. Download dataset

The data for calibrations can be downloaded here.

3. Execute

To prune the SSM module, you can run the following command:

CUDA_VISIBLE_DEVICES=${your_gpu_id} python main.py \
 path/to/your/model wikitext2 \
 --experiment_name your_experiment_name\
 --method "sparsessm_dev" \
 --save path/to/pruned_model \
 --sparsity 0.5 \
 --nsamples 64 \
 --minlayer 0 \
 --maxlayer 100 \
 --prune_A True \
 --do_prune \
 --eval_zero_shot \
 --log_wandb \

πŸ–ΌοΈ Method Overview

image-20250925192926282

Illustration of SparseSSM. The first row depicts the evolution of the diagonal parameter matrix $A_{log}$ within the SSM module in Mamba, together with a schematic of the forward-propagation process. In the second row, the left panel shows the procedure for obtaining a mask from the Hessian estimate at a single time step, while the right panel presents our weighted strategy for merging the masks across all time steps, darker background indicates larger weights.

πŸ“Š Comparison of Experimental Results

Performance analysis for one-shot unstructured pruning of SSM modules in Mamba models at 50ドル%$ sparsity.

image-20250925192141105

πŸ™ Acknowledgements

Citation

If you find this work useful for your research, please consider citing our paper:

@article{tuo2025sparsessm,
 title={SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot},
 author={Kaiwen Tuo and Huan Wang},
 journal={arXiv preprint arXiv:2506.09613},
 year={2025},
}

About

[arxiv 2025] SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /