Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

License

Notifications You must be signed in to change notification settings

serp-ai/Parameter-Efficient-MoE

Repository files navigation

This fork introduces mistral support (aka sparsetral)

Original README


Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

News

Introduction

We present Parameter-Efficient Sparsity Crafting to help dense models learn knowledge from different fields (including code and math). This appraoch perfrom instruction tuning and utilize MoE structure in an efficient way.

Parameter-Efficient Sparsity Crafting utilizes parameter efficient techiniques including QLoRA and Adapter to perfrom Efficient Sparse Upcycling.

The repo supports the training of dense models (LLaMA 2, Yi, Qwen1.5, etc.).

Model Lists

Camelidae Series Download
Camelidae-8x7B πŸ€— HuggingFace
Camelidae-8x13B πŸ€— HuggingFace
Camelidae-8x34B πŸ€— HuggingFace
Camelidae-8x34B-pro πŸ€— Coming Soon
Qwen2idae Series Download
Qwen2idae-16x14B-v1.0 πŸ€— HuggingFace
Qwen2idae-16x7B-v1.0 πŸ€— Coming Soon
Qwen2idae-16x1.8B-v1.0 πŸ€— Coming Soon

Performance

Model Activated Params MMLU (5shot) GSM8k (5shot) MATH (4shot) HumanEval (0shot) MBPP (4shot) HellaSwag (10shot)
GPT3.5 - 70.0% 57.1% 34.1% 48.1% - 85.5%
LLaMA2-70B-chat 70B 63.8% 59.3% 10.4% 32.3% 35.6% 84.8%
Camelidae-8x34B-pro 35B 75.7% 79.4% 24.0% 48.8% 43.2% 85.2%
Camelidae-8x34B 35B 75.6% 78.3% 22.6% 43.9% 41.4% 85.3%
SUSChat-34B 34B 76.4% 72.3% 22.0% 11.6% 40.2% 83.9%
Yi-34B-chat 34B 74.8% 67.6% 17.3% 20.1% 41.0% 83.9%
Qwen2idae-16x14B-v1.0 15B 66.7% 77.8% 29.9% 62.8% 48.6% 82.3%
Mixtral-8x7B-instruct 14B 68.7% 71.7% 22.1% 25.6% 40.6% 86.5%
Camelidae-8x13B 13B 54.4% 52.6% 9.8% 30.6% 30.4% 82.5%
LLaMA2-13B-chat 13B 53.9% 37.1% 5.2% 18.9% 27.2% 81.9%
Camelidae-8x7B 7B 48.3% 44.0% 5.8% 18.3% 23.4% 79.2%
LLaMA2-7B-chat 7B 47.2% 26.3% 3.9% 12.2% 17.6% 78.6%

We bold the top3 scores separately for all models.

Usage

Camelidae

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("hywu/Camelidae-8x34B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x34B", device_map="auto", trust_remote_code=True).eval()
inputs = tokenizer('### Human:\nHow are you?\n### Assistant:\n', return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Qwen2idae

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("hywu/Qwen2idae-16x14B-v1.0", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("hywu/Qwen2idae-16x14B-v1.0", device_map="auto", trust_remote_code=True).eval()
inputs = tokenizer('<|im_start|>user\nHow are you?<|im_end|>\n<|im_start|>assistant\n', return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Citation

@article{wu2024parameter,
 title={Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks},
 author={Wu, Haoyuan and Zheng, Haisheng and He, Zhuolun and Yu, Bei},
 journal={arXiv preprint arXiv:2401.02731},
 year={2024}
}

License

The source code in this repo is licensed under the Apache 2.0 License. Camelidae and Qwen2idae models are developed for academic research and free commercial use, all usage must adhere to the license from facebookresearch, 01-ai and Qwen1.5.

About

Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 93.0%
  • Jupyter Notebook 6.3%
  • Shell 0.7%

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /