|
1 | | -# starcoder2 |
| 1 | +# StarCoder 2 |
| 2 | + |
| 3 | +<p align="center"><a href="https://huggingface.co/bigcode">[π€ Models]</a> | <a href="">[Paper]</a> | <a href="https://marketplace.visualstudio.com/items?itemName=HuggingFace.huggingface-vscode">[VSCode]</a> |
| 4 | +</p> |
| 5 | + |
| 6 | +StarCoder2 is a family of code generation models (3B, 7B, and 15B), trained on 600+ programming languages from [The Stack v2]() and some natural language text such as Wikipedia, Arxiv, and GitHub issues. The models use Grouped Query Attention, a context window of 16,384 tokens, with sliding window attention of 4,096 tokens. The 3B & 7B models were trained on 3+ trillion tokens, while the 15B was trained on 4+ trillion tokens. |
| 7 | + |
| 8 | + |
| 9 | +# Disclaimer |
| 10 | + |
| 11 | +Before you can use the models, go to `hf.co/bigcode/starcoder2-15b` and accept the agreement, and make sure you are logged into the Hugging Face hub: |
| 12 | +```bash |
| 13 | +huggingface-cli login |
| 14 | +``` |
| 15 | + |
| 16 | +# Table of Contents |
| 17 | +1. [Quickstart](#quickstart) |
| 18 | + - [Installation](#installation) |
| 19 | + - [Model usage and memory footprint](#model-usage-and-memory-footprint) |
| 20 | + - [Text-generation-inference code](#text-generation-inference) |
| 21 | +2. [Fine-tuning](#fine-tuning) |
| 22 | + - [Setup](#setup) |
| 23 | + - [Training](#training) |
| 24 | +3. [Evaluation](#evaluation) |
| 25 | + |
| 26 | +# Quickstart |
| 27 | +StarCoder2 models are intended for code completion, they are not instruction models and commands like "Write a function that computes the square root." do not work well. |
| 28 | + |
| 29 | +## Installation |
| 30 | +First, we have to install all the libraries listed in `requirements.txt` |
| 31 | +```bash |
| 32 | +pip install -r requirements.txt |
| 33 | +# export your HF token, found here: https://huggingface.co/settings/account |
| 34 | +export HF_TOKEN=xxx |
| 35 | +``` |
| 36 | + |
| 37 | +## Model usage and memory footprint |
| 38 | +Here are some examples to load the model and generate code. Ensure you've installed `transformers` from source (it should be the case if you used `requirements.txt`). We also include the memory footprint of the largest model, `StarCoder2-15B`, for each setup. |
| 39 | + |
| 40 | + |
| 41 | +### Running the model on CPU/ one GPU / multi GPU |
| 42 | +```python |
| 43 | +# pip install git+https://github.com/huggingface/transformers.git # TODO: merge PR to main |
| 44 | +from transformers import AutoModelForCausalLM, AutoTokenizer |
| 45 | + |
| 46 | +checkpoint = "bigcode/starcoder2-15b" |
| 47 | +device = "cuda" # for GPU usage or "cpu" for CPU usage |
| 48 | + |
| 49 | +tokenizer = AutoTokenizer.from_pretrained(checkpoint) |
| 50 | +# to use Multiple GPUs do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")` |
| 51 | +model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device) |
| 52 | + |
| 53 | +inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device) |
| 54 | +outputs = model.generate(inputs) |
| 55 | +print(tokenizer.decode(outputs[0])) |
| 56 | +``` |
| 57 | + |
| 58 | +### Running the model on a GPU using different precisions |
| 59 | + |
| 60 | +* _Using `torch.bfloat16`_ |
| 61 | + |
| 62 | +```python |
| 63 | +# pip install accelerate |
| 64 | +import torch |
| 65 | +from transformers import AutoTokenizer, AutoModelForCausalLM |
| 66 | + |
| 67 | +checkpoint = "bigcode/starcoder2-15b" |
| 68 | +tokenizer = AutoTokenizer.from_pretrained(checkpoint) |
| 69 | + |
| 70 | +# for fp16 use `torch_dtype=torch.float16` instead |
| 71 | +model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", torch_dtype=torch.bfloat16) |
| 72 | + |
| 73 | +inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda") |
| 74 | +outputs = model.generate(inputs) |
| 75 | +print(tokenizer.decode(outputs[0])) |
| 76 | +``` |
| 77 | +```python |
| 78 | +>>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB") |
| 79 | +Memory footprint: 32251.33 MB |
| 80 | +``` |
| 81 | + |
| 82 | +#### Quantized Versions through `bitsandbytes` |
| 83 | +* _Using 8-bit precision (int8)_ |
| 84 | + |
| 85 | +```python |
| 86 | +# pip install bitsandbytes accelerate |
| 87 | +from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig |
| 88 | + |
| 89 | +# to use 4bit use `load_in_4bit=True` instead |
| 90 | +quantization_config = BitsAndBytesConfig(load_in_8bit=True) |
| 91 | + |
| 92 | +checkpoint = "bigcode/starcoder2-15b_16k" |
| 93 | +tokenizer = AutoTokenizer.from_pretrained(checkpoint) |
| 94 | +model = AutoModelForCausalLM.from_pretrained("bigcode/starcoder2-15b_16k", quantization_config=quantization_config) |
| 95 | + |
| 96 | +inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda") |
| 97 | +outputs = model.generate(inputs) |
| 98 | +print(tokenizer.decode(outputs[0])) |
| 99 | +``` |
| 100 | +```bash |
| 101 | +>>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB") |
| 102 | +# load_in_8bit |
| 103 | +Memory footprint: 16900.18 MB |
| 104 | +# load_in_4bit |
| 105 | +>>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB") |
| 106 | +Memory footprint: 9224.60 MB |
| 107 | +``` |
| 108 | +You can also use `pipeline` for the generation: |
| 109 | +```python |
| 110 | +from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline |
| 111 | +checkpoint = "bigcode/starcoder2-15b" |
| 112 | + |
| 113 | +model = AutoModelForCausalLM.from_pretrained(checkpoint) |
| 114 | +tokenizer = AutoTokenizer.from_pretrained(checkpoint) |
| 115 | + |
| 116 | +pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0) |
| 117 | +print( pipe("def hello():") ) |
| 118 | +``` |
| 119 | + |
| 120 | +## Text-generation-inference: TODO |
| 121 | + |
| 122 | +```bash |
| 123 | +docker run -p 8080:80 -v $PWD/data:/data -e HUGGING_FACE_HUB_TOKEN=<YOUR BIGCODE ENABLED TOKEN> -d ghcr.io/huggingface/text-generation-inference:latest --model-id bigcode/starcoder2-15b --max-total-tokens 8192 |
| 124 | +``` |
| 125 | +For more details, see [here](https://github.com/huggingface/text-generation-inference). |
| 126 | + |
| 127 | +# Fine-tuning |
| 128 | + |
| 129 | +Here, we showcase how you can fine-tune StarCoder2 models. |
| 130 | + |
| 131 | +## Setup |
| 132 | + |
| 133 | +Install `pytorch` [see documentation](https://pytorch.org/), for example the following command works with cuda 12.1: |
| 134 | +```bash |
| 135 | +conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia |
| 136 | +``` |
| 137 | + |
| 138 | +Install the requirements (this installs `transformers` from source to support the StarCoder2 architecture): |
| 139 | +```bash |
| 140 | +pip install -r requirements.txt |
| 141 | +``` |
| 142 | + |
| 143 | +Before you run any of the scripts make sure you are logged in `wandb` and HuggingFace Hub to push the checkpoints: |
| 144 | +```bash |
| 145 | +wandb login |
| 146 | +huggingface-cli login |
| 147 | +``` |
| 148 | +Now that everything is done, you can clone the repository and get into the corresponding directory. |
| 149 | + |
| 150 | +## Training |
| 151 | +To fine-tune efficiently with a low cost, we use [PEFT](https://github.com/huggingface/peft) library for Low-Rank Adaptation (LoRA) training and [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) for 4bit quantization. We also use the `SFTTrainer` from [TRL](https://github.com/huggingface/trl). |
| 152 | + |
| 153 | + |
| 154 | +For this example, we will fine-tune StarCoder2-3b on the `Rust` subset of [the-stack-smol](https://huggingface.co/datasets/bigcode/the-stack-smol). This is just for illustration purposes; for a larger and cleaner dataset of Rust code, you can use [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup). |
| 155 | + |
| 156 | +To launch the training: |
| 157 | +```bash |
| 158 | +accelerate launch finetune.py \ |
| 159 | + --model_id "bigcode/starcoder2-3b" \ |
| 160 | + --dataset_name "bigcode/the-stack-smol" \ |
| 161 | + --subset "data/rust" \ |
| 162 | + --dataset_text_field "content" \ |
| 163 | + --split "train" \ |
| 164 | + --max_seq_length 1024 \ |
| 165 | + --max_steps 10000 \ |
| 166 | + --micro_batch_size 1 \ |
| 167 | + --gradient_accumulation_steps 8 \ |
| 168 | + --learning_rate 2e-5 \ |
| 169 | + --warmup_steps 20 \ |
| 170 | + --num_proc "$(nproc)" |
| 171 | +``` |
| 172 | + |
| 173 | +If you want to fine-tune on other text datasets, you need to change `dataset_text_field` argument to the name of the column containing the code/text you want to train on. |
| 174 | + |
| 175 | +# Evaluation |
| 176 | +To evaluate StarCoder2 and its derivatives, you can use the [BigCode-Evaluation-Harness](https://github.com/bigcode-project/bigcode-evaluation-harness) for evaluating Code LLMs. |
0 commit comments