Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Say383/starcoder.cpp

Repository files navigation

💫StarCoder in C++

This is a C++ example running 💫 StarCoder inference using the ggml library.

The program can run on the CPU - no video card is required.

The example supports the following 💫 StarCoder models:

  • bigcode/starcoder
  • bigcode/gpt_bigcode-santacoder aka the smol StarCoder
  • HuggingFaceH4/starchat-beta - the coding assistants based on StarCoderPlus

Sample performance on MacBook M1 Pro:

TODO

Sample output:

$ ./bin/starcoder -h
usage: ./bin/starcoder [options]
options:
 -h, --help show this help message and exit
 -s SEED, --seed SEED RNG seed (default: -1)
 -t N, --threads N number of threads to use during computation (default: 8)
 -p PROMPT, --prompt PROMPT
 prompt to start generation with (default: random)
 -n N, --n_predict N number of tokens to predict (default: 200)
 --top_k N top-k sampling (default: 40)
 --top_p N top-p sampling (default: 0.9)
 --temp N temperature (default: 1.0)
 -b N, --batch_size N batch size for prompt processing (default: 8)
 -m FNAME, --model FNAME
 model path (default: models/starcoder-117M/ggml-model.bin)
$ ./bin/starcoder -m ../models/bigcode/gpt_bigcode-santacoder-ggml-q4_1.bin -p "def fibonnaci(" -t 4 --top_k 0 --top_p 0.95 --temp 0.2 
main: seed = 1683881276
starcoder_model_load: loading model from '../models/bigcode/gpt_bigcode-santacoder-ggml-q4_1.bin'
starcoder_model_load: n_vocab = 49280
starcoder_model_load: n_ctx = 2048
starcoder_model_load: n_embd = 2048
starcoder_model_load: n_head = 16
starcoder_model_load: n_layer = 24
starcoder_model_load: ftype = 3
starcoder_model_load: ggml ctx size = 1794.90 MB
starcoder_model_load: memory size = 768.00 MB, n_mem = 49152
starcoder_model_load: model size = 1026.83 MB
main: prompt: 'def fibonnaci('
main: number of tokens in prompt = 7, first 8 tokens: 563 24240 78 2658 64 2819 7 
def fibonnaci(n):
 if n == 0:
 return 0
 elif n == 1:
 return 1
 else:
 return fibonacci(n-1) + fibonacci(n-2)
print(fibo(10))
main: mem per token = 9597928 bytes
main: load time = 480.43 ms
main: sample time = 26.21 ms
main: predict time = 3987.95 ms / 19.36 ms per token
main: total time = 4580.56 ms

Quick start

git clone https://github.com/bigcode-project/starcoder.cpp
cd starcoder.cpp
# Convert HF model to ggml
python convert-hf-to-ggml.py bigcode/gpt_bigcode-santacoder
# Build ggml libraries
make
# quantize the model
./quantize models/bigcode/gpt_bigcode-santacoder-ggml.bin models/bigcode/gpt_bigcode-santacoder-ggml-q4_1.bin 3
# run inference
./main -m models/bigcode/gpt_bigcode-santacoder-ggml-q4_1.bin -p "def fibonnaci(" --top_k 0 --top_p 0.95 --temp 0.2

Downloading and converting the original models (💫 StarCoder)

You can download the original model and convert it to ggml format using the script convert-hf-to-ggml.py:

# Convert HF model to ggml
python convert-hf-to-ggml.py bigcode/gpt_bigcode-santacoder

This conversion requires that you have python and Transformers installed on your computer.

Quantizing the models

You can also try to quantize the ggml models via 4-bit integer quantization.

# quantize the model
./quantize models/bigcode/gpt_bigcode-santacoder-ggml.bin models/bigcode/gpt_bigcode-santacoder-ggml-q4_1.bin 3
Model Original size Quantized size Quantization type
bigcode/gpt_bigcode-santacoder 5396.45 MB 1026.83 MB 4-bit integer (q4_1)
bigcode/starcoder 71628.23 MB 13596.23 MB 4-bit integer (q4_1)

iOS App

The repo includes a proof-of-concept iOS app in the StarCoderApp directory. You need to provide the converted (and possibly quantized) model weights, placing a file called bigcode_ggml_model.bin.bin inside that folder. This is what it looks like on an iPhone:

starcoder-ios-screenshot

About

C++ implementation for 💫StarCoder

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 79.6%
  • C++ 13.3%
  • Cuda 4.6%
  • Python 1.1%
  • Makefile 0.8%
  • Swift 0.5%
  • Other 0.1%

AltStyle によって変換されたページ (->オリジナル) /