Name	Name	Last commit message	Last commit date
Latest commit History 841 Commits
.cargo	.cargo
.github/workflows	.github/workflows
candle-book	candle-book
candle-core	candle-core
candle-datasets	candle-datasets
candle-examples	candle-examples
candle-flash-attn	candle-flash-attn
candle-kernels	candle-kernels
candle-nn	candle-nn
candle-pyo3	candle-pyo3
candle-transformers	candle-transformers
candle-wasm-examples	candle-wasm-examples
.gitignore	.gitignore
.gitmodules	.gitmodules
.pre-commit-config.yaml	.pre-commit-config.yaml
Cargo.toml	Cargo.toml
LICENSE-APACHE	LICENSE-APACHE
LICENSE-MIT	LICENSE-MIT
Makefile	Makefile
README.md	README.md

candle

discord server Latest version Documentation License

Candle is a minimalist ML framework for Rust with a focus on performance (including GPU support) and ease of use. Try our online demos: whisper, llama2.

let a = Tensor::randn(0f32, 1., (2, 3), &Device::Cpu)?;
let b = Tensor::randn(0f32, 1., (3, 4), &Device::Cpu)?;
let c = a.matmul(&b)?;
println!("{c}");

Check out our examples

Check out our examples:

Whisper: speech recognition model.
Llama and Llama-v2: general LLM.
Falcon: general LLM.
Bert: useful for sentence embeddings.
StarCoder: LLM specialized to code generation.
Stable Diffusion: text to image generative model, only cpu support at the moment and on the slow side.

Run them using the following commands:

cargo run --example whisper --release
cargo run --example llama --release
cargo run --example falcon --release
cargo run --example bert --release
cargo run --example bigcode --release
cargo run --example stable-diffusion --release --features image -- --prompt "a rusty robot holding a fire torch"

In order to use CUDA add --features cuda to the example command line.

There are also some wasm examples for whisper and llama2.c. You can either build them with trunk or try them online: whisper, llama2.

For llama2, run the following command to retrieve the weight files and start a test server:

cd candle-wasm-examples/llama2-c
wget https://huggingface.co/spaces/lmz/candle-llama2/resolve/main/model.bin
wget https://huggingface.co/spaces/lmz/candle-llama2/resolve/main/tokenizer.json
trunk serve --release --public-url /candle-llama2/ --port 8081

And then head over to http://localhost:8081/candle-llama2.

Features

Simple syntax, looks and feels like PyTorch.
CPU and Cuda backends, m1, f16, bf16.
Serverless (on CPU), small and fast deployments
WASM support, run your models in a browser.
Model training.
Distributed computing using NCCL.
Model support out of the box: Llama, Whisper, Falcon, StarCoder...
Embed user-defined ops/kernels, such as flash-attention v2.

How to use

Cheatsheet:

Using PyTorch	Using Candle
Creation	`torch.Tensor([[1, 2], [3, 4]])`	`Tensor::new(&[[1f32, 2.], [3., 4.]], &Device::Cpu)?`
Creation	`torch.zeros((2, 2))`	`Tensor::zeros((2, 2), DType::F32, &Device::Cpu)?`
Indexing	`tensor[:, :4]`	`tensor.i((.., ..4))?`
Operations	`tensor.view((2, 2))`	`tensor.reshape((2, 2))?`
Operations	`a.matmul(b)`	`a.matmul(&b)?`
Arithmetic	`a + b`	`&a + &b`
Device	`tensor.to(device="cuda")`	`tensor.to_device(&Device::Cuda(0))?`
Dtype	`tensor.to(dtype=torch.float16)`	`tensor.to_dtype(&DType::F16)?`
Saving	`torch.save({"A": A}, "model.bin")`	`candle::safetensors::save(&HashMap::from([("A", A)]), "model.safetensors")?`
Loading	`weights = torch.load("model.bin")`	`candle::safetensors::load("model.safetensors", &device)`

Structure

candle-core: Core ops, devices, and Tensor struct definition
candle-nn: Tools to build real models
candle-examples: Examples of using the library in realistic settings
candle-kernels: CUDA custom kernels
candle-datasets: Datasets and data loaders.
candle-transformers: transformers-related utilities.
candle-flash-attn: Flash attention v2 layer.

FAQ

Why should I use Candle?

Candle's core goal is to make serverless inference possible. Full machine learning frameworks like PyTorch are very large, which makes creating instances on a cluster slow. Candle allows deployment of lightweight binaries.

Secondly, Candle lets you remove Python from production workloads. Python overhead can seriously hurt performance, and the GIL is a notorious source of headaches.

Finally, Rust is cool! A lot of the HF ecosystem already has Rust crates, like safetensors and tokenizers.

Other ML frameworks

dfdx is a formidable crate, with shapes being included in types. This prevents a lot of headaches by getting the compiler to complain about shape mismatches right off the bat. However, we found that some features still require nightly, and writing code can be a bit daunting for non rust experts.

We're leveraging and contributing to other core crates for the runtime so hopefully both crates can benefit from each other.
burn is a general crate that can leverage multiple backends so you can choose the best engine for your workload.
tch-rs Bindings to the torch library in Rust. Extremely versatile, but they bring in the entire torch library into the runtime. The main contributor of tch-rs is also involved in the development of candle.

Common Errors

Missing symbols when compiling with the mkl feature.

If you get some missing symbols when compiling binaries/tests using the mkl features, e.g.:

 = note: /usr/bin/ld: (....o): in function `blas::sgemm':
 .../blas-0.22.0/src/lib.rs:1944: undefined reference to `sgemm_' collect2: error: ld returned 1 exit status
 = note: some `extern` functions couldn't be found; some native libraries may need to be installed or have their path specified
 = note: use the `-l` flag to specify native libraries to link
 = note: use the `cargo:rustc-link-lib` directive to specify the native libraries to link with Cargo (see https://doc.rust-lang.org/cargo/reference/build-scripts.html#cargorustc-link-libkindname)

This is likely due to a missing linker flag that was needed to enable the mkl library. You can try adding the following at the top of your binary:

extern crate intel_mkl_src;

Cannot run llama example : access to source requires login credentials

Error: request error: https://huggingface.co/meta-llama/Llama-2-7b-hf/resolve/main/tokenizer.json: status code 401

This is likely because you're not permissioned for the llama-v2 model. To fix this, you have to register on the huggingface-hub, accept the llama-v2 model conditions, and set up your authentication token. See issue #350 for more details.

Tracking down errors

You can set RUST_BACKTRACE=1 to be provided with backtraces when a candle error is generated.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

tabVersion/candle

Folders and files

Latest commit

History

Repository files navigation

candle

Check out our examples

Features

How to use

Structure

FAQ

Why should I use Candle?

Other ML frameworks

Common Errors

Missing symbols when compiling with the mkl feature.

Cannot run llama example : access to source requires login credentials

Tracking down errors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

License

tabVersion/candle

Folders and files

Latest commit

History

Repository files navigation

candle

Check out our examples

Features

How to use

Structure

FAQ

Why should I use Candle?

Other ML frameworks

Common Errors

Missing symbols when compiling with the mkl feature.

Cannot run llama example : access to source requires login credentials

Tracking down errors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages