Commit 54821be

authored

update usage section

1 parent 3786335 commit 54821beCopy full SHA for 54821be

File tree

1 file changed

-8

lines changed

README.md

1 file changed

-8

lines changed

`‎README.md‎`

Lines changed: 9 additions & 8 deletions

Original file line number	Diff line number	Diff line change
`@@ -35,10 +35,13 @@ export HF_TOKEN=xxx`
`35`	`35`	```
`36`	`36`
`37`	`37`	`## Model usage and memory footprint`
`38`		-Here are some examples to load the model and generate code. Ensure you've installed `transformers` from source (it should be the case if you used `requirements.txt`). We also include the memory footprint of the largest model, `StarCoder2-15B`, for each setup.
`39`		`-`
	`38`	+Here are some examples to load the model and generate code, with the memory footprint of the largest model, `StarCoder2-15B`. Ensure you've installed `transformers` from source (it should be the case if you used `requirements.txt`)
	`39`	+```bash
	`40`	`+pip install git+https://github.com/huggingface/transformers.git`
	`41`	+```
`40`	`42`
`41`		`-### Running the model on CPU/ one GPU / multi GPU`
	`43`	`+### Running the model on CPU/GPU/multi GPU`
	`44`	`+* _Using full precision_`
`42`	`45`	```python
`43`	`46`	`# pip install git+https://github.com/huggingface/transformers.git # TODO: merge PR to main`
`44`	`47`	`from transformers import AutoModelForCausalLM, AutoTokenizer`
`@@ -55,10 +58,7 @@ outputs = model.generate(inputs)`
`55`	`58`	`print(tokenizer.decode(outputs[0]))`
`56`	`59`	```
`57`	`60`
`58`		`-### Running the model on a GPU using different precisions`
`59`		`-`
`60`	`61`	* _Using `torch.bfloat16`_
`61`		`-`
`62`	`62`	```python
`63`	`63`	`# pip install accelerate`
`64`	`64`	`import torch`
`@@ -79,7 +79,7 @@ print(tokenizer.decode(outputs[0]))`
`79`	`79`	`Memory footprint: 32251.33 MB`
`80`	`80`	```
`81`	`81`
`82`		-#### Quantized Versions through `bitsandbytes`
	`82`	+### Quantized Versions through `bitsandbytes`
`83`	`83`	`* _Using 8-bit precision (int8)_`
`84`	`84`
`85`	`85`	```python
`@@ -117,7 +117,8 @@ pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)`
`117`	`117`	`print( pipe("def hello():") )`
`118`	`118`	```
`119`	`119`
`120`		`-## Text-generation-inference: TODO`
	`120`	`+## Text-generation-inference:`
	`121`	`+TODO`
`121`	`122`
`122`	`123`	```bash
`123`	`124`	`docker run -p 8080:80 -v $PWD/data:/data -e HUGGING_FACE_HUB_TOKEN=<YOUR BIGCODE ENABLED TOKEN> -d ghcr.io/huggingface/text-generation-inference:latest --model-id bigcode/starcoder2-15b --max-total-tokens 8192`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 54821be

File tree

1 file changed

1 file changed

`‎README.md‎`

0 commit comments