Commit f442397

authored

Update README.md

1 parent 2cdd916 commit f442397Copy full SHA for f442397

File tree

-1

lines changed

-1

lines changed

Lines changed: 1 addition & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -3,7 +3,7 @@`
`3`	`3`	`<p align="center"><a href="https://huggingface.co/bigcode">[🤗 Models & Datasets]</a> \| <a href="https://drive.google.com/file/d/17iGn3c-sYNiLyRSY-A85QOzgzGnGiVI3/view">[Paper]</a></a>`
`4`	`4`	`</p>`
`5`	`5`
`6`		`-StarCoder2 is a family of code generation models (3B, 7B, and 15B), trained on 600+ programming languages from [The Stack v2]() and some natural language text such as Wikipedia, Arxiv, and GitHub issues. The models use Grouped Query Attention, a context window of 16,384 tokens, with sliding window attention of 4,096 tokens. The 3B & 7B models were trained on 3+ trillion tokens, while the 15B was trained on 4+ trillion tokens.`
	`6`	+StarCoder2 is a family of code generation models (3B, 7B, and 15B), trained on 600+ programming languages from [The Stack v2](https://huggingface.co/datasets/bigcode/the-stack-v2) and some natural language text such as Wikipedia, Arxiv, and GitHub issues. The models use Grouped Query Attention, a context window of 16,384 tokens, with sliding window attention of 4,096 tokens. The 3B & 7B models were trained on 3+ trillion tokens, while the 15B was trained on 4+ trillion tokens. For more details check out the [paper](https://drive.google.com/file/d/17iGn3c-sYNiLyRSY-A85QOzgzGnGiVI3/view).
`7`	`7`
`8`	`8`	`# Table of Contents`
`9`	`9`	`1. [Quickstart](#quickstart)`

Comments

(0)