Commit a18f481

authored

server : use common_token_to_piece instead of common_detokenize (ggml-org#11740)

* server : use common_token_to_piece instead of common_detokenize This commit replaces the call to common_detokenize with common_token_to_piece in the populate_token_probs. The motivation for this change is to avoid an issue where common_detokenize would remove the word boundary character for tokens, which caused a regression in the server generated token probabilities. Resolves: ggml-org#11728 * squash! server : use common_token_to_piece instead of common_detokenize Use common_token_to_piece for post_sampling_probs as well.

1 parent b9ab0a4 commit a18f481Copy full SHA for a18f481

File tree

1 file changed

-2

lines changed

examples/server
- server.cpp

1 file changed

-2

lines changed

`‎examples/server/server.cpp‎`

Lines changed: 2 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -2279,7 +2279,7 @@ struct server_context {`
`2279`	`2279`	`for (size_t i = 0; i < std::min(max_probs, n_probs); i++) {`
`2280`	`2280`	`result.probs.push_back({`
`2281`	`2281`	`cur_p->data[i].id,`
`2282`		`- common_detokenize(ctx, {cur_p->data[i].id}, special),`
	`2282`	`+ common_token_to_piece(ctx, cur_p->data[i].id, special),`
`2283`	`2283`	`cur_p->data[i].p`
`2284`	`2284`	`});`
`2285`	`2285`	`}`
`@@ -2301,7 +2301,7 @@ struct server_context {`
`2301`	`2301`	`for (size_t i = 0; i < std::min(n_vocab, n_probs); i++) {`
`2302`	`2302`	`result.probs.push_back({`
`2303`	`2303`	`cur[i].id,`
`2304`		`- common_detokenize(ctx, {cur[i].id}, special),`
	`2304`	`+ common_token_to_piece(ctx, cur[i].id, special),`
`2305`	`2305`	`cur[i].p`
`2306`	`2306`	`});`
`2307`	`2307`	`}`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Commit a18f481

File tree

1 file changed

1 file changed

`‎examples/server/server.cpp‎`

0 commit comments