Roadmap May 2023 · ggml-org/llama.cpp · Discussion #1220

Replies: 3 comments 11 replies

dakennedyd
Apr 29, 2023

3 replies

Green-Sky Apr 30, 2023
Collaborator

no stablelm is supported (by ggml, see repo) but its quality is underwhelming....
still, latent diffusion models would be sick

If I'm understanding the idea of llama_state--that it will allow multiple "inference threads" from a single loaded model, then it definitely seems worth implementing, since it opens up a lot of possibilities.

Is the idea that we can get a lot of the same gains by just quickly swapping out stored contexts? A lot of llm applications benefit from having multiple instances that can build on one another, or different instances that receive diverse queries.

I haven't started using it yet, because I've been waiting for someone to post an example. It's a bit hard for me to parse the api.

8 replies

@ggerganov

ggerganov May 20, 2023
Maintainer Author

The right way to do it is like we do it in whisper.cpp:

https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h

If someone wants to give it a try at implementing it here

@bullno1

bullno1 Jun 8, 2023

Should every public API has an implicit and an explicit _with_state & _from_state version though? That seems pretty verbose.
Granted, the implicit version could probably just forward to the explicit version.

Still, does that mean if one choose the explicit state version, the context would still contain an unused state?

@ggerganov

ggerganov Jun 10, 2023
Maintainer Author

I agree that it is a bit over-verbose, but we didn't see a better way. Open to suggestions

Still, does that mean if one choose the explicit state version, the context would still contain an unused state?

There are init calls that explicitly do not create an internal state:

https://github.com/ggerganov/whisper.cpp/blob/57543c169e27312e7546d07ed0d8c6eb806ebc36/whisper.h#L109

@didzis

didzis Jun 11, 2023

I suggest using the llama_context as the llama_state. For me that seems semantically correct and the most logical approach, it is fully backwards compatible requiring no changes for existing public API users. It's also very simple with only few changes internally and only few additions to the public API only for those who would like to use this feature.

I created a pull request here: #1797 (comment)

@bullno1

bullno1 Jun 12, 2023

I just tried to create a llama_state but the duplication in sampling is too much.
All sampling functions rely on ctx->rng.

It's also a constant duplication factor to all new public API going forward.

@didzis change is great, just 2 new public functions.

monircsueb
Apr 3, 2024

do we have weigh model support for Encodec? If yes, could you please tell me how to build the sgml-model.bin?
Thank you appreciate your help and time.

0 replies

Roadmap May 2023 #1220

Uh oh!

Uh oh!

ggerganov Apr 28, 2023 Maintainer

High-prio

Low-prio

Replies: 3 comments · 11 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Green-Sky Apr 30, 2023 Collaborator

Uh oh!

Uh oh!

Uh oh!

ggerganov May 20, 2023 Maintainer Author

Uh oh!

Uh oh!

ggerganov Jun 10, 2023 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

ggerganov
Apr 28, 2023
Maintainer

Replies: 3 comments 11 replies

Green-Sky Apr 30, 2023
Collaborator

ggerganov May 20, 2023
Maintainer Author

ggerganov Jun 10, 2023
Maintainer Author