-
Notifications
You must be signed in to change notification settings - Fork 58
add LLM in adapter and save query and answer #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hicofeng
commented
Dec 19, 2024
Does the current LLM (Large Language Model) adapter for this project support streaming answers? For scenarios that require low latency, is there a plan to support this feature in the future if it's not available now? Thank you very much for your assistance.
DreamCyc
commented
Dec 22, 2024
@hicofeng When the model is deployed to the server machine and provided with a URL, it can achieve streaming output to avoid user waiting. The functionality provided here is to invoke the deployed model when there are no matching results in the cached data, referring to the OpenAI specification, which may vary depending on the specific model and deployment method used.
No description provided.