@devin-alan
smart-doc PMC,torna项目发起人,iLogtail Committer
A high-throughput and memory-efficient inference and serving engine for LLMs
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型