中文项目简介 | Documentation | 中文文档
InfiniLM-ModelHub is an out-of-tree model-definition repository for InfiniLM. It provides a small, reviewable layer for adding model families, config adapters, processors, weight remapping rules, and optional C++ backend plugins without growing the core InfiniLM engine repository with every model-specific detail.
ModelHub plugins depend on InfiniLM's public infinilm.plugins API. Python
plugin code runs while loading model configs, processors, and checkpoints; it is
not part of the token-by-token inference hot path.
- Reusable helpers for config adaptation and checkpoint key remapping.
- Example plugins for dense transformer, MoE, and linear-attention families.
- Out-of-tree C++ backend plugin examples for model types that cannot be represented by pure Python config mapping alone.
- Documentation for implementing a full out-of-tree model backend and its model-specific operators.
- Small HuggingFace-style config fixtures for fast validation without downloading large checkpoints.
- Validation utilities for checking plugin registration and InfiniLM config adaptation.
The current configuration example source of truth is
examples/model_matrix.json.
cpp_backends/ Out-of-tree C++ backend adapter examples
docs/ Design notes and compatibility documentation
examples/ Tiny configs and plugin smoke-test entry points
src/infinilm_model_hub/ Python plugin modules and reusable helpers
tests/ Script-style verification checks
tools/ Build, validation, and inspection tools
Install InfiniLM first, then install ModelHub in editable mode:
cd InfiniLM-ModelHub python -m pip install -e . --no-build-isolation
Run the lightweight plugin repository check:
cd InfiniLM-ModelHub
python tests/verify_plugin_repo.pyRun config-only validation for the full example matrix:
python tests/verify_model_matrix.py
Build the example out-of-tree C++ backend plugin:
python tools/build_backend_plugins.py \ --infinilm-root <path-to-InfiniLM> \ --infini-root <path-to-InfiniCore-install>
Load a plugin explicitly from Python:
from infinilm.plugins import load_plugin load_plugin("infinilm_model_hub.llama_alias")
For command-line workflows, INFINILM_PLUGINS can load one or more Python
plugin modules before model initialization:
INFINILM_PLUGINS=infinilm_model_hub.llama_alias \ python <path-to-your-inference-script>
If a model family can reuse an existing InfiniLM C++ backend, a plugin is often
only a small ModelSpec plus a config adapter:
from infinilm.plugins import ModelSpec, register_model def adapt_config(config): config = dict(config) config["head_dim"] = config["hidden_size"] // config["num_attention_heads"] return config register_model( ModelSpec( model_type="my_llama_family", backend_model_type="llama", config_adapter=adapt_config, processor="llama", ) )
If checkpoint names or tensor layouts differ, compose reusable weight rules:
from infinilm_model_hub.weights import rename, split_fused weight_rules = [ split_fused("query_key_value", ["q_proj", "k_proj", "v_proj"]), rename({"transformer.layers.": "model.layers."}), ]
See examples/README.md for runnable smoke tests.
If a model cannot reuse an existing backend, implement the model backend and its
model-specific operators in an out-of-tree shared library, then declare that
library from ModelSpec.backend_plugin:
from infinilm.plugins import ModelSpec, register_model register_model( ModelSpec( model_type="my_new_arch", config_adapter=adapt_config, processor="default", backend_plugin="/path/to/libmy_new_arch_backend.so", ) )
The C++ plugin may export infinilm_backend_plugin_init() and register model
types through InfiniLM's C++ registry. The example implementation is
cpp_backends/modelhub_backend_adapters.cpp,
and the build entry point is
tools/build_backend_plugins.py.
For the full backend flow, including the C++ model class and operator boundary,
see docs/out_of_tree_backend.md.
For embedding or temporary command-line debugging, backend plugins can still be
loaded from INFINILM_BACKEND_PLUGINS, but this is an explicit API call:
from infinilm.plugins import load_backend_plugins_from_env load_backend_plugins_from_env()
InfiniLM's core config and model factories do not read backend plugin environment variables implicitly.
This repository defines how model metadata is connected to InfiniLM through config adapters, processor selection, weight rules, and optional backend plugin registration. For architectures that InfiniLM does not implement yet, the backend and model-specific operators should live in an out-of-tree C++ plugin.