Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

InfiniTensor/InfiniLM-ModelHub

Repository files navigation

InfiniLM-ModelHub

中文项目简介 | Documentation | 中文文档

InfiniLM-ModelHub is an out-of-tree model-definition repository for InfiniLM. It provides a small, reviewable layer for adding model families, config adapters, processors, weight remapping rules, and optional C++ backend plugins without growing the core InfiniLM engine repository with every model-specific detail.

ModelHub plugins depend on InfiniLM's public infinilm.plugins API. Python plugin code runs while loading model configs, processors, and checkpoints; it is not part of the token-by-token inference hot path.

What Is Included

  • Reusable helpers for config adaptation and checkpoint key remapping.
  • Example plugins for dense transformer, MoE, and linear-attention families.
  • Out-of-tree C++ backend plugin examples for model types that cannot be represented by pure Python config mapping alone.
  • Documentation for implementing a full out-of-tree model backend and its model-specific operators.
  • Small HuggingFace-style config fixtures for fast validation without downloading large checkpoints.
  • Validation utilities for checking plugin registration and InfiniLM config adaptation.

The current configuration example source of truth is examples/model_matrix.json.

Repository Layout

cpp_backends/ Out-of-tree C++ backend adapter examples
docs/ Design notes and compatibility documentation
examples/ Tiny configs and plugin smoke-test entry points
src/infinilm_model_hub/ Python plugin modules and reusable helpers
tests/ Script-style verification checks
tools/ Build, validation, and inspection tools

Installation

Install InfiniLM first, then install ModelHub in editable mode:

cd InfiniLM-ModelHub
python -m pip install -e . --no-build-isolation

Quick Checks

Run the lightweight plugin repository check:

cd InfiniLM-ModelHub
python tests/verify_plugin_repo.py

Run config-only validation for the full example matrix:

python tests/verify_model_matrix.py

Build the example out-of-tree C++ backend plugin:

python tools/build_backend_plugins.py \
 --infinilm-root <path-to-InfiniLM> \
 --infini-root <path-to-InfiniCore-install>

Using A Python Plugin

Load a plugin explicitly from Python:

from infinilm.plugins import load_plugin
load_plugin("infinilm_model_hub.llama_alias")

For command-line workflows, INFINILM_PLUGINS can load one or more Python plugin modules before model initialization:

INFINILM_PLUGINS=infinilm_model_hub.llama_alias \
python <path-to-your-inference-script>

Defining A New Model

If a model family can reuse an existing InfiniLM C++ backend, a plugin is often only a small ModelSpec plus a config adapter:

from infinilm.plugins import ModelSpec, register_model
def adapt_config(config):
 config = dict(config)
 config["head_dim"] = config["hidden_size"] // config["num_attention_heads"]
 return config
register_model(
 ModelSpec(
 model_type="my_llama_family",
 backend_model_type="llama",
 config_adapter=adapt_config,
 processor="llama",
 )
)

If checkpoint names or tensor layouts differ, compose reusable weight rules:

from infinilm_model_hub.weights import rename, split_fused
weight_rules = [
 split_fused("query_key_value", ["q_proj", "k_proj", "v_proj"]),
 rename({"transformer.layers.": "model.layers."}),
]

See examples/README.md for runnable smoke tests.

Adding A C++ Backend Plugin

If a model cannot reuse an existing backend, implement the model backend and its model-specific operators in an out-of-tree shared library, then declare that library from ModelSpec.backend_plugin:

from infinilm.plugins import ModelSpec, register_model
register_model(
 ModelSpec(
 model_type="my_new_arch",
 config_adapter=adapt_config,
 processor="default",
 backend_plugin="/path/to/libmy_new_arch_backend.so",
 )
)

The C++ plugin may export infinilm_backend_plugin_init() and register model types through InfiniLM's C++ registry. The example implementation is cpp_backends/modelhub_backend_adapters.cpp, and the build entry point is tools/build_backend_plugins.py.

For the full backend flow, including the C++ model class and operator boundary, see docs/out_of_tree_backend.md.

For embedding or temporary command-line debugging, backend plugins can still be loaded from INFINILM_BACKEND_PLUGINS, but this is an explicit API call:

from infinilm.plugins import load_backend_plugins_from_env
load_backend_plugins_from_env()

InfiniLM's core config and model factories do not read backend plugin environment variables implicitly.

Scope

This repository defines how model metadata is connected to InfiniLM through config adapters, processor selection, weight rules, and optional backend plugin registration. For architectures that InfiniLM does not implement yet, the backend and model-specific operators should live in an out-of-tree C++ plugin.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /