SLDAgent is an evolution-based AI agent that autonomously discovers scaling laws for large language models. This work introduces SLDBench, a comprehensive benchmark for this new scientific discovery task, and demonstrates that SLDAgent can uncover laws that are more accurate and conceptually sound than their human-derived counterparts.
The agent co-optimizes both the symbolic formula of a scaling law and the parameter-fitting algorithm, enabling it to explore complex relationships and achieve superhuman performance in predicting model behavior at scale.
This project includes SLDBench, the first comprehensive benchmark for scaling law discovery, curated from over 5,000 LLM training experiments from existing literature.
| Task key | Config file |
|---|---|
parallel_scaling_law |
configs/parallel_scaling_law.yaml |
vocab_scaling_law |
configs/vocab_scaling_law.yaml |
sft_scaling_law |
configs/sft_scaling_law.yaml |
domain_mixture_scaling_law |
configs/domain_mixture_scaling_law.yaml |
moe_scaling_law |
configs/moe_scaling_law.yaml |
data_constrained_scaling_law |
configs/data_constrained_scaling_law.yaml |
lr_bsz_scaling_law |
configs/lr_bsz_scaling_law.yaml |
Data is centrally hosted on Hugging Face Hub at pkuHaowei/sldbench.
- Python 3.13+
uvpackage manager (recommended)- An OpenAI-compatible API key (set
OPENAI_API_KEY) - macOS/Linux/Windows
Note:
uv runguarantees commands execute inside a synchronized project environment. If you prefer plainpip, you can adapt the commands accordingly.
# 1) Clone the repo git clone <repository-url> cd SLD # 2) Install dependencies uv sync # 3) Provide your LLM API key export OPENAI_API_KEY=your_key # Optional: if using a non-default endpoint # export OPENAI_BASE_URL=https://your.openai.compatible.endpoint/v1
On Windows (PowerShell):
$env:OPENAI_API_KEY="your_key" # $env:OPENAI_BASE_URL="https://your.openai.compatible.endpoint/v1"
# Create and activate a virtual environment python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate # Install dependencies pip install -U pip pip install -r requirements.txt # Or use pyproject.toml # Set your API key export OPENAI_API_KEY=your_key # export OPENAI_BASE_URL=https://your.openai.compatible.endpoint/v1
Run a single discovery task (e.g., Data-Constrained):
EVAL_TASK_NAME="data_constrained_scaling_law" \
uv run openevolve-run.py \
--config configs/data_constrained_scaling_law.yaml \
init_program.py evaluator.py \
--output results/data_constrained_scaling_law/run_1Or run all tasks in batch:
# If the script is executable: uv run scripts/run.sh # Otherwise: bash scripts/run.sh
SLD/
ββ configs/ # βοΈ YAML configs (one per scaling law)
β ββ data_constrained_scaling_law.yaml
β ββ domain_mix_scaling_law.yaml
β ββ lr_and_bsz_scaling_law.yaml
β ββ moe_scaling_law.yaml
β ββ parallel_scaling_law.yaml
β ββ sft_scaling_law.yaml
β ββ vocab_size_scaling_law.yaml
ββ data_loader.py # βοΈ Unified data loading interface (from Hugging Face)
ββ evaluator.py # β
Unified evaluation system
ββ init_program.py # π± Initial scaling-law template
ββ results/ # π Outputs & checkpoints (created automatically)
ββ scripts/
ββ run.sh # π Batch execution helper
export EVAL_TASK_NAME="data_constrained_scaling_law" uv run python openevolve-run.py \ --config configs/data_constrained_scaling_law.yaml \ init_program.py evaluator.py \ --output results/data_constrained_scaling_law/run_1
bash scripts/run.sh
This will:
- Run each task 3 times with different random seeds.
- Write outputs to
results/{task_name}/run_{1,2,3}/. - Save intermediate checkpoints.
- Evaluate and save the best program from each run.
EVAL_TASK_NAME="data_constrained_scaling_law" \
uv run python evaluator.py \
results/data_constrained_scaling_law/run_1/best/best_program.pyCreate configs/your_law_name.yaml and customize the settings (see the full template in the original README). Key sections include llm, prompt, database, and evaluator.
Upload your data to Hugging Face Hub. The data should be structured with appropriate feature and target columns following the existing schema patterns.
Add your task schema to the TASK_SCHEMA_MAP dictionary in data_loader.py:
TASK_SCHEMA_MAP = { # ... existing tasks ... "your_law_name": { "feature_names": ["feature1", "feature2"], "target_name": "target_variable", }, }
Add your task to the SUPPORTED_TASKS set in evaluator.py:
SUPPORTED_TASKS = { # ... existing tasks ... "your_law_name", }
Add "your_law_name" to the tasks array in scripts/run.sh to include it in batch runs.
Key knobs to tune in your .yaml files:
- Search Budget: Increase
max_iterationsandpopulation_sizefor more thorough exploration. - Exploration vs. Exploitation: Adjust
exploration_ratioandexploitation_ratio. - Parallelism: Raise
parallel_evaluationsto speed things up. - Reproducibility: Set a fixed
random_seedfor consistent results. - API Resilience: Bump
llm.timeoutandllm.retriesfor flaky networks.
- Data is centrally hosted on Hugging Face Hub at pkuHaowei/sldbench
- The unified
data_loader.pyautomatically loads data based on the task name and predefined schema
- Import Errors: Run
uv syncto ensure your environment is up-to-date. - Task Not Found: Check that
EVAL_TASK_NAMEmatches a task key inSUPPORTED_TASKSinevaluator.py. - Data Loading Issues: Verify internet connection and access to Hugging Face Hub repository
pkuHaowei/sldbench. - API Timeouts: Increase
llm.timeoutandllm.retriesin your config, or check yourOPENAI_BASE_URL. - Script Not Executable: Run
chmod +x scripts/run.shor execute it withbash scripts/run.sh.
Do I have to use OpenAI?
No. Any OpenAI-compatible endpoint works. Just set the api_base in your YAML config or the OPENAI_BASE_URL environment variable.
Can I use pip instead of uv?
Yes. Create a virtual environment, activate it, and install dependencies from requirements.txt. Then run the Python commands directly.
Where are the results stored?
Under results/{task_name}/{run_id}/. You'll find checkpoints, logs, and the final best/best_program.py.
If you use SLDAgent or SLDBench in your academic work, please cite the paper:
@article{lin2026SLD, title = {Can Language Models Discover Scaling Laws?}, author = {Lin, Haowei et al.}, year = {2025} }
This project is built on the excellent OpenEvolve .