Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

learning-commons-org/evaluators

Repository files navigation

Evaluators

Evaluators project banner logo

Try it in the Playground QuickstartCore concepts

Evaluators help you to measure the attributes of LLM-generated text through the lens of learning science.

We build learning-science-backed systems that follow LLM-as-a-judge methodology and can be directly integrated to your product or evaluation stack.

Use cases include:

  • Feature optimization: Use fine-grained literacy evaluation to sharpen and consistently deliver a feature’s AI-generated content so it aligns with pedagogy and your goals.
  • Maintaining performance: Ensure content is generated as expected by using the evaluators as product analytics for your LLM output.
  • Model selection: Make a confident decision about which model is right for your product by testing the output of models you’re considering.

Evaluators and the supporting datasets are built in collaboration with leading literacy experts from Student Achievement Partners and the Achievement Network.

Repository contents

Path Description
evals Evaluators code and prompts
datasets Expert annotated datasets used to create and validate evaluators
LICENSE Open source license details

Check out the Evaluators docs for complete setup instructions and usage examples.

Try the evaluators

You can test the evaluators with your own text in the Evaluators Playground on the Learning Commons Platform.

Quickstart

To use the evaluators, clone the repository and follow the instructions below.

If you’d like to download or access our evaluators and datasets directly, follow the links below.

Requirements

We rely on the Python interpreter to power the evaluators. All examples and tutorials are provided as Python code snippets.

Setup on Mac/Linux

You’ll need Python 3.10 or newer. To verify your version of Python, run the following code in the terminal:

python3 --version

1. Create a virtual environment

Creating an isolated environment is a best practice that prevents conflicts between Python packages used in this project and others on your system.

python3 -m venv .venv
source .venv/bin/activate

Remember to activate the virtual environment for each new shell session when working with Evaluators.

2. Install dependencies

The required packages are listed in the requirements.txt file.

pip install -r evals/requirements.txt

3. Set your API keys

We are using both OpenAI and Google Gemini for different evaluators. You need API keys from both platforms:

Set the key(s) as environment variables in your shell session:

export OPENAI_API_KEY="sk-your-key-here"
export GOOGLE_API_KEY="your-key-here"
Setup on Windows

Setup on Windows

You’ll need Python 3.10 or newer. To verify your version of python, run the following code in the terminal:

python --version

1. Create a virtual environment

Open a Command Prompt and run:

python -m venv .venv
.venv\Scripts\activate

Or in PowerShell:

python -m venv .venv
.venv\Scripts\Activate.ps1

Remember to activate the virtual environment for each new shell session when working with Evaluators.

2. Install dependencies

pip install -r evals/requirements.txt

3. Set your API keys

Get your API keys from:

Set the key(s) as environment variables:

In Command Prompt:

set OPENAI_API_KEY=sk-your-key-here
set GOOGLE_API_KEY=your-key-here

In PowerShell:

$env:OPENAI_API_KEY="sk-your-key-here"
$env:GOOGLE_API_KEY="your-key-here"

Run the Evaluators' code

You are now ready to run the evaluator examples. We recommend using a Jupyter Notebook for interactive exploration.

  1. Start Jupyter Notebooks Lab:
jupyter lab

Jupyter will open in your web browser (usually at http://localhost:8888).

  1. Browse into the evals folder, then double click on the evaluator you want to try.
  2. You can now copy the text you want to evaluate into the last code cell of the notebook to run an evaluator on your text sample.

If you prefer using an IDE with Python and Jupyter notebook support, such as VSCode with Microsoft's Python and Jupyter extensions, please refer to Microsoft's instructions for their installation and configuration.

Support & feedback

We want to hear from you. For questions or feedback, please open an issue or reach out to us at support@learningcommons.org

Stay up to date

Sign up for a Learning Commons account to receive news about the latest Evaluators updates, and releases.

Reporting security issues

If you believe you have found a security issue, please responsibly disclose by contacting us at security@learningcommons.org.

Disclaimer

Use of the resources provided in this repository is subject to our Terms of Use.

About

Evaluation for AI outputs against trusted educational rubrics. Measure and improve content quality with research-backed rubrics — ensuring rigor, reliability, and alignment to classroom needs.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors

AltStyle によって変換されたページ (->オリジナル) /