Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 0eff9b1

Browse files
Eval markdown
1 parent 1929ba1 commit 0eff9b1

File tree

1 file changed

+9
-2
lines changed

1 file changed

+9
-2
lines changed

‎docs/evaluation.md‎

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,13 @@
22

33
Follow these steps to evaluate the quality of the answers generated by the RAG flow.
44

5+
* [Deploy a GPT-4 model](#deploy-a-gpt-4-model)
6+
* [Setup the evaluation environment](#setup-the-evaluation-environment)
7+
* [Generate ground truth data](#generate-ground-truth-data)
8+
* [Run bulk evaluation](#run-bulk-evaluation)
9+
* [Review the evaluation results](#review-the-evaluation-results)
10+
* [Run bulk evaluation on a PR](#run-bulk-evaluation-on-a-pr)
11+
512
## Deploy a GPT-4 model
613

714

@@ -45,7 +52,7 @@ python evals/generate_ground_truth_data.py
4552

4653
Review the generated data after running that script, removing any question/answer pairs that don't seem like realistic user input.
4754
48-
## Evaluate the RAG answer quality
55+
## Run bulk evaluation
4956
5057
Review the configuration in `evals/eval_config.json` to ensure that everything is correctly setup. You may want to adjust the metrics used. See [the ai-rag-chat-evaluator README](https://github.com/Azure-Samples/ai-rag-chat-evaluator) for more information on the available metrics.
5158
@@ -72,6 +79,6 @@ Compare answers across runs by running the following command:
7279
python -m evaltools diff evals/results/baseline/
7380
```
7481
75-
## Run the evaluation on a PR
82+
## Run bulk evaluation on a PR
7683
7784
To run the evaluation on the changes in a PR, you can add a `/evaluate` comment to the PR. This will trigger the evaluation workflow to run the evaluation on the PR changes and will post the results to the PR.

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /