Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 4d0e801

Browse files
Merge pull request #118 from Azure-Samples/testeval6
Add TOC to eval markdown
2 parents 1929ba1 + ca9a3fd commit 4d0e801

File tree

2 files changed

+20
-7
lines changed

2 files changed

+20
-7
lines changed

‎.github/workflows/evaluate.yaml

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -170,20 +170,26 @@ jobs:
170170
- name: Summarize results
171171
if: ${{ success() }}
172172
run: |
173-
echo "📊 Evaluation Results" >> $GITHUB_STEP_SUMMARY
174-
python -m evaltools summary evals/results --output=markdown >> eval-results.md
175-
cat eval-results.md >> $GITHUB_STEP_SUMMARY
173+
echo "## Evaluation results" >> eval-summary.md
174+
python -m evaltools summary evals/results --output=markdown >> eval-summary.md
175+
echo "## Answer differences across runs" >> run-diff.md
176+
python -m evaltools diff evals/results/baseline evals/results/pr${{ github.event.issue.number }} --output=markdown >> run-diff.md
177+
cat eval-summary.md >> $GITHUB_STEP_SUMMARY
178+
cat run-diff.md >> $GITHUB_STEP_SUMMARY
176179
177180
- name: Comment on pull request
178181
uses: actions/github-script@v7
179182
with:
180183
script: |
181184
const fs = require('fs');
182-
const summaryPath = "eval-results.md";
185+
const summaryPath = "eval-summary.md";
183186
const summary = fs.readFileSync(summaryPath, 'utf8');
187+
const runId = process.env.GITHUB_RUN_ID;
188+
const repo = process.env.GITHUB_REPOSITORY;
189+
const actionsUrl = `https://github.com/${repo}/actions/runs/${runId}`;
184190
github.rest.issues.createComment({
185191
issue_number: context.issue.number,
186192
owner: context.repo.owner,
187193
repo: context.repo.repo,
188-
body: summary
194+
body: `${summary}\n\n[Check the Actions tab for more details](${actionsUrl}).`
189195
})

‎docs/evaluation.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,13 @@
22

33
Follow these steps to evaluate the quality of the answers generated by the RAG flow.
44

5+
* [Deploy a GPT-4 model](#deploy-a-gpt-4-model)
6+
* [Setup the evaluation environment](#setup-the-evaluation-environment)
7+
* [Generate ground truth data](#generate-ground-truth-data)
8+
* [Run bulk evaluation](#run-bulk-evaluation)
9+
* [Review the evaluation results](#review-the-evaluation-results)
10+
* [Run bulk evaluation on a PR](#run-bulk-evaluation-on-a-pr)
11+
512
## Deploy a GPT-4 model
613

714

@@ -45,7 +52,7 @@ python evals/generate_ground_truth_data.py
4552

4653
Review the generated data after running that script, removing any question/answer pairs that don't seem like realistic user input.
4754
48-
## Evaluate the RAG answer quality
55+
## Run bulk evaluation
4956
5057
Review the configuration in `evals/eval_config.json` to ensure that everything is correctly setup. You may want to adjust the metrics used. See [the ai-rag-chat-evaluator README](https://github.com/Azure-Samples/ai-rag-chat-evaluator) for more information on the available metrics.
5158
@@ -72,6 +79,6 @@ Compare answers across runs by running the following command:
7279
python -m evaltools diff evals/results/baseline/
7380
```
7481
75-
## Run the evaluation on a PR
82+
## Run bulk evaluation on a PR
7683
7784
To run the evaluation on the changes in a PR, you can add a `/evaluate` comment to the PR. This will trigger the evaluation workflow to run the evaluation on the PR changes and will post the results to the PR.

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /