Commit 4d0e801

authored

Merge pull request #118 from Azure-Samples/testeval6

Add TOC to eval markdown

2 parents 1929ba1 + ca9a3fd commit 4d0e801Copy full SHA for 4d0e801

File tree

2 files changed

+20

-7

lines changed

.github/workflows
- evaluate.yaml
docs
- evaluation.md

2 files changed

+20

-7

lines changed

`‎.github/workflows/evaluate.yaml`

Lines changed: 11 additions & 5 deletions

Original file line number	Diff line number	Diff line change
`@@ -170,20 +170,26 @@ jobs:`
`170`	`170`	`- name: Summarize results`
`171`	`171`	`if: ${{ success() }}`
`172`	`172`	`run: \|`
`173`		`- echo "📊 Evaluation Results" >> $GITHUB_STEP_SUMMARY`
`174`		`- python -m evaltools summary evals/results --output=markdown >> eval-results.md`
`175`		`- cat eval-results.md >> $GITHUB_STEP_SUMMARY`
	`173`	`+ echo "## Evaluation results" >> eval-summary.md`
	`174`	`+ python -m evaltools summary evals/results --output=markdown >> eval-summary.md`
	`175`	`+ echo "## Answer differences across runs" >> run-diff.md`
	`176`	`+ python -m evaltools diff evals/results/baseline evals/results/pr${{ github.event.issue.number }} --output=markdown >> run-diff.md`
	`177`	`+ cat eval-summary.md >> $GITHUB_STEP_SUMMARY`
	`178`	`+ cat run-diff.md >> $GITHUB_STEP_SUMMARY`
`176`	`179`
`177`	`180`	`- name: Comment on pull request`
`178`	`181`	`uses: actions/github-script@v7`
`179`	`182`	`with:`
`180`	`183`	`script: \|`
`181`	`184`	`const fs = require('fs');`
`182`		`- const summaryPath = "eval-results.md";`
	`185`	`+ const summaryPath = "eval-summary.md";`
`183`	`186`	`const summary = fs.readFileSync(summaryPath, 'utf8');`
	`187`	`+ const runId = process.env.GITHUB_RUN_ID;`
	`188`	`+ const repo = process.env.GITHUB_REPOSITORY;`
	`189`	+ const actionsUrl = `https://github.com/${repo}/actions/runs/${runId}`;
`184`	`190`	`github.rest.issues.createComment({`
`185`	`191`	`issue_number: context.issue.number,`
`186`	`192`	`owner: context.repo.owner,`
`187`	`193`	`repo: context.repo.repo,`
`188`		`- body: summary`
	`194`	+ body: `${summary}\n\n[Check the Actions tab for more details](${actionsUrl}).`
`189`	`195`	`})`

`‎docs/evaluation.md`

Lines changed: 9 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -2,6 +2,13 @@`
`2`	`2`
`3`	`3`	`Follow these steps to evaluate the quality of the answers generated by the RAG flow.`
`4`	`4`
	`5`	`+* [Deploy a GPT-4 model](#deploy-a-gpt-4-model)`
	`6`	`+* [Setup the evaluation environment](#setup-the-evaluation-environment)`
	`7`	`+* [Generate ground truth data](#generate-ground-truth-data)`
	`8`	`+* [Run bulk evaluation](#run-bulk-evaluation)`
	`9`	`+* [Review the evaluation results](#review-the-evaluation-results)`
	`10`	`+* [Run bulk evaluation on a PR](#run-bulk-evaluation-on-a-pr)`
	`11`	`+`
`5`	`12`	`## Deploy a GPT-4 model`
`6`	`13`
`7`	`14`
`@@ -45,7 +52,7 @@ python evals/generate_ground_truth_data.py`
`45`	`52`
`46`	`53`	`Review the generated data after running that script, removing any question/answer pairs that don't seem like realistic user input.`
`47`	`54`
`48`		`-## Evaluate the RAG answer quality`
	`55`	`+## Run bulk evaluation`
`49`	`56`
`50`	`57`	Review the configuration in `evals/eval_config.json` to ensure that everything is correctly setup. You may want to adjust the metrics used. See [the ai-rag-chat-evaluator README](https://github.com/Azure-Samples/ai-rag-chat-evaluator) for more information on the available metrics.
`51`	`58`
`@@ -72,6 +79,6 @@ Compare answers across runs by running the following command:`
`72`	`79`	`python -m evaltools diff evals/results/baseline/`
`73`	`80`	```
`74`	`81`
`75`		`-## Run the evaluation on a PR`
	`82`	`+## Run bulk evaluation on a PR`
`76`	`83`
`77`	`84`	To run the evaluation on the changes in a PR, you can add a `/evaluate` comment to the PR. This will trigger the evaluation workflow to run the evaluation on the PR changes and will post the results to the PR.

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 4d0e801

File tree

2 files changed

2 files changed

`‎.github/workflows/evaluate.yaml`

`‎docs/evaluation.md`

0 commit comments