Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix: fix expectations for export accuracy tests #466

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
nirinchev merged 1 commit into main from ni/export-accuracy
Aug 21, 2025
Merged

Conversation

Copy link
Collaborator

@nirinchev nirinchev commented Aug 20, 2025

Proposed changes

Looks like we had some tests failing because the model was providing the export title but we weren't expecting it.

Checklist

@Copilot Copilot AI review requested due to automatic review settings August 20, 2025 22:22
@nirinchev nirinchev requested a review from a team as a code owner August 20, 2025 22:22
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes failing export accuracy tests by updating test expectations to include the exportTitle parameter that the model is now providing. The tests were failing because the model was generating an export title, but the test expectations weren't accounting for this additional parameter.

Key changes:

  • Added exportTitle: Matcher.string() to all export tool test expectations

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

📊 Accuracy Test Results

📈 Summary

Metric Value
Commit SHA 1c4165d16422a606bec361b87d4c4b77d485f750
Run ID e5a0c705-e69c-48a4-adc8-f7a6f6e9ce89
Status done
Total Prompts Evaluated 59
Models Tested 1
Average Accuracy 97.0%
Responses with 0% Accuracy 1
Responses with 75% Accuracy 3
Responses with 100% Accuracy 55

📊 Baseline Comparison

Metric Value
Baseline Commit c4ba2c912f43e1b447b96bc9f9f80fa9a154de5f
Baseline Run ID a69e95c1-915c-4d46-ae29-2722e1f5adf5
Baseline Run Status done
Responses Improved 2
Responses Regressed 0

📎 Download Full HTML Report - Look for the accuracy-test-summary artifact for detailed results.

Report generated on: 8/20/2025, 10:26:35 PM

@nirinchev nirinchev requested review from himanshusinghs and removed request for a team August 21, 2025 07:22
Copy link
Collaborator

@himanshusinghs himanshusinghs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, I ran them locally and it seemed fine 🤔

@nirinchev nirinchev merged commit 2539d2e into main Aug 21, 2025
18 of 20 checks passed
@nirinchev nirinchev deleted the ni/export-accuracy branch August 21, 2025 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Reviewers

Copilot code review Copilot Copilot left review comments

@himanshusinghs himanshusinghs himanshusinghs approved these changes

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /