Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add features based on file paths in the title and description#4270

Open
benjaminmah wants to merge 42 commits into
mozilla:master from
benjaminmah:file-path-features
Open

Add features based on file paths in the title and description #4270
benjaminmah wants to merge 42 commits into
mozilla:master from
benjaminmah:file-path-features

Conversation

@benjaminmah

@benjaminmah benjaminmah commented Jun 20, 2024

Copy link
Copy Markdown
Contributor

Resolves #4269.

Introduces new feature that uses file paths mentioned in the title and description of a bug and splits it into sub-paths and individual directories/files.

Copy link
Copy Markdown
Contributor Author

Metrics of the newly trained model: metrics.log

@benjaminmah benjaminmah marked this pull request as ready for review June 24, 2024 20:35
Comment thread bugbug/bug_features.py Outdated
Comment thread bugbug/bug_features.py Outdated
Comment thread bugbug/bug_features.py Outdated
Comment thread bugbug/bug_features.py Outdated

@suhaibmujahid suhaibmujahid left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see significant improvement when adding this feature?

Copy link
Copy Markdown
Contributor Author

Do you see significant improvement when adding this feature?

I've previously attached the metrics of the model here:

Metrics of the newly trained model: metrics.log

Here are the metrics of the original/current model: metrics_original.log

There is a slight improvement (~ +1%) in each of the metrics.

Comment thread bugbug/bug_features.py Outdated
Comment thread bugbug/bug_features.py Outdated

@marco-c marco-c left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, but could you add a few tests for the new class?

benjaminmah reacted with thumbs up emoji
@benjaminmah benjaminmah marked this pull request as draft July 18, 2024 20:32

benjaminmah commented Jul 18, 2024
edited
Loading

Copy link
Copy Markdown
Contributor Author

I've converted this PR to a draft, as I realized there still needs some polishing to do with the extraction of file paths. For example, there are cases where it may mistake a URL or a step (i.e. 1.Step 1, 2.Step2) as a file path. Once done, I'll be sure to add a few tests for this feature!

Copy link
Copy Markdown
Contributor Author

Current metrics: metrics.log

Seems to perform slightly worse than the current model and python3 -m scripts.bug_classifier component --bug-id 1902245 classifies this bug as Core::Widget: Gtk (which is incorrect).

It is worth noting that the first instance of the file path feature model correctly classified the above bug as Core::Networking, despite it not 100% correctly retrieving the relevant file paths from the bug summary and description. Will continue to look into this.

Comment thread bugbug/bug_features.py Outdated
Comment thread bugbug/bug_features.py Outdated
Comment thread bugbug/bug_features.py Outdated

Copy link
Copy Markdown
Contributor Author

The current model now classifies python3 -m scripts.bug_classifier component --bug-id 1902245 correctly as Core::Networking. The metrics can be found here: metrics.log.

Copy link
Copy Markdown
Contributor Author

Looks good in general, but could you add a few tests for the new class?

Added two tests here: a5a9c0f

Copy link
Copy Markdown
Contributor Author

Seems like the tests failed, I'll do some revisions for these ASAP.

@benjaminmah benjaminmah marked this pull request as ready for review July 29, 2024 14:06

marco-c commented Aug 1, 2024

Copy link
Copy Markdown
Collaborator

What is the difference in average precision / recall? Is there any component which gets much better or much worse?

Copy link
Copy Markdown
Contributor Author

What is the difference in average precision / recall? Is there any component which gets much better or much worse?

Here are the metrics from the model with the FilePaths feature included: new_model.log

Here are the metrics from the currently deployed model (which does not include the FilePaths feature): old_model.log

For the 0.9 CF, the precision increased by 0.02 and recall increased by 0.01.

Overall, there seems to be an increase in most metrics for specific product-component pairs, however feel free to consult the detailed metrics for the few cases where either the precision or recall dropped with the new model.

Comment thread bugbug/bug_features.py Outdated
Comment thread bugbug/bug_features.py Outdated
Comment thread bugbug/bug_features.py

@marco-c marco-c left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given your latest changes, was there any effect on the metrics?

Comment thread bugbug/bug_features.py Outdated
Comment thread bugbug/bug_features.py Outdated
Comment thread tests/test_bug_features.py Outdated

Copy link
Copy Markdown
Contributor Author

Given your latest changes, was there any effect on the metrics?

Training the model with the file path feature included and excluded, I got the following results:

ct Feature Inclusion pre rec spe f1 geo iba sup
Training Set With File Path 0.95 0.95 1.00 0.95 0.97 0.95 73665
Without File Path 0.95 0.95 1.00 0.95 0.98 0.95 73656
No CT With File Path 0.64 0.63 0.99 0.62 0.78 0.60 8185
Without File Path 0.63 0.62 0.99 0.61 0.77 0.59 8184
60% CT With File Path 0.46 0.33 1.00 0.38 0.44 0.32 8185
Without File Path 0.44 0.32 1.00 0.36 0.42 0.31 8184
70% CT With File Path 0.47 0.32 1.00 0.37 0.42 0.30 8185
Without File Path 0.45 0.30 1.00 0.36 0.41 0.29 8184
80% CT With File Path 0.49 0.29 1.00 0.36 0.41 0.28 8185
Without File Path 0.47 0.28 1.00 0.34 0.39 0.27 8184
90% CT With File Path 0.50 0.26 1.00 0.33 0.38 0.25 8185
Without File Path 0.48 0.25 1.00 0.32 0.36 0.24 8184

Overall, there seems to be a marginal increase in precision and recall when the file path feature is included.

@suhaibmujahid suhaibmujahid left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjaminmah could you please resolve conflicts, do a self-review and check the impact on performance?

benjaminmah reacted with thumbs up emoji
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@suhaibmujahid suhaibmujahid suhaibmujahid left review comments

@marco-c marco-c Awaiting requested review from marco-c

Requested changes must be addressed to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

[model:component] Add features based on file paths in the title and description

AltStyle によって変換されたページ (->オリジナル) /