-
Notifications
You must be signed in to change notification settings - Fork 335
Add features based on file paths in the title and description#4270
Add features based on file paths in the title and description #4270benjaminmah wants to merge 42 commits into
Conversation
benjaminmah
commented
Jun 20, 2024
Metrics of the newly trained model: metrics.log
@suhaibmujahid
suhaibmujahid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see significant improvement when adding this feature?
benjaminmah
commented
Jul 2, 2024
Do you see significant improvement when adding this feature?
I've previously attached the metrics of the model here:
Metrics of the newly trained model: metrics.log
Here are the metrics of the original/current model: metrics_original.log
There is a slight improvement (~ +1%) in each of the metrics.
@marco-c
marco-c
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general, but could you add a few tests for the new class?
I've converted this PR to a draft, as I realized there still needs some polishing to do with the extraction of file paths. For example, there are cases where it may mistake a URL or a step (i.e. 1.Step 1, 2.Step2) as a file path. Once done, I'll be sure to add a few tests for this feature!
benjaminmah
commented
Jul 19, 2024
Current metrics: metrics.log
Seems to perform slightly worse than the current model and python3 -m scripts.bug_classifier component --bug-id 1902245 classifies this bug as Core::Widget: Gtk (which is incorrect).
It is worth noting that the first instance of the file path feature model correctly classified the above bug as Core::Networking, despite it not 100% correctly retrieving the relevant file paths from the bug summary and description. Will continue to look into this.
benjaminmah
commented
Jul 22, 2024
The current model now classifies python3 -m scripts.bug_classifier component --bug-id 1902245 correctly as Core::Networking. The metrics can be found here: metrics.log.
benjaminmah
commented
Jul 24, 2024
Looks good in general, but could you add a few tests for the new class?
Added two tests here: a5a9c0f
benjaminmah
commented
Jul 29, 2024
Seems like the tests failed, I'll do some revisions for these ASAP.
marco-c
commented
Aug 1, 2024
What is the difference in average precision / recall? Is there any component which gets much better or much worse?
benjaminmah
commented
Aug 2, 2024
What is the difference in average precision / recall? Is there any component which gets much better or much worse?
Here are the metrics from the model with the FilePaths feature included: new_model.log
Here are the metrics from the currently deployed model (which does not include the FilePaths feature): old_model.log
For the 0.9 CF, the precision increased by 0.02 and recall increased by 0.01.
Overall, there seems to be an increase in most metrics for specific product-component pairs, however feel free to consult the detailed metrics for the few cases where either the precision or recall dropped with the new model.
@marco-c
marco-c
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given your latest changes, was there any effect on the metrics?
benjaminmah
commented
Oct 18, 2024
Training the model with the file path feature included and excluded, I got the following results:
Overall, there seems to be a marginal increase in precision and recall when the file path feature is included. |
7eaab61 to
2022eb4
Compare
@suhaibmujahid
suhaibmujahid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benjaminmah could you please resolve conflicts, do a self-review and check the impact on performance?
Resolves #4269.
Introduces new feature that uses file paths mentioned in the title and description of a bug and splits it into sub-paths and individual directories/files.