Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ricky-ma/ChatbotAnalytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

30 Commits

Repository files navigation

Chatbot Evaluation and Maintenance

Exploratory data analysis and interactive model-understanding and evaluation tool for chatbot training data and feedback.

The tool is built to answer questions such as:

  • What examples does my model perform poorly on?
    • In terms of classification?
    • In terms of user feedback?
  • Can user feedback be attributed to adversarial behaviour?
  • Is there mislabeled text in the training set?

Features

The tool runs through a browser-based dashboard. Standard features include:

  • A visualization of the low-dimensional representation of the embeddings for both user queries as well as any stored data.
  • Aggregate analysis on chatbot metrics:
    • language
    • feedback type (upvote, downvote, something else, none)
    • chatbot FAQ ID
    • confidence of top intent
    • outlier scores for training data
    • novelty scores for feedback data
  • Future metrics to be added include:
    • ranking of delivered content from bot API
    • visible ranking when presented to users
    • timestamp
    • website
    • IP address
    • session ID
    • attach policies (set of rules governing a chatbot)
    • user annotated FAQ ID
    • distance of query from FAQ ID

Feedback Workflows

Workflow templates provide standard solutions for chatbot performance evaluation and maintenance.

  • Confirmation that upvote user feedback agrees with chatbot predictions.
    • If user labels agree with chatbot labels, then feedback is likely genuine and can be quickly added to training data.
    • If user labels are in many different categories, then the classifier performed poorly, but user says the result is correct. The examples should be checked before being added to training data.
    • If user labels are in another category, then feedback can be attributed to adversarial behaviour.
  • Confirmation that downvote user feedback agrees with chatbot predictions.
    • If user labels agree with chatbot labels, then feedback can be attributed to adversarial behaviour.
    • If user labels are in many different categories, then the classifier performed poorly and the user confirms this. The examples should be checked and reviewed.
    • If user labels are in another category, then feedback is likely genuine, but needs confirmation. The examples should be checked before being added to training data.
  • Confirmation that "something else" button presses agree with chatbot predictions.
    • When a user chooses something else, confidences of previously returned chatbot intents should be low.
    • Examples should be checked before deciding whether to add data to an existing category or a new category.

About

Exploratory data analysis and interactive model-understanding and evaluation tool for chatbot training data and feedback

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /