Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Support hotword boosting feature and lexicon based decoding#222

Open
Sushmitha-Deva wants to merge 7 commits into
parlance:master from
Sushmitha-Deva:hotword_boosting
Open

Support hotword boosting feature and lexicon based decoding #222
Sushmitha-Deva wants to merge 7 commits into
parlance:master from
Sushmitha-Deva:hotword_boosting

Conversation

@Sushmitha-Deva

@Sushmitha-Deva Sushmitha-Deva commented Nov 16, 2023
edited
Loading

Copy link
Copy Markdown

This pull request includes modifications that enable hotword boosting, lexicon-based decoding, and the build configuration for the ctcdecode library.

Hotword boosting:

  • This feature is supported for both character and wpe based (allowed characters a-z and an apostrophe) ASR labels.
  • Scoring logic for hotwords is inspired from the pyctcdecode package, where partial weight will be added to the path score containing the hotword tokens, and in case a complete hotword is not formed, it will reset the score to the original.
  • Tests for using this feature is added in this PR

Lexicon based decoding:

  • Lexicon based decoding ensures to penalize the path that's going to form an invalid word, thereby giving priority to paths containing valid spellings
  • During decoding, it compares the beam path with the lexicon FST and then applies penality for unknown path formation.
  • A constant negative value unk_score will be used as a penalty.
  • Results indicate a 90% reduction in spelling mistakes after applying this decoding. Increase in unk_score, will lead to a 100% reduction.
  • Create the lexicon FST using the build_fst tool . See tools/README.md for usage

LuluW8071 reacted with thumbs up emoji
@Sushmitha-Deva Sushmitha-Deva changed the title (削除) Support hotword boosting feature (削除ここまで) (追記) Support hotword boosting feature and lexicon based decoding (追記ここまで) Nov 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

AltStyle によって変換されたページ (->オリジナル) /