Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

abbreviations for languages with upper case #16

Open
@neurlang

Description

abbreviations.tsv are currently not implemented. Make or borrow an open source dataset (for various languages) which ideally looks like this:

GPS tab Global Positioning System tab ["technology"]
USA tab United States of America tab ["geography"]

The full abbreviations are needed so that dataset admin know what abbreviation it is. Without it, dataset admin will have a hard job to delete / correct abbreviations. Tags will be optional.

Training phase:

  1. Only the first column will be used.
  2. Generate a bigram/trigram comparing abbreviations with normal language's words.

Inference phase:

For every non-dictionary word:

  1. Check if word is short and have at least 2 uppercase letters. If no its a word.
  2. Check it using bigram/trigram.
  3. Spell it out if it thinks that it is an abbreviation.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /