LOGIOS
Lexicon Generation Tool
LexTool
An example
If your input file looks something like this left-hand column:
Your output file will look something like this right-hand column:
Hello
HELLO HH EH L OW
HELLO(1) HH AH L OW
world
compound_word
hyphen-ated
ONE23
2008
boom!
kweezlebotter
WORLD W ER L D
COMPOUND_WORD K AA M P AW N D W ER D
HYPHEN-ATED HH AY F AH N EY T IH D
ONE23 OW EH N IY T UW TH R IY
2008 T UW Z IY R OW Z IY R OW EY T
BOOM! B UW M
KWEEZLEBOTTER K W IY Z L AH B AA T AH R
Please note the following:
- Some words may have multiple pronunciations; these will appear on
separate line and will be differentiated by an instance id such as
"(1)". The current implementation of the Sphinx decoder expects each
dictionary entry to be unique. Note however that this tool does not
check for uniqueness, so if you include multiple instances of an input
word it will appear multiple times. As a rule you want to sort your
input files before you submit them.
- Words with internal separators such as "_" and "-" will be
rendered as a single word; the internal characters will be kept as part
of the orthographic element.
- Alpha-numeric items, as well as numbers, will be rendered
character-by-character. This is because such items are ambiguous and
can be rendered several ways (e.g., "one two three", "one
twenty-three", etc.) It is you responsibility to determine how such
items will be spoken. Typically this will vary by domain.
- Punctuation marks will be ignored
- Words that do not exist in the tool's dictionary will be
generated according to letter-to-sound rules. There is no guarantee
that such a pronunciation will be correct. You are advised to check these before use.
-
If you choose to manually alter pronunciations, be sure that you follow the formatting; and be sure that the phones are part of the legal set.