Given an input of a series of Korean graphemes/letters (i.e. Hangul), KoG2P outputs the corresponding pronunciations.
한국어의 문자열로부터 발음열을 생성하는 파이썬 기반 G2P 패키지입니다.
터미널에서 원하는 문자열을 함께 입력해 사용할 수 있습니다.
On terminal, you simply can type in your input within quotations:
$ python g2p.py '박물관'
Then you'll get /방물관/ symbolized as follows:
p0 aa ng mm uu ll k0 wa nf
NB. Your input does not necessarily need to be a lemma or a legitimate sequence of Korean; the system will provide an output based on the phonological rules of Korean for any sequences in Hangul.
- Python 2.7 or 3.x
Please check out the symbol table below for the mapping.
| C/V | Position | Symbols in Hangul | Symbols in KoG2P |
|---|---|---|---|
| consonant | onset | ᄇ | p0 |
| consonant | onset | ᄑ | ph |
| consonant | onset | ᄈ | pp |
| consonant | onset | ᄃ | t0 |
| consonant | onset | ᄐ | th |
| consonant | onset | ᄄ | tt |
| consonant | onset | ᄀ | k0 |
| consonant | onset | ᄏ | kh |
| consonant | onset | ᄁ | kk |
| consonant | onset | ᄉ | s0 |
| consonant | onset | ᄊ | ss |
| consonant | onset | ᄒ | h0 |
| consonant | onset | ᄌ | c0 |
| consonant | onset | ᄎ | ch |
| consonant | onset | ᄍ | cc |
| consonant | onset | ᄆ | mm |
| consonant | onset | ᄂ | nn |
| consonant | onset | ᄅ | rr |
| consonant | coda | ᄇ | pf |
| consonant | coda | ᄑ | ph |
| consonant | coda | ᄃ | tf |
| consonant | coda | ᄐ | th |
| consonant | coda | ᄀ | kf |
| consonant | coda | ᄏ | kh |
| consonant | coda | ᄁ | kk |
| consonant | coda | ᄉ | s0 |
| consonant | coda | ᄊ | ss |
| consonant | coda | ᄒ | h0 |
| consonant | coda | ᄌ | c0 |
| consonant | coda | ᄎ | ch |
| consonant | coda | ᄆ | mf |
| consonant | coda | ᄂ | nf |
| consonant | coda | ᄋ | ng |
| consonant | coda | ᄅ | ll |
| consonant | coda | ᄀᄉ | ks |
| consonant | coda | ᄂᄌ | nc |
| consonant | coda | ᄂᄒ | nh |
| consonant | coda | ᄅᄀ | lk |
| consonant | coda | ᄅᄆ | lm |
| consonant | coda | ᄅᄇ | lb |
| consonant | coda | ᄅᄉ | ls |
| consonant | coda | ᄅᄐ | lt |
| consonant | coda | ᄅᄑ | lp |
| consonant | coda | ᄅᄒ | lh |
| consonant | coda | ᄇᄉ | ps |
| vowel | monophthong | ᅵ | ii |
| vowel | monophthong | ᅦ | ee |
| vowel | monophthong | ᅢ | |
| vowel | monophthong | ᅡ | aa |
| vowel | monophthong | ᅳ | xx |
| vowel | monophthong | ᅥ | vv |
| vowel | monophthong | ᅮ | uu |
| vowel | monophthong | ᅩ | oo |
| vowel | diphthong | ᅨ | ye |
| vowel | diphthong | ᅤ | yq |
| vowel | diphthong | ᅣ | ya |
| vowel | diphthong | ᅧ | yv |
| vowel | diphthong | ᅲ | yu |
| vowel | diphthong | ᅭ | yo |
| vowel | diphthong | ᅱ | wi |
| vowel | diphthong | ᅬ | wo |
| vowel | diphthong | ᅫ | wq |
| vowel | diphthong | ᅰ | we |
| vowel | diphthong | ᅪ | wa |
| vowel | diphthong | ᅯ | wv |
| vowel | diphthong | ᅴ | xi |
NB. IPA symbols for Korean phones can be found in the following page: IPA for Korean.
Please cite the following if using this code:
@misc{cho2017kog2p,
title = {Korean Grapheme-to-Phoneme Analyzer (KoG2P)},
author = {Yejin Cho},
year = {2017},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/scarletcho/KoG2P}}
}
-
Yoon Seok Hong, Kyung Seo Ki, and Gahgene Gweon. 2018. Automatic Miscue Detection Using RNN Based Models with Data Augmentation. In Proc. Interspeech 2018. 1646-1650. [pdf]
-
Younggun Lee and Taesu Kim. 2018. Learning pronunciation from a foreign language in speech synthesis network. arXiv preprint. arXiv:1811.09364. [pdf]