[Univ of Cambridge]
[Dept of Engineering]
HTK Rich Audio Transcription
The HTK Rich Audio Transcription project is funded by the DARPA
Effective, Affordable Reusable Speech-to-text (EARS) programme for 5
years which started in May 2002. The aim of the project is to very
significantly advance the state-of-the-art while tackling the hardest
speech recognition challenges including the transcription of broadcast
news and telephone conversations. A wide range of research areas will
be pursued aimed at both improving the word error rate of conventional
speech recognition systems and developing an enriched output format
with additional acoustic and linguistic metadata.
Research is split into three broad tasks:
- Task 1. Core Algorithm Development
- To improve and develop new, generally applicable, techniques for
speech recognition. This will build on
previous work on
Speech-To-Text transcription at CUED
- Task 2: Metadata Generation
- To generate enriched transcriptions which
contain the identity of the speaker, acoustic environment, channel
conditions and some linguistic mark-up, such as the location
of sentence-like boundaries or disfluent speech.
- Task 3: Public HTK Development
- To develop and enhance the core HTK software toolkit
available via the HTK Website.
Personnel working on this project are:
- Staff:
- Prof Phil Woodland (pcw@eng.cam.ac.uk)
[ Principal Investigator ]
- Dr. Mark Gales
(mjfg@eng.cam.ac.uk)
[ University Lecturer ]
- RAs:
- Gunnar Evermann
(ge204@eng.cam.ac.uk)
[ LVCSR search and HTK development and maintenance]
- Dr. Bin Jia
(bj214@eng.cam.ac.uk)
[ Conversational Telephone Speech System, acoustic modelling[English/Chinese]/language modelling[Chinese]]
- Dr. Do Yeong Kim
(dyk21@eng.cam.ac.uk)
[ Broadcast News Systems, acoustic modelling and adaptation]
- Antti-Veikko Rosti
(avir2@eng.cam.ac.uk)
[ CTS segmentation ]
- Dr. Marcus Tomalin
(mt126@eng.cam.ac.uk)
[ Metadata, Slash-unit detection ]
- Sue Tranter (formerly Johnson)
(sej28@eng.cam.ac.uk)
[ Metadata, segmentation, speaker diarisation ]
- PhD Students:
- H.Y. (Ricky) Chan
(hyc27@eng.cam.ac.uk)
[ Lightly supervised acoustic modeling for LVCSR ]
- Xunying Liu (Andrew)
(xl207@eng.cam.ac.uk)
[ model complexity control and subspace projection schemes ]
- David Mrva
(dm312@eng.cam.ac.uk)
[ Language Modelling]
- Khe Chai Sim
(kcs23@eng.cam.ac.uk)
[ Extended Maximum Likelihood Linear Transform (E-MLLT)]
- Lan Wang
(lw256@eng.cam.ac.uk)
[ discriminative adaptive training]
- Kai Yu
(ky219@eng.cam.ac.uk)
[ acoustic/speaker factorization and segmentation ]
- Former Members of CUED EARS team:
- Dr. Thomas Hain
(th223@eng.cam.ac.uk)
[ University Lecturer ] (now at Sheffield University)
- Dan Povey
(dp10006@eng.cam.ac.uk)
[ Discriminative training ] (now at IBM)
- dpovey@us.ibm.com
- Dr. Srinivasan Umesh
(su216@eng.cam.ac.uk)
[ VTLN/Acoustic Modelling ] (returned to IIT)
- Kit Thambiratnam
(ajkt2@eng.cam.ac.uk)
[ Broadcast News segmentation and clustering ] (returned to QUT)