IEEE
SPEECH TECHNICAL COMMITTEE NEWSLETTER
April 19, 2004
INTRODUCTION:
Welcome to the ninth IEEE Signal Processing Society Speech Technical Committee (STC)
newsletter. Contributions
of events, publications, workshops, and career information to the
newsletter are welcome (rose@ece.mcgill.ca).
STC NEWS:
STC ICASSP
2004 Paper Review Process (Michael Picheny and Rick Rose)
ICASSP 2004 Technical Program
Preparation (Peter Kabal and Li
Deng)
NEW
WORKSHOP ANNOUNCEMENTS:
Fourth
International Symposium on Chinese Spoken Language Processing
(Pascal Fung)
2004 HLT/NAACL
Workshop on Spoken Language
Understanding for Conversational Systems
NIST Rich
Transcription 2004 Meeting Recognition Workshop (John Garofolo)
COST278 Workshop on
Robustness Issues in
Conversational Interaction (Borge
Lindberg)
CAREERS:
Ron
Schafer Retires After Four Decades of Contributions in DSP
(Larry Rabiner and Rick Rose)
Post
Doctoral and PhD Positions at Eurecom in France
(Chris
Wellekens)
LINKS TO WORKSHOPS AND CONFERENCES:
Links to conferences and
workshops organized by date (Rick Rose)
New
STC Paper Review Process
for ICASSP 2004
This year, the Speech Technical Committee implemented the new ICASSP
paper review process for papers submitted for ICASSP2004. The STC
received 524 paper submissions in the speech area for
ICASSP 2004, essentially the same as last year. The review
process was carried out by the 46 STC members along
with a set of over 100 volunteer associate reviewers. The
associate reviewers were recruited by STC members as experts in the
various technology areas covered by the STC. The 46 STC
members were broken into teams of two people each to cover each
technology area. A set of 4-5 associate reviewers
(depending on the number of papers per team), were also assigned to
technology areas. Each paper was reviewed by a set of three
reviewers, as opposed to two reviewers in previous years. Each
team
performed their reviews and produced a ranked list of papers with a
single overall score per paper.
The process followed by each team can be briefly defined as
follows. Each of the two STC team members read all the papers
assigned to the team,
but each member formally reviewed only half of the papers.
In
parallel, the team divided the papers amongst the associate reviewers
to ensure each paper receives a total of three reviews. After
all the
reviews for a paper were collected, it was ranked and scored by the two
STC members. The set of ranked and scored papers was combined
into a single list. This year there was an effort to
include three separate sets of
comments from each reviewer and return them to the authors.
The goal here was to provide constructive feedback, especially for
those authors whose papers were rejected. In previous years, the
comments from the two STC reviewers were returned to the
authors. The breakdown of submissions is given in the
following table:
Area
Submissions
Speech
Production/Synthesis
46
Speech
Analysis / Feature Extraction
86
Speech
Coding
46
Speech
Enhancement
45
Acoustic
Modeling for ASR
61
Robust
ASR
100
Confidence/Lexical/Language/LVCSR
44
Adaptation
25
Spoken
Language Systems
27
Speaker
Rec / Language ID
47
back to top
Summary of ICASSP 2004 Submissions from
the
Conference
Technical Program Chairs
The following summary of the overall
ICASSP2004 technical program preparation was provided by the Technical
Program Co-Chairs, Li Deng and Peter Kabal. The
submitted
papers were routed to the appropriate technical committee for
review. The TCs have worked very hard,
with the help of external reviewers, to ensure that each paper is
thoroughly and fairly reviewed. The review process is
a monumental task.
The
high degree of professionalism of the TCs is one major factor for the
success
of our ICASSP. Much credit of our technical program goes to the hard
work
of all TC members and reviewers (including associate reviewers in
some
TCs), under the leadership of the following TC Chairs: Michael Picheny,
Mats
Viberg, Michael Brandstein, Thrasyvoulos Pappas, John Sorensen,
Alle-Jan van
der Veen, Alex Gershman, Magdy Bayoumi, Tulay Adali, Thad Welch, and
Jennifer Trelewicz. We, as Technical Chairs of the conference,
worked
closely with the TC chairs, with much of the work overlapping the
holiday season. Our work has also been made much easier by closely working
with
Conference Management Services, Lance Cotton and Billene Mercer
in
particular. We want to express our special thanks to all of the people
above,
to all the contributing authors, and to the special session chairs
who
organized those sessions. In this year's ICASSP Technical
Program, we have organized all the accepted papers into 11
technical tracks, comprising 55 lecture and 88 poster sessions.
Among
the 1324 accepted regular papers, many, in fact most, will be poster
presentations. The choice of oral or poster was made by the TCs based entirely
on
subject grouping.
The breakdown of submissions by
technical committee is given in the following table:
Technical Committee
Submissions
Speech
Processing
542
Signal
Processing Theory and Methods
362
Signal
Processing for Communications
368
Image &
Multidimensional Signal Processing
357
Sensor Array
& Multi-channel Signal Processing
184
Audio &
Electroacoustics
153
Industry
Technology Track
105
Design &
Implementation of SP Systems
110
Multimedia
Signal Processing
81
Machine
Learning for Signal Processing
187
Signal
Processing Education
11
Special
Sessions
84
[top
of page]
4th International Symposium on
Chinese
Spoken Language Processing
December
15-18, 2004
Hong Kong
http://www.se.cuhk.edu.hk/~iscslp/index.html
Preliminary Call for Papers
The 4th International Symposium on Chinese Spoken Language Processing
(ISCSLP'04) will be held during December 16-18, 2004 in Hong Kong.
ISCSLP is a conference for scientists, researchers, and practitioners
to report and discuss the latest progress in all the scientific and
technological aspects of the Chinese spoken language processing. The
series of conferences have been held biennially in different Asia
Pacific cities: 1998 in Singapore, 2000 in Beijing, and 2002 in Taipei.
ISCSLP has become the world's largest and most comprehensive technical
conference focused on Chinese spoken language processing and its
applications. The ISCSLP'04 will feature world-class plenary speakers,
tutorials, and a number of lecture and poster sessions on the following
topics:
* Speech Production and Perception
* Phonetics and Phonology
* Speech Analysis
* Speech Coding
* Speech Enhancement
* Speech Recognition
* Speech Synthesis
* Language Modeling and Spoken Language Understanding
* Spoken Dialog Systems
* Spoken Language Translation
* Speaker and Language Recognition
* Indexing, Retrieval and Authoring of Speech Signals
* Multi-Model Interface including Spoken Language
Processing
* Spoken Language Resources and Technology Evaluation
* Applications of Spoken Language Processing
Technology
* Others
Hong Kong, better known as the Pearl of the Orient, is a place where
East meets West. Shopping, dining, sightseeing, as well as world-class
events and attractions are all conveniently available within a short
distance. As the "City of Life" in Asia with multi-culture heritage and
kaleidoscopic living style, Hong Kong buzzes with unique tourist
attractions that are beyond compare in the region. You are cordially
invited to attend ISCSLP'04 and to experience the fascination of Hong
Kong that is unmatched anywhere in the world.
The working language of ISCSLP is English. Prospective authors are
invited to submit full-length, four-page papers for presentation in any
of the areas listed above. All ISCSLP'04 papers will be handled and
reviewed electronically and details can be found in the conference
web-site http://www.iscslp2004.org. Please note that following
important dates and plan your schedule well in advance.
Schedule of Important Dates:
Four page full paper submission to be received
by July 23, 2004
Notification of acceptance mailed out
by September 20, 2004
Camera ready papers to be received
by October 8, 2004
Early registration November 12,2004
back
to top
Call for participation
HLT-NAACL
2004 Workshop on
Spoken
Language Understanding for Conversational Systems and
Higher Level
Linguistic Information for Speech Processing
Friday,
May 7, 2004
Park
Plaza Hotel, Boston, USA
http://www.research.att.com/~dtur/NAACL04-Workshop/
http://www.speech.sri.com/hlt-workshop/
The success of a conversational system depends on a synergistic
integration of technologies such as speech recognition, spoken language
understanding (SLU), dialog modeling, natural language generation,
speech synthesis and user interface design. In this workshop, we
address the issue of improving the robustness of the speech recognition
and SLU components by exploiting higher level linguistic knowledge,
meta-information and machine learning techniques.
The first part of the workshop will focus on robust SLU in
conversational systems, which has received much attention during the
DARPA funded ATIS program of the 1990s and more recently the DARPA
Communicator program. In parallel to that research, a number of
real-world conversational systems have been deployed to date. However,
the techniques for robust SLU have branched out in many different
directions. They have been influenced by many recent areas such as
information extraction, question answering and machine learning. Data
driven approaches to understanding are rapidly gaining
prominence. There has been a substantial increase in interest in
information extraction from the NLP community, question-answering in
the information retrieval community, and spoken dialog systems in the
speech processing community. Spoken language understanding is an
especially attractive topic for cross-fertilization of ideas between
speech, IR, and NLP communities.
Going beyond SLU and dialog systems, the second part of the workshop
will address use of high-level knowledge for improved speech
recognition accuracy. The challenging robustness issues in speech
recognition such as compensation for acoustic confusability resulting
from noisy environments and unexpected channel and speaker mismatch can
potentially be aided by the use of linguistic information such as
prosody, syntax, semantics, and pragmatics and even high-level
meta-information, such as personal information stored in a database or
dialogue and pragmatic coherence constraints. However, current
state-of-the-art speech recognizers do not explicitly use such
information and rely mainly on information encoded in statistical
N-gram language models. The papers here show the potential of
high-level information to not only improve word accuracy but also to
help disambiguate the recognized words, thus benefitting downstream
processing and SLU in particular.
----------------------------------------------------------------
Invited Talks:
Renato De Mori, Univ Avignon, France
Sentence Interpretation using Stochastic Finite State Transducers
Roberto Pieraccini, IBM TJ Watson Research Center, USA
Spoken Language Understanding: The Research/Industry Chasm
----------------------------------------------------------------
Program:
8:45-9:00 Welcome
9:00-9:50 Invited Talk: Sentence Interpretation using Stochastic
Finite State Transducers, Renato De Mori
9:50-10:00 Break
10:00-10:30 Hybrid Statistical and Structural Semantic Modeling for
Thai Multi-Stage Spoken Language Understanding, Chai Wutiwiwatchai and
Sadaoki Furui
10:30-11:00 Interactive Machine Learning Techniques for Improving SLU
Models, Lee Begeja, Bernard Renger, David Gibbon, Zhu Liu and Behzad
Shahraray
11:00-11:30 Virtual Modality: a Framework for Testing and Building
Multimodal Applications, Peter Pal Boda and Edward Filisko
11:30-12:00 Automatic Call Routing with Multiple Language Models,
Qiang Huang and Stephen Cox
12:00-1:00 Lunch
1:00-1:30 Error Detection and Recovery in
Spoken Dialogue Systems,
Edward Filisko and Stephanie Seneff
1:30-2:00 Robustness Issues in a Data-Driven Spoken Language
Understanding System, Yulan He and Steve Young
2:00-2:50 Invited Talk: Spoken Language Understanding: the
Research/Industry Chasm, Roberto Pieraccini
2:50-3:00 Break
3:00-3:30 Using Higher-level Linguistic Knowledge for Speech
Recognition Error Correction in a Spoken Q/A Dialog, Minwoo Jeong,
Byeongchang Kim and Gary Geunbae Lee
3:30-4:00 Speech Recognition Models of the Interdependence Among
Syntax, Prosody, and Segmental Acoustics, Mark Hasegawa-Johnson,
Jennifer Cole, Chilin Shih, Ken Chen, Aaron Cohen, Sandra Chavarria,
Heejin Kim, Taejin Yoon, Sarah Borys and Jeung-Yoon Choi
4:00-4:30 Modeling Prosodic Consistency for Automatic Speech
Recognition: Preliminary Investigations, Ernest Pusateri and James
Glass
4:30-5:00 Assigning Domains to Speech Recognition Hypotheses, Klaus
R�ggenmann and Iryna Gurevych
5:00-5:30 Context Sensing using Speech and Common Sense, Nathan Eagle
and Push Singh
----------------------------------------------------------------
Co-chairs:
Srinivas Bangalore, AT&T Labs - Research
Dilek Hakkani-T�r, AT&T Labs - Research
Gokhan Tur, AT&T Labs - Research
Yuqing Gao, IBM TJ Watson Research Center
Hong-Kwang Jeff Kuo, IBM TJ Watson Research Center
Andreas Stolcke, SRI & ICSI
----------------------------------------------------------------
Program Committee:
Frederic Bechet, Univ. of Avignon, France
Jerome Bellegarda, Apple Computer, USA
Jennifer Chu-Carroll, IBM TJ Watson Research Center, USA
Ciprian Chelba, Microsoft, USA
Stephen Cox, Univ. of East Anglia, UK
Sadaoki Furui, Tokyo Institute of Technology, Japan
Allen Gorin, AT&T Labs - Research, USA
Roberto Gretter, ITC-IRST, Italy
Julia Hirschberg, Columbia University, USA
Dan Jurafsky, University of Colorado, USA
Sanjeev Khudanpur, Johns Hopkins University, USA
Helen Meng, CUHK, Hong Kong
Prem Natarajan, BBN, USA
Hermann Ney, RWTH Aachen, Germany
Martha Palmer, University of Pennsylvania, USA
Barbara Peskin, ICSI, USA
Roberto Pieraccini, IBM TJ Watson Research Center, USA
Manny Rayner, NASA, USA
Brian Roark, AT&T Labs - Research, USA
Roni Rosenfeld, Carnegie Mellon University, USA
Stephanie Seneff, MIT, USA
Elizabeth Shriberg, SRI, USA
Amanda Stent, Stony Brook Univ., USA
back to top
CALL FOR
PAPERS
Robust 2004:
COST278 Workshop on
Robustness
Issues in Conversational
Interaction
August
30 and 31, 2004
University
of East Anglia, Norwich, UK
http://www.cmp.uea.ac.uk/robust04/
A workshop on robustness issues for conversational interaction,
organized by COST (European Cooperation in the field of Scientific and
Technical Research) action 278, "Spoken Language Interaction in
Telecommunication", will be held on August 30th and 31st, 2004 at the
University of East Anglia, Norwich, UK.
The objective of this two day workshop is to bring together
researchers from both universities and industry to consider different
methods of achieving robustness in conversational interaction.
The workshop is aimed at robustness against all effects which are
known to degrade the performance of each individual component of
a conversational interaction system.
Different approaches for compensating againt these effects will form
the main theme of the the workshop. A broad list of topics
includes (not limited to):
- Robustness against environmental noise
- Robustness against unreliable transmission channels
- Robust conversational system design
- Inclusion of non-speech modalities to improve robustness
In addition to regular technical sessions, the workshop will include
invited plenary talks on topics of related general interest. The
workshop will be divided into four sessions during the two days and
will conclude with a panel discussion.
Submission and further details
Prospective authors are invited to submit four-page papers describing
original work in any of the areas relevant to the workshop.
Email enquiries can be sent to robust04@cmp.uea.ac.uk
Participation to the workshop will be restricted to around 50 people.
Important dates
Submission deadline: June 18th 2004
Notification of acceptance: July 9th 2004
Workshop: August 30th and 31st 2004
[top of page]
Rich
Transcription 2004 Meeting Recognition Workshop
ICASSP
2004 in Montreal
May
17, 2004
NIST is conducting a community-wide evaluation
of speech-based meeting recognition technologies in March
and a 1-day workshop, "Rich Transcription 2004 Meeting
Recognition Workshop", on May 17 at ICASSP 2004 in
Montreal.
While a portion of the workshop will be devoted to discussion of the
results of the evaluation, the goal of the workshop is to provide an
overview of the state-of-the-art in meeting recognition technologies
and discuss plans for future work and collaborations.
Huge efforts are being expended in mining information in newswire,
news broadcasts, and conversational speech and in developing interfaces
to metadata extracted in these domains. However, until recently,
relatively little has been done to address such applications in the
more challenging and equally important meeting domain.
The development of smart meeting room core technologies that can
automatically recognize and extract important information from
multi-media sensor inputs will provide an invaluable resource for a
variety of business, academic, and governmental applications. Such
metadata will provide the basis for the development of second-tier
meeting applications that can automatically process, categorize, and
index meetings. Third-tier applications will provide a context-aware
collaborative interface between live meeting participants, remote
participants, meeting archives and vast online resources.
The meeting domain has several important properties not found in
other domains and which are not currently being focused on in other
research programs: multiple forums and vocabularies,
highly-interactive/simultaneous speech, multiple distant microphones,
multiple camera views, and multi-media/multi-modal information
integration.
The Rich Transcription 2004 Spring Meeting Recognition Workshop at ICASSP 2004 on May 17 in Montreal will bring
together the community of researchers working in this new and
challenging domain to discuss the challenges, the current
state-of-the-art, and future plans and collaborations. Discussions will
include the results of the March 2004 Rich Transcription Meeting
Recognition Evaluation including both Speech-to-Text Transcription and
Speaker Segmentation technologies, related research work in the meeting
domain, related governmental programs, and future collaborations.
Workshop Participation
While RT-04 Spring
Recognition Evaluation participants will have automatic slots in
the workshop,
researchers working in related areas (speech technologies, vision
technologies,
behavioral sciences, etc.) in the meeting domain will also present
their work. Additionally, a certain number of non-presenters will be
permitted to
attend the workshop on an invited basis. Please contact us at
rteval@nist.gov
if you are interested in attending. While a portion of the workshop
will be devoted to
discussion of the results of the evaluation, the goal of the workshop
is to
provide an overview of the state-of-the-art in meeting recognition
technologies and discuss plans for future work and collaborations.
Evaluation
The RT-04 Spring Recognition Evaluation
is part of the NIST Rich Transcription Evaluation series
and will include both speaker segmentation and
speech-to-text transcription tasks in the meeting domain. The test set
will be
approximately 90 minutes in length and will
be comprised of 8˜11-minutes meeting exerpts collected
at CMU, ICSI, the LDC, and NIST.
[top of page]
Colloquium in Honor of Ron Schafer
Georgia Institute
of Technology, Atlanta, Georgia
Friday, October
31, 2003 GCATT Building
Last fall a colloqium in honor of Ron Schafer's retirement was held at
Georgia Tech. The colloqium was hosted by Russ
Mersereau. The morning featured presentations by Ron's PhD thesis
advisor Al Oppenheim, his colleagues Larry Rabiner from Bell Labs and
Tom Barnwell from Georgia Tech, and former student Mark Smith:
- Dr. A. V. Oppenheim, MIT, "DSP and Ron Schafer in the 1960s"
- Dr. L. R. Rabiner, Rutgers University, "DSP and Ron Schafer in
the 1970s"
- Dr. T. P. Barnwell, Georgia Tech, "DSP and Ron Schafer in the
1980s"
- Dr. M. J. T. Smith, Purdue University, "DSP and Ron Schafer in
the 1990s"
The afternoon included a panel discussion on future directions for
digital signal processing that was emceed by Fred Juang:
- Panel: Future of DSP in Research, Development, and Education
- Emcee: Dr. Fred Juang, Georgia Tech
- Panel Members:
- Dr. S. Burrus, Rice University
- Dr. J. Flanagan, Rutgers University
- Dr. G. Frantz, Texas Instruments
- Dr. A. Katsaggelos, Northwestern University
- Dr. F. Kitson, Hewlett Packard
Larry Rabiner contributed photographs of taken at the colloquium:
Ron Schafer retirement
Ron Schafer and Larry Rabiner
The newly retired Ronald W. Schafer
Ron Schafer with Larry Rabiner
STC Newsletter archive photos of R.W. Schafer
and L. R. Rabiner. Actually, the STC Newsletter
has no archive. These were actually scanned from
the IEEE Transactions on Audio and Electroacoustics
R. W. Schafer staff photoL. R. Rabiner AT&T staff photo
[back]
Positions Available in the Speech
Group at Eurecom
Postdoc Position Available at Eurecom:
The
Speech group at Eurecom is looking for a Post-doc student who has
acquired a hands-on
practice of speech processing. He/she must have an excellent
practice
of signal and speech analysis as well as a good knowledge of optimal
classification using Bayesian criteria. He/she must be
open-minded
to original solutions proposed after a rigourous analysis of the
low level phenomena in speech processing. Fluency in English
is mandatory (write, understand, and speak). He/she should be
able
to represent
Eurecom at periodical meetings. Ability to work
in a small team is also required.
The
position is associated with the
European project DIVINES, a STREP/ 6th Frame Program.
The
aim of the project is to analyse the reasons why recognizers are unable
to reach the human recognition rates even in the case of lack
of semantic content. All weaknesses will be analyzed at the level of
feature extraction, phone and lexical models. Focus will be put
on intrinsic variabilities of speech in quiet and noisy environment as
well as in read and spontaneous speech. The analysis will not be
restricted
to tests on several databases with different features and models
but will go into the detailed behavior of the algorithms and
models. Suggestions of new solutions will arise and be experimented.
The duration of the project is for 3 years.
Ph.D. Student Position Available at Eurecom
The
Speech group at Eurecom is looking for a top level PhD student who has
a good
knowledge
of speech processing. Preference is for a student who worked in
speech in his/her predoctoral school or worked on a speech project for
his graduation project. He/she must have an excellent practice
of signal and speech analysis as well as a good knowledge of optimal
classification using Bayesian criteria. Fluency in english
is mandatory (write, understand and speak). Ability to work
in a small team is also required.
Application Procedure:
-send
a detailed resume (give details on your activity since your PhD
graduation)
-send
a copy of your thesis report (as a printed document or
CDROM) DO NOT attach your thesis in an e-mail!
-send
a copy of your diploma
-send
the names and email addresses of two referees.
-send
the list of your publications
Send
all materials to
Professor Chris J. Wellekens, Dept of Multimedia Communications, 2229
route
des Cretes, BP 193, F-06904 Sophia Antipolis Cedex, France
Contact
Professor Chris Wellekens at christian.wellekens@eurecom.fr
(<http://www.eurecom.fr/~welleken> )
back to top
Links to
Upcoming Conferences and Workshops
(Organized by Date)
International Conference on Speech Prosody
Nara, Japan, March 23-26, 2004
http://www.gavo.t.u-tokyo.ac.jp/sp2004/sp2004_fm.html#
ICA2004 18th International Congress on Acoustics
Kyoto, Japan, April 4-9, 2004
http://www.ica2004.or.jp
ITCC04 - International Conference on Information Technology
Coding
and Computing
Las Vegas, Nevada, April 5-7, 2004
http://www.itcc.info
HLT/NAACL 2004
Boston, MA, May 2-7, 2004
http://www.hlt-naacl04.org/
HLT/NAACL 2004 Workshop on Spoken Language Understanding for
Conversational Systems
Boston, MA, May 7, 2004
http://www.research.att.com/~dtur/NAACL04-Workshop/
NIST Rich Transcription 2004 Meeting Recognition Workshop
Montreal, Canada, May 17, 2004
john.garofolo@nist.gov
ICASSP2004
Montreal, Canada, May 17-21, 2004
http://www.icassp2004.com
Odyssey2004 - ISCA Workshop on Speaker and Language Recognition
Toledo, Spain, May 31 - June 1, 2004
http://www.odyssey04.org/
3rd International Conference
MESAQIN 2004
Czech Republic, June 10-11, 2004
http://wireless.feld.cvut.cz/mesaqin/
IEEE2004 Workshop on Signal Processing Advances in Wireless
Communications
Lisbon Portugal, July 11 - 14, 2004
http://spawc2004.isr.ist.utl.pt
SCI2004 - 8th World Conference on Systemics, Cybernetics, and
Informatics
Orlando, Florida, July 18 - 21, 2004
http://www.iisci.org/sci2004
Robust
2004: COST278 Workshop on
Robustness
Issues in Conversational
Interaction
University of East Anglia, Norwich, U.K., August 30 - 31, 2004
http://www.cmp.uea.ac.uk/robust04/
EUSIPCO2004
Vienna, Austria, Sept. 7-10, 2004
http://www.nt.tuwien.ac.at/eusipco2004/
4th International Conference on
Spoken Language Processing
Hong Kong, China, December 15-18, 2004
http://www.se.cuhk.edu.hk/~iscslp/index.html
ICSLP2004 - INTERSPEECH 8th Biennial International Conference on
Spoken Language Processing
Jeju Island, Korea, October 4-8, 2004
http://www.icslp2004.org
ICASSP2005
Philadelphia, Pennsylvania, May, 2005
http://www.icassp2005.org/
EUROSPEECH 2005 9th European Conference on Speech Communication
and Technology
Lisbon, Portugal, September 4-8, 2005
http://www.interspeech2005.org/
back to top