Detecting and quantifying apnea based on the ECG
A challenge from PhysioNet and Computers in Cardiology 2000
22 September 2000: The deadline for entries has passed and
no further entries will be accepted. The final scores have now been posted
here, together with links to the
abstracts submitted by entrants for presentation at Computers in Cardiology.
14 March 2003: Several of the participants in this
challenge, together with the organizers, have published a paper that compares
the methods used in the challenge and investigates how several of the most
successful strategies can be combined. This paper can now be read on-line:
[PDF]
Penzel T, McNames J, de Chazal P, Raymond B, Murray A, Moody G.
Systematic comparison of different algorithms for apnoea detection based
on electrocardiogram recordings.
Medical & Biological Engineering & Computing
40:402-407, 2002.
January 2009: Andy Fraser, who with
colleagues James McNames and Andreas Rechtsteiner won event 2 and
achieved a perfect score in event 1, has published a book about hidden
Markov models and their applications, with a chapter detailing the
winning method and variations on it:
Fraser AM. Hidden Markov Models and Dynamical Systems [ISBN 978-0-898716-65-8]. Philadelphia: Society for Industrial and Applied Mathematics, 2008.
A list of the papers about the Challenge presented at
Computers in Cardiology 2000 is available.
Introduction
Obstructive sleep apnea (intermittent cessation of breathing) is a
common problem with major health implications, ranging from excessive daytime
drowsiness to serious cardiac arrhythmias. Obstructive sleep apnea is
associated with increased risks of high blood pressure, myocardial infarction,
and stroke, and with increased mortality rates. Standard methods for detecting
and quantifying sleep apnea are based on respiration monitoring, which often
disturbs or interferes with sleep and is generally expensive. A number of
studies during the past 15 years have hinted at the possibility of detecting
sleep apnea using features of the electrocardiogram. Such approaches are
minimally intrusive, inexpensive, and may be particularly well-suited for
screening. The major obstacle to use of such methods is that careful
quantitative comparisons of their accuracy against that of conventional
techniques for apnea detection have not been published.
We therefore offer a challenge to the biomedical research community:
demonstrate the efficacy of ECG-based methods for apnea detection using a
large, well-characterized, and representative set of data. The goal of the
contest is to stimulate effort and advance the state of the art in this
clinically significant problem, and to foster both friendly competition and
wide-ranging collaborations. We will award prizes of US500ドル to the most
successful entrant in each of two events.1
Data for development and evaluation
Data for this contest have kindly been provided by Dr. Thomas Penzel of
Philipps-University, Marburg, Germany, and are available here.
The data to be used in the contest are divided into a learning set
and a test set of equal size. Each set consists of 35 recordings,
containing a single ECG signal digitized at 100 Hz with 12-bit resolution,
continuously for approximately 8 hours (individual recordings vary in length
from slightly less than 7 hours to nearly 10 hours). Each recording includes a
set of reference annotations, one for each minute of the recording, that
indicate the presence or absence of apnea during that minute. These reference
annotations were made by human experts on the basis of simultaneously recorded
respiration signals. Note that the reference annotations for the test set will
not be made available until the conclusion of the contest. Eight of the
recordings in the learning set include three respiration signals (oronasal
airflow measured using nasal thermistors, and chest and abdominal respiratory
effort measured using inductive plethysmography) each digitized at 20 Hz, and
an oxygen saturation signal digitzed at 1 Hz. These additional signals can be
used as reference material to understand how the apnea annotations were made,
and to study the relationships between the respiration and ECG signals.
The database does not contain episodes of pure central apnea or of
Cheyne-Stokes respiration; all apneas in these recordings are either
obstructive or mixed. Minutes containing hypopneas (defined as
intermittent drops in respiratory flow below 50%, accompanied by drops
in oxygen saturation of at least 4%, and followed by compensating
hyperventilation) are also scored as minutes containing apnea.
Additional information about the recordings was posted
here
after the conclusion of the competition, including (for all
recordings) age, gender, height, weight, AI (apnea index), HI
(hypopnea index), and AHI (apnea-hypopnea index). The subjects of
these recordings are men and women between 27 and 63 years of age,
with weights between 53 and 135 kg (BMI between 20.3 and 42.1); AHI
ranges from 0 to 93.5 in these recordings.
Sleep apnea definitions
Several definitions for clinically significant sleep apnea have been in
clinical use since 1978, when Guilleminault defined "sleep apnea syndrome" as
more than 30 apneas per night. In 1981, Lavie proposed a more selective
criterion of 100 apneas per night. Later criteria were based on an "apnea
index" (the number of apneas per hour, or the number of minutes containing
apnea per hour). Most clinicians regard an apnea index below 5 as normal, and
an apnea index of 10 or more as pathologic. In 1988, He et al. found increased
mortality in untreated patients with apnea indices of 20 or more, and such
patients are now recognized as in need of treatment. Criteria used in current
practice rely not only on an apnea index, but also on symptoms and
cardiovascular sequelae.2
Data classes
For the purposes of this challenge, based on these varied criteria, we have
defined three classes of recordings:
- Class A (Apnea): These meet all criteria. Recordings in class A
contain at least one hour with an apnea index of 10 or more, and at
least 100 minutes with apnea during the recording. The learning and
test sets each contain 20 class A recordings.
- Class B (Borderline): These meet some but not all of the criteria.
Recordings in class B contain at least one hour with an apnea
index of 5 or more, and between 5 and 99 minutes with apnea during
the recording. The learning and test sets each contain 5 class B
recordings.
- Class C (Control): These meet none of the criteria, and may be
considered normal. Recordings in class C contain fewer than 5 minutes
with apnea during the recording. The learning and test sets each
contain 10 class C recordings.
Events and scoring
Each entrant may compete in one or both of the following events:
1. Apnea screening
In this event, your task is to design software that can classify the 35 test
set recordings into class A (apnea) and class C (control or normal) groups,
using the ECG signal to determine if significant sleep apnea is present. Your
classifications for the 5 class B (borderline) recordings will not influence
your score in this event (but you must classify them into either class A or
class C, since you will not know which records belong to class B until the
correct classifications of the 35 test set records are disclosed after the end
of the contest). Your score for this event is simply the number of correct
classifications; thus the maximum score possible is 30.
An example may help to clarify the scoring: A contestant submits her results,
classifying 22 recordings in class A and 13 in class C (for a total of 35). Out
of the 22 recordings that her software has identified as class A, 16 of them
are actually class A, 3 are class B and 3 are class C. Out of the 13
recordings that her software identified as class C, 7 have been correctly
identified, and other 6 include 4 class A and 2 class B. The score in this
case is 23 (16 correct class A identifications, plus 7 in class C). Class B
cases do not contribute to the final score; rather, they provide a buffer zone
between the cut of classes A and C.
We have chosen to exclude the class B recordings from the calculation of the
scores because the utility of a screening test depends primarily on the
accuracy with which it classifies the unambiguous cases, both positive and
negative (classes A and C respectively in this instance). If you wish to
attempt to classify recordings into all three groups, you may submit a second
set of classifications, and we will calculate your score in the same way (but
the maximum possible score in this case will be 35). The highest scores
obtained in this way will be published, but will not be the basis for an award.
2. Quantitative assessment of apnea
In this event, your software must generate a minute-by-minute annotation file
for each recording, in the same format as those provided with the learning set,
using the ECG signal to determine when sleep apnea occurs. Your annotations
will be compared with a set of reference annotations to determine your score.
Each annotation that matches a reference annotation earns one point; thus the
highest possible score for this event will be approximately 16800 (480
annotations in each of 35 records). It is important to understand that scores
approaching the maximum are very unlikely, since apnea assessment can be very
difficult even for human experts. Nevertheless, the scores can be expected to
provide a reasonable ranking of the ability of the respective algorithms to
mimic the decisions made by human experts.
Obtaining scores
A form that will permit you to submit your classifications and/or annotations
for scoring is now available
here. You will receive
a reference number and your score(s) by return e-mail. You may revise your
submissions and try again if you wish, but attempts to exploit this service in
order to discover the correct classifications are contrary to the spirit of the
contest and will result in disqualification.
How to enter
To enter the competition, submit an abstract with a concise
description of your approach to the problem to Computers in Cardiology 2000 no later than
Wednesday, 3 May 2000. Your abstract must include your reference number and
score(s); for this reason, do not wait until the last minute to submit your
classifications and/or annotations for scoring. If your abstract is accepted,
you will be expected to prepare a four-page paper for presentation during the
conference and publication in the conference proceedings. We welcome and
encourage contributions to PhysioNet of software developed during this
competition.
Awards
The author(s) of the top-scoring eligible entry in each event will receive an
award of US500ドル in recognition of his or her achievement. In the event of a
tie, the date of the author's abstract submission will be the tie-breaker.
This rule favors early submission of abstracts, but permits authors to improve
their results if they can after submitting their abstracts. Classifications or
annotations received for scoring after noon GMT on Friday, 22 September 2000
will not be eligible for awards. Submissions from members and affiliates of
our research groups at MIT, Boston University, Harvard Medical School, Beth
Israel Deaconess Medical Center, McGill University, and Phillips-University are
not eligible for awards, although all are welcome to participate.
Workshop/Panel discussion
All entrants are invited to describe their methods during a panel discussion at
Computers in Cardiology in Boston on Sunday, 24 September 2000, when the awards
will be given. Individual presentations of accepted papers will be scheduled
for one or more sessions of the conference during the following days (25-27
September).
Acknowledgements
1. Funding for the awards has been contributed by the Margret and
H.A. Rey Laboratory for Nonlinear Dynamics in Medicine at Boston's
Beth Israel
Deaconess Medical Center.
2. We thank Thomas Penzel for the discussion of diagnostic
criteria for sleep apnea syndrome, as well as for making this event possible
by his generous contribution of data.