Find, share, and reuse health data
We recognize that there is value in creating datasets or models that are either derived from MIMIC or which augment MIMIC in some way (for example, by adding annotations). Here are some guidelines on creating these datasets and models:
This repository is under review by NIH for potential modification in compliance with U.S. federal Administration directives.
PhysioNet has introduced updated access policies for certain datasets to comply with the U.S. Department of Justice’s Data Security Program (DSP) under Executive Order 14117. The DSP final rule took effect on April 8, 2025 and full enforcement began July 8, 2025: https://www.justice.gov/opa/media/1396351/dl
The DSP imposes export-control–style restrictions on U.S. persons sharing or transferring bulk sensitive personal data (e.g., genomic, biometric, health, financial, geolocation) and U.S. government-related data with specified countries or "covered persons". The rule applies to interactions with countries including: China (including Hong Kong and Macau), Cuba, Iran, North Korea, Russia, and Venezuela, as well as individuals or entities connected to them.
PhysioNet now prevents access to certain controlled-access datasets for users connecting from IP addresses or affiliations in those regions, or for those classified as “covered persons”. These steps are taken to satisfy legal obligations and are not a judgment on your work as researchers.
We understand these changes may affect ongoing research. PhysioNet is committed to supporting your efforts to understand the policy and explore compliant access options.
We have received inquiries about the use of credentialed and restricted data on PhysioNet, includingMIMIC-III, MIMIC-IV, MIMIC-CXR, and their derivatives, with large language models (LLMs) and online services. The PhysioNet Credentialed Data Use Agreement explicitly prohibits sharing access to the data with third parties, including sending it through APIs or using it on online platforms.
Key Requirements:
Zero Data Retention: MIMIC data must not be stored or retained by third-party LLM services.
User Responsibility: Researchers are responsible for ensuring compliance with the Data Use Agreement.
Recommendations:
Strongly Recommended: Use locally deployed LLMs to maintain full control over the data.
If Using Cloud Services or APIs: Verify that the service’s settings ensure zero data retention, no use of data for model training, and no human review. Many services retain data by default. Even when services claim "zero data retention," their requirements may be insufficient due to internal processing, logging, or caching practices. Regularly review platform policies, as they may change without notice. If a service’s data handling practices are unclear or cannot be fully verified, do not use the service.
Important Disclaimer: PhysioNet cannot verify the data practices of external services and does not endorse or recommend specific platforms.
The National Institutes of Health (NIH) is seeking applications for the position of Director, National Institute of General Medical Sciences (NIGMS). The Director, NIGMS, provides leadership, and administers, fosters, and supports research in the basic and general medical sciences and in related natural or behavioral sciences. The Director develops Institute goals, priorities, policies, and program activities, and keeps the Director, NIH, abreast of NIGMS developments, accomplishments, and needs as they relate to the overall mission of the NIH. In exercising the Director’s responsibilities for program planning, implementation and evaluation, the incumbent works with and seeks the advice of a wide range of groups within the scientific community including investigators, institutions, scientific societies, and relevant commercial organizations.
The Director is responsible for managing a high-level, complex organization and serving as the chief visionary for the Institute. The Director actively engages others to create a shared vision of the purpose and direction of the organization and works collaboratively within the Institute, across the NIH, and with external entities to generate, gain commitment for, and accomplish NIGMS goals. The Director must demonstrate a keen awareness of the workings of the public sector and successfully navigate with that environment to promote and reach NIGMS and NIH objectives.
The position is open for application from Friday, November 7, 2025 – Friday, December 12, 2025 [updated]. Additional information on the position and the application process can be found here: Director, National Institute of General Medical Sciences | Office of Human Resources
Hyung-Chul Lee, Chul-Woo Jung
VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients Published: Sept. 21, 2022. Version: 1.0.0Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Steven Horng, Leo Anthony Celi, Roger Mark
Large database of de-identified health information from patients admitted to Beth Israel Deaconess Medical Center Published: Jan. 6, 2023. Version: 2.2Alistair Johnson, Tom Pollard, Roger Mark, Seth Berkowitz, Steven Horng
Chest radiographs in DICOM format with associated free-text reports. Published: Sept. 19, 2019. Version: 2.0.0Caio Uehara Martins, Camila Tirapelli, Hugo Gaêta-Araujo, Jose Augusto Baranauskas, Breno Zancan, Jose Carneiro, Alessandra Macedo
InReDD‐Dataset-V1 is a collection of 924 anonymised panoramic dental radiographs curated by the Interdisciplinary Research Group in Digital Dentistry (InReDD) at the University of São Paulo. Published: Nov. 22, 2025. Version: 1.0.0Asad Aali, Vasiliki Bikia, Maya Varma, Nicole Chiou, Sophie Ostmeier, Arnav Singhvi, Magdalini Paschali, Ashwin Kumar, Andrew Johnston, Karimar Amador Martinez, Eduardo Perez Guerrero, Paola Cruz Rivera, Sergios Gatidis, Christian Bluethgen, Eduardo Pontes Reis, Eddy Zandee van Rilland, Poonam Hosamani, Kevin Keet, Minjoung Go, Evelyn Ling, David Larson, Curtis Langlotz, Roxana Daneshjou, Jason Hom, Sanmi Koyejo, Emily Alsentzer, Akshay Chaudhari
MedVAL-Bench is the first large-scale physician-validated benchmark for medical text validation, spanning 6 diverse medical tasks and containing 840 language model-generated outputs annotated by 12 physicians with error assessments and risk grades. Published: Nov. 14, 2025. Version: 1.0.1Ziming Wei, Luke Sagers, Caroline McKenna, Ted Pak, Chanu Rhee, Michael Klompas, Sanjat Kanjilal
NPA-CP is a freely accessible dataset derived from electronic health record (EHR) information at MGB between 2015 and 2024. The dataset includes 11 different pathogens and can be used to predict hospital-onset infections for these pathogens. Published: Nov. 4, 2025. Version: 1.0.0The published Bridge2AI-Voice dataset contains derived features from the audio waveforms. Interested users can request access to the original raw audio data by contacting: DACO@b2ai-voice.org
The raw audio data will be disseminated through controlled access only to protect participant's privacy.
Each year, the IEEE Awards Board selects a distinguished group of individuals to receive IEEE’s highest honors, recognizing exceptional achievements and significant contributions to technology, society, and the engineering profession.
We are honored to share that Professor Roger G. Mark and the late George B. Moody have been named co-recipients of the 2026 IEEE Biomedical Engineering Award for their leadership in ECG signal processing and the creation and distribution of curated biomedical and clinical data. View announcement on the IEEE website.
This recognition highlights the profound and lasting impact that Roger Mark and George Moody have had on biomedical engineering and the global research community. Their vision and contributions continue to underpin our work on PhysioNet and databases such as MIMIC.
Roger G. Mark is Distinguished Professor of Health Sciences and Technology Emeritus at the Institute for Medical Engineering & Science at MIT. His work spans physiological signal processing, patient monitoring, and critical care decision support. He is the co-founder of PhysioNet, launched in 1999 to provide open access to physiologic signals, clinical data, and open-source software for the research community.
George B. Moody made transformative contributions to biomedical signal processing through his work in electrocardiography. He developed the WFDB libraries and much of the code available on PhysioNet, which remains essential for ECG signal processing worldwide. He also created and led the PhysioNet/Computing in Cardiology Challenges for 15 years, fostering global collaboration and innovation.
The overarching goal of the ArchEHR-QA 2025 (pronounced "Archer") shared task is to develop automated responses to patients' questions by generating answers that are grounded in key clinical evidence from their electronic health records (EHRs). The proposed dataset, ArchEHR-QA, comprises hand-curated, realistic patient questions (reflective of patient portal messages), relevant focus areas identified within these questions (as determined by a clinician), corresponding clinician-rewritten versions (crafted to aid in formulating responses), and note excerpts providing essential clinical context.
We are pleased to announce the release of Bridge2AI-Voice v1.0, a dataset designed to advance research into the use of voice as a biomarker of health. This dataset, developed as part of the NIH Bridge2AI initiative, aims to support artificial intelligence research by providing ethically sourced, high-quality voice-derived data linked to clinical information.
Bridge2AI-Voice v1.0 includes 12,523 voice-derived recordings from 306 participants across five North American sites. Participants were selected based on conditions known to affect vocal characteristics, including:
The initial release does not include raw voice recordings. Instead, it provides derived acoustic features, such as spectrograms, along with detailed demographic, clinical, and validated questionnaire data.
This year's Challenge focuses on detecting Chagas disease from ECGs. Chagas disease is a parasitic disease in Central and South America that affects an estimated 6.5 million people and causes nearly 10,000 deaths annually. Timely treatment may prevent or slow damage to the cardiovascular system, but serological testing capacity is limited, so detection through ECGs can help to identify potential Chagas patients for testing and treatment.