Tests Documentation PyPI Demo Coverage DOI
EDS-NLP is a collaborative NLP framework that aims primarily at extracting information from French clinical notes. At its core, it is a collection of components or pipes, either rule-based functions or deep learning modules. These components are organized into a novel efficient and modular pipeline system, built for hybrid and multitask models. We use spaCy to represent documents and their annotations, and Pytorch as a deep-learning backend for trainable components.
EDS-NLP is versatile and can be used on any textual document. The rule-based components are fully compatible with spaCy's components, and vice versa. This library is a product of collaborative effort, and we encourage further contributions to enhance its capabilities.
Check out our interactive demo !
- Rule-based components for French clinical notes
- Trainable components: NER, Span classification
- LLM-based components
- Support for multitask deep-learning models with weights sharing
- Fast inference, with multi-GPU support out of the box
- Easy to use, with a spaCy-like API
- Compatible with rule-based spaCy components
- Support for various io formats like BRAT, JSON, Parquet, Pandas or Spark
You can install EDS-NLP via pip. We recommend pinning the library version in your projects, or use a strict package manager like Poetry.
pip install edsnlp==0.19.0
or if you want to use the trainable components (using pytorch)
pip install "edsnlp[ml]==0.19.0"Once you've installed the library, let's begin with a very simple example that extracts mentions of COVID19 in a text, and detects whether they are negated.
import edsnlp, edsnlp.pipes as eds nlp = edsnlp.blank("eds") terms = dict( covid=["covid", "coronavirus"], ) # Split the documents into sentences, this isneeded for negation detection nlp.add_pipe(eds.sentences()) # Matcher component nlp.add_pipe(eds.matcher(terms=terms)) # Negation detection (we also support spacy-like API !) nlp.add_pipe("eds.negation") # Process your text in one call ! doc = nlp("Le patient n'est pas atteint de covid") doc.ents # Out: (covid,) doc.ents[0]._.negation # Out: True
Go to the documentation for more information.
The performances of an extraction pipeline may depend on the population and documents that are considered.
We welcome contributions ! Fork the project and propose a pull request. Take a look at the dedicated page for detail.
If you use EDS-NLP, please cite us as below.
@misc{edsnlp, author = {Wajsburt, Perceval and Petit-Jean, Thomas and Dura, Basile and Cohen, Ariel and Jean, Charline and Bey, Romain}, doi = {10.5281/zenodo.6424993}, title = {EDS-NLP: efficient information extraction from French clinical notes}, url = {https://aphp.github.io/edsnlp} }
We would like to thank Assistance Publique – Hôpitaux de Paris, AP-HP Foundation and Inria for funding this project.