CaRe-NLP | NLP4Health Lab Amsterdam

Unstructured data is guesstimated to account for 80% of all patient data. Nonetheless, unstructured data is currently severely underused in healthcare because it is noisy, hard to interpret, and privacy-sensitive. In this project, we develop human-centric responsible Natural Language Processing (NLP) and Machine Learning (ML) methods that will allow clinicians and patients to safely tap this unstructured data’s potential. Our proposed methods have a focus on the Dutch healthcare ecosystem and are tailored to support research, education, and patient care by promoting explainable prediction models, ensuring fairness and patient privacy, preventing bias, and coping with data scarcity.

The CaRe-NLP project kicks-off in April 2024 and will run for 5 years until April 2029.

Scientific objectives

Improve Dutch healthcare research and technology by developing responsible NLP tools for multilingual and Dutch health data.
Enable new research lines in Dutch healthcare by generating synthetic patient electronic health records (EHRs) that include free-text notes and with differential privacy guarantees.
Improve fairness and inclusiveness by building new (and improving existing) prediction models for common and rare conditions and diseases.
Improve the explainability and transparency of NLP in healthcare by developing post-hoc feature attribution and explainable-by-design NLP methods for healthcare.
Increase human centeredness in NLP and AI for healthcare by integrating physicians in the loop to validate synthetic data and model predictions.

PhD projects

PhD1: Dutch responsible NLP tools
- Goal: Propose, develop, train, evaluate, and share pretrained large language models (LLMs) to address many problems in healthcare.
PhD2: Synthetic patient electronic health records (EHRs)
- Goal: Propose, develop, train, evaluate, and share methods to generate synthetic patient EHRs.
PhD3: Prediction models with physician-in-the-loop
- Goal: Propose, develop, train, evaluate, and share active learning and physician-in-the-loop methods to improve synthetic data generation.
PhD4: Explainability methods
- Goal: Propose, develop, train, evaluate, and share feature attribution and explainable-by-design explainability methods.