Technical Articles / Meet CURE, the dataset to build trustworthy AI for health information retrieval

Meet CURE, the dataset to build trustworthy AI for health information retrieval

Clinia
Written by Clinia
Published 2024-11-26
Meet CURE, the dataset to build trustworthy AI for health information retrieval

Today, we are proud to announce CURE (Cross-lingual Understanding and Retrieval Evaluation), as the first heMTEB dataset to be fully integrated into MTEB and accessible on Hugging Face.

As a core part of heMTEB (Health-specific Massive Text Embedding Benchmark), CURE is a purpose-built health dataset developed in conjunction with medical experts to evaluate AI models for the task of health information retrieval. As an open-source dataset, it has been developed to effectively and consistently benchmark both proprietary and open-source embedding models for information retrieval questions in point-of-care scenarios.

The use of AI in mission-critical health workflows and information retrieval demands the highest level of precision and reliability, as the information can directly impact patient safety and health outcomes. This is why rigorous health-specific benchmarks developed in conjunction with medical experts are essential to the ongoing evaluation of AI models used in health.

Available today - CURE v1

CURE v1 enables developers and researchers to assess their models' performance for point-of-care information retrieval use cases, where timely and accurate access to health information is essential. It covers 10 medical disciplines and supports the evaluation of retrieval tasks in three language settings: English-English, French-English, and Spanish-English.

We developed CURE in conjunction with medical experts, to instill confidence and ensure the real-world needs of health professionals are supported.

  • Address health-specific gaps: CURE was created to enhance and strengthen health-specific use cases, by focusing on point-of-care health information search scenarios.

  • Built by medical professionals: Collaborating with healthcare professionals across 10 disciplines (including gastroenterology, neuroscience and neurology, and psychiatry and psychology), ensured the dataset reflects the nuances of specialist medical language and the complexity of real-world use cases.

  • Open-sourced for the health community: With CURE publicly available and open to contributions, the health & medical community can lead the development of AI solutions that address their real-world, in-the-room challenges for their colleagues and the patients they serve.

CURE v1 is now publicly available on Hugging Face as a part of MTEB's suite of datasets. We aim to include additional medical disciplines to CURE in future releases and add more datasets to heMTEB to address specific needs across the whole health and medical spectrum.

At Clinia, we see firsthand the high level of trust and precision required to build technology and AI applications in health. From the very beginning, we have recognized the complexity and mission-critical nature of supporting patients and medical professionals in their journey to better health and effective care. That’s why we ensure that everything we create is truly health-grade, including the models we train, and the datasets that guide their development and evaluation.

For more technical information or to explore collaboration and contribute feedback to heMTEB, please contact Daniel Buades Marcos.