MIDAS 2.0, the Metric-based Integrity and Data Assessment System, gives institutions a common method to check data quality, documentation, representativeness, interoperability, governance, and privacy before a dataset is shared, reused, or used to build AI tools.
If you are new to MIDAS, think of it as a practical system that helps you understand whether a health dataset is truly ready for research, repository onboarding, and AI development, not just whether the files exist.
Useful datasets are often held back by uneven documentation, inconsistent metadata, weak interoperability, or unclear privacy safeguards. MIDAS 2.0 exists to replace guesswork with a consistent, evidence-based way to judge whether a dataset is strong enough to be trusted and reused.
MIDAS 2.0 evaluates whether a dataset is not only complete, but also understandable, reusable, representative, and safe.
The framework helps centres identify gaps, improve weak areas, compare datasets on a common scale, and decide how confidently a dataset can be shared, governed, and reused. It also supports repository onboarding and future benchmark AI work.
MIDAS 1.0 proved that high-quality, standardized biomedical datasets could be built in a structured way. MIDAS 2.0 takes the next necessary step. If datasets are going to be compared across centres, onboarded into trusted repositories, and reused for AI, they must be assessed with clearer evidence, stronger privacy safeguards, better interoperability checks, and a framework that works beyond imaging alone.
MIDAS 1.0 showed that biomedical datasets could be built with stronger annotation discipline, richer metadata, and more consistent curation standards instead of being assembled in an ad hoc way. That early work created the foundation for a more mature national framework.
MIDAS 1.0 was an important beginning, but it was not enough for a national-quality ecosystem. A stronger framework was needed because datasets today must do more than look well curated inside one institution. They must be comparable across centres, usable across data types, and safe enough to support responsible sharing and AI development.
MIDAS 2.0 uses a two-stage pathway so dataset custodians can first complete the Lite Version themselves and then undergo an independent Technical review before the final CQI and PRS are assigned. This keeps the process practical for centres while still requiring evidence-based validation before broader sharing or repository onboarding.
The dataset custodian performs a structured self-assessment and records the current state of the dataset against the MIDAS domains.
Metadata, SOPs, validation logs, consent information, and privacy documentation are kept ready so every claim can be verified rather than assumed.
An independent reviewer checks the submission in greater detail, asks for clarifications if needed, and computes the final quality and privacy scores.
The final CQI and PRS determine whether the dataset is ready for broader use, needs targeted improvement, or requires stronger sharing restrictions.
The framework produces two outputs: one score for overall dataset quality and one score for residual privacy risk. Together, they show whether a dataset is strong enough to support trustworthy reuse and what level of safeguards it still needs. This is what turns MIDAS 2.0 from a checklist into a decision framework for certification, repository onboarding, and responsible reuse.
Summarises how complete, reusable, representative, and well-governed a dataset is across the 15 MIDAS domains.
In simple terms, CQI answers: "How strong and trustworthy is this dataset overall?" A higher CQI means the dataset is better documented, easier to reuse, and more suitable for high-quality research and AI development.
Estimates how much re-identification or sensitive-attribute risk remains after privacy protections have been applied.
PRS answers a different question: "Even after de-identification and other controls, how much privacy risk still remains?" A lower PRS supports wider reuse, while a higher PRS signals the need for tighter controls.
The baseline is then adjusted for how sensitive the data are. Higher sensitivity means stricter handling requirements.
Before MIDAS 2.0 is used more widely, ICMR is asking experts to review whether the framework is clear, scientifically sound, complete, and practical across different biomedical data contexts. The goal is not only to validate the wording, but also to ensure the framework can be applied consistently before it is used for wider implementation and dataset certification.
Each statement is rated on a 5-point scale, and lower scores must be explained so unclear or impractical sections can be revised.
If you are an invited expert, you can log in and continue the validation workflow. If you are new to MIDAS, start with the Delphi Proposal to read the full review document and scoring context.