AI regulation guide: using PICO to generate evidence for AI development

AI regulation guide (1).png

A guide from the multi-agency advisory service on generating the right evidence for your AI product for health and care, and mastering the PICO statement.

Generating the right evidence

Clinical practice can be incredibly nuanced, variable across settings, and changeable over time due to improvements in care. This can make it challenging for developers to collect the right evidence to support their product; i.e. the kind that generalises well to the market they want to sell to.

The key is for evidence to be as representative of current clinical practice as possible – this will maximise likelihood of success through the regulatory and health technology assessment pathway. This applies to early data-sets that are used to train and validate a new product, as well as to subsequent data collected on the clinical effectiveness and safety of a product when introduced into real-life clinical workflows.

So how do we generate this evidence?

We recommend using the PICO (‘Population, Intervention, Comparators, Outcomes’) statement: a useful framework that can help you to define your clinical research question as precisely as possible. In order to achieve a clear and appropriate placement of a new product within health and care pathways, you will need to define:

  • the population for which the technology will be used; thinking specifically about the characteristics of this population, and any variables that could influence outcomes (such as: age, ethnicity, gender, severity of disease)
  • how the technology / intervention will be used in clinical practice, along with the current standard of care, and alternative technologies with which a product might be compared to
  • the most relevant outcomes to measure to determine the effect (e.g., clinical benefits, safety, resource implications) of a product.

It sounds simple on the surface, but this is notoriously tricky, detailed-oriented work. Developers often spend less time refining these details a-priori, resulting in consequences down the line when realising there are large gaps in their evidence-base.

Selecting the best outcomes to measure

Often, developers choose to measure outcomes that provide a quick and simple way of measuring a treatment effect, which are often considered ‘surrogate’ outcomes. This can be a problem, because these outcomes alone do not represent a direct clinical benefit for the patient or person using care.

Imagine you’re developing an implantable device with embedded AI that can reduce the size of a myocardial infarct – you may well consider measuring infarct size as a primary outcome. Yet, to determine the true benefits to individuals, you need to either measure or model downstream consequences, such as heart failure, quality of life, and survival. This is crucial, since these clinical benefits are often what motivates health and care services to adopt a product.

So, to identify potentially relevant outcomes to measure, developers should review core outcome sets (such as in the COMET database) to determine whether any agreed standardised set of outcomes exist for the clinical area their product is positioned for.

Collecting evidence on evolving algorithms

Regulators often need to see that the final version of a fixed algorithm has been fully validated, and ideally be the version that is studied in clinical practice in comparison to standard care – but we know that this is challenging, and can’t always be the case. Another complexity is that AI algorithms may need to be optimised for different sites based on their different populations or clinical practice, and so there may be different versions applicable to different contexts.

Often, developers present evidence of different versions of their product together for regulators and evaluators. This can make it challenging to appropriately evaluate the safety and performance of the final version of a product.

So do make sure you clearly outline which version of your product is used in which studies, and what the differences are between the versions.

Don’t ignore feasibility

A word of caution: remember that generating the evidence you need requires the right resources.

It can be time-consuming and costly to implement large-scale studies that measure longer term outcomes, and so this needs thorough planning. To support developers, NICE offers the META tool service, which identifies possible gaps in evidence generation plans, and offers advice and support for identifying potential next steps to address these gaps. This service can help you to think through what evidence generation is feasible for you, and how to use or generate real-world data when conducting large, long-term trials might not be feasible.


Further information

About the multi-agency advisory service

The MHRA website has a range of resources and guidance can be found on the regulation of medical devices.

The Digital Technology Assessment Criteria (DTAC) helps with assessing suppliers and gives developers what is expected for entry to the NHS.

Get details on what CQC registration is, and who needs to register.

The Health Research Authority’s website, outlines what approvals are required for health research and how to obtain them.


Rebecca Boffa, Jeanette Kusel, Russell Pearson, Moritz Flockenhaus, Omar Moreea, Carly Wheeler, Toni Gasse, and Clíodhna Ní Ghuidhir1, on behalf of the MAAS working group, and in collaboration with Emma Hughes, on behalf of the AI Award team at the Accelerated Access Collaborative.

For more information on the MAAS project, contact Clíodhna Ní Ghuidhir (corresponding author) at

AI regulation guide (1).png