Navigating the complexities in proving the real impact of AI

Recent months have shown a wide array of AI technologies come on to the market. These products are developing to reach impressive milestones, such as being able to accurately identify pathology within medical images. To support the safe development and deployment of these technologies into frontline care pathways, the NHS AI Lab has been working extensively with the wider AI and healthcare community to understand the gaps and develop robust pathways for them to be implemented.

In this blog, we outline some of the gaps and challenges that are met by innovators and the ways in which the NHS AI Lab is helping.

“When incorporating a new technology or product into a screening programme it is essential to have evidence that the programme will remain safe and highly effective” - Prof Anne Mackie Director of Screening for Public Health England.

Evidence generation

The evidence required to show the value of an AI product is more than a simple technical validation study, like proving that a computer can accurately identify cancer in an image. It includes showing that the addition of AI makes for better care, at a reasonable cost and that it fits seamlessly in the care pathway. Through the NHS AI in Health and Care Award, the NHS AI Lab is working with technology companies and the NHS to set a roadmap for how to generate this evidence to show the impact and cost efficacy of AI products in given settings. Read more about evaluating AI in health and care.

Representative data

When a company uses machine learning to train and evaluate a software product to read images, they will use large numbers of medical images with linked clinical outcomes data. If these images come from a population that is not representative of the NHS patient population (for example a sample set of images collected in another country) there is a risk that the AI may not perform as well in real life as it did when it was evaluated on that different population.

At present, achieving CE marking (indicating conformity with health and safety standards for products sold within the European Economic Area (EEA) does not have specific guidance for AI, although broad requirements note that medical devices must be evidenced with data representative of the entire intended purpose and patient populations. As such, NHS commissioners are seeking to have independent validation data to ensure that the product will work as expected. 

Independent validation

There is strong evidence that an independent evaluator will be more rigorous than the product developer, whether academic or commercial. For example, in a systematic review of trials on decision support, 75% of those done by the developer showed an impact on clinical practice, compared with only 28% of trials performed by independent evaluators.

Independent validation could provide additional information which is essential for planning and commissioning the entire service. For example, if a product reads images to detect cancers, commissioners will want to know what impact this will have on the rest of their service. Will it detect many additional cancers which require investigation? How many and of what type? Will the commissioner need to pay for additional pathology or surgical capacity to deal with the increase in findings? Will other pathologies go unreported? How will services address those “missed” cases? Will it have any effect on inequalities?

To support commissioners in navigating the questions that arise from the use of AI, the NHS AI Lab developed A Buyer’s Guide to AI in Health and Care which sets out what needs to be considered to make well-informed decisions about commissioning AI products.

Our pilot: why we are using a national screening programme 

The NHS AI Lab is also undertaking, in partnership with Public Health England (PHE), a pilot exercise for the technical process of validation - the testing of an AI model to determine if it works as described in a UK representative population. This exercise aims to inform commissioners and clinicians how an AI functions in the UK population; whether it will function with the clinical pathway and what true impact it may have on services and capacity. 

The NHS and PHE have an established history of seeking robust evidence and data on the efficacy of new technologies. Expert advisory groups will be asked to ensure the key questions are raised during the pilot and to help answer these.

These questions require broader evaluation. Evaluation for some of these technologies in a clinical setting is being funded through the AI Award.

The UK screening programmes are governed by ministerial policy based on scientific advice from an independent scientific advisory committee (the UK National Screening Committee (UK NSC), and any major changes to the screening pathway requires ministerial approval based on UK NSC advice. This approval is for the class of technology - not for a specific provider such as AI as a mammogram reader, and not company X’s product.

After the UK NSC approves a class of technology for use in the screening programme standards and specifications for the class of test are drawn up. Individual technology providers need to show that their product meets these requirements before they are recommended to the NHS for use. For these complex technologies, measuring compliance means achieving several different standards (such as specificity, acceptability, sensitivity, explainability, and integration with existing systems). A CE marking is not sufficient and NHS providers need assurance that a technology will fulfil its needs to comply with UK NSC recommendations. 

By exploring guidance on technical validation processes we will enable the NHS to more rapidly integrate technology from diverse suppliers after an NSC approval for the class of technology. 

The NHS AI Lab is exploring the processes and methods as part of this study and aims to publish updates in the following months. To keep up to date with the AI Lab’s imaging work, join our fast growing community of practice on the NHS AI Virtual Hub.