A Buyer’s Checklist for AI in Health and Care
The NHSX AI Lab has a mission to accelerate the safe adoption of artificial intelligence in health and care. NHSX Artificial Intelligence: How to get it right provides policy context and an overview of the opportunities and challenges presented by AI for our sector.
The unprecedented context of COVID-19 presents an enormous opportunity for the uptake of innovative digital solutions at pace and scale. Health and care organisations are receiving multiple proposals for AI applications that may improve the quality and ease the burden of their work. At the same time, organisations need to be assured that any AI technology they do buy meets the highest standards of safety and efficacy.
Specific characteristics of AI give rise to important considerations for buyers, over and above those related to digital products in general. Therefore, the AI Lab has put together this short reference checklist to assist the decision-making of those responsible for procuring AI solutions in their organisations. It is aimed at procurement teams and chief finance officers, service transformation and commissioning leads, in-house data scientists and analysts, and other chief officers. The checklist is informed by the Code of conduct for data-driven health and care technology released by the Department of Health and Social Care in 2019.
Please get in touch with the Lab at firstname.lastname@example.org if you want to discuss anything in this checklist, or to share your experiences and challenges of buying AI. The Lab will be producing a more comprehensive Buyers’ Guide in the coming months.
1. Is AI the right solution for the type of problem you need to solve?
is worth thinking carefully about the exact challenges you face, and if AI is
the right solution for those problems. You should work with practitioners and
operational staff to define the use case and availability of data for any
proposed AI solution. This guide from the Government Digital Service and
the Office for Artificial Intelligence may help you think through that process
and develop a viable business case.
2. Can this technology be procured through a transparent, fair, competitive process?
Like any technology, an AI solution will need to be procured through a recognised public sector procurement route. Buying AI is not an exception to competitive procurements; whilst the market is still small, most AI companies have established competitors. Guidance from the Office for Artificial Intelligence sets out several specific considerations and innovative approaches to procuring AI.
The proposed technology may be in use elsewhere in the
health and care system. It is worth contacting the NHSX AI Lab to connect you
with other organisations potentially using the same or a similar AI product.
3. Can this product do what it claims it can?
Data to evaluate the use of AI in health and care is scarce, so you should scrutinise bold claims made by companies about high accuracy predictions or increased efficiencies. The list of questions below is intended to prime the conversation. Several require some technical knowledge of AI; a glossary of terms follows this checklist to define italicised terms.
- What is the “intended use” of the product - i.e. what exactly can it be used for?
are the product’s “indications of use” - i.e. what are the exact conditions and
situations under which the product can be used?
- What is the evidence base from clinical and other investigations cited in the product’s CE mark and cited in performance claims?
- What are the product’s performance metrics?
- What is the human input required to work with the product? With that consideration, will the product still achieve the results that are being claimed?
the target prediction for the product
correlated to a real-world outcome and is it plausible that the labelled data to train the product
the training data been chosen fairly
and is it representative of your patients/service users (e.g. provenance and
time of data collection, size of the dataset, correspondence with
patient/service user demographic)?
the vendor performed tests on a validation
dataset - a hold-out dataset that
has not previously been seen by the model? If not, there is a risk that the
model has been overfit to its
- What is the failure rate (e.g. the frequency of false positives and false negatives) and what are the consequences of this? Are these errors unpredictable or is there systematic failure which might disadvantage specific groups?
class balance been considered for
evaluating performance? Is the training dataset balanced and, if not, has the
vendor accounted for that?
- How could you go about evaluating the performance of the AI product in your organisation, and what resourcing would you need for this?
4. Are the users of this product primed to use it?
Are the practitioner and operational leads and teams who would use the product sufficiently engaged at this stage to make the solution work? A common failure point for AI adoption is a lack of consideration towards changes in the end user’s workflow. Widespread practitioner and operational support - best gained at the outset - is critical to successful implementation.
Changing how people perform their day-to-day work is a difficult task that requires persuasive communications, practical training, and on-going support. Are you able to provide these?
5. Does this product meet regulatory standards?
Approval for health and social care / community care research in the UK from the NHS is facilitated through the Integrated Research Application System (IRAS). For clinical research projects, the relevant Health Research Authority (HRA) Research Ethics Committee for medical devices research must grant ethical approval. In addition, the Medicines and Healthcare products Regulatory Agency (MHRA) should be notified about a clinical investigation.
CE marking is required for all medical devices that are intended for diagnosis, therapy and monitoring of disease. More information on how to comply with legal requirements for medical devices can be found on GOV.UK.
For certain types of AI devices that carry out regulated clinical activity - such as analysis and reporting of X-Rays, CT scans or MRI scans - independently of clinicians, registration as a service through the Care Quality Commission (CQC) is required in addition to CE marking.
If a product directly supports the national COVID-19 response, there may be provisions that enable agencies to function with greater flexibility. The NHSX AI Lab will simplify the regulatory process by joining up key regulators to create a single gateway for AI products. This work has been funded and will commence in the summer of 2020.
6. What information sharing and data protection protocols would need to be in place to comply with your information governance policy?
The data access and storage implications of implementing an AI technology will likely require putting in place an information sharing agreement and undertaking a data protection impact assessment. Your organisation’s Data Protection Officer will be able to support you with this. Completing these protocols entails mapping the flow of data throughout the process, not only when the AI application is live but also in relation to ongoing storage - even when the technology may be decommissioned. You should also investigate what anonymised data you may need to supply back to the vendor, to support their plans for ongoing monitoring and correcting of the AI product as mandated by regulation.
7. What agreements should you put in place to protect any intellectual property generated by your organisation through its use of this AI product?
Given that algorithms iterate and improve the more data they are fed, it may be that your organisation’s use of the AI product in question contributes to its development - dependent on the data flows agreed. Intellectual property may be created that increases the product’s value; if this is likely to be the case, you should take advice at the outset to ensure that your organisation secures an acceptable commercial agreement. NHSX’s newly set up Centre of Expertise can offer tailored guidance - please contact the Centre via the AI Lab at email@example.com.
8. Do you have the necessary storage and computing requirements?
The data-heavy nature of AI requires extensive storage and computing power, so you should consider these requirements and how you will fulfill them (e.g. cloud software, local servers). The additional costs associated with support from your organisation’s IT personnel should be considered alongside the costs of the product itself. If the new technology will be replacing an older system, you should take into account the financial implications of dealing with legacy systems, which could save you money or cost you more.
9. Will your existing systems work effectively alongside the new technology to ensure a clear and reliable workflow?
New technology needs to work alongside existing systems, to ensure both safety and efficiency. Back-end integrations are essential for ensuring a clear and reliable workflow. For example, consider the unsatisfactory result of a product that introduces decision support into the clinical workflow of a radiologist, but is only integrated with the imaging storage system and not the radiology information system. The decision support will not automatically flag; rather, the radiologist will be reliant on manually opening a patient’s imaging set to identify something of concern. A reliable workflow is even more important when multiple organisations are involved in caring for people.
10. Can you manage the maintenance burden of this new technology?
You need to be confident of having skilled and affordable in-house or outsourced technical personnel available to ensure that your product is working as it should far into the future. This includes managing both downtime and future performance:
- If the product fails, can you afford to have downtime, and who is responsible for ensuring that the product is up and running again?
will you manage performance drift,
upgrades to the model, changes in data format, and future interoperability
- Do you have the human and financial resources required to ensure the longevity of this software?
- How resilient would your service be if the supplier of your AI product withdrew from the marketplace?
There are many materials published by public bodies to help guide you through this process. Several key resources are listed below.
The NHSX AI Lab can answer queries as well. You can contact the Lab at firstname.lastname@example.org.
- Artificial Intelligence: How to get it right (NHSX)
- Code of conduct for data-driven health and care technology (DHSC)
- Assessing if AI is the right solution for you (GDS and OAI)
- Medical devices: how to comply with the legal requirements (MHRA)
- Medical device stand-alone software including apps (MHRA)
- Managing medical devices: Guidance for healthcare and social services organisations on managing medical devices in practice (MHRA)
- Using Machine Learning in Diagnostic Services - A report with recommendation from CQC’s regulatory sandbox (CQC)
- Evidence standards framework for digital health technologies (NICE)
Glossary of terms
This glossary includes definitions of some of the technical terminology used, but is not a comprehensive list of terms used in AI. A good reference for more commonly used terms is this journal article in Science.
- Class balance - when the target observations in the training dataset are balanced. An example of a class imbalance can take place in medical diagnosis where say 0.5% of a patient sample may test positive for the target that you’re looking for (e.g. a disease, tumour, polyp). Most machine learning algorithms work best when the number of samples in each class are roughly equal as the model has enough samples to learn one scenario from another.
- Cloud system provider - a third party vendor that supplies virtual computing services including processing power and storage. Common examples of these include Google Cloud Platform, Amazon Web Services, and Microsoft Azure.
- Compute facility - a dedicated computer that performs AI operations housed within virtual or physical servers. Cloud system providers offer multiple different server types to suit users’ needs. Some models and actions (training or running a model) require more compute power than others.
- Data protection impact assessment - a process to help identify and minimise the data protection risks of a project requiring the processing of personal data.
- False negative - an incorrectly detected negative outcome (e.g. not identifying a tumour when the ground truth says there is a tumour).
- False positive - an incorrectly detected positive outcome (e.g. identifying a tumour when the ground truth says there is no tumour).
- Ground truth - the absolute truth measured empirically, against which to assess model performance. This helps to answer how accurate a model is, or how often it is predicting what we want it to predict, correctly and incorrectly. It helps us define false positive/negative and true positive/negative rates.
- Information sharing agreement - a common set of rules to be adopted by organisations sharing personal data with each other. Also known as a data sharing agreement or data sharing protocol.
- Labelled data - a dataset that has labels for the target prediction (i.e. if you’re looking to predict the existence of breast cancer, the labelled dataset may be a mammography image tagged with whether the resulting diagnosis was positive or not). Labelled datasets are needed to train a machine learning model.
- Model - the output of AI that performs the desired action (e.g. prediction, classification, clustering). Machine learning algorithms build mathematical models based on training data to predict or classify a target without being explicitly programmed to do so.
- Overfit - when a model is trained on a dataset to the point where it can very accurately predict within the training dataset, but has far worse predictive value outside of the training dataset.
- Performance drift - drift occurs over time as machine learning models become outdated. This is generally caused by changes in the target variable because of complex system dependencies including changes in equipment, model, software version and even local protocols. Machine learning models need to be updated to keep up with these changes.
- Target - the variable that the model is aiming to predict.
- Training data - a dataset used to train machine learning models.
- True positive rate - the rate at which a model rightly detects a positive outcome (i.e. identifying a tumor when the ground truth says there is a tumor).
- Validation dataset - a completely new dataset that the model has never been trained or tested on, that is able to test the real world performance of the model.
Download print version of A Buyer’s Checklist for AI in Health and Care