NICE's medical technologies advisory committee considered evidence on artificial intelligence (AI) technologies to aid contouring for radiotherapy treatment planning from several sources, including an early value assessment (EVA) report by the external assessment group (EAG), and an overview of that report. Full details are in the project documents for this guidance.
3.1 AI autocontouring with healthcare professional review may be quicker than other contouring methods, which could reduce healthcare professional time to do contouring. This could reduce costs and increase efficiency, which may increase capacity, allow more focus on patient-facing tasks and reduce waiting lists. The Royal College of Radiologists clinical oncology census report 2021 reported workforce pressure because of staff shortages and continued effects from the COVID‑19 pandemic. Clinical experts advised that they spend a lot of time creating and reviewing manual contours. They said that healthcare professionals have reported finding it easier to review and edit AI autocontours than to create contours from scratch.
3.2 Clinical experts advised that AI technologies could improve the consistency of contours and compliance with national and international guidelines. Some AI technologies have been trained using guidelines and may be regularly updated when guidelines update. One expert said that AI technologies helped improve how they were defining structures and may produce smoother contours of 3D structures than manual contouring. AI technologies may also produce contours for structures not routinely contoured in standard care. This could improve treatment planning and quality of care.
3.3 The committee considered that there was strong evidence for the potential usefulness of AI technologies to aid contouring in radiotherapy treatment planning. The relevant evidence consisted of 79 studies, including 27 full-text publications and 52 conference abstracts. Because of the large number of publications, the EAG extracted data from 15 prioritised studies, specifically:
8 prospective studies (DLCExpert, Limbus Contour, MIM Contour ProtégéAI and MRCAT Prostate plus Auto-contouring)
4 retrospective studies (INTContour, MVision Segmentation Service, OSAIRIS, RayStation)
1 mixed retrospective and prospective study (AI-Rad Companion Organs RT)
2 conference abstracts (ART-Plan, AutoContour).
The evidence base for each technology and the EAG's rationale for selecting the prioritised studies are outlined in the assessment report in the project documents for this guidance. The level of evidence varied across technologies, but all technologies had some evidence showing potential benefits of AI autocontouring. Overall, the clinical evidence showed that AI autocontours were generally similar to manual contours, with most rated as clinically acceptable and ready to use or needing only minor edits. AI autocontouring was also consistently quicker than manual contouring even when including time for healthcare professional review and edits. The committee concluded that AI autocontouring with healthcare professional review and edits was likely to be clinically equivalent to manual contouring and quicker to do.
3.4 The evidence showed that although AI autocontouring worked well for most organs at risk (OAR) and clinical target volumes, there were some structures that needed major edits or were unusable. These were typically smaller structures such as the cochlea, optic chiasm, optic nerve, penile bulb and pituitary gland. The clinical experts advised that AI autocontouring performed similarly in clinical practice. AI technologies sometimes have difficulties contouring very small or irregularly shaped organs. AI autocontours may also be less accurate for people with atypical anatomy or who have trouble with positioning during imaging. One expert estimated that for head and neck structures, about 90% to 95% of AI autocontours would be accurate. The clinical experts noted that over time, healthcare professionals learn where specific AI technologies produce less accurate contours. This means they can make edits more quickly because they know that certain areas of the contour are likely to need editing.
3.5 Cost-consequence analysis showed that potential cost saving from using AI autocontouring as an alternative to manual contouring depended on technology costs, time saving and healthcare professional grade of the person doing the contouring:
Technology costs ranged from £4 to £50 per plan and included software (licence and subscription), hardware, data storage, and upgrade and maintenance costs. Several companies advised that healthcare professional training is also included in these costs.
The clinical evidence reported time savings ranging from 3 minutes to 80 minutes, but the EAG advised that these savings did not always include the time for healthcare professional review and edits. The clinical experts estimated time saving of 10 minutes to 30 minutes depending on the amount of editing needed. The committee noted the importance of clinical acceptability of the AI autocontours because this may affect the number of edits needed when reviewed by a healthcare professional.
Experts advised that contouring of OAR may be done by band 6 or 7 radiographers or speciality training doctors if there are not enough radiographers. Contours are usually reviewed by consultant clinical oncologists. But there may be many people involved in the review and sign-off of contours, which may make it difficult to estimate true resource use.
3.6 The simple cost offset calculator showed that as technology costs increased, the time saving needed for AI technologies to be cost saving or cost neutral also increased. The same was found for healthcare professional grade needed to do manual contouring. For example:
With the lowest technology cost of £4 per plan and a band 7 radiographer (£65 per hour based on PSSRU Unit Costs of Health and Social Care 2021) doing the contouring, time saved would need to be around 4 minutes for the AI technology to be cost neutral.
With the highest technology cost of £50 per plan and a band 7 radiographer doing the contouring, the time saved must be around 47 minutes for the AI technology to be considered cost neutral.
The EAG advised that there were several factors in the analysis that may cause a wide variation in results. It noted that the limited cost-effectiveness evidence made it difficult to draw firm conclusions about the potential cost effectiveness of the time saving compared with manual contouring. Estimates of healthcare professional costs may also vary depending on the source used. The committee concluded that although there were uncertainties in the cost analysis, AI technologies were likely to be cost saving or cost neutral but this largely depended on the technology costs and time saving.
3.7 AI technologies would be used to aid contouring for radiotherapy treatment planning within the existing care pathway. AI autocontours must always be reviewed by trained healthcare professionals and edited as needed before being used. There were no adverse events reported in the evidence or by the clinical experts. So, the committee considered that the risk of AI autocontouring with healthcare professional review and edits is likely to be low. People having contouring should be made aware that AI technologies are being used, and the role of healthcare professionals in the radiotherapy treatment planning process should be explained. Some technology developers or companies said they have tools that healthcare professionals can use to report errors in the performance and outputs of their technology. The committee concluded that there should be ongoing reporting of any errors in AI autocontouring and adverse events associated with these technologies.
3.8 NHS hospitals and trusts should have appropriate information governance policies for using AI technologies. AI technologies must also have national and local Digital Technology Assessment Criteria (DTAC) approval before being used in clinical practice.
3.9 In the future, more widespread use of these technologies could result in a skill loss in the workforce. Clinical experts advised that healthcare professionals would nearly always do some editing as part of their review of the autocontours. The committee considered that it is important for healthcare professionals to develop and to maintain contouring skills so they could adequately review and edit AI autocontours. Some technologies provided training packages for healthcare professionals to develop and practise their skills.
3.10 The committee considered that the compatibility of AI technologies with current systems may vary in each NHS hospital or trust. Most technologies were identified as being available to work with any system and so should have minimal technical implementation issues. The experts advised that AI technologies should be compatible with existing hospital systems if they use the DICOM (Digital Imaging and Communications in Medicine) format.
3.11 AI models can contain algorithmic bias depending on the population used in training, which may not be representative of populations in clinical practice. This may cause bias based on age, disability, sex and geographical location. Experts advised that there may be a lack of representation of female pelvis and breast cancer in men in some training datasets. Training datasets may also underrepresent children and young people.
3.12 AI technologies used to aid contouring may work best with certain CT or MRI sequences or with the person being in a specific position. Training datasets may not include data on atypical positioning or atypical anatomy, for example, if someone has had a previous surgery. This may affect how well AI autocontouring works for these populations. Healthcare professionals may consider manual contouring to be more appropriate for some people because it may produce more accurate contours in these specific cases. This is not thought to affect patient care or outcomes but may affect time to produce, review and edit contours.
3.13 Risk of bias should be considered as part of a local assessment process when deciding if to use AI technologies. Technology developers or companies should provide information on training datasets as part of their product information pack, including demographics of population datasets. Clinical experts advised that most AI technologies were not retrained on local training sets although some had the capacity for this. Ideally in the future AI technologies could be trained on a representative national population.
3.14 For all technologies, evidence gaps can be related to the population, the intervention, or the main outcomes. The committee concluded that there was enough evidence of potential benefits from the 9 technologies for them to be used in the NHS once they have DTAC approval, while further evidence is generated to address these gaps. Two other technologies also had evidence of potential benefits but these are awaiting CE or UK Conformity Assessed (UKCA) mark approval so cannot be used yet. Important evidence gaps for all technologies are:
Population: the most assessed anatomical sites were the head and neck, and pelvis or prostate. More evidence is needed on how well AI autocontouring works in different anatomical sites. There was no relevant published evidence on using AI autocontouring in specific population groups, such as children and young people or people with atypical anatomy because of surgery. The committee considered that the demographics of datasets used for training an algorithm may differ from populations in clinical settings. It highlighted the need for evidence generation on how AI technologies work in clinical practice in local populations, including information on population demographics such as age, sex, disability and ethnicity.
Intervention: there were 2 technologies (ART-Plan and AutoContour) with no full-text evidence.
Outcomes: only 4 technologies (DLCExpert, INTContour, Limbus Contour and RayStation) had evidence that included dosimetric analysis. The committee highlighted the need for further evidence on dosimetric analysis in other technologies. In all technologies there was a need for long-term patient outcomes and data on adverse events including the impact of AI autocontouring on radiation toxicity. There were 6 technologies (AI-Rad Companion Organs RT, DLCExpert, Limbus Contour, MIM Contour ProtégéAI, MVision Segmentation Service, OSAIRIS) that had evidence on the time saved using AI technologies compared with atlas-based or manual contouring. Time saved was highlighted as a key potential benefit of these technologies. The committee highlighted the importance for evidence generation in timesaving to include time for healthcare professional review and edits after AI technology use. Clinician acceptability and number of edits needed for AI autocontouring may impact overall time saved and so should be accounted for. The committee also noted that the time and cost saving potential is impacted by who edits the contour. So, any evidence generated should include healthcare professional grade and the impact of this on time and cost saving.
3.15 In addition to the key outcomes listed in section 1.4, the committee agreed that real-world evidence on using AI autocontouring in clinical practice could provide more valuable information about:
accuracy and acceptability of autocontours across a range of anatomical sites
how well AI technologies work in an NHS population, including people with limited mobility or atypical anatomy
the frequency of software updates and impact of updates on how well AI autocontouring works.
In the longer term, evidence on patient outcomes such as radiation toxicity and survival outcomes could become available.