Artificial intelligence-derived software to analyse chest X-rays for suspected lung cancer in primary care referrals: early value assessment

Health technology evaluation
Reference number: HTE12
Published: 28 September 2023

3 Committee discussion

The diagnostics advisory committee considered evidence on software with artificial intelligence (AI)‑derived algorithms for detecting and measuring lung nodules and other abnormalities in chest X‑ray images from several sources, including an external assessment report and an overview of that report. Full details are in the project documents for this guidance.

Unmet need

There is an unmet need for fast chest X-ray reports

3.1 In primary care chest X‑ray referrals, there is an unmet need for quicker reporting of chest X-rays. Sometimes, it takes a long time for chest X‑ray reports to be returned to a GP, which can have an impact on the time to a CT scan, diagnosis and treatment of lung cancer. Factors leading to this delay in chest X‑ray reporting include a backlog of chest X‑rays that need review and not enough radiologist and reporting radiographer capacity. A clinical expert explained that in some areas, lung cancer specialists would not accept a referral for a CT scan unless the chest X‑ray is done and reported. Software that provide triaging of images could prioritise images with abnormal features that suggest lung cancer for urgent review, which could result in faster referral to CT scan, diagnosis and treatment if needed.

Anxiety while waiting for final diagnosis

3.2 A patient expert said that people who are told they have a lung nodule or other abnormality may experience anxiety, especially if the chest X‑ray was done without any expectation of finding nodules or abnormalities suggesting cancer. Having more information as soon as possible is important and reduces anxiety for people with suspected lung cancer and their families. The patient expert explained that most people are happy to have all tests done quickly, because immediate results would turn a passive period of waiting into something more proactive and less uncertain. According to the patient expert, people would be happy to have AI-derived software used as part of the diagnostic pathway as long as there was evidence to show that it is accurate.

Anxiety around use of AI-derived software

3.3 A patient expert said that some people may not trust AI‑derived software. The committee highlighted that trust in AI‑derived software for patients and healthcare professionals was important for its efficient use. This would require standardisation of technologies and further research in the setting of interest.

Software capability

Different software have different capabilities

3.4 AI‑derived software have different capabilities. Some technologies only provide computer-aided detection (CADe) or computer-aided diagnosis (CADx), which can detect or diagnose an abnormality on the X‑ray, whereas others provide both CADe or CADx and computer-assisted triage (CAST; see section 2.4). The committee considered that software that can prioritise image review with abnormalities suggesting lung cancer might have the greatest benefit, because this could reduce time to CT scan, diagnosis and treatment (see section 3.1). It heard that AI‑derived software for chest X‑rays are not able to compare a previous chest X‑ray with a new one. Comparison with previous chest X‑rays would be helpful because it could help determine whether a lung abnormality was of any clinical concern.

Clinical effectiveness

Technologies with no published evidence

3.5 The committee considered the summarised evidence. The external assessment group's (EAG's) review found no relevant published evidence for the population referred from primary care. The committee acknowledged the EAG's expanded review criteria, that included studies in populations where the referral criteria were unclear and the addendum that included studies that compared X‑ray review by AI‑derived software alone with review by a radiology specialist alone. It noted that there was no evidence for 10 of the 14 technologies: Auto Lung Nodule Detection, ChestLink, ChestView, Chest X‑ray, ClearRead Xray Detect, InferRead DR Chest, Milvue Suite, qXR, SenseCare-Chest DR PRO and VUNO Med‑Chest X‑Ray. The committee recommended more research on these technologies (see section 1.3).

Generalisability of evidence to clinical practice

3.6 In the original review, there were no studies that looked at accuracy in the population of interest, that is, those referred from primary care. There were 6 studies that compared the accuracy of X‑ray review by a radiology specialist alongside AI-derived software with review by a radiology specialist alone, but it was unclear where the population had been referred from. In an addendum, there were 5 studies that compared the accuracy of X‑ray review by AI‑derived software alone with review by a radiology specialist alone. One of these studies was in people referred from primary care, the other studies had mixed populations or unclear referral routes. The primary care referral population is likely to be different to populations from other settings, such as inpatients and people presenting to the emergency department. These populations could have a higher prevalence of disease and present at a more advanced stage, making it easier to detect abnormalities that indicate cancer. In most primary care referral cases, chest X‑rays requested by GPs are done to rule out lung cancer because it is unlikely that the person has lung cancer. So, AI‑derived software trained and evaluated in populations from other settings or mixed settings are likely to perform differently in people referred from primary care. The committee agreed that the accuracy data was unlikely to be generalisable to a primary care population. The committee also acknowledged that only 2 UK studies were identified and that diagnostic pathways in different countries may be different. The UK study assessed the diagnostic accuracy of red dot (Behold.ai), but the population was unclear.

Study design

3.7 The committee expressed concerns because all 6 studies included in the original assessment and all 5 studies identified in the addendum were retrospective. Many studies also had small population sizes. The committee was also concerned that most of the studies had unclear referral routes, with only 1 study done in a population that had been referred for chest X‑ray from primary care. Also, some studies used enhanced data sets or specific exclusion criteria, for example, excluding low-quality X‑rays, X‑rays with lung nodules below a certain size or lateral view X‑rays, which meant studies had a low generalisability to real-world use. There was concern that because the design of the studies did not reflect how AI‑derived software may be used in clinical practice in the NHS, the outcomes seen may differ. The committee considered that the studies that looked at using AI‑derived software alone compared with radiology specialist review alone do not reflect how the software would be used in practice. Regulation requires AI software to be used as a tool alongside clinician reporting unless approved for autonomous diagnosis. None of the AI‑derived software included in this assessment are approved for autonomous diagnosis. The committee concluded that there is a need for prospective studies, in a population referred from primary care, that reflect how the software would be used in clinical practice in the NHS.

Accuracy for detecting lung cancer and nodules

3.8 Only 2 studies, both assessing red dot (Behold.ai) in the UK in populations with unclear referral, reported accuracy to detect lung cancer. Of these, 1 study found AI‑derived software had statistically significantly higher sensitivity than X‑ray review without the software. It also found that AI-derived software had lower specificity but the difference was not statistically significant. The other 5 studies from the original review and the 5 studies from the addendum review that reported accuracy to detect nodules did not report any statistically significant differences in accuracy. In practice, if specificity is lower, this would mean more false-positive results. That is, more people who do not have lung cancer would go on to have CT scans, which are associated with anxiety and costs to the NHS. If sensitivity to detect lung cancer is higher, this would mean more true positive lung cancer cases would be detected and referred for CT scans, which could lead to treatment at an earlier disease stage and improved outcomes. In contrast, if sensitivity to detect lung cancer is lower, this could result in missing the opportunity to detect cancer early. Cancer would be identified at a later stage when the disease is more advanced, which may be associated with worse outcomes and more costly treatment. No studies reported technical failure rates or how many images were rejected by the software because they were of poor quality and so could not be interpreted. Because none of the studies looked at more than 1 software, a direct comparison between different software was not possible. The committee concluded that further research was needed on how using AI‑derived software alongside clinician review of chest X‑rays affects the accuracy of detecting lung cancer. Research is also needed on the technical failure rates and rejection rates of the software.

Standardisation of reporting

3.9 Clinical experts noted that junior registrars must show that they can achieve a minimum standard before they are allowed to report chest X‑rays and that there is need for similar standardised benchmarking for AI‑derived software. This benchmarking could be used to compare the different software and identify those that reach a specified level of accuracy when used alongside clinician review. This would help to protect patients and establish trust in the software. A benchmark standard is currently not available. The committee agreed that it would be beneficial for standardisation of AI‑derived software to be developed by a specialised working group.

Time to read and report a chest X-ray

3.10 The EAG's review included 2 studies that looked at time to read and report a chest X‑ray. The comparisons in the studies were done at least partly in laboratory-like conditions, rather than in routine clinical practice. So, results may not be generalisable to current practice in the NHS. No studies suggested that read and report was faster with AI‑derived software than without. Because AI‑derived software would be used in addition to review by a radiologist or reporting radiographer, it may not reduce time to read and report chest X‑rays. The time may also depend on how well the software integrate into the radiologists' workflow within the picture archiving and communication system (PACS). AI‑derived software may support less experienced or trainee radiologists and reporting radiographers. This is because the software may identify nodules or abnormalities that the less experienced reader may not pick up, and this could help to improve their skills initially. This could help standardise the quality of reading and reporting of chest X‑rays across radiology specialists with differing levels of experience. The committee was uncertain about whether using AI‑derived software would speed up reading and reporting a chest X‑ray in a UK clinical setting. Further evidence should be collected on the impact of AI-derived software on time to read and report chest X‑rays and if this is different between readers with different levels of experience.

Triaging of images

3.11 There were no studies that reported time to CT referral or time to diagnosis. Some software can triage chest X‑rays, identifying images that have features that suggest lung cancer, which can be prioritised for review, and identify other images that are likely to be normal and could potentially be reviewed faster. This could help improve workflow by focusing radiology resources on the more urgent chest X‑rays, and potentially speeding up time to CT referral, diagnosis and treatment when needed. It could also enable a same-day CT scan pathway in the population referred from primary care. However, the committee noted that a possible disadvantage to AI‑derived software for identifying normal chest X‑rays with high confidence may be that the clinician has a preconceived idea about the image and may be less likely to notice abnormalities. The committee considered studies in the addendum that reported the negative predictive value of AI‑derived software alone compared with clinician review alone to identify normal X‑rays with high confidence. In the 1 study that reported a comparison of negative predictive values, a higher value was reported for AI‑derived software alone (0.97) compared with clinician review alone (0.94). The higher the negative predictive value, the better the AI‑derived software are at identifying X‑rays that have no lung abnormalities. The committee concluded that some software could meet the unmet need for faster chest X‑ray reports, leading to faster CT referral, diagnosis and treatment if needed. It also concluded that ruling out normal X‑rays with high confidence is the most promising area for this software in clinical practice. However, it agreed that more prospective studies are needed on how using AI‑derived software to triage chest X‑rays from people referred from primary care affects time to CT referral and time to diagnosis. An ongoing UK-based randomised controlled trial (LungIMPACT) on the impact of AI‑derived software for triaging to chest CT in people who have been referred by their GP for chest X‑rays may provide useful evidence. Key outcomes are time to CT scan and time to lung cancer diagnosis.

Populations that could be particularly impacted by the technologies

3.12 The committee considered groups of people that could particularly benefit from the software. It recognised that detecting lung nodules and other abnormalities can be difficult in people with underlying lung conditions such as asthma, chronic obstructive pulmonary disease (COPD), people whose family background means they may be at a higher risk of having lung cancer, and younger women who do not smoke. If using the software helps to improve lung cancer detection, it would particularly benefit these groups. The committee considered groups that may be disadvantaged because of the difficulty in producing high-quality X‑rays in certain conditions. Poor-quality images may be rejected because the software cannot interpret them. This may include people with scoliosis and morbid obesity. The ability of AI‑derived software to adapt and how the AI‑derived software has been trained will determine its ability to read images of different quality. The committee recommended further research to ensure that the software works in these groups and that they would not be disadvantaged.

Cost effectiveness

Conceptual model structure

3.13 The EAG developed a conceptual decision analytic model to inform potential future full cost-effectiveness evaluation of AI‑derived software to analyse chest X‑rays to identify suspected lung cancer. A model structure was developed based on the chest X‑ray clinical pathway. The committee agreed that the conceptual model was a good basic framework. But it highlighted some issues that would need clarifying, for example, whether people with lung cancer and no abnormalities identified on a chest X‑ray would be picked up at a later point, such as presentation at an emergency department. Triaging and prioritising images with abnormal features that suggest lung cancer, and the impact of this on time to CT scan, diagnosis and treatment, would need to be captured in a future model. The committee agreed that a linked-evidence approach to economic modelling would be acceptable. That is, using diagnostic accuracy and time to diagnosis data linked to long-term outcome data from separate studies.

Software costs

3.14 The EAG considered the costs of introducing AI‑derived software alongside radiology specialist review of chest X‑rays by developing a simple budget impact analysis. The budget impact analysis considered one-off set-up costs, annual subscription fee based on a volume of 16,945 images, total cost per year and the cost over the first 5 years. Because the literature reviews did not provide any evidence to show changes in resource use because of AI‑derived software, these were not considered in the budget impact calculations. Test costs varied between companies, but the EAG cautioned against direct comparison, because the AI‑derived software presented have varying capabilities and some may be used in different positions early in the diagnostic pathway.

Conclusions

Potential benefits and risks

3.15 The potential benefits associated with using AI‑derived software could include:

A reduction in the time radiologists or diagnostic radiographers spend reviewing and reporting chest X‑rays, which could release staff resources and help address the high demand for image reading and the workforce capacity limitations.
Improved workflow, by identifying normal X‑rays with high confidence and prioritising abnormal X‑rays for immediate review, resulting in:
- A same-day CT pathway and reduced time from chest X‑ray to diagnosis and treatment, which could improve patient outcomes and quality of life, and save resources in the NHS.
- Increased confidence in reporting of normal X‑rays and reduced time to return a chest X-ray report to a GP.
Higher sensitivity to detect cancerous nodules and other abnormalities that suggest cancer, which could result in more cancers being identified and treated at an earlier stage, improving patient outcomes and quality of life, and saving resources in the NHS.

The potential risks associated with using AI‑derived software could include:
The cost of the AI‑derived software, which includes a one-off set-up cost, may not be offset by cost and resource savings later in the pathway.
Lower specificity to detect cancerous nodules and other abnormalities that suggest cancer could result in more people without cancer having CT scans, which would have cost and disutility implications.
Lower sensitivity to detect cancerous nodules and other abnormalities that suggest cancer could result in lung cancer being missed and identified at a more advanced disease stage, which could lead to more costly treatment and worse patient outcomes.
Equality concerns for cases when a high-quality X‑ray is difficult to obtain such as in people with scoliosis or morbid obesity.
Software may not reduce, or may increase, the time radiologists or diagnostic radiographers spend reviewing and reporting chest X‑rays, and so the workload and the time to return a chest X‑ray report to a GP would not be reduced.

The clinical effectiveness is unknown

3.16 The committee recalled that there was no evidence on the accuracy of AI‑derived software to detect suspected lung cancer in a population referred from primary care and no evidence on the technical failure rates. It noted that the evidence summarised came from populations with unclear referral criteria and the results may not be generalisable to a primary care population. The committee concluded that the accuracy of AI‑derived software to identify suspected lung cancer on chest X‑rays from people referred from primary care is uncertain. Therefore, the committee could not determine whether AI‑derived software is clinically effective, and recommended further evidence is generated on diagnostic accuracy and technical failure rates.

The cost impact is unknown

3.17 The diagnostic accuracy data on AI‑derived software to detect suspected lung cancer in a population referred from primary care is uncertain. So, it is also unknown how these technologies would impact the number of people referred for chest CT scans and the number of lung cancer cases that would be identified. So, the committee was unable to understand whether the benefits would outweigh the risks and was concerned that AI‑derived software may not be cost effective. It recommended further research on the impact of the software on clinical decision making, the number of people referred for chest CT scans, and how AI‑derived software impacts healthcare costs and resource use.

Addressing unmet need

3.18 The unmet need was around the speed of chest X‑ray reports being returned to the referring GP, particularly for X‑rays that show abnormalities that suggest lung cancer because a delay can impact on time to diagnosis, treatment and patient outcomes. There was no evidence on the impact of AI‑derived software on the time to CT scan and time to diagnosis from a population referred from primary care. So, the committee was uncertain whether any of the AI‑derived software could meet the unmet need, but it noted that some of the software probably could reduce delays. The committee recommended further research on the impact of AI‑derived software on review and reporting time and time to CT referral and diagnosis.

Use only in research

3.19 The committee recalled that the clinical effectiveness and cost impact of AI‑derived software could not be determined, and it was uncertain whether the software would address the unmet need. Also, the committee could not determine whether the benefits would outweigh the risks if the software was adopted for use in the NHS alongside evidence generation. It was concerned that AI‑derived software for chest X‑rays could be cost incurring and may not improve clinical outcomes and should not currently be used to inform clinical care in the NHS. So, the committee decided that AI‑derived software should only be used in research to resolve some of the uncertainty and allow re-assessment in the future when further data is available. The committee was aware that some centres are already using AI‑derived software and concluded that these centres may continue, but only under appropriate research governance. AI‑derived software should only be used alongside clinician review.