Clinical and technical evidence

A literature search was carried out for this briefing in accordance with the interim process and methods statement for medtech innovation briefings. This briefing includes the most relevant or best available published evidence relating to the clinical effectiveness of the technology. Further information about how the evidence for this briefing was selected is available on request by contacting mibs@nice.org.uk.

Published evidence

Four studies are summarised in this briefing. There are several abstracts and other studies with mainly smaller patient populations that have not been summarised in this briefing.

Three retrospective cohort studies with 135 (Jacob et al. 2017), 110 (Ungprasert et al. 2017) and 55 people (Maldonado et al. 2014) have been included. One validation study (Bartholmai et al. 2013) was also included.

The clinical evidence and its strengths and limitations is summarised in the overall assessment of the evidence.

Overall assessment of the evidence

There have been a significant number of studies done to analyse using lung texture analyses in assessing interstitial lung diseases. One of the strengths of the evidence base on the NHS context is that many of the studies have outcomes that are commonly used in NHS practice. Most studies are retrospective cohort studies using scan data from databases, rather than results from randomised controlled trials or long-term observational studies. A literature review by Jankharia and Angirish (2021) outlines the investigatory and validation studies that have been done in relation to Computer-Aided Lung Informatics for Pathology Evaluation and Rating (CALIPER) technology. The review includes images from the Imbio lung texture analysis software to show how the results are presented and analysed by radiologists. There are also several abstracts which have not yet been published as full articles that could potentially be useful in assessing the benefit of implementing lung texture analysis within the NHS.

Lung texture analysis is intended to be used in clinical practice as a supplement to visual analysis of CT scan results, to help increase radiologist agreement and identify features that may have been missed by readers. The evidence base consists of studies that compare quantitative CT analysis to visual CT analysis, pulmonary function tests and other imaging techniques. Also, many diagnoses are incorporated under interstitial lung diseases and the evidence base encompasses a significant number of these conditions.

Jacob et al. (2017)

Intervention and comparators

Computer-based CT analysis (CALIPER), compared with visual CT scoring and pulmonary function tests.

Key outcomes

The following CT features were scored visually: ground glass opacity, reticulation, honeycombing, consolidation, gas trapping and traction bronchiectasis. For CALIPER CT analysis, pulmonary vessel volume was also estimated. Pulmonary function measures examined included: forced expiratory volume in 1 second, forced vital capacity, diffusing capacity for carbon monoxide, carbon monoxide transfer coefficient, residual volume, total lung capacity and composite physiologic index.

Linkages between visual and CALIPER scores for shared interstitial CT patterns were strongest for honeycombing (r=0.77) and fibrosis extent (r=0.62), while they were weakest for ground glass opacities (r=0.21). On CALIPER CT analysis, total interstitial lung disease extent showed stronger linkages with forced vital capacity, diffusing capacity for carbon monoxide and composite physiologic index than corresponding visual CT scores. Of all the CT variables scored visually or by CALIPER, pulmonary vessel volume showed the strongest links with pulmonary function tests.

Strengths and limitations

The current accepted gold standard for identifying clinical deterioration in people with interstitial lung disease is the measurement of forced vital capacity. Therefore, CT analysis through visual and computer-based methods were specifically compared with this outcome measure. Every person included in the study was subject to all 3 interventions: CALIPER CT analysis, visual CT analysis and pulmonary function tests. Two radiologists independently evaluated each CT scan; they had 5 and 7 years thoracic imaging experience. A significant limitation of the study was that not all people had histopathological proof of diagnosis. Although it can be highlighted that this is reflective of the population usually seen in clinical practice.

Ungprasert et al. (2017)

Intervention and comparator

Computer-based CT analysis (CALIPER) supported by 1 radiologist's visual analysis, compared with pulmonary function tests.

Key outcomes

Correlation between baseline CALIPER measurements (low attenuation areas, ground glass opacity, reticular density, honeycombing and total interstitial abnormalities) and pulmonary function test measurements (total lung capacity, diffusing capacity for carbon monoxide, forced expiratory volume in 1 second, forced vital capacity and oxygen saturation) as well as correlation between changes in measurements at 1 year were tested using Spearman correlation coefficients.

At baseline, total interstitial abnormalities as measured by CALIPER had a significant negative correlation with total lung capacity, diffusing capacity for carbon monoxide and oxygen saturation. Analysis by sub-type of interstitial abnormality revealed significant negative correlations between ground glass opacity and reticular density with total lung capacity and diffusing capacity for carbon monoxide. A significant negative correlation between ground glass opacity and oxygen saturation was also observed. At 1‑year follow up, the only significant negative correlation was change of total interstitial abnormalities compared with changes in total lung capacity and oxygen saturation.

Strengths and limitations

One radiologist reviewed the high-resolution CT scans at baseline and subsequent timepoints in a blinded fashion to correct any errors that happened in the CALIPER software. The study was also the first to show evidence of CALIPER utility beyond idiopathic pulmonary fibrosis and hypersensitivity pneumonitis. The follow up at 1 year allows the evaluation of CALIPER use at different timepoints. A major limitation of the study was that there was no standardised protocol for doing high-resolution CTs and pulmonary function tests; these were done at the discretion of the clinicians which could introduce selection bias. Also, less than 50% of people had these assessments at 1 year, resulting in a small number of eligible people for the follow-up analysis.

Maldonado et al. (2014)

Intervention and comparators

Computer-based CT analysis (CALIPER), compared with visual analysis of CT scan by 2 thoracic radiologists and pulmonary function tests.

Key outcomes

The correlation between total lung volume measured by CALIPER and total lung capacity measured by pulmonary function tests was very good for both timepoints 1 (r=0.77) and 2 (r=0.87). Correlation between radiologist 1 and CALIPER for interstitial lung disease scoring was mild to moderate for timepoint 1 (range 0.29 to 0.63) and timepoint 2 (0.24 to 0.64). Correlation between radiologist 2 and CALIPER for interstitial lung disease scoring was mild to moderate for timepoint 1 (range 0.29 to 0.48) and timepoint 2 (0.28 to 0.64).

Strengths and limitations

The study directly investigated the agreement between CALIPER CT analysis and radiologists' visual analysis separately. There was a mild to moderate correlation between observations for both clinicians. This suggests that CALIPER may be a useful tool to help readers agree about CT scan results. An additional strength of the study is that the pulmonary function tests were done within 30 days of the respective CT scans. A limitation of this study is that it needed people to have had 2 high-resolution CTs. In clinical practice this is usually not possible because most people will only have the images from 1 high-resolution CT available. Although the study aimed to quantify correlation between CALIPER and visual CT analysis, this does not consider errors that could have been made using either of the 2 methods.

Bartholmai et al. (2013)

Intervention and comparators

Computer-based CT analysis (CALIPER), compared with visual analysis of CT scan and pulmonary function tests.

Key outcomes

To verify that the automated classification of lung abnormalities by CALIPER matched those of expert radiologist description of disease, regional matching to severity and character of disease determined by the interpreting radiologist was done using Spearman's correlation coefficients. Significant correlations were found for ground glass opacity (r=0.19 to 0.42, p<0.039), honeycombing (r=0.27 to 0.56, p<0.003) and reticular infiltrate (r=0.18 to 0.47, p<0.05) quantitative scores and corresponding visual scores.

Significant correlations were noted between CALIPER measurements and physiologic parameters that are accepted as biomarkers for disease severity in interstitial lung disease. Percentage of reticular infiltrates correlated significantly (p<0.001) with changes in 6‑minute walk total distance (r=-0.32), forced vital capacity (r=-0.63), diffusing capacity for carbon monoxide (r=-0.65) and total lung capacity (r=-0.44). Similarly, significant inverse correlation (p<0.001) existed between lung classified as normal by CALIPER and physiologic tests: 6‑minute walk total distance (r=0.32), forced vital capacity (r=0.66), diffusing capacity for carbon monoxide (r=0.59) and total lung capacity (r=0.56).

Strengths and limitations

The study was the first to describe the use of CALIPER in interstitial lung diseases and validate it using correlation with visual analysis and pulmonary function measurements. The most significant limitation of the study was that it was done at the same institute where the technology was developed.

Sustainability

There is no evidence on the sustainability of this technology.

Recent and ongoing studies

No ongoing or in-development trials were identified.