3 Evidence

The diagnostics advisory committee considered evidence on PredictSURE IBD and IBDX to guide treatment of Crohn's disease from several sources. Full details of all the evidence are in the committee papers.

Clinical effectiveness

3.1 The external assessment group (EAG) systematically reviewed evidence to evaluate the prognostic ability and clinical effectiveness of the PredictSURE IBD and IBDX tests to predict severe disease and guide treatment in people with Crohn's disease who:

have newly or recently diagnosed disease
have moderate to severe active disease
are currently not receiving any concomitant steroids, immunomodulators or biological treatments
would not have top-down treatment with current standard care in the NHS.

3.2 The EAG identified 8 primary studies (reported in 12 publications) that met the selection criteria for the literature review (see page 18 of the diagnostics assessment report for details of the selection criteria). The studies were all observational. Of the included studies, 7 reported on the diagnostic performance of IBDX. In these studies, a higher number of positive biomarkers was associated with poorer prognosis. Two of the IBDX studies (Wolfel et al. 2017 and Reider et al. 2010c) prospectively assessed the prognostic ability of IBDX for predicting complications (fistulas and stenoses) and Crohn's disease-related surgery. A third prospective study reported a correlation of IBDX with either a history of complication or surgery at baseline, or their occurrence during follow up (Rieder et al. 2010b). The other 4 studies were cross sectional, reporting a correlation between the number of positive IBDX biomarkers and outcomes associated with severe disease (a presence or history of complications or surgery) at the time of testing. Only 1 study (Biasci et al. 2019) reported on the predictive ability of PredictSURE IBD to classify people into either a high or low risk of severe Crohn's disease, as defined by the study investigators.

3.3 Of the studies identified for IBDX, 3 were carried out in Germany and 1 study each in Canada, France, and the US. One study, published as an abstract, had an unclear location. The study on PredictSURE IBD was carried out across 4 centres in the UK.

Study quality

3.4 The EAG assessed the risk of bias in the included studies using the quality in prognosis studies (QUIPS) tool. All the studies for IBDX were considered to be at a moderate or unclear risk of bias in the measurement of confounding factors domain. Three studies were considered to be at a moderate risk of bias in the participation domain. The study identified for PredictSURE IBD was assessed to be at a low or unclear risk of bias.

Prognostic accuracy

3.5 None of the studies for IBDX was done in only people with newly diagnosed Crohn's disease. The studies had people with an established diagnosis and with a recent diagnosis of Crohn's disease. Median duration of disease at the time of testing ranged from 10.6 months (interquartile range [IQR] 1.7 to 52.3) to 9.4 years (IQR 1 to 44). One prognostic study by Rieder et al. (2010c) assessed the ability of IBDX to predict developing a complication or needing surgery in people with no prior complication or surgery at baseline (n=76). People who tested positive for 2 or more out of 6 IBDX markers had a significantly higher risk of complications (hazard ratio [HR] 2.5; 95% confidence interval (CI) 1.03 to 6.1; p=0.043), or surgery (HR 3.6; 95% CI 1.2 to 11.0; p=0.023) during the median follow up of 53.7 months, than people who tested positive for 0 or 1 markers. A prognostic study by Wolfel et al. (2017), reported as a conference abstract, showed that the number of positive IBDX markers did not predict a shorter time to repeat intestinal surgery (n=118; median follow up of 100 months).

3.6 Reider et al (2010b) reported that people who had surgery (before or during follow up) had a higher number of positive IBDX markers (median 2.0 [range 1.0 to 3.0]) than those who did not (median 1.0 [range 0.0 to 2.0]; odds ratio [OR] 1.5 [95% CI 1.3 to 1.8]; p<0.001). Similarly, people with a complication had a higher number of positive IBDX markers (median 2.0 [range 1.0 to 3.0]) than those who did not (median 0.0 [range 0.0 to 2.0]; OR 1.5 [95% CI 1.3 to 1.9]; p<0.001). The remaining 4 IBDX studies reported the correlation between IBDX markers and disease phenotype at the time of testing. None of the studies for IBDX estimated sensitivity or specificity.

3.7 Biasci et al. (2019) reported on the prognostic ability of PredictSURE IBD in adults with newly diagnosed Crohn's disease who were not having concomitant treatment. The study included 2 training cohorts (66 people in the biomarker discovery cohort and 39 people in the whole blood classifier cohort) and 1 validation cohort (n=66). The validation cohort and the whole blood classifier cohort were considered relevant to this assessment. In the validation cohort, people categorised as high risk (n=27; 40.9%) had a statistically significantly higher risk of at least 1 treatment escalation than those categorised as low risk (n=39; 59.1%), with a HR of 2.65 (95% CI 1.32 to 5.34; p=0.006). Median duration of follow up was 1.6 years (IQR 1.0 to 3.7) in the high-risk group and 2.4 years (IQR 1.8 to 3.8) in the low-risk group. Sensitivity and specificity for predicting the need for 2 or more escalations in the first 12 months were 77.8% and 70.6% respectively, and within 18 months 72.7% and 73.2%. Negative predictive value for predicting multiple escalations in the first 18 months was 90.9%. Positive predictive value calculated by the EAG was 42.1%.

Comparative evidence

3.8 A sub-study by Lyons (2020) based on the same cohort as Biasci et al. 2019 and published as an abstract, compared the ability of PredictSURE IBD and IBDX to predict the need for multiple treatment escalations in 74 people with Crohn's disease at Addenbrooke's Hospital, Cambridge. Everyone had active disease at enrolment, and all had accelerated step-up treatment. The author concluded that there was no significant difference between the group who tested positive for at least 1 IBDX marker and those who tested positive for 2 or more IBDX markers, in terms of time to, or frequency of, treatment escalation. In comparison, when the cohort was stratified by PredictSURE IBD, people classed as high risk had a significantly shorter time to treatment escalation than people classed as low risk (p=0.001).

Clinical utility

3.9 No evidence was identified on how the tests affect the decision in clinical practice to offer top-down strategy to people at high risk of severe disease. There was also no evidence on how the tests affect the clinical outcomes of people with severe Crohn's disease.

Cost effectiveness

Systematic review of cost-effectiveness evidence

3.10 The EAG searched for studies on the cost effectiveness of PredictSURE IBD and IBDX in Crohn's disease and economic evaluations of treatments for people with newly diagnosed and moderate to severe Crohn's disease. It did not identify any economic studies for PredictSURE IBD and IBDX, but it did find 11 evaluations relevant to treatment options in Crohn's disease.

3.11 One study by Marchetti et al. (2013) specifically compared the cost effectiveness of top-down (step 1: infliximab plus azathioprine, step 2: additional infliximab plus azathioprine, step 3: methylprednisolone plus azathioprine) and step-up (step 1: methylprednisolone, step 2: methylprednisolone plus azathioprine, step 3: infliximab plus azathioprine) approaches in Italy. The authors concluded that the top-down strategy was better and less costly than the step-up strategy. The treatment strategies modelled in the study by Marchetti are not representative of UK NHS practice. The health economics report for NICE's guideline on Crohn's disease explored the cost effectiveness of 9 induction treatment sequences (composed of 4 treatment lines) for Crohn's disease from the NHS perspective. The remaining 9 studies compared individual treatment steps.

3.12 The company submitted an abstract of a study evaluating the cost effectiveness of PredictSURE IBD to guide early use of biologics in Crohn's disease and ulcerative colitis in the UK. The model structure comprised a decision tree then a Markov transition model. Study results were presented at the European Crohn's and Colitis Organisation Conference in February 2020. The results show that, over a 15-year time horizon, top-down treatment guided by PredictSURE IBD produced an incremental cost-effectiveness ratio (ICER) of £7,179 per quality-adjusted life year (QALY) gained when compared with standard care.

Economic analysis

3.13 The EAG developed a de novo model to assess the cost effectiveness of PredictSURE IBD and IBDX to guide treatment in Crohn's disease. There were no detailed data for IBDX so the EAG assessed its cost effectiveness in an exploratory scenario analysis only.

3.14 The economic analysis was done from the UK NHS and personal social services perspective. The model had a lifetime time horizon (65 years) with a cycle length of 2 weeks. Costs and benefits were discounted at 3.5% per year.

Model structure

3.15 The model was a hybrid model, with a decision tree for the induction treatment and a Markov transition model for the maintenance treatment. In the induction model, people whose disease does not respond (deterioration; no change; or an improvement of 70 or less in Crohn's Disease Activity Index [CDAI] score) have second-line treatment, according to their treatment allocation (top down or step up). People whose disease responds to the induction treatment (an improvement in CDAI score above 70) move to the maintenance model. They can enter the maintenance model either in remission, mild, or moderate to severe health states. People can then move between these states during maintenance treatment, reflecting the different levels of response to maintenance treatment. People in the mild and moderate to severe states are at risk of relapse and escalating to the next treatment step.

3.16 Escalations from corticosteroids to immunomodulators (step up) and from corticosteroids to tumour necrosis factor (TNF)-alpha inhibitors (top down) were not modelled because in both strategies all people have initial induction treatment with corticosteroids, so they cancel each other out.

3.17 Surgical events are modelled as a standalone outcome in the model, that is, people did not leave their respective health states to enter a surgery health state. Complications and long-term consequences of surgery were not modelled. Time to surgery in the high-risk, top-down cohort was estimated by applying a hazard function generated from the study by Hoekman et al. (2018).

Model inputs

3.18 The population modelled was based on the UK study by Biasci et al. (2019), which the company provided anonymised individual patient data for. There were 105 people in the cohort with Crohn's disease; 88 were newly diagnosed. However, the EAG based its analysis on 40 people in the study whose treatment matched the standard definition of step-up treatment in the UK, that is, people who had first-line treatment with corticosteroids and second-line treatment with immunomodulators (after failure of corticosteroids). This informed the estimates of time to treatment escalation and time to surgery in the base case. To extrapolate time to treatment escalation data to the time horizon of the model, the EAG used individual patient data to generate time to event data for time to first escalation.

3.19 D'Haens et al. (2008) and its 10-year follow-up study by Hoekman et al. (2018) informed estimates for effectiveness of top-down compared with step-up treatment. The study by D'Haens was a 2-year multicentre randomised trial that assessed the clinical efficacy of early combined immunosuppression (top-down treatment) compared with conventional treatment (step-up treatment) in people with newly diagnosed Crohn's disease. People randomised to top-down treatment had induction treatment with infliximab and azathioprine. People had no infliximab maintenance but were allowed infliximab as needed and, if necessary, corticosteroids, to control disease activity. People randomised to step-up treatment had corticosteroids, followed, in sequence, by azathioprine and infliximab. The study by Hoekman et al. (2018) retrospectively reviewed the medical records of people included in the D'Haens trial, which collected data on hospitalisation, flares, surgery, clinical activity and other outcomes, for a median follow up of 10 years.

Effectiveness of induction and maintenance therapies

3.20 Probabilities of response and remission with induction and maintenance therapies were based on data from a pragmatic search and from NICE's guidance on vedolizumab for treating moderately to severely active Crohn's disease after prior therapy. Based on this guidance, the EAG estimated that 21.2% of responders remained in the moderate to severe disease state. The probability of response is the same for top down and step up, except for immunomodulators in the step-up strategy (table 1).

Table 1 Probability of response and remission with induction and maintenance therapies

Treatment strategy	Induction: response	Induction: remission	Maintenance: response	Maintenance: remission
Top down: biologics	32%	13%	2%	28%
Top down: anti-TNF	26%	37%	10%	33%
Step up: biologics	32%	13%	2%	28%
Step up: anti-TNF	26%	37%	10%	33%
Step up: immunomodulator	23%	16%	15%	25%

3.21 The costs considered in the model are the costs of the diagnostic tests, treatment and care of Crohn's disease. The total cost of testing charged by the laboratory was £1,250 for PredictSURE IBD and £347 (estimated) for IBDX. Table 2 shows the dose prices and induction dosages for induction treatment in top-down and step-up strategies, taken from BNF and NHS reference costs, and maintenance treatment dosages based on clinical opinion.

Table 2 Treatment doses and costs for induction and maintenance therapies

Treatment	Dose per unit (mg)	List price per unit	Induction dosages	Maintenance dosages
Ustekinumab	130	£2,147.00	Induction dose at week 0 depends on body weight: 260 mg for 56 kg 390 mg for 56 kg to 85 kg 520 mg for 86 kg or over	90 mg every 8 weeks
Vedolizumab	300	£2,050.00	300 mg at week 0, 2 and 6	300 mg every 8 weeks
Infliximab	100	£377,66	5 mg/kg at week 0, 2 and 6	5 mg/kg every 8 weeks
Adalimumab	40	£308.13	160 mg at week 0; 80 mg at week 2	40 mg every 2 weeks
Azathioprine	50	£0.04	2.5 mg/kg/week for 8 weeks	2.5 mg/kg/week
6-MP	50	£1.97	1.25 mg/kg/week	1.25 mg/kg/week
Methotrexate	25/15	£16.64 or £14.92	25 mg/week for 8 weeks	15 mg/week
Prednisolone	2.5	£0.04	40 mg; tapered by 5 mg per week – 8 weeks total	No maintenance with prednisolone
Intravenous administration (outpatient)	1	First: £199 Follow up: £212	Not applicable	Not applicable

The total cost of managing maintenance health states for 2 weeks was £17 for remission, £27 for a mild state and £122 for a moderate to severe state. This included outpatient, radiology, endoscopy and hospitalisation costs.

3.22 The EAG used the utility values from NICE's guidance on vedolizumab (based on EQ-5D data from GEMINI studies) in the base case analysis and a mapping algorithm based on NICE's guidance on ustekinumab for moderately to severely active Crohn's disease after prior therapy in a scenario analysis. All utilities were adjusted to account for the age and sex of the modelled population, according to Ara and Brazier 2010. Surgery-related disutility was estimated from the Marchetti study. Table 3 shows the utility values used in the modelling.

Table 3 Utility values used for remission, mild, moderate to severe health states

Health state	NICE guidance on vedolizumab	NICE guidance on ustekinumab
Remission	0.820	0.820
Mild disease	0.730	0.700
Moderate to severe	0.570	0.550

Key assumptions

3.23 The EAG assumed that:

PredictSURE IBD (and IBDX in the scenario analysis) are 100% accurate in categorising people into high and low risk of severe disease.
People categorised as high risk by the test have top-down treatment.
People have the same baseline probability of escalating to the next step in the step-up strategy (estimated from time to first escalation in Biasci et al.) regardless of the number of previous escalations.
30% of people receiving anti-TNF and 20% of people receiving non-anti-TNF biologics have combination treatment with immunomodulators.
Response to anti-TNF does not depend on the prior lines of treatment.
People in the top-down strategy have a longer time to treatment escalation and a longer time to surgery than people in the step-up strategy, based on extrapolation of results from D'Haens et al. (2008) and Hoekman et al. (2018).

3.24 To use the D'Haens study the EAG assumed that:

The relative treatment effect of top-down and step-up strategies in a mixed-risk population is the same as the relative treatment effect in a high-risk population.
Time to relapse is a proxy measure for time to the next treatment escalation.
The effectiveness of treatment strategies in this study is a proxy for the treatment effectiveness of the first step in the top-down (anti-TNF) and step-up (immunomodulators) strategies modelled.

3.25 To estimate the relative treatment effect of top-down and step-up treatment on time to treatment escalation, the EAG digitised the time to relapse Kaplan–Meier data from D'Haens et al. to estimate a hazard function. This was applied to the first treatment step in the high-risk, top-down arm of the model.

3.26 The base case was revised to reflect the assumption that time to treatment escalation restarts on each new treatment rather than reducing over time and as treatment sequences progress. Cost-effectiveness results presented are from the revised base case.

Base case results

3.27 Results of the revised base case (detailed in the addendum to the diagnostics assessment report) superseded results of the primary analysis. In the revised analysis the time to treatment escalation restarts on each new treatment.

3.28 The base case compared the top-down strategy (using PredictSURE IBD to predict who was high risk) with standard care, in which a high-risk person has step-up treatment. In both the deterministic and probabilistic analyses PredictSURE IBD was dominated by standard care, meaning it costs more and has less QALYs.

Deterministic result: incremental cost was £9,084 and incremental QALY was -0.08.
Probabilistic result: incremental cost was £12,132 and incremental QALY was -0.03.

The testing strategy had a less than 10% probability of being cost effective against standard care at the maximum acceptable ICERs of £20,000 and £30,000 per QALY gained.

3.29 The deterministic fully incremental cost-effectiveness analysis explored as a scenario analysis showed that PredictSURE IBD (incremental cost £903, incremental QALY 0) and IBDX (incremental cost £8,181, incremental QALY -0.08) were dominated when compared with standard care of no testing. In the absence of robust evidence on the prognostic accuracy of both tools, the cost-effectiveness analysis only differs in the cost of the tests.

Cost-effectiveness results: scenario analyses

3.30 The dominance of the step-up strategy may be because of the benefit some people get from having immunomodulators first, before biologics. The clinical experts told the EAG that people on the top-down strategy do not have immunomodulators after 3 lines of biologics. However, the EAG explored a scenario which had immunomodulators as the last treatment option in the top-down arm. Deterministic base case results for this scenario showed that PredictSURE IBD (top-down strategy) generated 0.07 more QALYs than the step-up strategy, at an additional cost of £7,502, producing an ICER of £105,148 per QALY gained. This is higher than £20,000 to £30,000 per QALY gained, the range NICE normally considers an acceptable use of NHS resources.

3.31 The EAG ran a series of individual scenario analyses, most of which showed that PredictSURE IBD was dominated by standard care. The EAG also ran a combination of individual scenarios, because these were thought to have more impact than individual scenarios.

3.32 If the analysis assumed that the condition did not respond to treatment with immunomodulators for any high-risk person in the step-up arm, so they had no benefit from them, PredictSURE IBD had an ICER of £170,180 per QALY gained. The proportion of people who responded to immunomodulators was then varied. This showed that the 2 strategies became clinically equivalent when it was assumed that 97% of high-risk people in the step-up arm do not benefit from immunomodulators.

3.33 The EAG explored a scenario that assumed PredictSURE IBD had a lower test accuracy, and the effect of misdiagnosis. In this scenario PredictSURE was more costly and generated a QALY gain of 0.15, producing an ICER of £64,876 per QALY gained. This gain in QALY, despite the lower accuracy of the test, can be attributed to the assumption that some lower-risk people misdiagnosed as high risk go on to have top-down treatment, without the need for further escalation.

3.34 Assumptions about treatment discontinuation were based on Marchetti 2013, which reported that 76% of people had mucosal healing after 2 years in remission with biologic treatments using a top-down strategy and 40% using a step-up strategy. In a scenario analysis that assumed 76% of people in the top-down arm and 40% of people in the step-up arm discontinued biologics, PredictSURE IBD was less costly and less effective than standard care, producing an ICER of £46,263 per QALY gained. Scenarios combining the effect of misdiagnosis with the same proportion of people discontinuing biologics in both arms, that is, 40% in step up and top down or 76% in step up and top down, produced ICERs of £48,034 and £32,875 per QALY gained respectively.

3.35 Individual scenarios were combined to explore the impact of increasing the effectiveness of the top-down strategy while reducing the treatment cost of biologics. The results of these combined scenarios varied. One scenario combined 3 assumptions, that:

base case risk of relapse for second and later treatment steps is the same
discontinuation of biologic treatment is 76% for top down and step up
100% of people in the step-up arm do not respond to immunomodulators.

This produced an ICER of £29,225 per QALY gained in favour of top down.

3.36 A tornado plot of the one-way sensitivity analyses showed that response to biologics in the top-down arm of the model was a key driver of the deterministic ICER.

How are you taking part in this consultation?

PredictSURE IBD and IBDX to guide treatment of Crohn's disease