3 Evidence
The Appraisal Committee (section 6) considered evidence submitted by Boehringer Ingelheim and a review of this submission by the Evidence Review Group (ERG; section 7). See the Committee papers for full details of the evidence.
Clinical effectiveness
3.1 The clinical evidence for nintedanib came from 3 multicentre, double‑blind, placebo‑controlled, randomised trials comprising 2 phase III trials (INPULSIS 1 [n=513] and INPULSIS 2 [n=548]) and a phase IIb dose‑ranging trial (TOMORROW [n=428]). All 3 trials compared nintedanib with placebo for 52 weeks in adults of 40 years or older with idiopathic pulmonary fibrosis. The primary outcome was the rate of decline (ml per year) in forced vital capacity (FVC). The trials included people with an FVC of at least 50% of the predicted normal value, and a diffusion capacity of the lung for carbon monoxide of 30–79% of the predicted normal value at baseline. The mean percent predicted FVC at baseline was approximately 80% in all 3 trials.
3.2 The key outcomes from the phase III nintedanib trials are presented in tables 1, 2 and 3. The annual rate of decline in FVC with nintedanib (114.7 ml/year) was approximately half that of placebo; this difference was statistically significant (p<0.001). Fewer people randomised to nintedanib died compared with placebo, but this difference was not statistically significant. The time to first acute exacerbation was inconsistent across the trials:

In INPULSIS 1, there was no statistically significant difference between nintedanib and placebo.

In INPULSIS 2 the difference showed a benefit in favour of nintedanib, and was statistically significant.

The pooled analysis of the 2 trials showed a benefit in favour of nintedanib, which was not statistically significant: 4.9% of people in the nintedanib arm had 1 or more acute exacerbations in 52 weeks compared with 7.6% of people in the placebo arm (HR 0.64, p=0.08).
The company noted that the INPULSIS trials were not powered to detect the effect of nintedanib on acute exacerbations.
Study 
Annual rate of FVC decline (ml/year) 
FVC responders^{a} 
≥1 acute exacerbation in 52 weeks^{b} 
Death (all cause) 

Nintedanib 150 mg twice daily 
−114.7 
218/309 (70.6%) 
19/309 (6.1%) 
13/309 (4.2%) 
Placebo 
−239.9 
116/206 (56.9%) 
11/206 (5.4%) 
13/206 (6.4%) 
Measure of effect HR/MD/OR (95% CI) 
MD: 125.3 (77.7, 172.8) p<0.001 
OR: 1.91 (1.32, 2.79) p<0.001 
HR: 1.15 (0.54, 2.42) p=0.67 
HR: 0.63 (0.29,1.36) p=0.29 
Study 
Annual rate of FVC decline (ml/year) 
FVC responders^{a} 
≥1 acute exacerbation in 52 weeks^{b} 
Death (all cause) 

Nintedanib 150 mg twice daily 
−113.6 
229/331 (69.6%) 
12/331 (3.6%) 
22/331 (6.7%) 
Placebo 
−207.3 
140/220 (63.9%) 
21/220 (9.6%) 
20/220 (9.1%) 
Measure of effect HR/MD/OR (95% CI) 
MD: 93.7 (44.8, 142.7) p<0.001 
OR: 1.29 (0.89, 1.86) p=0.18 
HR: 0.38 (0.19, 0.77) p=0.005 
HR: 0.74 (0.40, 1.35) p=0.30 
Study 
Annual rate of FVC decline (ml/year) 
FVC responders^{a} 
≥1 acute exacerbation in 52 weeks^{b} 
Death (all cause) 

Nintedanib 150 mg twice daily 
−113.6 
447/638 (70.1%) 
31/638 (4.9%) 
35/638 (5.5%) 
Placebo 
−223.5 
256/423 (60.5%) 
32/423 (7.6%) 
33/423 (7.8%) 
Measure of effect HR/MD/OR (95% CI) 
MD: 109.9 (75.9, 144.0) p<0.0001 
OR: 1.58 (1.21, 2.05) p=0.0007 
HR: 0.64 (0.39, 1.05) p=0.08 
HR: 0.70 (0.43, 1.12) p=0.14 
^{a }People with absolute decline in percent predicted FVC <10% at 52 weeks.
^{b }Investigator‑reported acute exacerbations (according to the criteria described by the trial protocol); hazard ratio is based on analysis of time to first event.
^{c }Source of pooled results: nintedanib summary of product characteristics (individual trial results were presented in the company submission).
Abbreviations: CI, confidence interval; FVC, forced vital capacity; HR, hazard ratio; MD, mean difference; OR, odds ratio.
3.3 Subgroup analyses showed that there were no statistically significant differences between the effectiveness of nintedanib on slowing lung function decline in people with a percent predicted FVC of 50–80% and people with a percent predicted FVC of more than 80%.
3.4 To compare nintedanib with pirfenidone, the company did a network meta‑analysis including the 3 nintedanib trials and 5 placebo‑controlled trials of pirfenidone (SP2, SP3, CAPACITY 1, CAPACITY 2 and ASCEND) which informed its economic model. The company chose different trials for different end points in the model:

It included evidence from all the trials for overall survival.

It excluded the 2 pirfenidone trials in Japanese populations (SP2 and SP3) for acute exacerbations, because of heterogeneity (differences compared with other studies, including longer disease duration and a different proportion of people who smoke).

It excluded the ASCEND study of pirfenidone for decline in lung function.
The results are presented in table 4. The base‑case results for overall survival were the same for nintedanib and pirfenidone, and neither drug showed a statistically significant difference in mortality compared with placebo. The base‑case analysis of acute exacerbations showed comparable benefits for nintedanib and pirfenidone compared with placebo, but the company reported uncertainty in the results, which it considered to be a result of heterogeneity in the Japanese trials of pirfenidone (SP2 and SP3). After excluding these trials from the network meta‑analysis ('scenario 3' of the sensitivity analyses for this outcome), the results showed fewer acute exacerbations with nintedanib than pirfenidone. The company's analysis of loss of lung function (defined by the company as an absolute decline in percent predicted FVC of at least 10%) gave similar results for nintedanib and pirfenidone using the base‑case network meta‑analysis. The differences in loss of lung function between each drug and placebo were statistically significant. After excluding the ASCEND trial of pirfenidone from the network meta‑analysis because of heterogeneity ('scenario 2' of the sensitivity analyses for this outcome), the results suggested that nintedanib was more effective than pirfenidone at reducing loss of lung function, however the company did not state whether this difference was statistically significant.
3.5 The company evaluated 4 safety outcomes in its network meta‑analysis. It reported that, compared with people receiving placebo, those receiving nintedanib were more likely to have severe gastrointestinal events (p=0.055), stop the study drug (p=0.014), and have adverse events that led to stopping the study drug (p=0.007). These differences were statistically significant. Nintedanib was associated with fewer serious cardiac events than placebo and pirfenidone, but the odds ratios were not statistically significant. Nintedanib was associated with more serious gastrointestinal events than pirfenidone (odds ratio 3.96, 95% confidence interval [CI] 1.18 to 14.51, p value not reported). The company reported that, compared with pirfenidone, nintedanib was associated with lower rates of stopping because of adverse events (odds ratio 0.88, 95% CI 0.57 to 1.37, difference not statistically significant).
Comparison 
Overall survival: Median odds ratio (95% CI), fixed effect model (NMA base case: all evidence) 
Acute exacerbations: Median odds ratio (95% CI), fixed effect model (NMA scenario 3: excluded heterogeneous trials) 
Loss of lung function^{a}: Median odds ratio (95% CI), fixed effect model (NMA scenario 2: excluded heterogeneous trial) 

Nintedanib compared with placebo 
0.70 (0.45, 1.10) 
0.56 (0.35, 0.89)c 
0.54 (0.42, 0.69)c 
Pirfenidone compared with placebo 
0.70 (0.46, 1.05) 
1.01 (0.22, 4.50) 
0.69 (0.47, 1.00)c 
Nintedanib compared with pirfenidoneb 
1.00 (0.55, 1.85) 
0.56 (0.12, 2.68) 
0.78 (0.49, 1.22) 
^{a} Defined as an absolute decline in percent predicted FVC of over 10% by the end of the study follow‑up.
^{b} Results of significance testing not reported.
^{c} Statistically significant.
Abbreviations: CI, confidence interval; FVC, forced vital capacity; NMA, network meta‑analysis.
ERG comments
3.6 The ERG highlighted that the 3 nintedanib trials enrolled people with a percent predicted FVC of at least 50% and therefore did not provide evidence for people with more severe disease.
3.7 The ERG was concerned that the company did not fully explain how lung function, physical function or acute exacerbations predict the course and outcome of the disease in patients. Therefore it was unclear which specific outcomes were the most clinically meaningful.
3.8 The ERG's key concern with the network meta‑analysis was the potential for bias in favour of nintedanib because the company excluded studies in some scenarios.
Cost effectiveness
3.9 The company provided a Markov model to assess the cost effectiveness of nintedanib compared with pirfenidone or best supportive care in adults with idiopathic pulmonary fibrosis. The company modelled people with a percent predicted FVC of 50% or more (although the marketing authorisation does not have a restriction related to FVC). The model used a lifetime time horizon, with a cycle length of 3 months.
3.10 The 19 health states in the model used a combination of 2 measures: percent predicted FVC (defined as approximately 10 percentage point increments) and the occurrence of an acute exacerbation. People entered the model in different health states based on percent predicted FVC and without having had an exacerbation. They could remain in the same health state or move through the model to different health states by:

loss of lung function (representing disease progression, defined as a 10 percentage point decrease in percent predicted FVC)

exacerbation

loss of lung function and exacerbation

death.
Once a person progressed to a health state with a lower percent predicted FVC it was not possible to return to a health state with better lung function. Once an exacerbation occurred, a person could not move back to a health state without exacerbation. Exacerbation health states had different health outcomes and costs from health states without exacerbation. If a person had a second exacerbation they did not move into a different health state. Instead they incurred a short‑term cost and disutility associated with an exacerbation. Because there was no evidence on the incidence of recurrent exacerbations, the company assumed that a person who had at least 1 exacerbation had the same risk of another exacerbation as a person who had never had an exacerbation. Death could occur at any point in the model, or when a person's percent predicted FVC reduced to 39.9% or less.
3.11 The company modelled the baseline risks of mortality, disease progression (loss of lung function), and acute exacerbations using the results from the placebo arm of the nintedanib clinical trials (INPULSIS and TOMORROW). It based the efficacy of best supportive care on the results from the placebo arms of the INPULSIS trials. The company applied odds ratios from its network meta‑analysis to the baseline risks to estimate the relative effectiveness and safety of nintedanib and pirfenidone compared with best supportive care. To extrapolate data beyond what were available from clinical trials, the company fitted the following parametric models:

a log logistic model to estimate overall survival

an exponential model to estimate the probability of exacerbation and stopping medication

a logistic regression model to predict loss of lung function.
3.12 The company included adverse events in the model if they: substantially affected costs and quality‑adjusted life years (QALYs), had an incidence of more than 5%, or an incidence 1.5 times greater than in the comparator arm. The company excluded the adverse event diarrhoea, even though it occurred commonly in the INPULSIS trials (reported in over 60% of people receiving nintedanib compared with 19% of people receiving placebo), because the condition was usually mild to moderate in severity and resulted in less than 5% of people stopping treatment.
3.13 The company included the following costs in its model: drug treatments (including concomitant medications), adverse events, liver function tests, resource use (for drug acquisition, patient monitoring, treating acute exacerbations and adverse events), oxygen use, exacerbations, and end‑of‑life care. The company assigned utility values to each health state in the model using EQ‑5D data collected in the INPULSIS trials. The model also incorporated disutilities from exacerbations and treatment‑related adverse events.
3.14 Both nintedanib and pirfenidone had a confidential patient access scheme (price discount) agreed with the Department of Health. At the request of NICE, the company (Boehringer Ingelheim) provided its base‑case results and sensitivity analyses using the list prices of nintedanib and pirfenidone. NICE requested that the ERG provide the results of its own exploratory analyses including the list prices, and, separately in a confidential appendix, with both discounts incorporated.
3.15 In the company's deterministic base case, best supportive care was associated with 3.27 QALYs; pirfenidone with 3.62 QALYs and nintedanib with 3.67 QALYs. Using the list prices for nintedanib and pirfenidone, nintedanib dominated pirfenidone (that is, nintedanib was more effective and was cost saving) and produced an incremental cost‑effectiveness ratio (ICER) of £149,361 per QALY gained compared with best supportive care. The company did sensitivity and scenario analyses around its base case (using list prices for nintedanib and pirfenidone). The comparison between nintedanib and pirfenidone was sensitive to using the stopping rule (when people stop treatment if their percent predicted FVC declines by 10% or more in 1 year):

When the stopping rule was applied only to people receiving pirfenidone, the ICER for nintedanib was £82,784 per QALY gained compared with pirfenidone.

When the stopping rule was applied to both the nintedanib and pirfenidone arms, the ICER for nintedanib was £17,096 per QALY gained compared with pirfenidone.
The comparison between nintedanib and best supportive care was very sensitive to estimates of mortality risk associated with treatment. Changing the baseline survival risk (by using an alternative method of extrapolation) increased the ICER by approximately: 
£91,000 per QALY gained when the company used a Weibull parametric model

£320,000 per QALY gained when it used a Gompertz parametric model.
3.16 When the ERG applied the patient access schemes for nintedanib and pirfenidone to the company base case, pirfenidone was extendedly dominated by nintedanib and best supportive care (meaning that a combination of best supportive care and nintedanib would give more benefit than pirfenidone and would be cost saving). The ICER for nintedanib compared with best supportive care was substantially over £30,000 per QALY gained. In a pairwise comparison, the ICER for nintedanib compared with pirfenidone was between £20,000 and £30,000 per QALY gained. NICE cannot report the exact ICERs because the patient access schemes are confidential.
ERG comments and additional analyses
3.17 The ERG's clinical adviser considered that people who have had 1 exacerbation were at higher risk of recurrent exacerbation than those who have not had any.
3.18 The ERG suggested that the population in the company's model may not represent those treated in clinical practice in England because it included people with percent predicted FVC of more than 80% (accounting for approximately 45% of people in the model). The ERG noted that clinical advice during the pirfenidone appraisal (see NICE's technology appraisal guidance on pirfenidone for treating idiopathic pulmonary fibrosis) suggested that this FVC represents disease that is milder than would typically be treated in current practice.
3.19 The ERG suggested that the results of the company's cost‑effectiveness analysis may have been biased, because the company chose a different scenario analysis from its network meta‑analysis to inform the relative effectiveness of nintedanib and pirfenidone for each different outcome.
3.20 The ERG suggested that the company model overestimated disutilities for adverse events, and suggested alternative estimates, because:

Adverse events in the company's model last for 1 year; the ERG considered that for gastrointestinal and skin disorders the duration would be shorter than this and suggested a duration of 1 month based on published data.

Data from a long‑term open‑label extension study of the CAPACITY trials of pirfenidone (the RECAP study) suggested that the incidence of rash was lower than the estimates in the company model.

The company may have overestimated the incidence of photosensitivity associated with pirfenidone, which the ERG suggested patients can prevent by avoiding sun exposure.
The ERG also noted the disutility associated with new exacerbations that the company included in its submission (−0.14) did not match the disutility the company used in its model (0.0987).
Additional analyses
3.21 After consultation, the company increased the simple discount to the price of nintedanib. The company updated its analysis with the lower price and also included different results from the network meta‑analysis in line with the Committee's preferred assumptions (see section 4.6). The company included all the trial evidence in the network meta‑analysis for the outcomes of lung function, serious cardiac events, gastrointestinal events and probability of stopping the drug. The SP2 trial comparing pirfenidone with placebo in a Japanese population, which provided evidence in the network meta‑analysis for overall survival and acute exacerbations, was considered to be an outlier by the Committee. Therefore the company excluded the SP2 study from the network meta‑analysis for these outcomes. The company included a risk of death of 2.79% over 6 months for people with an exacerbation (previously the company did not model a link between exacerbation and life expectancy). It also reduced the duration of adverse events from 1 year to 1 month and corrected the disutility associated with new exacerbations (see section 4.11 for the Committee's preferred assumptions on disutilities). The company did not reduce the incidence of photosensitivity, but the ERG advised that this no longer affected the ICERs because the company had reduced the duration of adverse events to 1 month. As in the company's base case, the company modelled a population with percent predicted FVC of more than 50% but did not include the stopping rule. Using the list prices for nintedanib and pirfenidone, nintedanib dominated pirfenidone and was associated with an ICER of £145,310 per QALY gained compared with best supportive care. The ICER for pirfenidone compared with best supportive care was £172,208 per QALY gained.
3.22 After consultation, when the ERG applied the patient access schemes for nintedanib and pirfenidone to the company's revised analysis, nintedanib dominated pirfenidone. The ICER for nintedanib compared with best supportive care remained substantially over £30,000 per QALY gained. The ICER for pirfenidone compared with best supportive care was also substantially over £30,000 per QALY gained. The results were similar when restricting the population to people with percent predicted FVC of 50–79.9% or 80% or more. However, the ERG highlighted that it would expect differences in treatment costs, efficacy and other model parameters for different subgroups defined by percent predicted FVC, and that these differences would likely influence cost effectiveness. Therefore, the results for subgroups should be interpreted with caution.
3.23 The ERG presented results with the stopping rule applied (that is, treatment is stopped if percent predicted FVC declines by more than 10%) for both nintedanib and pirfenidone and, separately, for pirfenidone only. The patient access schemes were included for both drugs. For the population with percent predicted FVC of more than 50% and also for the subset of the population with percent predicted FVC of 50–79.9%, applying the stopping rule for both treatments resulted in nintedanib dominating pirfenidone. When the ERG applied the stopping rule for pirfenidone only, the ICER for nintedanib compared with pirfenidone was between £20,000 and £30,000 per QALY gained in both populations. The ERG did not report the ICERs with the stopping rule for the population with percent predicted FVC of 80% or more. The ICER for pirfenidone compared with best supportive care remained substantially over £30,000 per QALY gained when the stopping rule was applied. NICE cannot report the exact ICERs because the patient access schemes are confidential.