3 The company's submission

The Appraisal Committee (section 6) considered evidence submitted by Boehringer Ingelheim and a review of this submission by the Evidence Review Group (ERG; section 7).

Clinical effectiveness

3.1 The company did a systematic literature review of studies evaluating the efficacy and safety of all second‑line treatments for non‑small‑cell lung cancer. For nintedanib, it identified 1 relevant randomised controlled trial, the LUME‑Lung 1 trial, from which it took the key clinical evidence for the comparison of nintedanib plus docetaxel with placebo plus docetaxel (hereafter referred to as docetaxel alone).

3.2 The LUME‑Lung 1 trial (n=1314) was a phase III, multicentre, placebo‑controlled, double‑blind, randomised (1:1) controlled trial comparing nintedanib plus docetaxel with docetaxel alone. The trial was carried out in 211 centres in 27 countries (including the UK). Eligible patients were adults who had locally advanced, metastatic or locally recurrent non‑small‑cell lung cancer and whose disease had progressed on or after treatment with only 1 prior chemotherapy regimen. Randomisation was stratified by 4 variables: Eastern Cooperative Oncology Group (ECOG) score (0 or 1); previous bevacizumab treatment (yes or no); presence of brain metastases (yes or no); and histology (squamous or non‑squamous). Patients in the nintedanib group received nintedanib (200 mg) twice daily, on day 2 to 21 of a 21‑day cycle, plus docetaxel (75 mg/m2) on day 1 of the 21‑day cycle. If patients experienced adverse events, the trial design specified reducing the dose of nintedanib from 200 mg twice daily to 150 mg twice daily and then to 100 mg twice daily, and reducing the dose of docetaxel from 75 mg/m2 to 60 mg/m2. Patients in the nintedanib group who had at least 4 cycles of nintedanib plus docetaxel could then have nintedanib alone. Patients in the placebo group received placebo twice daily on day 2 to 21 of a 21‑day cycle, and docetaxel dosing as in the nintedanib group. In the placebo group, reducing the dose of docetaxel (from 75 mg/m2 to 60 mg/m2) was permitted if adverse events occurred. Treatment in both groups stopped when patients' disease progressed or if they experienced unacceptable adverse events. The trial investigators followed‑up patients every 6 weeks before disease progression and every 6 to 8 weeks after disease progression, until the patient died or was lost to follow‑up.

3.3 Progression‑free survival, measured radiologically, was the primary outcome in the LUME‑Lung 1 trial and was defined as time from randomisation to death or disease progression when progression preceded death. Progression‑free survival was determined by a central independent review by radiologists using the modified Response Evaluation Criteria in Solid Tumours (RECIST). The key secondary outcome in LUME‑Lung 1 was overall survival. Overall survival was defined as the time from randomisation to death (irrespective of cause of death). Other secondary outcomes included progression‑free survival by local investigator review, tumour response by both central independent review and investigator review, clinical improvement (defined as lengthening the time to deterioration in body weight), health‑related quality of life, safety, and tolerability.

3.4 The primary progression‑free survival analysis was to be done when 713 patients had experienced (centrally assessed) disease progression or death (cut‑off November 2010) to detect a hazard ratio of 0.78 with 90% statistical power. The primary analysis was based on the intention‑to‑treat population. According to the company, the study remained unblinded between final analysis for progression‑free survival and for overall survival. The final analysis of overall survival was done when 1151 patients had died, and was designed to permit investigators to detect an 18% increase in median overall survival or a hazard ratio of 0.85. At final analysis of overall survival, the company did a follow‑up analysis of all events including disease progression or death (February 2013).To be considered statistically significant, the p value had to be less than 0.00043 for primary progression‑free survival, less than 0.05 for final progression‑free survival and less than 0.04984 for the final overall survival analysis.

3.5 The analyses in LUME‑Lung 1 were extended beyond the original specification of the statistical analysis plan to validate findings from a hypothesis‑generating analysis of the LUME‑Lung 2 trial which compared nintedanib plus pemetrexed with placebo plus pemetrexed. This change to the statistical analysis plan was introduced after the initial analysis for primary progression‑free survival analysis, but before database lock for the final overall survival analysis (February 2013). From the analysis of LUME‑Lung 2, the company identified that patients whose disease had progressed within 9 months after the start of their first‑line therapy, and patients who had adenocarcinoma, would benefit most from treatment with nintedanib. A hierarchical overall survival statistical analysis was therefore introduced into the LUME‑Lung 1 trial, by amending the trial statistical analysis plan. In LUME‑Lung 1, the company tested overall survival in an intention‑to‑treat sequential fashion: first, patients with adenocarcinoma whose disease had progressed within 9 months of starting first‑line therapy, followed by all patients with adenocarcinoma, and finally the overall trial population.

3.6 The focus of the company's submission to NICE was on patients with adenocarcinoma because this was the population specified in the marketing authorisation for nintedanib. In LUME‑Lung 1, of the 1314 patients randomised, 759 patients had non‑squamous cell carcinoma of whom 658 had adenocarcinoma. The company considered the baseline characteristics of patients in LUME‑Lung 1 with adenocarcinoma, including sex, age, race, smoking status and ECOG score, to be similar between the treatment groups, and similar to patients seen in clinical practice with adenocarcinoma. Of the patients in the trial with adenocarcinoma, 62.5% were men, the mean age was 58.5 (standard deviation 10.1) years, 76.9% were white, 70.4% had an ECOG performance status of 1, and 7.4% of patients had brain metastases. In the LUME‑Lung 1 trial, 18.0% of the patients with adenocarcinoma in the nintedanib group and 18.2% in the docetaxel alone group had pemetrexed−platinum therapy as first‑line therapy; 0.9% of patients in the nintedanib plus docetaxel group and 0.6% of patients in the docetaxel alone group had pemetrexed−non‑platinum therapy. Data on epidermal growth factor receptor (EGFR) mutations were not routinely collected in the LUME‑Lung 1 trial. During the clarification stage of the appraisal, the company stated that this had been retrospectively collected from a sample of patients in the LUME‑Lung 1 trial. The results from the sample are considered to be academic in confidence and therefore cannot be reported.

3.7 The results for progression‑free and overall survival for the adenocarcinoma population in LUME‑Lung 1 are given in table 1. The company presented the results of the primary progression‑free survival analysis for the overall trial population and for people with adenocarcinoma whose disease had progressed within 9 months of starting first‑line therapy (see table 1 for the adenocarcinoma group).

Table 1 Progression‑free and overall survival results for the adenocarcinoma population in LUME‑Lung 1 (cut‑off November 2010 and February 2013)

Outcome

Nintedanib plus docetaxel

Docetaxel alone

Hazard ratio (95% confidence interval)

Progression‑free survival (central independent review)

Primary analysis at November 2010, 7.1 month follow‑up (median, months)

4.0

2.8

0.77

(0.62–0.96)

Progression‑free survival (central independent review)

Final analysis at February 2013, 31.7 month follow‑up (median, months)

4.2

2.8

0.84

(0.71–1.00)

Overall survival (final analysis at February 2013) (median, months)

12.6

10.3

0.83

(0.70–0.99)

3.8 The company provided Kaplan–Meier curves for patients with adenocarcinoma for progression‑free survival (primary analysis [November 2010]) and follow‑up analysis [February 2013]) and overall survival (final analysis, February 2013). The Kaplan–Meier curves for progression‑free survival (primary analysis) separated after 6 weeks and remained separated until approximately 7 months. The Kaplan–Meier curves for overall survival (final analysis) in patients with adenocarcinoma separated after 6 months and remained apart over the entire observation period up to 36 months.

3.9 The company did subgroup analyses at the time of the final overall survival analysis (February 2013). Most pre‑specified and post‑hoc progression‑free survival subgroup analyses showed the effect of nintedanib plus docetaxel to be consistent with the treatment benefit seen in the primary analysis of adenocarcinoma.

3.10 The company collected health‑related quality of life in the LUME‑Lung 1 trial. This was measured at the screening visit, at 21‑day intervals during treatment, at the end of treatment and at the first follow‑up visit. The investigators used 3 questionnaires: EQ‑5D, European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire (EORTC QLQ‑C30), and EORTC lung cancer‑specific supplementary module (EORTC QLQ‑LC13). Investigators found no differences in global health status, quality of life or self‑reported health‑related quality of life reported for the time to deterioration for coughing, breathlessness or pain between the nintedanib plus docetaxel group compared with the docetaxel alone group. Health‑related quality‑of‑life scores at the time of randomisation were available for the whole trial population but not for the adenocarcinoma subgroup. Statistically significant improvements were seen in 3 individual pain items ('have pain' [p=0.0332], 'pain in chest' [p=0.0196] and 'pain in arm and shoulder' [p=0.0004]) in favour of nintedanib plus docetaxel, while time to deterioration for diarrhoea was significantly shorter with nintedanib plus docetaxel.

3.11 The company did a mixed treatment comparison to compare nintedanib plus docetaxel with erlotinib because erlotinib was specified as a comparator in the final scope issued by NICE. However, the company commented that it did not consider erlotinib to be the main comparator to nintedanib plus docetaxel because patients considered fit enough to have treatment with nintedanib plus docetaxel would also be considered fit enough to have docetaxel alone rather than erlotinib. The company did a systematic review and identified 9 trials to include in its mixed treatment comparison. The trials included erlotinib, pemetrexed and gefitinib. The company assumed that the effectiveness of docetaxel and pemetrexed do not differ, to allow as many treatments as possible to be compared with nintedanib plus docetaxel.

3.12 The results of the analysis from the mixed treatment comparison for nintedanib plus docetaxel compared with docetaxel alone (4 trials) showed that nintedanib plus docetaxel significantly improved overall survival (hazard ratio [HR] 0.83, 95% confidence interval [CI] 0.70 to 0.99) and progression‑free survival (HR 0.77, 95% CI 0.62 to 0.96) compared with docetaxel alone. Nintedanib plus docetaxel also significantly improved overall survival (HR 0.64, 95% CI 0.46 to 0.90) and progression‑free survival (HR 0.70, 95% CI 0.50 to 1.00). The Bucher indirect comparisons supported these findings (overall survival HR 0.56, 95% CI 0.38 to 0.82; progression‑free survival HR 0.58, 95% CI 0.39 to 0.87) for nintedanib plus docetaxel compared with erlotinib.

3.13 The company provided data on drug‑related adverse events that occurred with an incidence of 5% or more in both treatment groups in the adenocarcinoma subgroup for the duration of the trial. Diarrhoea (43.4% compared with 24.6%), nausea (28.4% compared with 17.7%) and vomiting (19.4% compared with 12.3%) occurred more often with nintedanib plus docetaxel than with docetaxel alone. Deaths from adverse events, not attributed to disease progression, were more common with nintedanib plus docetaxel (6.3%) than with docetaxel alone (2.4%). However, in the nintedanib plus docetaxel group, the median duration of nintedanib plus docetaxel treatments was 4.2 months (with 5 cycles of docetaxel) and the docetaxel alone group received treatment for a median duration of 3.0 months (with 4 cycles of docetaxel). There were more grade 3 or greater adverse events and grade 3 or greater serious adverse events in the nintedanib plus docetaxel group (75.9% and 31.3% respectively) than in the docetaxel alone group (68.5% and 27.6% respectively).

3.14 To compare the adverse events of nintedanib with chemotherapeutic regimens other than docetaxel, the company compiled data on fatigue, nausea and diarrhoea. These were the only safety outcomes reported in a consistent format in more than 1 trial. The company also stated that, because few trials reported these outcomes and because of the low incidence of adverse events, it compared nintedanib plus docetaxel with other treatments using the sensitivity analysis in which the company assumed docetaxel and pemetrexed were equally effective. In the mixed treatment comparison of adverse events, the LUME‑Lung 1 did not connect with the other studies. The results suggested that nintedanib plus docetaxel was significantly more likely to lead to diarrhoea than docetaxel alone or pemetrexed, but was not more likely to lead to diarrhoea than erlotinib. The risk of fatigue was similar for all treatments.

Cost effectiveness

3.15 The company provided a partitioned survival Markov model containing 3 health states: progression‑free (on or off treatment); progressed disease; and death. All patients enter the model in the progression‑free state. At the beginning of each time period patients could either remain in the same health state or progress to a worse health state, that is, from progression free to progressed or death, or from progressed disease to death. The model used the partitioned survival method to determine the proportion of patients in each of the 3 health states during each model cycle. The company modelled 3‑weekly cycle lengths, a half‑cycle correction and a time horizon of 15 years. All costs and outcomes were discounted by 3.5% and the company stated that all costs were from the NHS and Personal Social Services perspective, although the company included only NHS costs in the model. In the company's base‑case analysis, it compared nintedanib plus docetaxel with docetaxel alone. In the company's secondary analysis, it compared nintedanib plus docetaxel with erlotinib. The model included people with locally advanced, metastatic or locally recurrent adenocarcinoma whose disease progressed following first‑line chemotherapy. The company assumed that 70% of patients have best supportive care on stopping second‑line treatment, although some people in the progressed‑disease state can have subsequent treatments (5% erlotinib or 25% platinum doublet therapy). The company included the cost of subsequent treatments in the model but made no assumptions about their efficacy.

3.16 Kaplan–Meier survival curves for overall survival and progression‑free survival for nintedanib plus docetaxel and for docetaxel alone were available from the LUME‑Lung 1 trial and informed the proportion of patients in the model's 3 health states at each time point. Progression‑free survival data from LUME‑Lung 1 were mature and the proportions of censored patients in both treatment groups were similar. To extrapolate trial data beyond the time horizon of the trial, the company analysed overall survival and progression‑free survival data using parametric survival curves fitted using 2 approaches:

  • Joint models including data from both treatment groups using a term for treatment and the same distributions for each group.

  • Separately modelled curves to each randomised treatment group.

    The company tested the 'fit' of the curves using Akaike information criteria (AIC). The company interpreted the intercept and scale parameters of the separately fitted curves to indicate that the curves should not be forced into the same model, and therefore selected separate curves by treatment group for progression‑free survival and overall survival. The log‑normal model had the lowest AIC among the separate progression‑free survival fits and the Weibull model had the lowest AIC among the separate proportional hazard models for progression‑free survival; therefore, these were selected to model progression‑free survival. The log‑logistic model had the lowest AIC among the separately fitted overall survival models and the Weibull model had the lowest AIC among the separate models for overall survival; therefore, these were selected to model the overall survival data. The company stated that it tested the validity of the data by showing the results to a group of 'key opinion leaders' (clinicians) and by comparing it with registry data from the National Lung Cancer Audit (LUCADA, UK) and Surveillance, Epidemiology, and End Result (SEER, USA).

3.17 Progression‑free and overall survival curves were not available for erlotinib. The company obtained these by taking the progression‑free survival and overall survival curves for nintedanib plus docetaxel and applying the hazard ratio from the mixed treatment comparison to reflect the relative effectiveness of erlotinib to nintedanib plus docetaxel. The company considered that proportional hazards could only be used if the survival distribution was a proportional hazards model using the exponential, Weibull or Gompertz extrapolations. Based on the goodness of fit, a Weibull distribution was chosen for erlotinib and, therefore, erlotinib could only be evaluated in the model if this distribution was selected for both progression‑free survival and overall survival. The cost‑effectiveness analysis that compared erlotinib plus docetaxel compared with docetaxel alone used hazard ratios from the mixed treatment comparison base case, with the hazard ratio being 0.7 (95% CI 0.5 to 1.0) for progression‑free survival and 0.64 (95% CI 0.46 to 0.90) for overall survival.

3.18 The company collected health‑related quality‑of‑life data in the LUME‑Lung 1 trial using EQ‑5D questionnaires, which it used in a longitudinal model to adjust for certain baseline characteristics including ECOG score, prior treatment with bevacizumab, presence of brain metastases, health status and key adverse events. In the progression‑free survival health state, the company estimated utility values from week 0 to 30 in 3‑week intervals in both treatment arms. The company extrapolated the trend it observed up to week 30 to provide data beyond this time point, which it incorporated into its base case. To estimate utility values for the progressed disease state, the company used utility values from the LUME‑Lung 1 trial. In sensitivity analyses, the company used utility values for progression‑free survival and progressed disease from the literature (Chouaid et al. 2013), which included patients with non‑small‑cell lung cancer in the UK, Europe, Canada, Australia and Turkey. The model also incorporated the impact of adverse events on health‑related quality of life using utility decrements associated with each adverse event. The company acknowledged that the model may have double counted disutility as people may have more than 1 adverse event.

3.19 In the model, the company assumed that patients would take two 100 mg capsules of nintedanib. The company also modelled an option of patients taking one 150 mg capsule. The price of both formulations is the same. In the model, nintedanib plus docetaxel was given for a minimum of 4 cycles before nintedanib could be administered alone. The model included no administration cost associated with nintedanib, but a cost of £155 for docetaxel. Intravenous docetaxel was modelled at a concentration of 75 mg/m2 on day 1 of a 21‑day cycle. For the comparison of nintedanib plus docetaxel with erlotinib, a 30‑tablet pack of erlotinib was £1631.53 (MIMS list price [2013]). The company noted that erlotinib has a patient access scheme, which it took into account by doing several sensitivity analyses in which a range of discounts were applied to the list price of erlotinib. The company assumed that the cost of best supportive care was £406.63 per 3‑week cycle.

3.20 The company used resource questionnaires and an interview with an oncologist who specialises in lung cancer to determine health state costs. Three main areas of resource use were considered: routine follow‑up (type and frequency of physician visit, laboratory tests and radiological scans); treatment at time of progression (hospitalisations, physician visits, laboratory tests, radiological scans and procedures used); and resource use during best supportive care or palliative care (initial tests, procedures, hospitalisations, physician visits, laboratory tests, radiological scans and procedures). The unit costs of visit procedures and laboratory tests were mainly derived from the National Schedule of reference costs (2012–13) and some visit costs were taken from the Personal Social Services Research Unit.

3.21 The company provided deterministic and probabilistic incremental cost‑effectiveness ratios (ICER) for nintedanib plus docetaxel compared with docetaxel alone in its original submission and after consultation on the appraisal consultation document. The ICERs generated using the company's original model have been superseded by those using the revised model that included a patient access scheme and was provided after consultation on the appraisal consultation document (see section 3.47). Only the ICERs from the revised model are referred to in this document.

3.22 The company did a range of deterministic sensitivity analyses. These included alternative hazard ratios for progression‑free survival, hazard ratios for overall survival, utility values for progressed disease, model costs for progressed disease, risk of stopping nintedanib and docetaxel per cycle, and percentage of patients switching to best supportive care.

3.23 The company also did various scenario analyses on the survival modelling. Its original base case included separately modelled curves for the trial period and beyond (log‑normal curves for both treatment and placebo arms of modelled progression‑free survival and log‑logistic for both arms of modelled overall survival). One scenario replaced these curves with Weibull distributions. Another scenario incorporated Kaplan–Meier trial data, after which the company chose Weibull parametric curves instead of the curves chosen for the base case to extrapolate both progression‑free survival and overall survival. Another scenario used the LUME‑Lung 1 trial data in the form of Kaplan–Meier curves for the period of the trial only, and not for the 15‑year time horizon; it used a restricted mean for overall survival, acknowledging that although all people in the trial had progressed, not all had died. The restricted mean assumed that all patients died immediately after final data lock. For the remaining scenarios, the company used the progression‑free survival Kaplan–Meier curve from the LUME‑Lung 1 trial and, for overall survival, the Kaplan–Meier curves. It used these for the duration of the time horizon, extrapolated in 2 different ways: using registry data (LUCADA or SEER), or modelled using a parametric curve (log‑normal curve, log‑logistic curve or Weibull curves).

3.24 The company did several other scenario analyses replacing resource use costs (with those from NICE's technology appraisal guidance on afatinib for treating epidermal growth factor receptor mutation-positive locally advanced or metastatic non-small-cell lung cancer), and altering utility values (using published values) and the time horizon.

3.25 In its original submission, the company also provided an analysis for the comparison of nintedanib plus docetaxel compared with erlotinib.

ERG's critique and exploratory analyses

3.26 The ERG considered that the LUME‑Lung 1 trial was well designed, with a low risk of bias and good randomisation, and noted that the trial was unblinded only at the end and provided mature data. The characteristics of patients with adenocarcinoma at baseline were well balanced between the nintedanib plus docetaxel and docetaxel alone groups in the ERG's opinion.

3.27 The ERG was concerned about the generalisability of the results from LUME‑Lung 1 to patients seen in clinical practice in England. It considered that patients in the trial were potentially fitter and younger than those seen in clinical practice in England. The ERG highlighted the following dissimilarities in patient characteristics:

  • The trial excluded patients with clinically significant pleural effusion, or evidence of cavitary or necrotic tumours, with significant coronary disease, or on anticoagulation (except low‑dose heparin) or antiplatelet therapy (except aspirin). The ERG considered the trial population to have a better prognosis than patients seen in clinical practice in England.

  • There were differences in the proportion of patients having third‑line treatments. The ERG commented that patients in England are less likely to have third‑line treatment than those in the trial (55.8%).

  • The proportion of patients in the trial aged 65 years or older was smaller than the proportion seen in clinical practice.

3.28 The ERG noted that, in LUME‑Lung 1, only 18.8% of patients with adenocarcinoma had pemetrexed as first‑line therapy, and that most had platinum‑based therapies. Conversely, the ERG considered that most patients in England would have pemetrexed as first‑line treatment. The company did not include subgroups by first‑line treatment (other than bevacizumab) in its submission.

3.29 The ERG was concerned that the company limited its submission to patients with adenocarcinoma even though only around 50% of patients in the LUME‑Lung 1 trial had adenocarcinoma, which itself was neither a stratification factor at randomisation nor a pre‑defined subgroup. However, the ERG noted that, in the trial, patients with adenocarcinoma constituted most of the patients with non‑squamous cell carcinoma, which was a stratification factor. Also, because baseline characteristics among patients with adenocarcinoma were well‑balanced across the 2 treatment groups, the ERG suggested that the analyses were acceptable.

3.30 The ERG questioned the validity of the hazard ratios calculated by the company using Cox proportional hazards modelling from the LUME‑Lung 1 trial data for progression‑free survival and overall survival. This model requires that the hazard (that is, the risk of an event occurring at a particular time conditional on having survived to that time) is a constant ratio between the patterns of events in the 2 treatment arms at any time since randomisation. The ERG noted that the progression‑free survival curve for the LUME‑Lung 1 trial groups diverge after 6 weeks and then converge after approximately 1 year so the proportional hazards assumption was not likely to be met. The ERG did a similar analysis of the overall survival data to test whether the proportional hazards assumption applied and concluded that it did not. The ERG stated that, because the proportional hazards assumption was not supported by the LUME‑Lung 1 trial data for estimating the relative effectiveness of nintedanib plus docetaxel compared with docetaxel alone, using methods based on proportional hazard assumptions is inappropriate.

3.31 The ERG considered it inappropriate to do a mixed treatment comparison because:

  • The proportional hazards assumption was not supported by the LUME‑Lung 1 trial data for progression‑free or overall survival. Because the LUME‑Lung 1 trial is the only trial providing evidence for nintedanib plus docetaxel, any comparison with this trial means that any estimation of the relative effectiveness of nintedanib plus docetaxel compared with erlotinib (that is, a calculated hazard ratio) lacks credibility and invalidates the comparison.

  • The trials included in the mixed treatment comparisons varied with respect to patient baseline characteristics and so were heterogeneous between trials. Trials varied by age, EGFR mutation status, ECOG score, sex, whether patients had smoked and response to prior therapy. This heterogeneity may mean that the trials are too dissimilar to allow a valid comparison of outcomes in a mixed treatment comparison.

  • The company assumed that docetaxel and pemetrexed were equally effective in the mixed treatment comparison. The ERG was not aware of any evidence that supported this assumption in an adenocarcinoma population.

3.32 The ERG commented on the way in which the company had fitted a variety of parametric functions to the available trial data and used these in its original model to predict the results beyond those available from the trial. The ERG was concerned about the company's approach to curve fitting because the main reason for curve fitting is to anticipate what will happen to patients who remain 'at risk' at the time of the data cut‑off point. In LUME‑Lung 1, however, most patients had died, their disease had progressed or they had stopped treatment at the time of the data cut‑off point. Therefore, extrapolating in this situation could have biased projections because it was based on the few survivors still at risk and could have led to fitting inappropriate functions.

3.33 To extrapolate beyond the end of the trial, the company fitted parametric functions based on descriptive data from SEER and LUCADA in its original model, but it was not possible for the ERG to assess whether this approach was valid. The ERG inferred from the company's submission that the SEER results were related to all‑cause mortality from the date of stage 4 diagnosis. For the LUCADA data, the ERG understood that the data were related to second‑line chemotherapy, but had no information on first‑line treatments. The ERG commented that it was difficult to assess whether the company's chosen parametric survival functions were valid and reflected the patient population in this appraisal because it did not have access to patient level data.

3.34 The ERG identified 11 aspects of the company's original base‑case model that involved errors in data analysis, parameter values or methodology. The ERG corrected these to estimate the ICER, but still considered that the model generated uncertainty in overall survival, progression‑free survival and time to treatment. The ERG applied 11 different amendments to the company's base case. These are outlined in sections 3.35 to 3.45.

3.35 The company's original base‑case assessment of nintedanib plus docetaxel compared with docetaxel alone estimated an undiscounted overall mean survival gain of 4.7 months. The ERG noted that only 15% of this gain occurred in the pre‑progression phase. The ERG stated that this is unusual because, in locally advanced and metastatic cancers, the benefit from treatment normally occurs before disease progression while patients have active treatment. The ERG did its own analysis using the data for overall survival and progression‑free survival from the trial, and noted that overall survival was linear for both groups after 300 days and continued indefinitely. This showed that the extrapolation used in the exponential model is appropriate, and the ERG calculated a long‑term hazard ratio of 0.83 for overall survival in favour of nintedanib plus docetaxel. The ERG produced a cumulative hazard plot that suggested that patients in LUME‑Lung 1 who survived beyond disease progression continued to gain survival benefit associated with treatment. The ERG estimated overall survival using the area under the curve (AUC) by applying the Kaplan–Meier results directly, and then projected long‑term overall survival using the exponential trends. The ERG estimated mean overall survival in the docetaxel treatment arm as 453.0 days (14.9 months) and 545.7 days (17.9 months) for the nintedanib plus docetaxel treatment group, resulting in an estimated mean overall survival difference of 92.7 days (3.05 months), which was considerably lower than the company's estimate of a mean overall survival gain of 4.7 months.

3.36 The ERG noted that the company's original model base‑case assessment of nintedanib plus docetaxel compared with docetaxel alone indicated a mean gain in (undiscounted) progression‑free survival of 28.6 days. This was based on calibrating a log‑normal hazard distribution to each group in the trial and replacing the trial data with the log‑normal curve for the duration of the model time horizon until all patients' disease had progressed or they died. Here, the extent of advantage in mean progression‑free survival can be readily estimated directly from the Kaplan–Meier analysis results because the progression‑free survival data were mature, by comparing the AUC estimates up to the point when the curves converge. The ERG identified that the curves converged at day 375. The difference in the AUCs at this time was 36.4 days, which suggested that the company's model had underestimated progression‑free survival (28.6 days). The ERG incorporated its own result into the company's model and used a common long‑term exponential model from day 375 onwards.

3.37 The ERG used a similar approach to estimate duration of treatment in the 2 groups of patients in the LUME‑Lung 1 trial. This increased the discounted cost per patient and the incremental cost per patient increased by 2.2% in both groups.

3.38 The ERG commented that in its original model the company costed both nintedanib plus docetaxel and docetaxel alone using the average number of patients having treatment across each cycle. The ERG commented that adjusting mid cycle is not accurate for docetaxel treatment in either group because patients have treatment on the first day of a 3‑week cycle. The error underestimated the quantity and cost of drugs used in the trial.

3.39 The ERG commented that the company calculated the average cost per dose of docetaxel using body surface area relevant to the UK population, but did not take into account the sex of the patients. The company also only costed the full 75 mg/m2 dose rather than the reduced dose of 60 mg/m2. The ERG considered it more accurate to cost the reduced dose, and then create a weighted average based on the proportions of the 2 doses recorded in the trial. The ERG considered that the nintedanib capsules would likely be dispensed with docetaxel, so any missed dosing was unlikely to have an effect on the dispensing pattern. Therefore, the ERG considered a reduction in cost through a randomised dose intensity index from trial data to be inappropriate. The ERG re‑estimated the overall average cost per dose of docetaxel using separate subgroups for men and women, and also re‑estimated the randomised dose index multiplier to match the balance of full and reduced doses. The ERG estimated an overall mean cost for nintedanib treatment per cycle using the LUME‑Lung 1 trial data.

3.40 The cost of treating the adverse event of febrile neutropenia was included in the company's original model at £2012.10 per patient affected. The ERG noted that this is substantially lower than the figure estimated by the NICE Decision Support Unit in 2007 and the updated figure used in the ongoing multiple technology appraisal for erlotinib and gefitinib for treating non‑small‑cell lung cancer that has progressed following prior chemotherapy, which used £5240.40 per episode and a mean cost per patient of £7352.54 (assuming 1.4 episodes per patient).

3.41 The ERG also noted that there were discrepancies in monitoring costs in the progression‑free health state when patients were still on active treatment. In the company's original model, monitoring costs of £188 per cycle were assigned to patients in the nintedanib plus docetaxel group and £205 per cycle to those having docetaxel alone. The ERG noted that this was because the company had incorrectly applied additional physician monitoring every 2 to 3 months for patients who had completed active treatment, to patients still on active treatment with docetaxel.

3.42 In the opinion of the ERG, the company modelled discounting incorrectly, basing the discounting on the 3‑weekly cycle rather than annually.

3.43 The main adverse events in LUME‑Lung 1 trial were stage 3 or 4 diarrhoea and fatigue. The company indicated that the disutility for diarrhoea was low (−0.04), whereas for fatigue it was much higher (−0.21). The ERG also noted that the company indicated a statistically significant difference between effect sizes in the 2 treatment groups, with a disutility of −0.326 for the nintedanib plus docetaxel group and of −0.101 for the docetaxel alone group. The ERG suggested that fatigue was a more serious side effect for those having nintedanib plus docetaxel. The company used an average disutility for the 2 treatment groups, whereas the ERG applied a disutility to the 2 groups separately. In the model, the company assumed that patients who had finished active treatment accrued the costs of having palliative nursing care every week and a bone scan every 3 weeks, in addition to a chest X‑ray every 2 to 3 months and a physician visit once a year. The company's clinical experts suggested that only a chest X‑ray would be needed and not palliative care or a bone scan. In the ERG's opinion, this reflected an error that significantly reduced the care costs of patients in a stable condition after second‑line treatment.

3.44 The ERG noted that the company's model followed the protocol used in the LUME‑Lung 1 trial, which allowed patients to have unlimited docetaxel treatment (exceeding 40 cycles). The ERG explained that, in the UK, patients have up to 4 cycles of docetaxel because of unacceptable adverse events. Although the company's original model allowed the number of cycles to be restricted, the ERG found an error that limited the number of cycles to 5 rather than to 4. When the ERG applied its own model adjustment and restricted the cycles to 4, this affected only the drug acquisition and administration costs, but not whether limiting docetaxel treatment would have an effect on the adverse events profile or patient prognosis. Both of these could affect the costs associated with treatment and the quality‑of‑life effects.

3.45 The ERG's original exploratory sensitivity analyses provided an ICER that incorporated all ERG amendments simultaneously to produce an ICER for nintedanib plus docetaxel. It also provided an ICER that included all amendments excluding analyses of the number of cycles of docetaxel. All ICERs from the ERG's exploratory analyses, generated using the company's original model, have been superseded by those using the revised model provided after consultation on the appraisal consultation document in January 2015 (see section 3.53).

3.46 The ERG's original exploratory sensitivity analyses also provided an ICER that applied 7 of the 11 amendments it had identified when analysing nintedanib plus docetaxel compared with docetaxel alone to the modelling of nintedanib plus docetaxel compared with erlotinib. The ERG also took into account the impact of the patient access scheme for erlotinib by assuming different discounts. However, the ERG still concluded that it did not consider erlotinib to be a suitable comparator.

Company's additional evidence in response to consultation

3.47 In response to consultation on the appraisal consultation document, the company provided a revised economic analysis, which contained all of the ERG's revisions (see section 3.34) except the cost of febrile neutropenia and the ERG's overall survival modelling. However, the company also changed its approach to survival modelling, and submitted new cost‑effectiveness estimates analyses based on the following:

  • using the data from Kaplan–Meier curves directly from the LUME‑Lung 1 trial until a chosen point and then extrapolating beyond this for the lifetime horizon of the model

  • choosing the point at which 5% of the original patients in both arms of the trial were still alive in the base case (alternatively, 2.5% and 7.5% in exploratory analyses)

  • calculating the probability of a patient remaining alive in each cycle using a log‑normal parametric curve fitted using data from LUCADA to extrapolate from this point

  • incorporating a patient access scheme, a confidential simple discount on the list price of nintedanib.

    This modelling approach resulted in a deterministic ICER of £46,580 per quality‑adjusted life year (QALY) gained. The probabilistic ICER was £46,517 per QALY gained.

3.48 The company also provided several scenario analyses, all of which incorporated the patient access scheme. When the company used a cut‑off point of 2.5% of the population still alive in both arms in its sensitivity analyses, the ICER was £46,813 per QALY gained. When it used a cut‑off point of 7.5% alive in both arms, the ICER was £49,894. The company carried out additional scenario analyses using utility values from the last observation carried forward (ICER £47,825 per QALY gained) and using utility values taken from the Chouaid et al. (2013) study (ICER £57,473 per QALY gained).

3.49 The company tested how robust the survival modelling was to various assumptions around survival extrapolation. To estimate the average extension of life associated with nintedanib plus docetaxel compared with docetaxel alone, the company carried out probabilistic sensitivity analyses. Using the updated model, the company noted that 4277 out of 5000 simulations (86%) resulted in an overall survival gain of at least 3 months (table 2).

Table 2 Incremental overall survival for nintedanib plus docetaxel compared with docetaxel monotherapy (taken from page 5 in the company's response to consultation on the appraisal consultation document)

Overall survival

Incremental life years

Incremental life months

Mixed: Kaplan–Meier from LUME‑ Lung 1 then to extrapolate LUCADA‑Log‑normal (5% patients alive cut‑off)

0.27

3.24

Mixed: Kaplan–Meier from LUME‑ Lung 1 then to extrapolate LUCADA‑Log‑normal (2.5% patients alive cut‑off)

0.27

3.24

Mixed: Kaplan–Meier from LUME‑ Lung 1 then to extrapolate LUCADA‑Log‑normal (7.5% patients alive cut‑off)

0.25

3.00

Mixed: Kaplan–Meier from LUME‑ Lung 1 then to extrapolate LUCADA‑Log‑normal (5% patients alive cut‑off), average of probabilistic sensitivity analyses

0.27

3.24

Separate – Log‑logistic (base‑case)

0.34

4.08

Mixed: Kaplan–Meier from LUME‑ Lung 1 then to extrapolate (5% patients alive cut‑off) SEER‑Log‑normal

0.28

3.36

Mixed curves: Kaplan–Meier from LUME‑Lung 1 then to extrapolate (5% patients alive cut‑off) Log‑logistic

0.34

4.08

Abbreviations: LUCADA, National Lung Cancer Audit; SEER, Surveillance, Epidemiology and End Result

3.50 The company also provided the restricted mean for the overall survival gain (2.87 months) for nintedanib plus docetaxel compared with docetaxel alone in LUME‑Lung 1. The company explained that this did not accurately represent true overall survival because 15% of the patients in the trial were still alive at this point.

ERG's critique of the company's additional evidence

3.51 The ERG focused its critique of the company's revised economic model on the overall survival modelling, noting that changing the cost of febrile neutropenia, not included by the company, had only a minor effect on the ICER.

3.52 The ERG raised concerns about the method used by the company to calculate the overall survival in the company's revised economic analyses:

  • By using the same LUCADA data for both arms to extrapolate beyond the trial, the company presumed that the long‑term risk for the 2 treatment groups was equal. This removed any relative differences in survival caused by increasing or decreasing the survival advantage of nintedanib beyond the cut‑off point.

  • By combining the 2 arms to estimate a proportion of patients alive (5%) at the cut‑off for extrapolation, the company introduced a risk of bias. This was because up to this point, patients in the nintedanib plus docetaxel arm were more likely to survive than patients in the docetaxel alone arm, and because of random differences in the number of patients censored, as evidenced by the extrapolations starting at different points in the Kaplan–Meier curve (more than 15% estimated probability of survival for nintedanib plus docetaxel and less than 12% for docetaxel alone). This means that any uncertainty in the parameters estimated for the log‑normal representation of the LUCADA data would have a proportionally larger effect on the nintedanib plus docetaxel group than on the docetaxel alone group, which may lead to larger biases in this group.

3.53 The ERG did exploratory analyses starting the extrapolation using the LUCADA data from the time in the Kaplan–Meier curves of the LUME‑Lung 1 trial when the probability of overall survival was 12.6% in each arm. The resulting ICER (incorporating the patient access scheme) was £56,804 per QALY gained for nintedanib plus docetaxel compared with docetaxel alone. The ERG calculated an overall survival, using this extrapolation of 0.224 incremental life years (2.69 months).

  • National Institute for Health and Care Excellence (NICE)