3 The manufacturer's submission
The Appraisal Committee (section 7) considered evidence submitted by the manufacturer of aflibercept and a review of this submission by the Evidence Review Group (ERG; section 8).
Clinicaleffectiveness evidence
3.1 The manufacturer did a systematic literature review of studies evaluating the efficacy and safety of secondline treatments for metastatic colorectal cancer. It identified 1 relevant randomised controlled trial (RCT), the VELOUR trial, from which it obtained the key clinical evidence. The VELOUR trial was a doubleblind placebocontrolled phase III study that was conducted in 176 centres in 28 countries, including the UK. Eligible patients were adults who had inoperable metastatic colorectal cancer, and whose disease progressed on or after treatment with only 1 prior oxaliplatinbased chemotherapy regimen. Investigators randomised patients in a 1:1 ratio to either aflibercept plus folinic acid/5fluorouracil/irinotecan (FOLFIRI) (n=612) or placebo plus FOLFIRI (n=614). They stratified randomisation by patients' wellbeing and ability to perform daily activities using the Eastern Cooperative Oncology Group Performance Status (ECOG PS), and whether or not the patient had received prior therapy with bevacizumab. Patients received either aflibercept at a dose of 4 mg/kg or placebo over 1 hour on day 1, every 2 weeks, both intravenously, immediately followed by FOLFIRI. During the trial, patients could stop 1 study treatment (aflibercept or placebo, or FOLFIRI) but still receive the other components of the regimen. Treatment continued until disease progressed, unacceptable toxicity occurred, or the patient declined further treatment.
3.2 The primary end point in the VELOUR trial was overall survival, defined as time from randomisation to death from any cause. One of the secondary end points was progressionfree survival as assessed by an independent review committee based on radiologic progression; it was determined as time from randomisation to first observation of disease progression (at least a 20% increase in the sum of the longest diameter of target tumours, the unequivocal increase in the size of nontarget tumours or the appearance of 1 or more new tumours), or death from any cause. In addition, disease progression determined by local investigators was recorded during the trial. Other secondary end points were objective response (complete and partial responses) according to Response Evaluation Criteria In Solid Tumors criteria version 1, and adverse events and abnormal laboratory findings.
3.3 The manufacturer stated that patient characteristics and disease history at baseline were well balanced between the aflibercept and placebo groups. Of the patients randomised in the study, the median age was 61 years, 58.6% were men, 97.8% had a baseline ECOG PS of 0 or 1, and 2.2% had a baseline ECOG PS of 2. The marketing authorisation for aflibercept stipulates prior treatment with an oxaliplatincontaining regimen. In the VELOUR trial, 90.2% of patients randomised to aflibercept plus FOLFIRI and 89.4% of those randomised to placebo plus FOLFIRI had received prior oxaliplatinbased chemotherapy for locally advanced or metastatic disease. Approximately 10% of patients had received prior oxaliplatinbased chemotherapy in the adjuvant setting (that is, as an additional treatment given after the primary treatment). Oxaliplatinbased regimens were given in combination with bevacizumab in 30.4% of patients.
3.4 The manufacturer determined that it needed 863 death events to detect a statistically significant 20% risk reduction in the aflibercept group compared with the placebo group; this determined the study cutoff date. To estimate timetoevent parameters (overall survival and progressionfree survival), the manufacturer used survival analysis. It calculated hazard ratios and confidence intervals for the primary and subgroup analyses using a Cox proportional hazards model. It also established heterogeneity of treatment effect among subgroups using a Cox proportional hazards model, and provided an interaction test for each subgroup analysis. If a patient neither died nor had disease progression during the trial, the manufacturer censored the patient at the date when the tumour was last assessed or at the study cutoff date.
3.5 The median followup for the overall population at the time of the primary analysis was 22.28 months, with the longest followup being 36 months. At the study cutoff date, 403 patients (65.8%) randomised to aflibercept and 460 patients (74.9%) randomised to placebo had died. Median overall survival was estimated to be 1.44 months longer for aflibercept than placebo (aflibercept 13.50 months, placebo 12.06 months), and the corresponding hazard ratio was 0.817 (95.34% confidence interval [CI] 0.713 to 0.937, p=0.0032), suggesting a reduction in the risk of death of 18.3% with aflibercept compared with placebo. The probabilities of overall survival at 6, 12, 18, 24 and 30 months were consistently higher in the aflibercept group than in the placebo group; the probability of overall survival was 4% higher at 6 months, and 85% higher at 30 months.
3.6 The manufacturer noted that the Kaplan–Meier curves for overall survival separated early and continued to separate over time, and suggested that there were patients who experienced a sustained benefit after treatment with aflibercept. Because of this, the manufacturer indicated that the difference in median overall survival of 1.44 months may underestimate the overall clinical benefit of adding aflibercept to FOLFIRI. In addition, the manufacturer calculated hazard ratios for overall survival by 6month periods up to 18 months after randomisation, and it combined all time points thereafter into a single hazard ratio. This analysis showed that hazard ratios improved over time, implying that the difference in overall survival increased in favour of aflibercept the longer patients received treatment. In response to a clarification request by the ERG, the manufacturer provided hazard ratios and the number of patients at risk of dying 18 months after randomisation by 6month periods. These hazard ratios continued to decrease over time (suggesting that the difference in overall survival continued to increase in favour of aflibercept), but had confidence intervals that crossed 1.00 (that is, the differences were not statistically significant).
3.7 The manufacturer estimated the mean overall survival by fitting separate parametric functions to the trial data for each treatment group, and extrapolating to provide complete curves (given that calculating the mean required all patients to have died). It modelled each treatment group separately, rather than modelling treatment as a covariate, because the logcumulative hazard plots (used to evaluate the assumption that a hazard ratio between 2 treatments remains constant over time) were not parallel and crossed. The manufacturer considered that the loglogistic function provided the best fit for overall survival for both treatment groups. The loglogistic function, however, gave a long tail (implying that some patients would live implausibly long), so the manufacturer truncated the curves at 15 years after randomisation (this assumed that all patients die by 15 years). Using this approach, the manufacturer estimated that aflibercept would extend mean overall survival by 4.7 months compared with placebo (aflibercept 22.8 months, placebo 18.1 months); without truncating the survival curves, the difference in mean overall survival was 6.6 months. In response to a clarification request by the ERG, the manufacturer provided estimates with the analysis truncated at 5 and 10 years. The manufacturer designated the results of this analysis as academic in confidence. The manufacturer also provided 'restricted' mean overall survivals for each treatment group based on actual data rather than an extrapolated model (that is, excluding patients who were alive at the end of the trial). This analysis estimated a difference in mean overall survival of 1.92 months in favour of aflibercept.
3.8 The manufacturer found that aflibercept also prolonged progressionfree survival compared with placebo; the difference in median progressionfree survival was estimated to be 2.23 months when disease progression was assessed by an independent review committee (aflibercept 6.90 months, placebo 4.67 months, hazard ratio 0.758 [95% CI 0.661 to 0.869]). The manufacturer also provided an estimate of 1.74 months for median progressionfree survival when local investigators determined disease progression. For response rate (complete and partial responses), the results favoured aflibercept, with an estimated response rate of 19.8% (95% CI 16.4 to 23.2) in the aflibercept group and 11.1% (95% CI 8.5 to 13.8) in the placebo group.
3.9 The manufacturer performed prespecified subgroup analyses according to the following:

Baseline characteristics: presence of liver metastasis, location of primary tumour, number of metastatic organs (metastases in 1 organ only, or metastases in more than 1 organ), prior history of hypertension.

Stratification variables: ECOG PS, prior bevacizumab treatment.

Demographic characteristics: age (less than 65 years old, or 65 years or older), sex, race, geographical region.
The manufacturer focused on 2 subgroups in its submission: patients with liver metastases only (prespecified), and a subgroup that excluded patients whose disease had relapsed 6 months or less after starting oxaliplatinbased adjuvant therapy (post hoc). The manufacturer stated that the subgroup of patients with liver metastases only was recognised as a relevant clinical subgroup for metastatic colorectal cancer in Cetuximab for the firstline treatment of metastatic colorectal cancer (NICE technology appraisal guidance 176). For the subgroup that excluded patients whose disease had relapsed 6 months or less after starting oxaliplatinbased adjuvant therapy, the manufacturer performed a post hoc analysis after the results of the VELOUR trial had been compiled. The manufacturer stated that 10% of patients in the trial had cancer that had relapsed within 6 months of starting oxaliplatinbased adjuvant therapy, which the manufacturer interpreted as reflecting patients with aggressive disease who would be unlikely to benefit from antivascular endothelial growth factor (VEGF) therapy.
3.10 For all the prespecified subgroups, the manufacturer carried out an analysis of overall survival. It found no evidence of heterogeneity in treatment effect (nonsignificant interaction test), except in the subgroup of patients with liver metastases only (p value for interaction was 0.0899, statistically significant at the 10% level). The hazard ratio for this subgroup was 0.649 (95.34% CI 0.492 to 0.855) compared with a hazard ratio of 0.868 (95.34% CI 0.742 to 1.015) in patients who had no liver metastases or in whom the cancer spread to the liver and other organs (estimates of survival times are academic in confidence). In response to a clarification request by the ERG, the manufacturer provided the difference in mean overall survival for the subgroup using actual, rather than extrapolated, data; this estimate is academic in confidence. In the post hoc subgroup analysis, which excluded patients whose disease had relapsed 6 months or less after starting oxaliplatinbased adjuvant therapy, the difference in median overall survival was estimated to be 1.9 months in favour of aflibercept. In this subgroup, the unadjusted hazard ratio was 0.78 (95% CI 0.68 to 0.90) compared with 1.09 (95% CI 0.70 to 1.69) in patients whose disease had relapsed 6 months or less after starting adjuvant therapy (p value for interaction 0.1265).
3.11 For progressionfree survival, the manufacturer did not find a statistically significant subgroup effect except in patients with liver metastases only (interaction test was statistically significant at the 10% level). These results, and those of the subgroup that excluded patients whose disease had relapsed 6 months or less after starting oxaliplatinbased adjuvant therapy, are academic in confidence.
3.12 The incidence of adverse events of any grade (according to the Common Terminology Criteria for Adverse Events v3.0) was similar in the aflibercept and placebo groups of the VELOUR trial (99.2% and 97.9% respectively), but the incidence of some adverse events was considerably higher in the aflibercept group (for example, 41.4% of patients receiving aflibercept had hypertension [any grade] compared with 10.7% of those receiving placebo). Grade 3–4 adverse events were reported in 83.5% of patients in the aflibercept group and 62.5% of those in the placebo group. The grade 3–4 adverse events that occurred at least twice as frequently in the aflibercept group than in the placebo group, in order of decreasing relative incidence, were: hypertension (19.3% versus 1.5%), proteinuria (7.8% versus 1.2%), handfoot syndrome (2.8% versus 0.5%), headache (1.6% versus 0.3%), arterial thromboembolic events (1.8% versus 0.5%), weight loss (2.6% versus 0.8%), stomatitis and ulceration (13.8% versus 5.0%), diarrhoea (19.3% versus 7.8%) and decreased platelet count (3.4% versus 1.6%). Typical antiVEGF adverse reactions and adverse reactions associated with FOLFIRI were more common in the aflibercept group. The manufacturer indicated that most of the adverse events associated with aflibercept plus FOLFIRI were reversible and manageable using current clinical practice, although some (physical weakness, infections, diarrhoea and hypertension) led to permanent discontinuation of study treatment in 26.8% of patients receiving aflibercept compared with 12.1% of those receiving placebo. Furthermore, the European Public Assessment Report notes that more patients in the aflibercept than the placebo groups had their dose of FOLFIRI reduced or their treatment cycle delayed.
3.13 To further characterise the adverse events of aflibercept, the manufacturer performed a metaanalysis by pooling safety data from 3 RCTs (VELOUR, VITAL and VANILLA). The VITAL trial evaluated aflibercept plus docetaxel compared with placebo plus docetaxel in patients with nonsmall cell lung cancer and, in the VANILLA trial, patients with metastatic pancreatic cancer were randomised to aflibercept plus gemcitabine or placebo plus gemcitabine. Overall, the metaanalysis included data from 2662 patients (1333 receiving aflibercept and 1329 receiving placebo). The analysis was framed so that risk ratios greater than 1 favoured placebo. The manufacturer found that, among patients treated with aflibercept, 0.4% and 0.5% had grade 4 hypertension and nephrotic syndrome respectively. It also found that adding aflibercept to concurrent chemotherapies did not increase the risk of venous thromboembolism, but it did increase the risk of grade 3–4 adverse reactions related to antiVEGF therapy; the difference in this risk was statistically significant for hypertension (risk ratio [RR] 9.21, 95% CI 5.91 to 14.36), proteinuria (RR 8.37, 95% CI 4.37 to 16.06) and haemorrhage (RR 2.04, 95% CI 1.20 to 3.47). The incidence of adverse reactions typically associated with the background chemotherapy used in the 3 RCTs also increased with the addition of aflibercept, most notably for neutropenia (including neutropenic complications), various gastrointestinal toxicities and physical weakness.
3.14 Data on healthrelated quality of life were not collected in the VELOUR trial. The manufacturer conducted the 'mCRC utilities study', an observational, crosssectional study to estimate utility values in patients with metastatic colorectal cancer who would be eligible for treatment with aflibercept plus FOLFIRI as per the licensed indication, or who had progressed to subsequent phases of the disease. The study took place in the Netherlands and the UK, and collected EQ5D data. The manufacturer used these data as its main source to estimate healthrelated quality of life for the costeffectiveness analysis.
ERG critique
3.15 The ERG stated that the manufacturer presented a wellconducted systematic review of clinical evidence, and used a search strategy that was unlikely to have missed any relevant studies. It also stated that the manufacturer included sufficient detail about the VELOUR trial and used appropriate criteria to assess the quality of the trial. The ERG noted, however, that the manufacturer provided minimal details of its metaanalysis of aflibercept's adverse events, and of the quality of the VITAL and VANILLA trials.
3.16 The ERG indicated that VELOUR was a good quality trial and directly related to the decision problem, and that the characteristics of patients at baseline and disease history were well balanced between the aflibercept and placebo groups. However, the ERG considered that patients in the trial were potentially fitter and younger than those seen in UK practice, and so patients in clinical practice may not achieve the level of benefit reported in the trial. The ERG highlighted the following dissimilarities between the VELOUR trial and clinical practice:

In the UK, patients whose disease progresses after a break in treatment during intermittent firstline palliative chemotherapy are likely to be offered repeat treatment with the firstline chemotherapy regimen. If their disease progresses while receiving this treatment, or within 6 to 8 weeks of completing it, they would then move to secondline treatment. Although the manufacturer's submission does not state how many cycles of firstline oxaliplatinbased chemotherapy patients in the VELOUR trial received, the ERG indicated that the trial population may be healthier than patients in clinical practice who may have received several cycles of firstline treatment.

Between 2007 and 2009, around 72% of patients diagnosed with colorectal cancer in the UK were aged 65 years or over. By contrast, in the VELOUR trial, only 33.5% of the aflibercept group and 38.9% of the placebo group were people aged 65 years or over.

The proportion of patients with an ECOG PS of 2 in the VELOUR trial was 2.2%. According to the ERG's clinical adviser, this is lower than the proportion reported in other trials in the secondline setting, or in UK clinical practice.

In the VELOUR trial, 42–44% of patients had metastasis in only 1 organ, which the ERG's clinical adviser considered higher than the proportion seen in clinical practice.
3.17 The ERG noted that the hazard ratios for overall survival by 6month periods had wide confidence intervals at the later time points of the VELOUR trial because by this time many patients were no longer alive, leaving few patients at risk of dying (around 5% at 30 months). The ERG stated that wide confidence intervals reflect imprecise estimates, and that interpreting hazard ratios towards the end of the trial is highly uncertain, particularly at 30 months and 36 months.
3.18 To estimate mean overall survival using parametric analysis, the manufacturer assumed that the proportional hazards assumption does not hold (that is, it did not accept that the hazard ratio between the 2 treatment groups remained constant over time). The manufacturer stated that this was because the hazard ratios for overall survival decreased over time (treatment effect improved), and because the logcumulative hazard plots were not parallel and crossed over one another. The ERG, conversely, considered that, while the hazard ratios decreased over time, they remained consistent with the proportional hazards assumption, although it acknowledged that using a proportional hazards approach is subject to judgement. In addition, the ERG noted that the logcumulative hazard plots were very close to parallel. The ERG stated that rejecting the proportional hazards assumption and assuming a continued separation of the overall survival curves is highly uncertain given that no data were available beyond 36 months' followup, and particularly that the progressionfree survival curves separate then converge at around 12 months. The ERG suggested that it would be reasonable to assume that the survival curves converge before 5 years (that is, there is no treatment effect after 5 years), in line with clinical experience in treating metastatic colorectal cancer.
3.19 The ERG noted that the estimate of mean overall survival varied considerably depending on the parametric function the manufacturer used, indicating that the manufacturer's estimates of the difference in mean overall survival (4.7 months) were not robust to the choice of distribution. The ERG requested from the manufacturer the mean estimates of overall survival for each treatment group, restricted to patients who had died before the end of the trial (that is, results based on actual data rather than an extrapolated model), which gave a difference of 1.92 months in favour of aflibercept. The ERG indicated that this figure is likely to be an underestimate given that it does not take into account the patients with long survival times.
3.20 The manufacturer used the loglogistic function to estimate mean overall survival, and it truncated the curves at 15 years. The ERG considered that 15 years is too long for the patient population under consideration because the treatment benefit is unlikely to extend beyond 5 years. The ERG requested that the manufacturer produce estimates with the analysis truncated at 5 years and 10 years. When the data were truncated at 5 years, the results from the different functions were more consistent with each other than when the data were truncated at 15 years. The ERG stated that it is unclear whether the mean based on extrapolating the curves and truncating the data at 5 years, or the restricted mean based on actual data, is more valid.
3.21 Progressionfree survival in the VELOUR trial was a secondary end point assessed by an independent review committee. The ERG advised that independent review committees may miss symptoms other than tumour growth caused by disease progression, which may have an impact on treatment duration and associated costs. The ERG noted that, when the manufacturer explored in a sensitivity analysis disease progression determined by investigator assessment taking into account symptomatic deterioration (as would happen in clinical practice), aflibercept was found to extend median progressionfree survival by 1.74 months.
3.22 The ERG stated that, while there was no evidence of a statistically significant interaction at the 5% level between treatment groups for most of the baseline patient characteristics, the results of the subgroup analyses suggested that patients with less advanced disease in the VELOUR trial (ECOG PS equal to 0, number of organs with metastasis less than or equal to 1, and patients with liver metastases only) may be more likely to benefit from treatment with aflibercept than those with more advanced cancer.
Costeffectiveness evidence
3.23 The manufacturer did not identify any published economic evaluations relevant to the decision problem. It submitted a de novo economic model to establish the cost effectiveness of aflibercept in patients with metastatic colorectal cancer who are eligible for secondline combination chemotherapy, and who were previously treated with an oxaliplatinbased regimen. The manufacturer performed subgroup analyses for patients with liver metastases only, and for a subgroup that excluded patients who had received oxaliplatinbased therapy in the adjuvant setting and whose disease relapsed within the following 6 months. The manufacturer conducted the analysis from the perspective of the NHS and personal social services and chose a time horizon of 15 years. It used a 2week treatment cycle to reflect the treatment schedules of aflibercept and FOLFIRI, and applied a halfcycle correction. Costs and health effects were discounted at an annual rate of 3.5%.
3.24 The manufacturer developed a statetransition Markov cohort model simulating 3 states: stable disease, progressed disease and death. The manufacturer further split the stabledisease health state into substates of 'on secondline treatment' and 'discontinued secondline treatment' to distinguish between patients who receive secondline treatment until their disease progresses, and those who stop second‑line treatment before their disease progresses. All simulated patients enter the model in the stabledisease health state and in the 'on second‑line treatment' substate. Patients can then continue treatment and remain in the 'on secondline treatment' substate, or move to the 'discontinued second‑line treatment' substate; they can instead move to the progresseddisease health state (and stop secondline treatment), or death. Patients cannot receive secondline treatment again once treatment is stopped, but they can receive further active therapy (systemic anticancer treatment, radiotherapy or surgery) or best supportive care. The manufacturer stated that the duration of secondline treatment in the model is based on the mean durations in the VELOUR trial to take into account dose delays or the discontinuation of aflibercept or FOLFIRI (for patients who were in the aflibercept group), or FOLFIRI (for patients who were in the placebo group), as observed in the trial. The manufacturer modelled adverse events as events (rather than health states) and it applied a utility decrement (disutility) for each adverse event.
3.25 The manufacturer's model included parameters for overall survival, progressionfree survival and time to discontinuing secondline treatment (before or after disease progression). To estimate the survival parameters, the manufacturer fitted alternative parametric functions (Weibull, lognormal, loglogistic and exponential) to observed Kaplan–Meier data from the VELOUR trial, and extrapolated the curves beyond the trial period for overall survival and time to discontinuing treatment, but not for progressionfree survival, because the disease had progressed in all patients during the trial. In extrapolating those curves, the manufacturer assumed nonproportional hazards (that is, the hazard ratios between aflibercept plus FOLFIRI and FOLFIRI alone varied over time) so it modelled each treatment group separately. The manufacturer chose the basecase survival functions based on the results of statistical tests, visual inspection of the fit to the data and the clinical plausibility of the extrapolated portion of the curve. For overall survival, the manufacturer used the loglogistic function, and assumed that the survival benefit from treatment with aflibercept plus FOLFIRI increases relative to treatment with FOLFIRI alone until around 12 months after starting treatment, and then decreases over the 15year time horizon, but does not cease at any point during the extrapolation period (that is, the overall survival curves start converging 12 months after starting treatment but never fully converge later in the extrapolation period). The manufacturer used the Weibull function for progressionfree survival and time to treatment discontinuation. The difference in mean progressionfree survival estimated by the manufacturer was 1.2 months in favour of aflibercept. Other parametric functions were explored in scenario analyses.
3.26 The manufacturer stated that the model predicted a median overall survival and a median progressionfree survival similar to those from the VELOUR trial. The largest difference was for progressionfree survival in the FOLFIRI group, which the model overestimated compared with the survival time observed in the trial.
3.27 Adverse events in the model included grade 3–4 adverse events that affected more than 5% of patients in the VELOUR trial, together with 6 rarer adverse events that the manufacturer's clinical advisory board considered important (gastrointestinal perforation, haemorrhage, febrile neutropenia, peripheral neuropathy, urinary tract infections and handfoot syndrome). The subgroup analyses incorporated data specific to each subgroup.
3.28 The manufacturer applied utility values in the model from its 'mCRC utilities study', in which investigators assigned patients to 1 of the following 3 groups: patients with stable disease who are receiving secondline treatment, and patients who had previously received secondline treatment but stopped it because of an adverse event, or because their disease progressed. Because the sample size of the group of patients who had an adverse event and stopped treatment was very small, the manufacturer did not use the utility estimates from this group, and instead assumed that all patients with stable disease have the same utility, equal to the utility of patients with stable disease who are receiving secondline treatment. The manufacturer got descriptions of health states from patients using the EQ5D system, and derived the utility weights by applying UK valuation of health states estimated using the time tradeoff method. The utility estimate used in the model for patients with progressed disease was 0.708. The manufacturer assumed that the utility in the progresseddisease health state is independent of time spent in the state. The manufacturer explained that, despite the age and health of patients, the utility values used in the model are relatively high because candidates for secondline chemotherapy must be fit enough to receive treatment.
3.29 The manufacturer also identified relevant utility studies from a systematic review of the literature. It did not use the values in those studies to source the model, but used them to compare the estimates from its utility study, and noted that they were reasonably consistent. The utility estimates in the literature that the manufacturer considered relevant ranged from 0.73 to 0.81 for stable disease, and from 0.68 to 0.69 for the progressed disease. One other study, Best et al. (2010), reported utility values of 0.51 for stable metastatic disease and 0.21 for progressed metastatic disease, but the manufacturer did not consider this study relevant because the population included patients receiving adjuvant chemotherapy and patients in remission.
3.30 The manufacturer got the disutilities associated with adverse events from the published literature, and supplemented these with clinical expert opinion. To calculate the average disutility per adverse event, the manufacturer assumed that an adverse event causes the same disutility regardless of the type of cancer. This gave an average disutility per adverse event of −0.0127 for patients receiving aflibercept plus FOLFIRI, and −0.0108 for those receiving FOLFIRI alone.
3.31 The costs of aflibercept plus FOLFIRI and FOLFIRI alone did not depend on the duration of secondline treatment in the model; the manufacturer calculated them separately based on data from the VELOUR trial to reflect the dose delays (for example, because of an adverse event) and dose reductions observed in the trial. It assumed that any unused drug in a vial was discarded (wasted) for aflibercept and irinotecan (a component of FOLFIRI), but explored in scenario analyses other possibilities to model drug wastage. The cost of aflibercept in the model took into account the patient access scheme discount.
3.32 To estimate costs of caring for people with metastatic colorectal cancer ('management costs' including supportive medications, clinician and nurse visits [hospital and community], imaging, laboratory tests, hospitalisations, palliative care, and personal and social care), the manufacturer conducted a retrospective observational study, and undertook a questionnairebased survey of 6 UK clinical oncologists (both unpublished studies). In the observational study, the manufacturer collected resourceuse data from patients who received oxaliplatinbased chemotherapy followed by FOLFIRI as secondline treatment, and used those data to estimate total management costs per 2week cycle for different groups of patients (the manufacturer advised that every patient would eventually receive endoflife care regardless of prior treatment, so it did not include resource use associated with endoflife care in the model). The clinician survey aimed to gather data on communitybased care, and on personal and social care. In this, the manufacturer elicited the average treatment practices of each oncologist to get data on managing patients with metastatic colorectal cancer. It also used the results of the survey, together with NHS reference costs, to estimate the costs associated with adverse events. The manufacturer used mean resource use for adverse events, but median resource use for communitybased care, and personal and social care. The cost of subsequent therapies that patients could receive after stopping secondline treatment or experiencing disease progression was calculated based on the manufacturer's study of resource use, and was assumed to be independent of the type of secondline treatment.
3.33 The manufacturer's deterministic basecase results estimated that the addition of aflibercept to FOLFIRI provides an additional 0.243 qualityadjusted life years (QALYs). This benefit is achieved with an additional cost of £8816, resulting in an incremental costeffectiveness ratio (ICER) of £36,294 per QALY gained for aflibercept plus FOLFIRI compared with FOLFIRI alone.
3.34 The manufacturer presented deterministic sensitivity analyses in which it varied the 20 parameters with the largest impact on the ICER, one at a time. The results showed that the ICER is most sensitive to the parametric function chosen for overall survival, the utility value chosen for the progresseddisease health state, and the number of administrations assumed for secondline treatment drugs. The manufacturer explained that improving overall survival and progressionfree survival increased incremental QALYs in favour of aflibercept, but also increased drug costs and the costs incurred from prolonged overall survival after disease progression.
3.35 The manufacturer carried out a probabilistic sensitivity analysis to summarise the uncertainty in the ICER. This showed that the probability of aflibercept plus FOLFIRI being cost effective when compared with FOLFIRI alone is less than 5% if the maximum acceptable ICER is £20,000 per QALY gained, and 22% at £30,000 per QALY gained.
3.36 The manufacturer investigated the structural uncertainty in the model by fitting alternative parametric functions for overall survival and progressionfree survival, and by directly applying patientlevel data from the VELOUR trial to model progressionfree survival (given that disease had progressed in all patients during the trial). It also performed scenario analyses to test the sensitivity of the ICER to alternative assumptions around drug wastage. In these, it explored the possibility of no drug wastage, and of reducing the dose to the nearest number of whole vials for patients who would otherwise use less than 5% of the vial contents. The highest ICER from these analyses was £49,805 per QALY gained (using the Weibull function to model overall survival).
3.37 The manufacturer provided subgroup analyses to establish the cost effectiveness of aflibercept plus FOLFIRI compared with FOLFIRI alone in patients with liver metastases only, and in a subgroup that excluded those who had received oxaliplatinbased therapy in the adjuvant setting and whose disease had relapsed within the following 6 months. In comparison with the deterministic basecase ICER of £36,294 per QALY gained, the ICERs were £30,474 per QALY gained (incremental costs £10,974, incremental QALYs 0.360) and £32,480 per QALY gained (incremental costs £8573, incremental QALYs 0.264) respectively. At a maximum acceptable ICER of £30,000 per QALY gained, the probability of aflibercept plus FOLFIRI being cost effective compared with FOLFIRI alone in both subgroups is around 50% (numerical values not provided in the manufacturer's submission).
ERG critique
3.38 The ERG indicated that the manufacturer's economic evaluation is consistent with the NICE reference case. It noted that the modelled population is based on data from the VELOUR trial, which relate to patients who appear fitter and younger than those seen in clinical practice. In exploratory sensitivity analyses, the ERG investigated the effect of treating a population that better reflects patients with metastatic colorectal cancer in the UK than the VELOUR trial by modelling an older population with a lower healthrelated quality of life.
3.39 The ERG considered that it is uncertain whether the hazard ratio for overall survival varies over time. The ERG reported that, when assuming in the manufacturer's model that the hazard ratio remains constant over time (that is, when applying the proportional hazards assumption), the ICER increased to £58,784 per QALY gained, with the difference being mainly driven by a reduction in incremental QALYs compared with the manufacturer's base case. The ERG considered that even this scenario may be relatively optimistic because the progressionfree survival curves separate and then converge at around 12 months, suggesting that the hazard ratio could increase over time.
3.40 In its costeffectiveness analysis, the manufacturer assumed that the survival benefit from treatment with aflibercept plus FOLFIRI initially increases relative to treatment with FOLFIRI alone until around 12 months after starting treatment, and then decreases over the rest of the time horizon, but does not cease at any point during the extrapolation period. The ERG noted that the difference in overall survival between aflibercept plus FOLFIRI and FOLFIRI alone decreases at a relatively slow rate after the initial 12 months and, importantly, suggests a continuing treatment effect on overall survival during the entire 15year horizon. The ERG explained that extrapolating overall survival data from the VELOUR trial, in which the median followup time was just under 2 years, over a 15year time horizon meant that the assumptions underpinning the extrapolation are key to explaining the large differences between the observed median and the extrapolated mean estimates of overall survival. The ERG stressed that extrapolating the overall survival curves beyond the trial period is highly uncertain given that no data were available for more than 3 years' followup, and particularly that the progressionfree survival curves separated and then converged at around 1 year. The ERG stated that the manufacturer did not explore this uncertainty sufficiently. Specifically, the manufacturer did not explore whether the risk of death in the aflibercept plus FOLFIRI and FOLFIRI alone groups could become the same from the point at which the trial ends (that is, the treatment effect of aflibercept plus FOLFIRI does not continue over the extrapolation period). In addition, it did not explore whether the overall survival curves for aflibercept plus FOLFIRI and FOLFIRI alone could converge over the extrapolation period (that is, the treatment effect of aflibercept plus FOLFIRI gradually decreases from the point at which the trial ends), similar to the convergence observed with progressionfree survival (in this scenario the risk of death may be higher in the aflibercept plus FOLFIRI group during the extrapolation period than in the FOLFIRI alone group). The ERG explored these 2 scenarios in its exploratory analyses.
3.41 Regarding the utility estimates in the model, the ERG had concerns about the generalisability of the manufacturer's 'mCRC utilities study' because the study population appeared to be younger than UK patients, and the proportion of patients who had an ECOG PS of 2 was lower than that seen in UK clinical practice. Moreover, the ERG noted that the study was small, and produced counterintuitive estimates in a subgroup analysis including UK patients only because the mean utility value for patients whose disease progressed was higher than for those who had stable disease and received secondline treatment.
3.42 The ERG was concerned that the utility estimates used in the model from the manufacturer's utility study, as well as those reported in the literature, were high when compared with values used in previous appraisals of metastatic colorectal cancer, or with general UK population norms. The ERG was particularly concerned about the utility value in the model for patients whose disease had progressed. The ERG explained that, because the model predicts longer overall survival than progressionfree survival, approximately threequarters of absolute QALY increment is accrued after disease progression. Furthermore, the ERG stated that the manufacturer's assumption that utility in the progresseddisease health state is independent of time spent in the state is clinically implausible because patients' healthrelated quality of life decreases as disease progresses and patients get older.
3.43 The ERG identified an error in the manufacturer's model in how disutilities associated with adverse events were applied, which reduced the disutilities in the model. Correcting this error increased the manufacturer's basecase ICER from £36,294 to £37,834 per QALY gained. The ERG applied this correction in its exploratory analyses.
3.44 The costs of aflibercept plus FOLFIRI and FOLFIRI alone did not depend on the duration of secondline treatment in the model; the manufacturer calculated them separately based on data from the VELOUR trial to reflect the dose delays (for example, because of an adverse event) and dose reductions observed in the trial. The ERG stated that an alternative way to reflect dose delays and reductions would be to apply drug costs per administration (including administration costs) directly to the proportion of patients in each health state, in line with how utility values are applied. Adjusting this increased the manufacturer's basecase ICER from £36,294 to £37,539 per QALY gained. The ERG applied this change in its exploratory analyses.
3.45 The manufacturer assumed that, because aflibercept is administered at the same time as FOLFIRI, no extra costs in terms of additional staff or inpatient admissions would be incurred. The ERG indicated that, even if given simultaneously, administering aflibercept involves preparing an additional infusion, which incurs an extra cost compared with FOLFIRI alone. The ERG highlighted that, in Cetuximab, bevacizumab and panitumumab for the treatment of metastatic colorectal cancer after firstline chemotherapy (NICE technology appraisal guidance 242), the pharmacy preparation of cetuximab and bevacizumab was estimated to be £15 per infusion. In addition, the ERG stated that, if aflibercept is given before or after FOLFIRI, instead of at the same time, administering aflibercept will include an additional hour of infusion time compared with administering FOLFIRI alone. The ERG noted that the model is sensitive to the assumptions underlying the administration costs of aflibercept plus FOLFIRI, and it explored these assumptions in sensitivity analyses.
3.46 Regarding resource use for community, and personal and social care, the manufacturer modelled the median estimate from its survey of clinical oncologists, instead of the mean. The ERG indicated that mean values are more commonly used in costeffectiveness analyses, and that the use of medians may underestimate expected costs. The ERG noted that, when the manufacturer used the mean value in a sensitivity analysis, the basecase ICER increased from £36,294 to £41,222 per QALY gained. The ERG stated that it is unclear in this case whether the median is a better estimate than the mean because there was a small number of survey responders (n=6) and the data were skewed. The ERG noted that the model is sensitive to this parameter and it further explored this in sensitivity analyses.
3.47 The ERG advised that the results of the analysis of the liver metastases only subgroup should be interpreted cautiously. Because the parametric curves for overall survival and progressionfree survival were fitted independently for each treatment group based on data for this subgroup from the VELOUR trial, and the subgroup corresponded to approximately 25% of the trial population, the ERG highlighted that the analysis may not have been powered to demonstrate a difference in treatment effect in this subgroup. For the analysis of the subgroup that excluded adjuvant chemotherapy, the ERG indicated that this analysis was performed post hoc, and so its results may be biased. The ERG's clinical advisers also stated that patients who receive adjuvant chemotherapy and whose disease relapses quickly afterwards would not be treated differently from other patients in UK clinical practice.
ERG exploratory analyses
3.48 The ERG investigated the uncertainty around how the manufacturer had chosen to extrapolate overall survival by considering other scenarios for the magnitude and duration of the overall survival benefit associated with secondline treatments. The ERG modelled the following scenarios by assuming that:

The risk of death in the aflibercept plus FOLFIRI and FOLFIRI alone groups becomes the same 30 months after starting treatment.

The risk of death in aflibercept plus FOLFIRI and FOLFIRI alone groups becomes the same 36 months after starting treatment.
The ERG implemented the following scenarios to mimic the converging progressionfree survival curves. 
The survival curves begin converging 30 months after starting treatment, and come together after a further 12 months, after which point the risk of death in both treatment groups becomes the same until the end of the time horizon.

The survival curves begin converging 30 months after starting treatment, and come together after a further 18 months, after which point the risk of death in both treatment groups becomes the same until the end of the time horizon.

The survival curves begin converging 36 months after starting treatment, and come together after a further 12 months, after which point the risk of death in both treatment groups becomes the same until the end of the time horizon.

The survival curves begin converging 36 months after starting treatment, and come together after a further 18 months, after which point the risk of death in both treatment groups becomes the same until the end of the time horizon.
In all of the above scenarios, the ERG assumed that the treatment effect of aflibercept plus FOLFIRI continues until either 30 months or 36 months. The ERG chose these time points because it identified them as particularly uncertain from the hazard ratios for overall survival by 6month periods presented by the manufacturer. When the ERG assumed that the risk of death in the aflibercept plus FOLFIRI and FOLFIRI alone groups becomes the same beyond the trial period, the ICERs were £45,570 and £42,718 per QALY gained for a treatment effect of aflibercept plus FOLFIRI lasting until 30 months or 36 months respectively. In the scenario in which the ERG assumed that the survival curves begin converging 30 months or 36 months after starting treatment over a period of 12 months or 18 months, the ICERs ranged from £55,424 per QALY gained (when curves begin converging after 36 months over 18 months) to £66,377 per QALY gained (when curves begin converging after 30 months over 12 months). The ERG explained that, in this scenario, when the curves begin converging over 12 months, the magnitude of the additional survival benefit from treatment with aflibercept plus FOLFIRI is assumed to taper at a higher rate than when the curves begin converging over 18 months, and so convergence over 12 months results in higher ICERs.
3.49 To address its concerns about some of the parameters used in the manufacturer's basecase model, the ERG performed the following sensitivity analyses, varying 1 parameter at a time:

Applying 2 alternative utility values for patients whose disease progressed: 0.21 from Best et al. (2010) and 0.60 from Bevacizumab and cetuximab for the treatment of metastatic colorectal cancer (NICE technology appraisal guidance 118). The ERG stated that the latter may better reflect the values reported in the literature.

Including a cost for preparing an additional infusion of aflibercept, and a cost for an additional hour of infusion time for aflibercept plus FOLFIRI compared with administering FOLFIRI alone. For the preparation cost, the ERG applied a cost of £15, in line with NICE technology appraisal guidance 242 and, for the extra time for infusion, it applied £45, based on NHS reference costs. The ERG explored the impact of these 2 assumptions separately and jointly.
When the ERG used the lower utility values of 0.21 and 0.6, the ICER increased from £36,294 per QALY gained (basecase ICER) to £71,143 and £40,608 per QALY gained respectively. Including a cost for preparing an additional infusion of aflibercept, and a cost for an additional hour of infusion time for aflibercept plus FOLFIRI, together increased the ICER to £39,258 per QALY gained.
3.50 The ERG applied its preferred adjustments and model inputs to the manufacturer's basecase model (hereafter the 'ERG base case'). In this, the ERG corrected the error it identified in the manufacturer's model (section 3.43), and applied the acquisition and administration costs to all patients in the secondline treatment health state of the model (section 3.44). In addition, the ERG assumed that patients entered the model at the age of 70 years and accounted for the impact of age on healthrelated quality of life by applying a utility decrement for aging. The ICER resulting from the above 3 changes was £41,653 per QALY gained. The ERG then applied its preferred model inputs for the parameters it varied in oneway sensitivity analyses:

an additional administration cost for aflibercept of £15

mean instead of median resource use estimates (section 3.46).
The ERG applied the above with or without: 
a utility value of 0.60 for patients whose disease had progressed.
When the ERG applied the 0.60 utility value, the analysis gave an ICER of £54,368 per QALY gained for aflibercept plus FOLFIRI compared with FOLFIRI alone. Without this modification (that is, using the same value in the manufacturer's base case), the ICER was £47,965 per QALY gained.
3.51 The ERG presented deterministic results for the scenario analyses (section 3.48) within its base case, and using the utility value of 0.60 for patients whose disease had progressed. It presented results for the overall population, and separately for each subgroup the manufacturer had identified. When the ERG assumed that the risk of death in the aflibercept plus FOLFIRI and FOLFIRI alone groups becomes the same beyond the trial period, the ICERs were £66,506 and £62,894 per QALY gained for a treatment effect of aflibercept plus FOLFIRI lasting until 30 months or 36 months respectively. In the scenario in which the ERG assumed that the survival curves begin converging 30 months or 36 months after starting treatment over a period of 12 months or 18 months, the ICERs ranged from £78,226 per QALY gained (when the curves begin converging after 36 months over 18 months) to £92,089 per QALY gained (when the curves begin converging after 30 months over 12 months). The ERG found that, using median resource‑use estimates from the manufacturer's survey of UK oncologists (that is, as per the manufacturer's base case), instead of mean, consistently decreased the ICERs for the scenario analyses within the ERG base case by approximately £5000 per QALY gained.
3.52 For the subgroup analyses combining the ERG's assumptions of overall survival and the ERG's alternative base case, the ICER for aflibercept plus FOLFIRI compared with FOLFIRI ranged from £46,576 to £58,257 per QALY gained for the liver metastases only subgroup, and from £57,224 to £80,187 per QALY gained for the subgroup that excluded patients who had received adjuvant oxaliplatinbased therapy and whose disease had relapsed within the following 6 months.
Manufacturer's response to consultation on the appraisal consultation document
3.53 To address the Committee's considerations of the evidence described in the appraisal consultation document, the manufacturer submitted a response to the consultation, which included:

a revised patient access scheme discount (the details of which are commercial in confidence),

utility data for the stabledisease state from an interim analysis of a phase III study (ASQoP), and

proposed changes to parameters in the model considered by the Committee.
3.54 The ASQoP study was an international singlearm openlabel phase III study. The primary objective of the study was to evaluate the safety of aflibercept in patients with metastatic colorectal cancer whose disease progressed following treatment with an oxaliplatinbased regimen. Its secondary objective was to establish healthrelated quality of life in this population. Because the study was not completed at the time of the second Committee meeting, the manufacturer provided interim results for mean EQ5D utility values at baseline and after patients received 3 and 5 cycles of treatment. Data from this study were available for the stabledisease state only. The manufacturer derived a utility value of 0.78 for the stabledisease state by using a weighted average of the utility values for patients who received 3 and 5 cycles of treatment.
3.55 In its response, the manufacturer made the following comments on some of the parameters in the model originally considered by the Committee:

The manufacturer considered that it was more clinically plausible to assume that the hazard ratio tapers to 1.0 after the end of the trial over a short period of time than to assume that the hazard ratio immediately changes to 1.0 at the end of the trial (the Committee's preferred extrapolation scenario).

The manufacturer did not agree that the utility value chosen by the ERG for the progresseddisease state in its base case (0.6) was appropriate because it was based on a comparison with population 'norm' data that reflects the general population, which includes people with significant morbidities. The manufacturer stated that the utility value for progressed disease used in its original base case came from a relevant 'realworld' study that met the requirements of the NICE reference case. However, the manufacturer acknowledged that, according to clinical opinion, healthrelated quality of life declines sharply towards the end of life for patients with metastatic colorectal cancer.

The manufacturer considered that assuming a starting age of 70 years in the model (as in the ERG base case) was too high according to available evidence and feedback from experts, and that a starting age of 60 years was more appropriate. The manufacturer provided the average age of patients with metastatic colorectal cancer receiving secondline treatment in 4 UK observational studies. It stated that these data were closer to the average age of patients in the VELOUR trial (60 years) than the average age used by the ERG (70 years).

The manufacturer argued that the median value, rather than the mean value, from its survey of clinical oncologists was more appropriate for estimating resource use. This was because the data on the parameter for the number of visits received by a patient from a palliative care team contained a clear outlier, which had a significant impact on the ICERs. The manufacturer further stated that the monthly cost of managing a patient whose disease had progressed used in NICE technology appraisal guidance 242 was closer to the median value than the mean.
3.56 The manufacturer revised its original base case by:

applying a revised discount to the patient access scheme

assuming that, 36 months after starting treatment, the hazard ratio for overall survival tapers to 1.0 over a 12month period

assuming that patients enter the model at the age of 60 years, and accounting for the impact of age on healthrelated quality of life by applying a utility decrement for aging

updating the utility value of 0.78 for the stabledisease state from the ASQoP study

correcting the disutilities associated with adverse events (section 3.43)

including a cost of £15 for preparing an additional infusion of aflibercept, and a cost of £45 for additional administration time (£60 in total).
The manufacturer's deterministic results of the revised base case estimated that the addition of aflibercept to FOLFIRI would provide an additional 0.20 QALYs. This estimated benefit would cost an additional £8500, resulting in an estimated ICER of £42,242 per QALY gained for aflibercept plus FOLFIRI compared with FOLFIRI alone. The probabilistic ICER from this analysis was estimated to be £42,197 per QALY gained, and the probability of aflibercept plus FOLFIRI being cost effective when compared with FOLFIRI alone was around 10% if the maximum acceptable ICER was £30,000 per QALY gained, and 72% at £50,000 per QALY gained.
3.57 The manufacturer performed the following scenario analyses, in which it varied one parameter at a time:

assuming that, 30 months after starting treatment, the hazard ratio for overall survival tapers to 1.0 over a 12month period

assuming that, 24 months after starting treatment, the hazard ratio for overall survival tapers to 1.0 over a 12month period

assuming that, 36 months after starting treatment, the hazard ratio changes to 1.0

assuming that, 30 months after starting treatment, the hazard ratio changes to 1.0

assuming that patients enter the model at the age of 65 years (while also applying a utility decrement for aging)

applying the utility value for the stabledisease health state from the 'mCRC utilities study' (the value used in the manufacturer's original base case)

applying a utility value of 0.3 during the last 2 months of life

applying the mean value from its survey of clinical oncologists after excluding the outlier in the data on the number of visits received by a patient from a palliative care team

applying the cost of managing disease progression used in NICE technology appraisal guidance 242.
The ICERs resulting from these scenario analyses ranged from £42,002 per QALY gained (when a utility value of 0.3 was applied during the last 2 months of life) to £47,246 per QALY gained (when the hazard ratio for overall survival begins tapering to 1.0 24 months after starting treatment over a 12month period).
ERG critique of the manufacturer's revised base case
3.58 The ERG stated that the manufacturer's extrapolation of overall survival in its revised base case was not based on new data, and so the ERG did not consider it any more plausible than the other scenarios previously presented to the Committee.
3.59 The ERG considered that the manufacturer's assumption of a 60year age for starting treatment in the model was unrealistic, noting that 3 of the 4 observational studies provided by the manufacturer reported an average starting age of 63 years. However, the ERG also accepted that a starting age of 70 years may be high, and that an age of 65 years was a satisfactory compromise.
3.60 The ERG considered it appropriate for the manufacturer to have sourced the stabledisease utility value from the ASQoP study. However, the ERG argued that, because the manufacturer applied this value in the model for patients both on and off treatment, it would have been more appropriate to use the utility value of 0.77 for patients before they started treatment than the value for patients receiving treatment. The ERG indicated that the manufacturer's approach may have biased the utility value if patients receiving treatment were healthier than those who were not on treatment.
3.61 The ERG was concerned that, for the progresseddisease health state, the manufacturer continued to use the utility value from its 'mCRC utilities study', which the ERG considered high. Regarding the scenario analysis in which the manufacturer applied a utility value of 0.3 during the last 2 months of life, the ERG stated that this was not based on empirical evidence.
3.62 The ERG agreed that the estimate from the manufacturer's survey of UK oncologists included an outlier. It considered that using the mean value after excluding this outlier (as in the manufacturer's scenario analysis) was more appropriate than using the median.
3.63 To address remaining uncertainties, the ERG altered the manufacturer's revised base case by applying the utility value before treatment from the ASQoP study for the stabledisease state; the progresseddisease utility value of 0.6; and the mean resource use estimate from the manufacturer's survey of UK oncologists after excluding the potential outlier; and assuming patients start treatment at the age of 60 or 65 years. The ERG applied these changes together with each of the following extrapolation scenario:

assuming a hazard ratio of 1.0 30 months after starting treatment

assuming a hazard ratio of 1.0 36 months after starting treatment

assuming that, 24 months after starting treatment, the hazard ratio tapers to 1.0 over 12 months

assuming that, 30 months after starting treatment, the hazard ratio tapers to 1.0 over 12 months.
When the ERG assumed that patients start treatment at the age of 60 years, the resulting ICERs with the above scenarios were £54,243, £50,991, £55,139 and £51,296 per QALY gained respectively. When it assumed that patients start treatment at the age of 65 years, the ICERs were £54,890, £51,634, £55,791 and £51,941 per QALY gained respectively.
3.64 The ERG presented estimates of the difference in mean overall survival for different time horizons, while assuming a hazard ratio of 1.0 after 30 or 36 months. When the ERG set the time horizon to 5, 10 and 15 years, the differences in mean overall survival were 2.7–2.8, 3.2–3.5 and 3.4–3.7 months respectively.
3.65 Full details of all the evidence are in the manufacturer's submission and the ERG report.