3 The manufacturer's submission

The Appraisal Committee (appendix A) considered evidence submitted by the manufacturer of tocilizumab and a review of this submission by the Evidence Review Group (ERG; appendix B).

3.1 The manufacturer submitted evidence for the two populations defined in the decision problem: population 1 – children and young people aged 2 years and older with systemic JIA that has not responded adequately to prior NSAIDs and systemic corticosteroids; and population 2 – children and young people aged 2 years and older with systemic JIA that has not responded adequately to prior NSAIDs, systemic corticosteroids and methotrexate. For population 1 the manufacturer compared tocilizumab with methotrexate. For population 2 the manufacturer carried out indirect comparisons of tocilizumab with tumour necrosis factor-alpha (TNF-alpha) inhibitors and anakinra.

3.2 In the manufacturer's submission, evidence of clinical effectiveness was based on one randomised controlled trial (TENDER). The TENDER trial is an ongoing three-part, 5-year, phase III study. Part one consisted of a 12-week international multicentre randomised double-blind placebo-controlled parallel two-group study to evaluate the efficacy and safety of tocilizumab in children with active systemic JIA. Part two is a 92-week single-group open-label extension and part three is a 3-year single-group open-label continuation of the study. The manufacturer stated that based on the inclusion criteria of the TENDER trial, all participants matched population 1 in the scope. The manufacturer also stated that 95% of TENDER trial participants who were either treated with methotrexate or had had methotrexate in the past matched population 2 in the scope because it was these participants whose disease could be regarded as having responded inadequately to methotrexate. Patients were included in the study if they had symptoms of active disease and the manufacturer stated that 'it follows that if patients have tried in the past or are currently administered methotrexate and continue to have persistent disease then they are inadequate responders'. An inadequate response to methotrexate was defined as patients being on a standard dose of methotrexate for a period of 3 months and still showing symptoms of active systemic JIA at baseline.

3.3 TENDER enrolled 112 participants (from 17 countries, including the UK) who were randomised 2:1 to tocilizumab (n = 75) or placebo (n = 37). Tocilizumab was administered every 2 weeks at a dose of 8 mg/kg for participants who weighed at least 30 kg (n = 37) and 12 mg/kg for those who weighed less than 30 kg (n = 38). Ages of patients in the trial ranged from 2 to 17 years, with an average age of 10 years. Patients had to have documented persistent disease activity (at least five active joints, or at least two active joints with fever above
38°C for any 5 out of 14 days of screening) for at least 6 months, and an inadequate response to NSAIDs and corticosteroids because of toxicity or lack of efficacy. An inadequate response to previous treatment was determined by the treating physician's clinical assessment. Before study entry, 78 out of 112 patients (70%) had been treated with methotrexate (36 entered the study on methotrexate that had been previously stopped then restarted; 42 were on their first course of methotrexate, which was ongoing). Approximately 26% (29) of patients were not on methotrexate at baseline but had received and stopped methotrexate previously. Five patients (approximately 5%) had never received methotrexate, and were considered methotrexate naive. Patients taking NSAIDs, corticosteroids and methotrexate were permitted to take part but had to enter the study on a stable dose of the medicines.

3.4 The primary outcome measures were the proportion of patients who had a JIA American College of Rheumatology (ACR) 30 response at 12 weeks and absence of fever (defined as no recorded temperature of 37.5°C or above in the preceding 7 days). A JIA ACR30 response is defined as an improvement of at least 30% from the baseline assessments in any three of six core outcome variables, with no more than one of the remaining variables deteriorating by more than 30%. The JIA core outcome variables are: physician global assessment of disease activity (100 mm visual analogue scale [VAS]); parent or patient global assessment of overall well-being (100 mm VAS); number of joints with active arthritis; number of joints with limitation of movement; erythrocyte sedimentation rate; and functional ability (using the Childhood Health Assessment Questionnaire [CHAQ], which measures eight everyday functional activities).

3.5 The secondary outcomes were: individual results for each JIA ACR core outcome variable at 12 weeks; JIA ACR 50/70/90 responses at 12 weeks (that is, an improvement of at least 50%, 70% or 90% respectively from the baseline assessments in any three of the six core outcome variables, and no more than one of the remaining variables worsening by more than 50%, 70% or 90%); corticosteroid reduction; fever; rash; pain; and laboratory outcomes (C-reactive protein [CRP]) levels, anaemia and haemoglobin levels, thrombocytosis and leucocytosis).

3.6 Efficacy endpoints were analysed using the intention-to-treat population. All patients were classified as either responders or non-responders. Patients who 'escaped' (patients whose disease did not respond to treatment who switched to an alternative treatment for the disease) or withdrew were classed as non-responders. There was an 'early escape' option to allow children with more severe disease at baseline an opportunity to escape and receive active open-label tocilizumab. Of the 112 patients enrolled, 21 received escape therapy, with 20 of those patients being initially randomised to the placebo arm. The main reasons for escape were fever for at least 3 consecutive days or a JIA ACR30 flare (a worsening of symptoms).

3.7 The results of the TENDER trial showed that for its primary endpoint (a JIA ACR30 response and absence of fever at week 12), 85.3% of the tocilizumab patients were classed as responders compared with 24.3% of the placebo patients, a statistically significant difference (p < 0.0001). Patients given tocilizumab had a greater chance of achieving JIA ACR30/50/70/90 responses at week 12 in comparison with the placebo patients. The differences in the proportions of tocilizumab patients and placebo patients at each JIA ACR response level were statistically significant (p < 0.0001). The proportion of responders showing an ACR30 response was higher in patients receiving tocilizumab 12 mg/kg (97.4%) compared with those receiving tocilizumab 8 mg/kg (83.8%). The efficacy of tocilizumab with respect to individual ACR core outcome variables was analysed as part of the secondary efficacy analyses; these results are marked by the manufacturer of tocilizumab as academic in confidence and therefore are not presented here.

3.8 The TENDER trial also included the Child Health Questionnaire (CHQ) as an instrument eliciting patient health-related quality of life. The CHQ assesses a child's physical, emotional and social wellbeing from the perspective of a parent or carer. The questionnaire was completed twice during the randomised period of the study: at baseline (visit 1) and at week 12 (visit 7).

3.9 The TENDER trial included data on adverse events. Infusion-related reactions were defined as all events occurring during or within 24 hours of an infusion. In the 12-week controlled phase, 4% of patients from the tocilizumab group experienced adverse events during infusion. One event (angioedema) was considered serious and life-threatening, and the patient stopped study treatment. In the 12-week controlled phase, 16% of patients in the tocilizumab group and 5.4% of patients in the placebo group experienced an adverse event within 24 hours of infusion. In the tocilizumab group, the adverse events included, but were not limited to, rash, urticaria, diarrhoea, epigastric discomfort, arthralgia and headache. One of these adverse events, urticaria, was considered serious. Clinically significant hypersensitivity reactions associated with tocilizumab that meant treatment was stopped were reported in 1 out of 112 patients (less than 1%) treated with tocilizumab during the controlled phase and up to and including the open-label clinical trial.

3.10 For the comparison of tocilizumab and methotrexate for population 1, the manufacturer used a post-hoc analysis to compare patients receiving tocilizumab with the 70% of patients in the placebo group who were receiving methotrexate. The manufacturer presented results that showed methotrexate had limited effect on the primary outcome in patients who were in the placebo group and those treated with tocilizumab. The manufacturer concluded that methotrexate as add-on therapy did not have a significant impact on the JIA ACR responses observed in the tocilizumab arms in the TENDER study. The manufacturer further presented results showing that the proportion of patients on tocilizumab who had an ACR30 response was 0.907, compared with 0.154 for those on methotrexate.

3.11 No head-to-head trials were available analysing the efficacy of tocilizumab compared with TNF-alpha inhibitors or anakinra for population 2. The manufacturer included data from two studies (Ruperto et al. 2007 [NCT00036374] and the ANAJIS [anakinra in patients with systemic-onset juvenile idiopathic arthritis] study) in the indirect comparison analysis. The NCT00036374 trial compared the TNF-alpha inhibitor infliximab with placebo in patients with juvenile rheumatoid arthritis (systemic 16%, pauciarticular 23%, polyarticular 61%) described as having a suboptimal response to methotrexate. Participants were from North and South America and Europe, aged between 4 and 18 years, and were randomised to infliximab (62 patients) or placebo (60 patients). Patients received concomitant methotrexate alongside placebo or active treatment. The study was a randomised double-blind placebo-controlled trial, and the primary outcome was the proportion of patients who had a paediatric ACR30 response based on JIA core outcome variables at week 14.

3.12 The ANAJIS trial recruited children with systemic JIA and compared anakinra with placebo. This was a multicentre study with 24 participants (12 in each arm) aged 2–20 years, from North America and Europe. The study included patients whose systemic JIA had not responded to methotrexate or any of the disease-modifying anti-rheumatic drugs (DMARDs), and the protocol did not permit the administration of any DMARDs during the trial. The outcomes of the randomised controlled phase were reported after 1 month. The primary outcome was the paediatric ACR score, absence of fever and normalisation of CRP levels and erythrocyte sedimentation rate after 1 month.

3.13 The manufacturer undertook an indirect comparison of tocilizumab compared with anakinra. Data from the ANAJIS and the TENDER trials were used. The manufacturer used all patients in the TENDER trial, including those who were methotrexate naive, for the analysis. The relative risk for an outcome of a JIA ACR30 response for patients on tocilizumab compared with anakinra was 2.37 (95% CI 1.10 to 5.10), which was statistically significant. There were no significant differences in JIA ACR30 response and absence of fever between the anakinra and tocilizumab populations. The manufacturer also conducted an indirect comparison of tocilizumab and infliximab using the results from the NCT00036374 trial and the TENDER trial. The outcomes JIA ACR30, 50 and 70 responses were measured. Patients on tocilizumab had a statistically significantly greater chance of having these outcomes than those on infliximab. The relative risks were 2.87 (95% CI 1.49 to 5.55), 5.35 (95% CI 1.91 to 14.97) and 4.61 (95% CI 1.16 to 18.38) for JIA ACR30, 50 and 70 responses respectively.

3.14 The manufacturer used an adjustment factor derived from a study of etanercept by Prince et al. (2009). This was an observational study of 146 patients, of whom 27% had systemic JIA. The adjustment factor is the difference in the proportion of responders between the total population with JIA and the subpopulation with systemic JIA. This factor was used to correct for ACR response rates in the indirect comparison results that the manufacturer had derived from the NCT00036374 (infliximab) study (in which 16% of JIA patients had systemic JIA) and the TENDER trial. These resulting ACR response rates were assumed to represent the responses achieved with all of the TNF-alpha inhibitors.

3.15 The manufacturer originally submitted a Markov model to evaluate the cost effectiveness of tocilizumab as part of a sequence of treatments. In the tocilizumab versus methotrexate model, patients progressed to anakinra, etanercept and then adalimumab; in the tocilizumab versus anakinra model, patients progressed to etanercept, adalimumab and then abatacept.

3.16 In the manufacturer's original model the Markov chain had 22 states. The model clustered the states into five groups: four groups representing different lines of treatment and the fifth group containing death and uncontrolled disease. Each line of treatment consisted of five health states: ACR responses at the 30, 50, 70 and 90 levels and 'no ACR response'. A patient could move from a particular ACR response in a particular line only to 'no ACR response' in the next line or to death. From 'no ACR response' the patient could move only to one ACR response level within that line of treatment or to 'no ACR response' in the next line. The main assumption of the model was that there were no transitions between ACR response categories (that is, the patient could not move within a given line to a better or worse health state [say, from ACR50 to ACR70]). The analysis assumed that patients stayed in the same health state unless they changed treatment line. After 12 weeks of treatment, the cohort was put on the next treatment in the sequence. Only after being through all four lines did a patient move to the health state 'uncontrolled disease'. The probability of a response or non-response within a line of treatment depended on the treatment. The order in which the treatments were applied did not change these transitions. The probability of death was treatment independent and health-state independent. The probability of withdrawal was health-state independent, but was higher for methotrexate than for other treatment options (all other treatment options had the same probability as each other). All transitions stayed constant over time; that is, they were independent of age or disease duration. In each cycle, the proportion of patients in a given state was calculated. The distribution across states was used to calculate cycle-specific quality-adjusted life years (QALYs) and treatment costs, which were discounted and summed over the length of treatment. The manufacturer's original model had a time horizon of 16 years. This means that a patient in the model starting treatment aged 2 years turned 18 and could be considered an adult at the end of the simulation. The model allowed shorter and longer time durations for sensitivity analysis (up to 30 years). The discount rates applied were 3.5% for utilities and costs, and costs were considered from an NHS and personal social services perspective. A half-cycle correction was applied.

3.17 The initial CHAQ score at baseline for the cohort of patients used in the original economic model was equal to that observed in the TENDER trial. The change in the patient CHAQ score was determined by the level of ACR response after 12 weeks. Improvement in each health state as measured by relative ACR change led to an absolute change in the initial CHAQ score. For a given CHAQ score, a utility was assigned to calculate QALYs. The health-state costs varied with the health state and the treatment costs.

3.18 The data inputs for the manufacturer's original model included utility values. To derive utility values, the manufacturer had to map the CHAQ scores to utilities, using a mapping formula derived in adults with rheumatoid arthritis that mapped Health Assessment Questionnaire [HAQ] results onto EQ-5D utilities. The manufacturer recognised that the assumptions that CHAQ is equal to HAQ and that adult EQ-5D is equal to the health-related quality of life of a child are not evidence based, and acknowledged that this mapping method was only used for the analysis to derive QALYs for the economic model because of the lack of other available data.

3.19 Treatment costs in the original model were a composite of the cost of the medication and the cost of administering it. For some drugs, the necessary dosage depends on the body weight of the patient. The manufacturer based the unit costs on UK reference costs, literature and expert opinion. The health-state costs depended only on the ACR response level and were independent from any other health outcomes. The manufacturer stated that 'in all comparisons, the identified adverse events are of minor severity and short duration, and their management would have a minuscule cost impact'. Therefore, it can be assumed that they do not have a considerable bearing on the incremental costs of the two model arms.

3.20 In response to the preliminary recommendations in the appraisal consultation document, in which the Committee was minded not to recommend tocilizumab, the manufacturer submitted a revised cost-effectiveness Markov economic model. The economic model in the manufacturer's submission was modified such that health states are defined according to categories of CHAQ, rather than being based on ACR response categories in which an average CHAQ is applied. In the revised economic model the manufacturer adopted an approach in which CHAQ categories define health states. The health states were defined as 'controlled' 'mild', 'moderate' and 'severe'. A simulated patient distribution of CHAQ scores based on the TENDER trial was used to establish the proportion of patients that would fall into each CHAQ category at baseline. The manufacturer used ACR as a potential predictor of the CHAQ score. The manufacturer assigned the following utility values to the health states: 0.19, 0.55, 0.65 and 0.77 to 'severe', 'moderate', 'mild' and 'controlled' respectively.

3.21 In the revised economic model, incremental analyses were presented by the manufacturer that compared the sequences of tocilizumab followed by infliximab with infliximab followed by tocilizumab, and then compared tocilizumab followed by anakinra with anakinra followed by tocilizumab followed by anakinra.

3.22 When the manufacturer submitted its comments on the appraisal consultation document, it also submitted a patient access scheme, which is a discount on all invoices of tocilizumab. The manufacturer applied the discounted value of tocilizumab to the revised version of the economic model. This document only details the results for tocilizumab with the patient access scheme.

3.23 In the manufacturer's revised base-case analyses with the patient access scheme the ICER was £18,194 per QALY gained when tocilizumab followed by infliximab was compared with infliximab alone. When tocilizumab followed by anakinra was compared with anakinra alone, the ICER was £16,923 per QALY gained. The manufacturer conducted two separate incremental analyses for infliximab- and anakinra-containing sequences. In these sequences tocilizumab had been used either before or after infliximab or anakinra and compared with infliximab or anakinra alone respectively. The ICERs obtained when tocilizumab was used first followed by infliximab compared with infliximab alone was £18,194 per QALY gained. For infliximab followed by tocilizumab compared with infliximab alone, the ICER was £30,630 per QALY gained. In the anakinra-containing sequences the ICER was £16,923 per QALY gained when tocilizumab was used first followed by anakinra compared with anakinra alone. Anakinra followed by tocilizumab dominated anakinra alone.

3.24 The manufacturer also conducted a sensitivity analysis in the revised model that included:

The uncertainty around the adjustment factor derived from the etanercept study used to take account of the other juvenile idiopathic arthritis subgroups in the infliximab study. The base-case sensitivity analysis assuming an increase of the adjustment factor by 30% resulted in an ICER of £20,240 per QALY gained when the tocilizumab then infliximab strategy was compared with infliximab alone. The ICER was £16,923 per QALY gained when the tocilizumab then anakinra strategy was compared with anakinra alone. The respective ICERs when the adjustment factor was decreased by 30% were £16,407 and £16,923 per QALY gained.
A stopping rule for tocilizumab after treatment duration of 2 years. The base-case sensitivity analysis assuming treatment with tocilizumab was stopped after 2 years. This showed that the tocilizumab then infliximab strategy dominated infliximab alone, and the tocilizumab then anakinra strategy also dominated anakinra alone.
A decreased frequency of administration of tocilizumab from a 2-weekly regimen to a 4-weekly regimen after treatment duration of 6 months. The resulting ICERs for this base-case sensitivity analysis showed that the tocilizumab then infliximab strategy dominated infliximab alone, and the tocilizumab then anakinra strategy also dominated anakinra alone.

3.25 In response to the preliminary recommendations in the appraisal consultation document, in which the Committee was minded not to recommend tocilizumab, the manufacturer also submitted information on radiographic evidence of progression of joint damage for patients with systemic JIA receiving tocilizumab. The manufacturer stated that the radiographic results from the TENDER trial were not yet available but presented results from a case series (Inaba et al. [2011 and 2007]) that included seven children with a mean age of disease onset of 4.1 years and a mean age of start of treatment of 9.4 years. The mean follow-up of treatment was 56 months. There were radiographic improvements in 57% of joints, worsening in 13% and no change in 30%. The authors of this study noted limitations of the small sample size and radiographic deterioration in some joints, despite stabilisation of systemic inflammatory responses. The authors had concluded that further studies with a larger number of participants were needed. The manufacturer also presented data from a study from Japan (Kaneko et al. [2009]), which included 46 patients with a mean age of 4 years who had systemic JIA receiving 8 mg/kg of tocilizumab every 2 weeks. The study noted that markers of systemic inflammation and numbers of tender/swollen joint counts were markedly improved following treatment with tocilizumab. However, progression of joint damage was observed in weight-bearing joints such as hip (85%) and knee (57%), along with growth disturbances and osteopenia. Radiographic progression was not seen in small joints.

3.26 In response to the preliminary recommendations in the appraisal consultation document, in which the Committee was minded not to recommend tocilizumab, the manufacturer also submitted long-term follow-up data, some of which were marked as academic in confidence and therefore are not presented here. Data from one trial in Japan (Yokota et al. [2005]) in which 11 patients were given tocilizumab every 2 weeks and were followed up for 10–35 months showed that one patient withdrew because of duodenum perforation after 10 months. The authors suggested that this could be because of long-term steroid and NSAID use. The most serious adverse events were pneumonia in 2 patients.

3.27 The manufacturer also responded to the Committee's request for clarification on how the CHAQ responses were elicited from the 21 children in the TENDER trial under the age of 5 years. The manufacturer stated that the parents of the children filled in the CHAQ on their behalf.

3.28 The ERG noted that the TENDER trial compared tocilizumab plus standard care with placebo plus standard care. The ERG observed that the comparator in this study did not match that specified in the scope and decision problem. For population 1 (that is, children with systemic JIA that has not responded adequately to prior NSAIDs and systemic corticosteroids) the comparator in the scope is methotrexate. The manufacturer, in its submission, had used a post-hoc analysis to compare patients receiving tocilizumab with those patients in the placebo group also receiving methotrexate. The ERG noted that this was not methodologically acceptable because the trial participants were not originally randomised into those populations. In the TENDER trial, 5% of patients were methotrexate naive. The ERG considered that this population would represent population 1 in the decision problem, but the analyses were inadequate. The ERG thus considered that there was insufficient evidence for any comparison of tocilizumab with methotrexate.

3.29 For population 2 (children with systemic JIA that has not responded adequately to prior NSAIDs, corticosteroids and methotrexate) the manufacturer's original submission provided data for an indirect comparison of tocilizumab with anakinra, using data from the TENDER trial and a trial of anakinra versus placebo. The ERG considered that the 5% of participants in the TENDER trial who were methotrexate naive should be excluded from these analyses. The manufacturer's original submission only provided data for all participants in the TENDER trial. However, in response to the request for clarification, some data were provided in which methotrexate-naive patients were excluded. These data were not reported for the TENDER trial, but only for the indirect comparison with anakinra. When conducting analyses, the ERG used data for this population when possible. For the comparators, the ERG noted that the manufacturer had decided to broaden the inclusion criteria to include all subtypes of juvenile arthritis, not just systemic JIA. The manufacturer had taken this approach because of the lack of clinical evidence for systemic JIA. The ERG was concerned that this approach had been taken despite the manufacturer's clinical specialists stressing the differences between systemic JIA and other subtypes, and advising against comparing the evidence from different JIA populations.

3.30 The ERG also noted the assumption in the original economic model that patients move to a certain ACR response and stay in that state until they either withdraw (move to the next treatment line) or die. The ERG thought that, given the nature of the disease, this assumption was unlikely to be correct.

3.31 The ERG noted the lack of health-related quality-of-life data both in the TENDER trial and in the literature, and recognised that very large assumptions (such as assuming that the CHAQ score of a child is equal to the HAQ score of an adult and that adult EQ-5D is equivalent to the health-related quality of life of a child) were needed to assign a utility to each health state in the model. Because of the lack of data in the trial and the literature, the ERG considered the approach used by the manufacturer to be reasonable and acceptable.

3.32 The ERG noted the revised modelling approach taken by the manufacturer to address the requests of the Committee after the first Appraisal Committee meeting, regarding the issue of mutually exclusive health states. The ERG considered that by defining health states based on a CHAQ score, and using ACR scores to define the transitions, the revised manufacturer's model does adhere to common modelling practice. The ERG also noted the manufacturer's approach of linear extrapolation to assigning costs to health states. The ERG did not fully agree with the manufacturer that an individual patient simulation had been performed. However, the ERG was of the opinion that given the purpose of the model, it was an acceptable and practical approach to conduct further economic analyses.

3.33 The ERG questioned the cost estimates for health states in the original model as defined by expert opinion, because they present a cost for non-responders (£3300) that is more than six times higher than the cost for an ACR30 response (£500), whereas an ACR90 response is associated with only a 30% decrease in cost (to £350) compared with an ACR30 response. However, in the manufacturer's revised model the ERG considered the linear extrapolation approach to assigning costs to health states to be acceptable.

3.34 The ERG noted that the sequences of treatments had been reduced from four treatments in the original model to two treatments in the revised model. The ERG further noted that although there are significant problems in estimating the effect of treatments after first line, it would have been better to have considered all options of including up to four treatments. The ERG conducted exploratory analyses of the full incremental analyses that the manufacturer had presented with the patient access scheme. The ERG's analyses showed that infliximab followed by anakinra compared with infliximab alone resulted in an ICER of £15,819 per QALY gained. The tocilizumab then infliximab strategy compared with the infliximab then anakinra strategy produced an ICER of £22,018 per QALY gained. The tocilizumab then anakinra strategy compared with the tocilizumab then infliximab strategy resulted in an ICER of £67,714 per QALY gained.

3.35 Finally, the ERG noted that the revised economic model only allows comparison of two sequences; therefore a probabilistic sensitivity analysis could not be done across all options. The ERG also noted that the manufacturer did not provide the covariance matrix for the regression equation used to determine the correlations between coefficients. The ERG was unable to run a full probabilistic sensitivity analysis. The ERG ran the probabilistic sensitivity analysis twice, once for the infliximab then anakinra strategy compared with infliximab alone, and once for the tocilizumab then infliximab strategy compared with infliximab alone. The ERG noted in its exploratory analyses (which included the patient access scheme and the probabilistic sensitivity analysis) that the tocilizumab then infliximab strategy resulted in an ICER of £32,331 per QALY gained compared with the infliximab then anakinra strategy. The infliximab then anakinra strategy had an ICER of £22,350 per QALY gained compared with infliximab alone.

3.36 Full details of all the evidence are in the manufacturer's submission and the ERG report, which are available from www.nice.org.uk/guidance/TA238