4 Evidence and interpretation

The Appraisal Committee (appendix A) considered evidence from a number of sources (appendix B).

4.1 Clinical effectiveness

Efficacy

4.1.1 The Assessment Group for this appraisal (School of Health and Related Research, University of Sheffield [ScHARR]) reviewed data from published randomised controlled trials (RCTs) in postmenopausal women in which fracture or health-related quality of life was an endpoint and where one of the five drugs of interest was compared with a relevant comparator, such as no treatment, placebo or one of the other included interventions. The majority of studies used placebo or no treatment as a control. Most studies ensured that women in all trial arms had normal calcium levels (that is, normal serum concentrations) or adequate supplementation, and some studies used additional dietary supplementation with vitamin D.

4.1.2 For this appraisal, reductions in RR associated with treatment were pooled regardless of the baseline BMD and fracture status of the participants in the studies. It was also assumed that these reductions in RR remained constant at all ages, although little evidence was available for the effectiveness of the drugs in women aged 80 years or older.

4.1.3 For vertebral fractures, some studies used clinical (that is, symptomatic) fractures as their endpoint whereas others used fractures that were identified radiographically. Vertebral fractures identified radiographically, which are termed 'radiographic fractures' or 'morphometric fractures', include both symptomatic and asymptomatic fractures. There are different definitions of a vertebral radiographic fracture, but those definitions that require a 20% reduction in vertebral height are generally recognised as producing more reliable results than those that require a 15% reduction.

4.1.4 For non-vertebral fracture types, individual data on hip, leg, pelvis, wrist, hand, foot, rib and humerus fractures were sometimes provided, whereas some studies only presented data for all non-vertebral fractures grouped together.

Alendronate

4.1.5 Sixteen RCTs of alendronate in postmenopausal women were included in the assessment report: two studies in women with low or normal BMD; one in women with osteopenia; eight in women with osteopenia or osteoporosis; four in women with osteoporosis; and one in women with established osteoporosis. Overall, 15 studies compared alendronate with placebo or with no treatment. All the studies were conducted in women who had adequate levels of calcium, from either dietary intake or calcium supplementation.

4.1.6 Two studies, one comparing alendronate with oestrogen alone or with oestrogen and alendronate combined, and the other comparing alendronate with teriparatide (which has a marketing authorisation only for secondary and not primary prevention), found no statistically significant differences between the groups in numbers of clinically apparent fractures of any type in women with osteoporosis. However, back pain was reported less frequently by women in the teriparatide group compared with women in the alendronate group (6% versus 19%, p = 0.012).

4.1.7 In addition to the 16 RCTs, a 2-year study demonstrated the equivalence of weekly and daily doses of alendronate, in terms of clinical fracture incidence and gastrointestinal adverse events. However, this study was not included in the analysis because it did not include the specified comparators.

4.1.8 The meta-analysis for alendronate relative to placebo, carried out by the Assessment Group, resulted in an RR of vertebral fracture of 0.56 (95% confidence interval [CI] 0.46 to 0.68, four RCTs, n = 7039), an RR of hip fracture of 0.62 (95% CI 0.40 to 0.98, three RCTs, n = 7455), an RR of wrist fracture of 0.67 (95% CI 0.34 to 1.31, four RCTs, n = 7931) and an RR for other non-vertebral fractures of 0.81 (95% CI 0.68 to 0.97, six RCTs, n = 9973).

4.1.9 A post-hoc analysis of data from the largest study on alendronate, the 'Fracture intervention trial' (FIT) RCT (non-vertebral fracture population), suggested that alendronate may be less effective at reducing fractures in women with T-scores above (that is, better than) −2.5 SD than in women with osteoporosis. These results were not statistically significant.

4.1.10 Gastrointestinal adverse events, including nausea, dyspepsia, mild oesophagitis/gastritis and abdominal pain, were reported in at least one third of the participants in studies of alendronate. However, only one study found the increased frequency of these symptoms to be statistically significant relative to placebo. This is consistent with post-marketing studies that indicate that approximately one third of alendronate users experience gastrointestinal adverse events. To avoid oesophagitis, the summary of product characteristics now recommends that alendronate should be taken on rising for the day, with a full glass of water. It is possible that these instructions were not followed in all of the studies, particularly the earlier ones.

4.1.11 Prescription-event monitoring studies in patients for whom alendronate was prescribed (n = 11,916) by GPs in England demonstrated a high incidence of dyspepsia, particularly in the first month of treatment. Consultations for dyspepsia ranged from 32.2 per 1000 patient-months in the first month of treatment to 10.9 per 1000 patient-months in months 2 to 6. Because these studies lacked a comparator, it is not possible to assess the extent to which these rates of upper gastrointestinal events may be above baseline levels in those not taking bisphosphonates.

4.1.12 One study reported health-related quality of life outcomes. At 12 months there were statistically significant improvements in the alendronate group compared with the control group in scores for pain, social isolation, energy level and physical ability.

Etidronate

4.1.13 Twelve RCTs of etidronate in postmenopausal women were reviewed: three studies in women with low-to-normal BMD; two in women with osteopenia or osteoporosis; one in women with osteoporosis; one in women with osteoporosis or established osteoporosis; and five in women with established osteoporosis. Four studies included active comparators, and eight compared etidronate with placebo or with no treatment (although in six of these, study participants in all arms received calcium, either alone or with vitamin D). Some studies did not use the exact treatment regimen that currently has a UK marketing authorisation (that is, 90-day cycles of etidronate 400 mg/day for 14 days, followed by calcium carbonate 1.25 g/day for the remaining 76 days). None of the studies reported health-related quality of life outcomes.

4.1.14 The meta-analysis of RCTs for etidronate relative to placebo carried out by the Assessment Group resulted in an RR of vertebral fracture of 0.40 (95% CI 0.20 to 0.83, three RCTs, n = 341), an RR of hip fracture of 0.50 (95% CI 0.05 to 5.34, two RCTs, n = 180), and an RR for other non-vertebral fractures of 1.04 (95% CI 0.64 to 1.69; four RCTs, n = 410). There were no data for wrist fracture.

4.1.15 An observational study in a general practice setting in the UK reported on fracture rates in people with a diagnosis of osteoporosis who were receiving etidronate compared with those who were not taking a bisphosphonate. People taking etidronate had an RR of non-vertebral fracture of 0.80 (95% CI 0.70 to 0.92). The RR of hip fracture was 0.66 (95% CI 0.51 to 0.85) and that of wrist fracture was 0.81 (95% CI 0.58 to 1.14).

4.1.16 Higher rates of gastrointestinal adverse effects were found in the etidronate groups of four RCTs, although the differences were not always statistically significant. However, non-RCT evidence and testimonies from clinical specialists and patient experts suggested that etidronate may be associated with fewer gastrointestinal adverse effects than other bisphosphonates.

4.1.17 The systematic review carried out by ScHARR in 2006 identified a cohort study conducted in the UK that indicated that etidronate may be associated with a much lower rate of upper gastrointestinal adverse effects than alendronate or risedronate.

Risedronate

4.1.18 Seven RCTs of risedronate in postmenopausal women were reviewed: one study in women with normal BMD; one in women with osteopenia; one in women with osteopenia or osteoporosis; one in women with osteoporosis or specific risk factors for hip fracture, such as a recent fall; and three in women with established osteoporosis. All compared risedronate with placebo (although, with the exception of those in the normal BMD study, all women also received calcium) and none reported on health-related quality of life outcomes.

4.1.19 The meta-analysis for risedronate relative to placebo, carried out by the Assessment Group, resulted in an RR of vertebral fracture of 0.61 (95% CI 0.50 to 0.75, three RCTs, n = 2301), an RR of hip fracture of 0.74 (95% CI 0.59 to 0.93, three RCTs, n = 11,770), an RR of wrist fracture of 0.68 (95% CI 0.43 to 1.08, two RCTs, n = 2439) and an RR for other non-vertebral fractures of 0.76 (95% CI 0.64 to 0.91, five RCTs, n = 12,399).

4.1.20 In all of the studies, rates of gastrointestinal adverse events were similar in the risedronate and placebo groups.

4.1.21 Prescription-event monitoring studies in patients for whom risedronate was prescribed (n = 13,643) by GPs in England suggested a high incidence of dyspepsia, particularly in the first month of treatment. Consultations for dyspepsia ranged from 26.9 per 1000 patient-months in the first month of treatment to 8.1 per 1000 patient-months in months 2 to 6.

Alendronate and risedronate: meta-analysis

4.1.22 A meta-analysis of pooled data from the alendronate and risedronate studies, carried out by ScHARR in 2006, resulted in an RR of vertebral fracture of 0.58 (95% CI 0.51 to 0.67, seven RCTs, n = 9340), an RR of hip fracture of 0.71 (95% CI 0.58 to 0.87, six RCTs, n = 19,233), an RR of wrist fracture of 0.69 (95% CI 0.45 to 1.05, six RCTs, n = 1037) and an RR for other non-vertebral fractures of 0.78 (95% CI 0.69 to 0.88, 11 RCTs, n = 22,372).

Raloxifene

4.1.23 Three RCTs of raloxifene in postmenopausal women were identified, but only two were included in the Assessment Group's meta-analysis: the largest study (the 'Multiple outcomes of raloxifene evaluation' [MORE] study) was carried out in women with osteoporosis, of whom 37% had a vertebral fracture at entry, and a smaller study was conducted in women with established osteoporosis. Both compared raloxifene with placebo (in both studies, women in both arms received calcium and vitamin D). Both studies examined raloxifene at dosages of 60 mg/day (the dosage specified in the UK marketing authorisation for the treatment of postmenopausal osteoporosis) and 120 mg/day. Neither reported on health-related quality of life outcomes. The mean age of women in the studies was 67–68 years. The MORE study was extended further to assess fracture, breast cancer, and cardiovascular and uterine safety outcomes. A third study examined the additive effect of raloxifene compared with placebo in women with a femoral neck T-score of −2 SD or below, with or without prior fracture, who were also receiving fluoride, calcium and vitamin D. Because of the use of fluoride as a co-intervention, these results were not included in the Assessment Group's meta-analysis.

4.1.24 The meta-analysis for raloxifene relative to placebo, carried out by the Assessment Group, resulted in an RR of vertebral fracture of 0.65 (95% CI 0.53 to 0.79, one RCT, n = 4551), an RR of hip fracture of 1.13 (95% CI 0.66 to 1.96, two RCTs, n = 6971), an RR of wrist fracture of 0.89 (95% CI 0.68 to 1.15, one RCT, n = 6828), and an RR for other non-vertebral fractures of 0.92 (95% CI 0.79 to 1.07, one RCT, n = 6828).

4.1.25 The most serious adverse effect associated with raloxifene was the approximately three-fold increased risk of VTE. Statistically significantly higher incidences of hot flushes, arthralgia, dizziness, leg cramps, influenza-like symptoms, endometrial cavity fluid, peripheral oedema and worsening diabetes were also found with raloxifene compared with placebo. The impact of raloxifene on cardiovascular disease is unclear, but there is evidence that it lowers serum concentrations of fibrinogen as well as both total and low-density lipoprotein (LDL) cholesterol levels (that is, serum concentrations) without increasing high-density lipoprotein (HDL) cholesterol.

4.1.26 The MORE study shows that raloxifene protects against breast cancer, with the RR at 4 years for all types of breast cancer reported as 0.38 (95% CI 0.24 to 0.58), and that for invasive breast cancer as 0.28 (95% CI 0.17 to 0.46).

Strontium ranelate

4.1.27 Three RCTs of strontium ranelate in postmenopausal women were identified: one study in women with osteoporosis and two in women with osteoporosis or established osteoporosis. All three studies compared strontium ranelate with placebo, and provided calcium and vitamin D supplementation to ensure an adequate intake.

4.1.28 The Assessment Group reported the results of a published meta-analysis that gave an RR for vertebral fracture of 0.60 (95% CI 0.53 to 0.69, two RCTs, n = 6551) and an RR for all non-vertebral fractures (including wrist fracture) of 0.84 (95% CI 0.73 to 0.97, two RCTs, n = 6551). Efficacy in reducing the rate of hip fracture was established in one study; the RR for hip fracture in the whole study population was 0.85 (95% CI 0.61 to 1.19, one RCT, n = 4932). A post-hoc subgroup analysis in women aged 74 or older with a T-score of −2.4 SD resulted in an RR for hip fracture of 0.64 (95% CI 0.412 to 0.997, one RCT, n = 1977).

4.1.29 In general, strontium ranelate was not associated with an increased risk of adverse effects and for the most part adverse effects were mild and transient; nausea, diarrhoea and creatine kinase elevations were the most commonly reported. A serious adverse event associated with strontium ranelate treatment was an increased incidence (RR = 1.42) of VTE and pulmonary embolism. This finding has been investigated further with the extension of ongoing studies and by post-marketing surveillance.

4.1.30 One study published results on health-related quality of life outcomes. It reported that strontium ranelate had quality of life benefits compared with placebo, as assessed by the QUALIOST osteoporosis-specific questionnaire and by the general health perception score of the short form (SF)-36 general scale.

Persistence and compliance

Bisphosphonates

4.1.31 Data from 14 RCTs indicated that between 81% and 100% of patients persisted with bisphosphonates in the first year of treatment, with lower rates of persistence of between 51% and 89% in the third year of treatment (eight RCTs).

4.1.32 A prescription-event monitoring study of patients for whom alendronate was prescribed (n = 11,916) by GPs in England indicated that 24% discontinued treatment within 1 year. In a similar study of patients for whom risedronate was prescribed (n = 11,742) in primary care in England, 30% appeared to have discontinued treatment within 6 months. In another 12 studies reviewed, persistence at 1 year ranged from 16% to 90%.

Raloxifene

4.1.33 Paid claims data from the USA suggested that only 18% of women starting raloxifene treatment continued to take their medication uninterrupted, and an investigation of a pharmacy prescription database indicated that only 44% were continuing treatment at the end of year 2.

Strontium ranelate

4.1.34 Compliance data were reported for two RCTs of strontium ranelate and were similar in the strontium ranelate and placebo arms (ranging from 83% to 93%) at up to 3 years.

Acid-suppressive medication and fracture risk

4.1.35 Two cohort and two case–control studies reported on a potential relationship between acid-suppressive medication (proton pump inhibitors or histamine H2 receptor antagonists) and fracture risk. One of the case–control studies, which used the UK General Practice Research Database (GPRD), found that 1 year or more of acid-suppressive medication was associated with an increase in fracture risk. The other case–control study reported a reduction of fracture risk associated with use of histamine H2 receptor antagonists, and that use of other acid-suppressive medication might increase fracture risk. Both studies, however, were unable to demonstrate convincingly that fracture risk was independent of underlying disease that might determine differences in fracture risk.

4.1.36 A prospective cohort study excluded women taking medication for fracture prevention and reported an increase in non-vertebral fracture in those taking acid-suppressive medication compared with those who were not. Findings appeared similar for users of proton pump inhibitors or histamine H2 receptor antagonists, but differences in fracture risk were not statistically significant for those using proton pump inhibitors compared with those not using acid-suppressive medication. One large retrospective cohort study using the UK GPRD compared women taking acid-suppressive medication plus bisphosphonates with those taking bisphosphonates alone. This GPRD study reported an increase in fracture risk for some fracture sites with concomitant use of acid-suppressive medication and bisphosphonates, but a reduction in risk for other fracture sites. The information on patients included in this GPRD study was incomplete and details of adjustments for confounders were not reported. The two cohort studies were not fully published, and their analysis may have been prone to confounding.

Additional submission from the manufacturer of strontium ranelate

4.1.37 Following the Court of Appeal Order of April 2010, NICE requested an additional submission from the manufacturer of strontium ranelate (Servier), setting out their views on the most appropriate estimate of strontium ranelate's efficacy in reducing the rate of hip fracture.

4.1.38 Servier explained that the pivotal phase III RCT (Treatment of Peripheral Osteoporosis Study [TROPOS]) was started before the increased regulatory emphasis on the prevention of hip fracture as a key measure of efficacy of treatments for osteoporosis (because of the significant morbidity associated with hip fracture). TROPOS had not been designed or powered to demonstrate the effect of strontium ranelate treatment on rates of hip fracture. In support of its application for regulatory approval of strontium ranelate, Servier was therefore asked by the European Medicines Agency (EMA) to investigate the efficacy of strontium ranelate in reducing the rate of hip fracture in a post-hoc subgroup analysis of TROPOS participants who met the definition of established osteoporosis (that is, a BMD T-score of −2.5 or below and one or more associated fractures). Instead of the requested subgroup, Servier provided the EMA with data for a different subgroup of trial participants whom they identified as being at high risk for hip fracture. This subgroup comprised women aged 74 or older who had a femoral T-score of −2.4 or below[2]. This subgroup represented 42% of TROPOS participants and had an RR of hip fracture of 0.64 (95% CI 0.412 to 0.997).

4.1.39 Servier described the method used to identify this high-risk subgroup. The placebo arms of two RCTs (TROPOS and another trial designed to assess the efficacy of strontium ranelate in reducing vertebral fractures, Spinal Osteoporosis Therapeutic Intervention [SOTI]) were pooled and the influence on fracture rates of three of the main risk factors for fragility fracture – age, BMD and prior fracture – was explored. Servier found that, in the pooled placebo arms of these two RCTs, prior fracture had no effect on the rate of hip fracture, so this factor was not considered further. To select an age group in which the risk of hip fracture was elevated, Servier investigated various possible age cut-offs, and identified the age at which the difference in the rate of hip fracture between women older and younger than the cut-off was greatest. This process led to the selection of an age cut-off of 74 years. Servier stated that this cut-off was consistent with epidemiological data, in particular a study by Donaldson et al. (1990), which Servier interpreted as showing a rising rate of hip fracture among women in the general population above the age of 74. The selected BMD cut-off was closely aligned to the WHO definition of osteoporosis (a T-score of −2.5 SD or below; see section 2.3). Servier emphasised that, having identified factors related to a high risk of hip fracture by screening the pooled data from the placebo arms of two RCTs, a single post-hoc analysis of the effect of strontium ranelate in this subgroup had been performed, without the need for multiple exploratory analyses of fracture risk reduction adopting different criteria for the subgroup selection.

4.1.40 After Servier had submitted data on efficacy in its chosen subgroup to the EMA, the EMA requested further analyses to confirm the effect of strontium ranelate on the rate of hip fracture. Servier provided additional evidence, including data from longer follow-up periods and analyses of trial participants with demonstrated compliance to treatment. Servier indicated that this additional evidence supported the view that an RR of 0.64 is a valid estimate of the efficacy of strontium ranelate in reducing the rate of hip fracture.

4.1.41 In their additional submission to NICE following the Court of Appeal Order in April 2010, Servier also suggested a hypothesis for a possible increased effect of strontium ranelate in older women: most osteoporosis drugs work by reducing the loss of existing bone, but strontium ranelate also stimulates the creation of new bone. Because the creation of new bone is increasingly impaired as women age, Servier stated that it is possible that strontium ranelate is able to provide additional benefit to older women.

4.1.42 Servier argued that the RR of 0.64 derived from the post-hoc analysis of the high-risk subgroup should be used in cost-effectiveness analyses to quantify the effect of strontium ranelate in reducing the rate of hip fracture because, in its view, it represents a more robust estimate of efficacy than the RR for the whole trial population. Servier stated that, unlike the analysis of the whole trial population, the subgroup analysis was suitably powered to demonstrate the effect of strontium ranelate in reducing the rate of hip fracture. Because of this, in Servier's opinion, the estimate was statistically robust.

4.1.43 Servier's view was that the estimate derived from the high-risk subgroup could be assumed to apply to all women taking strontium ranelate, but it acknowledged issues surrounding extrapolation from the high-risk subgroup to a broader population. Servier therefore indicated that it might also be concluded that the RR of 0.64 could only be applied to a population corresponding to the high-risk subgroup.

Review of Servier's additional submission by the Decision Support Unit

4.1.44 The DSU was commissioned to review Servier's additional submission, and to comment on the scientific validity of the post-hoc subgroup analysis provided by Servier. The DSU advised that any set of data will show some variation in response to treatment across different subgroups simply by chance. The DSU explained that, because of this, the correct statistical procedure for establishing a subgroup of trial participants with a significantly different response to treatment is via a test for interaction (that is, a formal test, using regression methods, of the hypothesis that the effect is different in one group of participants from that observed in the rest of the trial population). The DSU noted that no such test had been reported by Servier.

4.1.45 The DSU stated that the method used by Servier to identify the high-risk subgroup (see section 4.1.39) was logically likely to yield an unduly large relative effect, and the DSU stated that this would lead to a biased estimate of RR. This was because the method used to identify the age cut-off to define the subgroup was 'data-dependent' – that is, most of the data that were used to define the subgroup (the rate of hip fracture in the placebo arm of TROPOS) were also used to estimate the efficacy of strontium ranelate in the selected subgroup. In this way, the rate of hip fracture in the placebo group was certain to be high, relative to other potential age cut-offs, with no guarantee that this was also the case in the strontium ranelate group. Therefore, the DSU stated that the estimate of RR derived from the subgroup was likely to be artificially inflated.

4.1.46 The DSU also noted that, whilst Servier indicated that there were epidemiological data to support the chosen age cut-off (see section 4.1.39), the study by Donaldson et al. (1990) suggested that the rate of hip fracture rises to a notable level after 75 years of age,
not 74.

4.1.47 The DSU advised that Servier's argument of enhanced statistical power in the subgroup analysis was incorrect. The DSU explained that, in an analysis of RR, statistical power is dependent on the number of events (in this case, hip fractures) and that choosing a smaller group of participants will tend to reduce, rather than increase, power unless the RR is markedly greater in that subgroup. Because of this, the DSU disagreed with Servier's claim that the subgroup analysis was 'fully powered'.

4.1.48 The DSU was asked to comment on the most appropriate approach, from a statistical viewpoint, to the use of data from the whole trial population of TROPOS and the high-risk subgroup, in determining the relative efficacy of strontium ranelate. The DSU responded that, if the relative effect were to be applied to women in the general population, an intention-to-treat analysis of all randomised trial participants would yield the most appropriate estimate of efficacy. The DSU also commented that, if more than one trial is available, a pooled analysis of RRs from the intention-to-treat data of all relevant trials would be preferable. A meta-analysis of the data from SOTI and TROPOS would have provided the most appropriate overall measure of efficacy.

4.1.49 The DSU also advised that even as an estimate of efficacy in the high-risk subgroup, the RR of 0.64 was likely to be too extreme because of the likelihood of selection bias arising from the way in which the subgroup had been identified (see section 4.1.45). The DSU also emphasised that, to estimate the cost effectiveness of strontium ranelate in a particular subgroup, it would not be sufficient simply to adopt an RR of hip fracture from that group. It would also be important to populate the rest of the economic decision model with evidence specific to the subgroup in question.

4.1.50 NICE invited Servier to respond to the DSU's report. Servier provided a document reiterating its previous views that the subgroup analysis performed to evaluate the efficacy of strontium ranelate in reducing the rate of hip fracture was based on sound scientific principles and valid statistical methods. Servier did not respond to other specific issues raised in the DSU report.

4.2 Cost effectiveness

Manufacturers' models

4.2.1 For proprietary alendronate, compared with no treatment, the manufacturer's model provided an incremental cost-effectiveness ratio (ICER) of £8622 per quality-adjusted life year (QALY) gained for 70-year-old women with a T-score below −2.5 SD. The manufacturer's results were more favourable than the results of Assessment Group's 2003 model. This could be because the manufacturer's model was not adjusted for baseline fracture prevalence, or because it used different utilities for vertebral fractures, different efficacy data, different risk groups and a longer time horizon.

4.2.2 For etidronate, compared with no treatment, the manufacturer's model provided an ICER of £18,634 per QALY gained for 70-year-old women with a T-score below −2.5 SD. The manufacturer's model included morphometric vertebral fractures and corticosteroid use as risk factors for further fractures. It is unclear whether the manufacturer's ICER was for women with or without a prior osteoporotic fragility fracture.

4.2.3 For risedronate, compared with no treatment, the manufacturer provided data from two models. The ICER derived from the manufacturer's own model was £577 per QALY gained for women aged 74 years. In the second model provided by the manufacturer, which was commissioned from an external body, the ICER was more than £35,000 per QALY gained for all women without a prior osteoporotic fragility fracture and with a T-score of −2.5 SD. However, for women at slightly higher risk of fracture and aged 70 years or older, the corresponding ICER was £13,500 per QALY gained or less. The ICER calculated using the manufacturer's own model was difficult to verify from the information given. The ICERs generated by the second model were more consistent with the figures provided by the Assessment Group's 2003 model, although they did differ somewhat. This may be because of different cost and RR inputs.

4.2.4 For raloxifene, compared with no treatment, the manufacturer provided data for different age groups and different risk levels. All of the analyses included the breast cancer benefits. It was not clear how the different risk levels were defined. The ICERs ranged from £12,000 to £22,000 per QALY gained, and were more favourable than the Assessment Group's 2003 analysis, even when the Assessment Group included the breast cancer benefits. In the Assessment Group's 2003 model, the RR for the breast cancer effect was higher (0.38) than the RR for invasive breast cancer used in the manufacturer's model (0.28), and the breast cancer risk was adjusted for the association between low BMD and decreased risk of breast cancer. Additionally, the manufacturer's model was not adjusted for baseline fracture prevalence, and included different utilities for vertebral fractures, different efficacy data, different risk groups, and a longer time horizon than the Assessment Group's model.

4.2.5 For strontium ranelate, compared with no treatment, the manufacturer provided a model developed by an external organisation. The ICER was £45,028 per QALY gained for 65-year-old women with a T-score of −2.5 SD and £26,686 per QALY gained for 80-year-old women with a T-score of −2.5 SD. The manufacturer's results were more favourable than the Assessment Group's 2005 results because different modelling assumptions were used. For example, fewer health-state transition possibilities were incorporated. Compared with the Assessment Group's model, the manufacturer's model used more favourable efficacy data for hip fracture from the post-hoc 'high-risk' subgroup of women (see sections 4.1.28 and 4.1.38 to 4.1.43), and slightly more favourable efficacy data for wrist and proximal humerus fracture. Higher hip-fracture costs were used in the manufacturer's model.

The Assessment Group's model

4.2.6 The Assessment Group provided a cost–utility model with two components (described in detail in the 2005 Strontium Ranelate Assessment Report). As a first step, the model calculated absolute fracture risk from the epidemiological literature on a number of independent clinical risk factors. These data were prepared under the auspices of the WHO and were provided for this appraisal under an academic-in-confidence agreement. As a second step, the model applied RR reductions for fracture taken from the meta-analysis described in section 4.1.22. A single estimate of efficacy was used for alendronate and risedronate based on pooled data for these two drugs. Following advice from the original Osteoporosis Guideline Development Group[3], it was assumed that RRs remained constant across all ages, T-scores and fracture status. The most recent analyses carried out by ScHARR were based on the price of non-proprietary alendronate in February 2008 (£53.56 per year for once-weekly 70 mg tablets; £108.20 per year for daily 10 mg tablets).

4.2.7 All osteoporotic fragility fractures in women aged 50 years or older were included in the modelling. The RR for hip fracture was assumed to apply also to pelvis and other femoral fractures. The RR for non-vertebral fracture was assumed to apply also to proximal humerus, rib, sternum, scapula, tibia, fibula and wrist fractures. Where confidence intervals for RRs spanned unity, it was assumed that there was no effect of treatment, except in the case of strontium ranelate. In this case, an RR of 0.85 for hip fracture was used to acknowledge the effect reported in the high-risk subgroup of the study. The model used UK-specific epidemiological data on femoral neck BMD.

4.2.8 The model assumed an initial utility in the year of fracture and a higher utility in subsequent years. The time horizon for predicting morbidity was 10 years, consisting of 5 years of treatment with sustained efficacy plus 5 years of linear decline to no effect. However, treatment-related decreases in mortality rate extended beyond the 10-year time horizon. For this, the life expectancy for a woman at the threshold T-score for osteoporosis was calculated from standard life tables, and any increase in mortality rate due to fracture would continue until death or an age of 110 years. In the base case, vertebral-fracture utility was assumed to be lower than hip-fracture utility, and a sensitivity analysis was carried out in which the utility for vertebral fracture was assumed to be the same as that for hip fracture. The percentage of women assumed to move from community living to a nursing home following a hip fracture increased with increasing age. An age-dependent gradient of hip-fracture risk was used, and an association between vertebral or proximal humerus fracture and increased mortality in women with osteoporosis was included. No follow-up BMD scans were included in the model; this reflects current clinical practice in the UK.

4.2.9 The model included an assumption about the costs and disutility associated with treatment-related side effects for all drugs, based on the findings of prescription-event monitoring studies in patients treated with alendronate. For the base case, the model assumed 50% persistence with treatment. In addition to the base case, the Assessment Group undertook a number of sensitivity analyses using alternative assumptions, including: persistence with treatment (25% or 75% at 5 years); reduction in the efficacy of the drugs at reducing the risk of fracture associated with risk factors other than age, prior fracture and low BMD to 0% or 50% (with a consequent upward adjustment of the RR for the risk factors of age, prior fracture and low BMD); disutility of vertebral fracture; updated fracture costs; and the disutility and costs of treatment-related side effects. It was assumed that women who experience bisphosphonate-related side effects had 91% of the utility of women who do not have such side effects. In the base case analysis for all of the drugs under consideration this was applied to 2.35% of women in the first treatment month and 0.35% of women thereafter and, in sensitivity analyses for bisphosphonates, to 24% of women in the first treatment month and 3.5% of women thereafter. In the case of strontium ranelate, the effect on VTE was not included in the model. Discount rates of 6% per year for costs and 1.5% per year for health benefits were applied, in accordance with NICE methods relevant to this appraisal.

4.2.10 For raloxifene, 4-year follow-up data from the MORE study were used, and it was assumed that women with low BMD have a lower breast cancer risk than women with normal BMD. The cost effectiveness was modelled excluding the breast cancer benefit, the risk of VTE and the effect on cardiovascular events.

4.2.11 The independent clinical risk factors for fracture used in the model were based on the data prepared under the auspices of the WHO (see section 4.2.6) and included BMI, prior fracture, previous or current use of corticosteroids, parental history of fracture, current smoking, alcohol intake of more than 2 units per day, and rheumatoid arthritis. The study provided prevalence data for the different risk factors, and risk ratios for hip fracture and osteoporotic fracture for each risk factor, including T-score and age. Using these risk ratios, absolute risk of fracture was calculated.

4.2.12 The estimates of cost effectiveness were generated for different levels of absolute risk derived from a large number of combinations of T-score (in bands 0.5 SD wide), age and number of independent clinical risk factors for fracture. For practical reasons relating to the number of potential combinations, single-point RRs of fracture, calculated from the log-normal efficacy distributions, were used in the model. Results were presented for population groups categorised according to age, T-score and number of independent clinical risk factors.

4.2.13 As women without fracture do not usually present to clinicians, the Assessment Group also estimated the impact that the costs of identifying women at risk would have on the cost effectiveness of the drugs. This required both a calculation of the ICER for treatment, and a calculation of the distribution of risk assessment cost over the population who would benefit from treatment. A net-benefit approach was used to do this. The net-benefit approach is analogous to the more traditional cost per QALY gained approach, but also requires a value of willingness to pay (WTP) for an additional QALY gained. For the calculation of the net benefit of an intervention, the WTP is first multiplied by the incremental QALY gained associated with the intervention, then the incremental cost associated with the intervention is subtracted. For this appraisal, the total net benefit for each age group and DXA scanning approach was calculated by subtracting the cost of DXA scanning from the net benefit of treating all women who can be treated cost effectively.

4.2.14 A stepped net-benefit approach was used to estimate, in reverse order, the cost effectiveness of risk assessment, DXA scanning and treatment of women without a prior fracture. A WTP value of £20,000 per QALY gained was applied in the modelling.

  • Step 1. ICERs for treatment versus no treatment were calculated for each intervention for various combinations of age, T-score and number of independent clinical risk factors for fracture (see section 4.2.11). The net benefit of treatment per woman was calculated using the following formula: Net benefit = (£20,000 × incremental QALYs gained) – incremental costs. For women for whom the ICER for treatment was more than £20,000 per QALY gained, the net benefit was set to zero.

  • Step 2. The net benefit per woman was multiplied by the number of women in the population estimated to fall within each combination of age, T-score and number of independent clinical risk factors for fracture (based on the data used to develop the algorithm prepared for the WHO). The net benefits for each group were then added together to give a total net benefit of treatment for women with no, one, two or three independent clinical risk factors within each age group.

  • Step 3. The cost of DXA scanning all of the women in each age/independent clinical risk factor group was subtracted from the net benefit of treatment for that group (calculated as described in step 2). This provides the net benefit of treatment and DXA scanning for the group, assuming that the number of independent clinical risk factors is known. A positive net benefit indicates that DXA scanning of women in that age/independent clinical risk factor group and treating those groups of women in whom the ICER for treatment is £20,000 per QALY gained or less provides an ICER for the entire strategy of less than £20,000 per QALY gained.

  • Step 4. When the resulting values of net benefit of treatment and scanning were negative they were set to zero. For each age group, the total net benefit of scanning and treatment was calculated by adding together the net benefits for each age/independent clinical risk factor group. The cost of opportunistic assessment for all women in this age group was then subtracted to give the net benefit of risk assessment, scanning and treatment. A positive net benefit indicates an ICER of less than £20,000 per QALY gained for risk assessment, DXA scanning and treating women (at a specific T-score related to the ICER for treatment only) of that particular group. Cost per QALY gained data were presented for each strategy.

The Assessment Group's model: results for alendronate

4.2.15 First, the Assessment Group calculated ICERs (cost per QALY gained for alendronate compared with no treatment) without identification costs for all combinations of age, T-score and number of independent clinical risk factors for fracture. The cost per QALY gained, compared with no treatment, became more favourable with increasing age and number of independent clinical risk factors, and decreasing T-score (that is, with increasing annual absolute risk of fracture).

4.2.16 Then, the Assessment Group presented the results of the economic analyses in the form of identification and treatment strategies (based on age, T-score and number of independent clinical risk factors for fracture) that resulted in an ICER of £20,000 or less (cost per QALY gained compared with no treatment). The analyses shown below included the following assumptions: persistence at 5 years set to 50%; the efficacy of bisphosphonates on fracture risks associated with factors other than age, BMD and prior fracture status set to 50% of that observed for the total population in the trials (with a consequent upward adjustment of the RR associated with age, BMD and prior fracture); costs set to health resource group values including home-help costs; utility multiplier associated with vertebral fracture set to 0.792 in the first year of fracture and 0.909 in subsequent years (as for hip fracture); costs of bisphosphonate-related gastrointestinal symptoms incurred over 5 years; utility multiplier associated with bisphosphonate-related gastrointestinal symptoms set to 0.91 (included utility losses for non-compliant patients); and alendronate at a cost of £53.56 or £108.20 per year.

4.2.17 For alendronate priced at £53.56 per year (once-weekly treatment), and when assuming that 24% of women in the first treatment month and 3.5% of women thereafter experienced bisphosphonate-related side effects, the model produced the following results:

  • A strategy of risk assessment, DXA scanning and treatment with alendronate in women younger than 65 years resulted in an ICER of more than £20,000 per QALY gained.

  • A strategy of risk assessment, DXA scanning and treatment with alendronate in women who are confirmed to have osteoporosis (that is, a T-score of −2.5 SD or below) resulted in an ICER of less than £20,000 per QALY gained for all women aged 70 years or older, and for women aged 65–69 years who have an independent clinical risk factor for fracture.

4.2.18 In a sensitivity analysis for alendronate priced at £53.56 per year (with other assumptions as in sections 4.2.16 and 4.2.17), acid-suppressive medication was assumed to affect fracture risk. The data inputs for this were taken from one GPRD study (see section 4.1.35) and represent the midpoint values pooled for patients using acid-suppressive medication. This sensitivity analysis produced the following results:

  • A strategy of risk assessment, DXA scanning and treatment with alendronate in women younger than 70 years resulted in an ICER of more than £20,000 per QALY gained.

  • A strategy of risk assessment, DXA scanning and treatment with alendronate in women who are confirmed to have osteoporosis (that is, a T-score of −2.5 SD or below) resulted in an ICER of less than £20,000 per QALY gained for all women aged 70 years or older.

    The ICER for treatment with alendronate (but excluding identification costs) for a woman aged 70–74 years with a T-score of −2.5 SD (using the assumptions described in sections 4.2.16 and 4.2.17) was £5496 per QALY gained without acid-suppressive medication and £13,236 per QALY gained with acid-suppressive medication. If this woman has an independent clinical risk factor for fracture, the ICERs would be £1567 per QALY gained without and £7727 per QALY gained with acid-suppressive medication.

4.2.19 For alendronate priced at £108.20 per year (daily treatment), and when assuming that 24% of women in the first treatment month and 3.5% of women thereafter experienced bisphosphonate-related side effects, the model produced the following results:

  • A strategy of risk assessment, DXA scanning and treatment with alendronate in women younger than 70 years resulted in an ICER of more than £20,000 per QALY gained.

  • A strategy of risk assessment, DXA scanning and treatment with alendronate in women who are confirmed to have osteoporosis (that is, a T-score of −2.5 SD or below) resulted in an ICER of less than £20,000 per QALY gained for all women aged 75 years or older and for women aged 70–74 years who have an independent clinical risk factor for fracture. For women aged 70–74 years but with no independent clinical risk factor, the T-score needs to be −3 SD or below to give an ICER of less than £20,000 per QALY gained.

The Assessment Group's model: results for other drugs

4.2.20 Risedronate, raloxifene and strontium ranelate were dominated by alendronate (based on the price of £53.56 per year for alendronate); that is, these three drugs have a higher acquisition cost than alendronate, but are not more efficacious. Analyses were conducted as for alendronate (see section 4.2.16). For risedronate, base-case assumptions for bisphosphonate-related side effects were modelled; that is 2.35% of women in the first treatment month and 0.35% thereafter experienced side effects (see section 4.2.9). In addition a sensitivity analysis was performed, using the assumption that 24% of women in the first treatment month and 3.5% of women thereafter experienced bisphosphonate-related side effects. For raloxifene and strontium ranelate, base-case assumptions for side effects were used. In previous economic modelling and before the most recent price reduction for non-proprietary alendronate, etidronate's cost effectiveness was comparable to that of non-proprietary alendronate, but the calculations were based on a weaker clinical evidence base than for alendronate. Therefore the modelling for etidronate was not updated after the most recent price reduction for alendronate.

4.2.21 For risedronate, raloxifene and strontium ranelate, additional analyses were conducted to explore identification and treatment strategies that could be cost effective for these interventions when compared with no intervention. All results showed less favourable cost effectiveness than non-proprietary alendronate. For example, for women aged 65–69 years with an independent clinical risk factor for fracture, the ICERs (without considering costs related to risk assessment and DXA scanning) for risedronate and strontium ranelate (each compared with no treatment) were more than £45,000 and £90,000 per QALY gained, respectively. For these women, treatment with weekly non-proprietary alendronate, including risk assessment and DXA scanning costs, resulted in an ICER of less than £20,000 per QALY gained.

The Assessment Group's model: results for other drugs in second-line use

4.2.22 Further analyses were carried out assuming second-line use; that is, costs for risk assessment or DXA scanning were excluded because BMD was assumed to be known from the first-line management.

4.2.23 In the economic modelling carried out for this appraisal in 2006, lower ages and higher T-scores resulted in ICERs of less than £20,000 per QALY gained for etidronate compared with risedronate; that is, etidronate was more cost effective than risedronate. Because of the concerns expressed about the weaker clinical evidence base for etidronate, the modelling for this bisphosphonate was not updated.

4.2.24 For risedronate in second-line use, when assuming that 2.35% of women in the first treatment month and 0.35% of women thereafter experienced bisphosphonate-related side effects, the model produced the following results:

  • Treatment with risedronate in women younger than 65 years resulted in an ICER of more than £20,000 per QALY gained.

  • Treatment with risedronate in women who have the combinations of T-score, age and number of independent clinical risk factors for fracture indicated in the table below resulted in an ICER of less than £20,000 per QALY gained. Including women aged 65–69 years with no independent clinical risk factors for fracture increased the ICER to more than £20,000 per QALY gained.

T-scores (SD) at (or below) which risedronate in second-line use resulted in an ICER of less than £20,000 per QALY gained

Age (years)

Number of independent clinical risk factors for fracture
(section 1.5)

0

1

2

65–69

a

−3.5

−3.0

70–74

−3.5

−3.0

−2.5

75 or older

−3.0

−3.0

−2.0b

a ICER more than £20,000 per QALY gained.

b Women with osteopenia are not included in the guidance (see sections 1 and 4.3.6).

4.2.25 For raloxifene, the model produced the following results.

  • Treatment with raloxifene in women of any age resulted in an ICER of more than £20,000 per QALY gained.

4.2.26 For strontium ranelate, the model produced the following results.

  • Treatment with strontium ranelate in women younger than 65 years resulted in an ICER of more than £20,000 per QALY gained.

  • Treatment with strontium ranelate in women who have the combinations of T-score, age and number of independent clinical risk factors for fracture indicated in the table below resulted in an ICER of less than £20,000 per QALY gained. Including women aged 65–69 years with no independent clinical risk factors for fracture increased the ICER to more than £20,000 per QALY gained.

T-scores (SD) at (or below) which strontium ranelate in second-line use resulted in an ICER of less than £20,000 per QALY gained

Age (years)

Number of independent clinical risk factors for fracture (section 1.5)

0

1

2

65–69

a

−4.5

−4.0

70–74

−4.5

−4.0

−3.5

75 or older

−4.0

−4.0

−3.0

a ICER more than £20,000 per QALY gained

4.2.27 If it was assumed that acid-suppressive medication affects fracture risk, the ICER for treatment with risedronate (compared with no treatment, but excluding identification costs) for a woman aged 75 years with a T-score of −3 SD increased from £16,374 to £23,351 per QALY gained (using base-case assumptions about side effects). The corresponding ICER for strontium ranelate was £37,880 per QALY gained compared with no treatment (using base-case assumptions about side effects). For a woman aged 75 years with a T-score of −3.5 SD and one independent clinical risk factor for fracture, the ICER for risedronate increased from £5116 to £10,505 per QALY gained when acid-suppressive medication was assumed to affect fracture risk (using base-case assumptions about side effects). The corresponding ICER for strontium ranelate was £20,935 per QALY gained compared with no treatment (using base-case assumptions about side effects).

Consultee comments on the Assessment Group's economic model

4.2.28 Following the outcome of the judicial review and the court ruling of March 2009, NICE was able to offer the Assessment Group's executable economic model for consultation. Consultees and commentators who requested the model and returned the necessary confidentiality undertakings received a CD-ROM containing the executable version of the economic model, a document with instructions for running the model and a pro-forma for commenting on the model. Comments on the Assessment Group's model were received from Servier Laboratories (the manufacturer of strontium ranelate), the Bone Research Society (BRS), the National Osteoporosis Society (NOS) and the Society for Endocrinology. Comments received from each of these consultees are summarised in the sections below.

4.2.29 These four consultees expressed the view that the documentation provided with the Assessment Group's model was insufficient, that the model supplied to them was incomplete and that some inputs could not be altered. They also stated that the application of the fracture risk algorithm developed under the auspices of the WHO could not be assessed. They felt that the model could not be validated and that its validity had not been demonstrated in documents made available during development of the guidance.

4.2.30 Servier commented that the fracture risks entered in the Assessment Group's model differed from estimates that Servier calculated using the FRAX fracture risk calculation tool (see section 4.3.47 for further information about the FRAX tool). Servier commented that mortality risk associated with clinical risk factors had been omitted from the model. In Servier's opinion, these differences called into question whether the WHO fracture risk algorithm had been applied correctly in the Assessment Group's model.

4.2.31 Other comments questioned the use of a fixed value for BMI in the model. Consultees commented that no clear explanation was provided of the rationale for the choice of BMI value, that a range of BMI values should have been used, and that the use of a fixed BMI value resulted in underestimation of the cost effectiveness of treatment for some women at risk of fracture.

4.2.32 Servier commented on the selection and weighting of the independent clinical risk factors for fracture used in the Assessment Group's model. Servier, BRS and NOS suggested that the risk associated with alcohol intake was incorrect in the model and that this would have adversely affected estimation of the cost effectiveness of treatment for women at risk of fracture. They suggested that a threshold alcohol intake of 3 or more units per day, as used in the FRAX fracture risk calculation tool, should have been applied. They also stated that the Assessment Group's model and the guidance were inconsistent with each other, and that these differences resulted in the risk of fracture being underestimated in the model. Servier also noted that the Assessment Group's model gave an equal weighting to each of the independent clinical risk factors for fracture. Servier suggested that this was a less precise approach than that used in the FRAX tool, which used different weightings (some higher and some lower than those in the Assessment Group's model) for each fracture risk for specific risk factors. Servier stated that the FRAX tool assesses fracture risk and cost effectiveness more accurately and 'deals more fairly' with variation between women at risk of fracture. Servier also noted that one of the risk multipliers for fracture risk included in the Assessment Group's model was not consistent with that given in the assessment report.

4.2.33 Servier and NOS noted that the Assessment Group's model had a time horizon limited to 10 years and criticised how mortality beyond 10 years had been taken into account in the economic evaluation. Servier expressed the view that, as a consequence of this, the model was inaccurate and underestimated the cost effectiveness of treatment for women at risk of fracture. Servier also identified two values ('wristbonusat2.5' and 'phbonusat2.5', related to QALY calculations) that were included in the model, but not described in assessment reports.

4.2.34 Servier commented that using the same disutility for side effects associated with strontium ranelate and bisphosphonates was not correct, as the side effects of strontium ranelate are different from those of the bisphosphonates.

4.2.35 BRS and NOS thought that the proportion of women with low BMD in England and Wales was substantially underestimated in the Assessment Group's model. These consultees were also concerned that although both smoking and previous or current glucocorticoid (corticosteroid) use had been included as additional independent clinical risk factors for fracture in the Assessment Group's model, they were not defined as risk factors in the recommendations (see section 1.5). In addition, both consultees thought that interactions between several clinical risk factors were not incorporated in the model, thereby reducing the cost effectiveness of treatment for women at risk of fracture, especially younger women.

4.2.36 All four consultees commented on elements of the Assessment Group's economic evaluation that had been considered and agreed by the Appraisal Committee before it directed the Assessment Group to develop the economic model using specific assumptions. These Committee-directed assumptions included the compliance rate, costs associated with fracture, utility values used for vertebral fracture, and the strategy for identifying women at high risk. Servier also commented on the discount rates used in the model.

4.2.37 Servier reported that it had prepared a 'comparative' model which was run using assumptions similar to those in the Assessment Group's model. This model was referred to in a report to support the mathematical foundation of revised analyses discussed as part of Servier's comments on the Assessment Group's model. This report was made available to the Appraisal Committee to inform its consideration of comments by Servier on the DSU report (see below and section 4.3).

Decision Support Unit (DSU) report on consultee comments on the Assessment Group's economic model

4.2.38 The DSU was commissioned to review the comments from consultees on the Assessment Group's executable economic model and report to the Appraisal Committee. The DSU considered issues that were relevant to the economic model. Key issues were grouped under the common themes of model transparency and ability to assess its validity, methodology (approach) and model inputs.

4.2.39 The DSU assessed comments on the transparency and validity of the Assessment Group's model. With regard to the consultees' observation that some model inputs were fixed and that in their view the model provided for consultation was incomplete and not fully executable, the DSU confirmed that certain inputs were intentionally fixed and the ability to alter these inputs was not a feature of the model or necessary for some parameters with minimal uncertainty that are commonly fixed in other economic models. In response to comments on the consultees' inability to assess the application of the WHO algorithm, the DSU explained that the WHO algorithm itself was not embedded within the model. The DSU confirmed that absolute fracture risks were correctly calculated using the WHO algorithm before being entered into the model. The DSU noted that documentation had been provided to consultees in the form of publicly available reports and peer-reviewed manuscripts produced by the Assessment Group, and that instructions on the operation of the Assessment Group's model were also offered to consultees and commentators.

4.2.40 With regard to comments on the modelling approach adopted in the Assessment Group's model, the DSU responded by confirming that alcohol consumption of more than 2 units per day was included in the model, and that the coefficients used in the model were consistent with the WHO algorithm (as supplied to the Assessment Group at the time the model was developed). The DSU also explored how the model considered corticosteroid-related fracture risk, and confirmed that corticosteroid use was included in the model and that the coefficient used for this risk factor was consistent with that calculated using the WHO algorithm. The DSU noted that the fracture risk of women using corticosteroids would have contributed to the overall fracture risk of the whole modelled population and thereby reduced the ICER associated with the treatment of all women at risk of fracture.

4.2.41 The DSU confirmed that each clinical risk factor for fracture was given equal weighting in the model. In response to consultee comments expressing the view that this was a less precise approach than that used in the FRAX tool, the DSU noted two points. Firstly, no individual risk calculation tool was publicly available when the model was developed. Secondly, the DSU referred to the 2005 Strontium Ranelate Assessment Report, which compared suggested treatment thresholds (combinations of age, T-score and number of independent clinical risk factors for fracture) from the Assessment Group's model with treatment thresholds indicated by absolute fracture risk. The DSU suggested that the use of absolute fracture risk alone did not accurately predict cost effectiveness, and therefore would not provide a robust basis for the Committee's decision-making.

4.2.42 Consultee comments on the modelling approach also addressed the time horizon and population data used and the grouping of age in 5-year bands. The DSU confirmed that the consequences of fracture were considered beyond 10 years, and provided further explanation of the modelling approach. The DSU additionally undertook exploratory analyses of the impact of mortality after the 10-year time horizon and of incorporating mortality associated with vertebral fracture and proximal humerus fracture. They reported that the change in the results produced by the model was minimal when mortality risk beyond 10 years was doubled. The DSU also confirmed that UK epidemiological data from a study by Holt et al. were used in the Assessment Group's model, and undertook an exploratory analysis around the assumptions of the distribution of T-scores used in the model. For some age bands modelled, the T‑scores did not follow a statistically normal distribution, but the DSU noted that the assumption of a normal distribution made it more likely that treatments for women at risk of fracture would be judged to be cost effective. The DSU considered a comment on the calculation of cost-effectiveness estimates averaged for the 5-year age bands implemented in the model. It disagreed with the alternative suggested by the consultee and noted that the Committee had considered and agreed that initial identification by age band was a workable strategy for selecting women at risk of fracture in clinical practice. It also noted that alternative strategies (which did not use age bands) may in fact be more resource-consuming and less likely to be judged as cost effective.

4.2.43 The DSU reviewed consultee comments on inputs used in the Assessment Group's model. It confirmed that the WHO algorithm (as supplied) had been correctly implemented in the model to produce estimates of fracture risk for each T-score band. The DSU suggested that the differences in the estimates of fracture risk obtained using the FRAX fracture risk calculation tool and the Assessment Group's model did not necessarily suggest that the WHO algorithm had been incorrectly applied (see section 4.3.47), and that these differences could occur for a number of reasons. For example, the use of a midpoint age to represent an age band of 5 years could lead to differences in estimates of fracture risk. The DSU confirmed that no increase in mortality associated with clinical risk factors was used in the model. The DSU suggested that inclusion of such mortality effects would be likely to increase the ICERs for women with those clinical risk factors. The DSU explained that this is because fewer QALY benefits would accrue in the model for women who die of causes related to risk factors. In response to a further comment from Servier, the DSU agreed that, for women without clinical risk factors, the inclusion of these mortality effects in the model may have the opposite effect (that is, a decrease in ICERs). Therefore the overall effect of including the increased mortality associated with clinical risk factors would be small.

4.2.44 The DSU also confirmed that a fixed value for BMI of 26 kg/m2 was used in the Assessment Group's model. This was the mean BMI in the UK epidemiological dataset from the Holt study used in the model. In an exploratory analysis using the WHO algorithm, the DSU showed that using a BMI of 26 kg/m2 resulted in higher estimated fracture risk than a BMI of 20 or 32 kg/m2 when BMD is known, and this was confirmed by the estimates supplied by one consultee. The DSU suggested that the BMI value used in the model may favour treatment of women at risk of fracture compared with alternative BMI values. The DSU also pointed out that BMI is a weak predictor of fracture when BMD is known (as specified in the identification strategy in the guidance).

4.2.45 The DSU investigated the risk multipliers used for fracture risk in the Assessment Group's model and the consultee comment that interactions between clinical risk factors had been omitted. It confirmed that the risk multipliers used for fracture risk had been correctly calculated from the WHO algorithm and that all interactions between risk factors had been included. The DSU also noted that the inconsistency between one of the risk multipliers for fracture risk included in the Assessment Group's model compared with the assessment report was the result of a typographical error. Accordingly there was no impact on the results of the model.

4.2.46 The DSU did not respond in detail to comments on assumptions in the model that had already been documented and agreed by the Appraisal Committee and which were available to consultees and commentators earlier in the development of the appraisal guidance. The DSU did, however, list these issues in its report and cited where they had been considered by the Committee or had been available for comment during development of the guidance. Features of the economic evaluation previously discussed and agreed by the Committee included the following (the sections of this document where these points are covered are given in parentheses):

  • discount rates used in model (4.2.9)

  • treatment compliance (4.2.9)

  • costs associated with fracture (4.2.16)

  • strategy for identifying women at high risk of fracture (4.2.16)

  • utility values used for vertebral fracture (4.2.16 and 4.3.12)

  • equal disutility for the side effects of strontium ranelate and bisphosphonates (4.2.9)

  • sensitivity analyses on disutility (4.3.14).

4.2.47 The DSU concluded that, in its view, adequate documentation on the Assessment Group's model had been provided for consultees. It highlighted that the WHO algorithm used to generate estimates of fracture risk was not integrated within the Assessment Group's model; rather, the fracture risks derived from the algorithm were entered into the model. Comparisons with fracture risks derived using the FRAX fracture risk calculation tool were made by several consultees on the basis that the WHO algorithm supplied to the Assessment Group and the FRAX tool are assumed to be identical. The DSU could not verify these analyses without access to the FRAX algorithm. The DSU agreed that some parameters in the Assessment Group's model were fixed. These included those with minimal uncertainty, as well as those that are commonly fixed in other economic models. Sensitivity analyses conducted by the DSU suggested that none of the consultees' suggestions relating to the modelling approach would lead to significant improvements in the cost effectiveness of treatment for women at risk of fracture. The DSU concluded that, in its view, no issues raised by consultees would either affect the validity of the Assessment Group's model or raise significant doubts about the appropriateness of using the model to inform the deliberations of the Committee.

4.3 Consideration of the evidence

4.3.1 The Appraisal Committee reviewed the data available on the clinical and cost effectiveness of alendronate, etidronate, risedronate, raloxifene and strontium ranelate, having considered evidence on the nature of osteoporosis and the value placed on the benefits of these drugs by women with the condition, those who represent them, and clinical specialists. It also considered the consultation comments received in response to the previous appraisal consultation documents, the extra analysis undertaken by ScHARR in November 2006 and February 2008, and comments received from consultees and commentators after an appeal against an earlier final appraisal determination was upheld in December 2007. Following the outcome of a judicial review and court ruling in March 2009, the Committee considered the comments received from consultees after release of the Assessment Group's executable economic model, a report by the DSU reviewing these comments, and responses from the consultees to the DSU report. It also took into account the effective use of NHS resources. The Committee was aware of a previous decision of the National Screening Committee not to recommend screening to prevent osteoporotic fracture because of concerns about the accuracy of BMD assessment for the prediction of fracture and because there was no trial evidence indicating that such screening would reduce the incidence of fractures.

4.3.2 The Committee considered the clinical effectiveness data for the bisphosphonates (alendronate, etidronate and risedronate), strontium ranelate and raloxifene. It noted that all these drugs have proven efficacy in reducing the incidence of vertebral fragility fractures in women with osteoporosis, but that there were differences between the drugs in the degree of certainty that treatment results in a reduction in hip fracture (considered a crucial goal in osteoporosis management). In the case of alendronate and risedronate, the Committee accepted that there was sufficiently robust evidence to suggest a reduction in hip-fracture risk. The Committee noted that the available RCTs for etidronate were of insufficient size to show statistically significant reductions in hip-fracture risk, but that observational data lent support to a reduction in hip-fracture risk.

4.3.3 The Committee noted that strontium ranelate was effective in preventing vertebral and non-vertebral fractures, and the drug resulted in a non-significant 15% reduction in hip-fracture risk. The Committee was also aware of the result of a post-hoc subgroup analysis showing a statistically significant reduction in the incidence of hip fractures in women aged 74 or older who had a T-score of −2.4 SD or below.

4.3.4 The Committee noted that the evidence for raloxifene showed an effect on risk of vertebral fractures, but did not show an effect on risk of hip fractures. In addition, there was evidence for a beneficial side effect of raloxifene on the incidence of breast cancer.

4.3.5 The Committee did not consider it appropriate to make recommendations for the treatment of women on long-term corticosteroid treatment because this patient group is at greatly increased risk of fracture and therefore requires special consideration. The Committee was aware that for women without prior fracture but on corticosteroid treatment, the fracture risk is as high as, or even higher than, the fracture risk for women with a prior fracture. The Committee therefore felt that it would be disadvantageous for this group to be included in the current guidance.

4.3.6 Recommendations for the treatment of women with osteopenia (T-score of between –1 and –2.5 SD below peak BMD) were not made, for two reasons. Firstly, it was agreed after the scope was issued in 2002 that the outcome in this appraisal should be 'the prevention of osteoporotic fractures' and this has been understood by the Committee to be a fragility fracture experienced by women with osteoporosis, not osteopenia. Secondly, not all of the drugs under appraisal have a UK marketing authorisation for treatment of women with osteopenia.

Cost-effectiveness modelling

4.3.7 Because women who have not had a fracture would not normally present to clinicians, the Committee considered it necessary to consider the costs involved in the assessment of fracture risk and of DXA scanning in its appraisal of the drugs for the primary prevention of osteoporotic fragility fractures.

4.3.8 The Committee acknowledged the efforts of the Assessment Group to build on the model used previously, particularly in using epidemiological data and a fracture risk algorithm developed under the auspices of the WHO to calculate transition probabilities and to model the identification approaches. The Committee noted that fracture risk is clearly related to age, low BMD and prior fracture. The Committee accepted that most of the independent clinical risk factors for fracture listed in section 4.2.11 are likely to be associated with an increased fracture risk. The Committee was not persuaded that 'current smoking' is a statistically significant risk factor in women, but noted that alcohol consumption of 4 or more units per day is a statistically significant risk factor. However, even for the statistically significant risk factors, the Committee was concerned that there was not sufficient evidence for a proven treatment effect on fracture risk related to risk factors other than low BMD, age and prior fracture.

4.3.9 With these caveats in mind, the Committee concluded that the Assessment Group's model was a useful basis for exploring the estimates of cost effectiveness; the model used data for a wide age range (age 50–75 years and older) and all osteoporotic fracture sites. Although the Assessment Group's model considered a shorter time period (10 years for predicting morbidity, see section 4.2.8) than the manufacturers' models, the Committee thought that this was appropriate considering the age groups involved and the uncertainties around health effects over a longer period.

4.3.10 The Committee discussed the assumptions underpinning the economic modelling undertaken by the Assessment Group. It noted that the most recent modelling explored some of the uncertainties identified by the Committee surrounding the results of the previous modelling; these related to the costs and disutility associated with treatment-related side effects and to non-compliance or non-persistence with treatment in a proportion of patients. The Committee also noted the effect of the recent price reductions for non-proprietary alendronate (70 mg weekly and 10 mg daily doses) on the cost effectiveness of the drug.

4.3.11 The Committee considered the base-case assumptions and those used in additional analyses. The Committee noted that the costs associated with fractures used in the base-case analysis were those used in the original assessment report developed in 2003 and considered that these were likely to be outdated. The Committee agreed that costs based on health resource groups, including home-help costs, were likely to provide the most accurate reflection of the cost of fractures to the NHS and personal social services, and it decided to incorporate these costs into the base-case analysis.

4.3.12 The Committee considered the utility multiplier used in the base-case analysis for the first year after a vertebral fracture and noted that it was based on a hospitalised patient group and not on a typical group of patients with vertebral fractures. Consequently it was considerably lower than the utility value modelled for a hip fracture. Although the Committee acknowledged that vertebral fracture can lead to greatly reduced quality of life, it considered that its true value would not greatly outweigh the utility decrement associated with a hip fracture. Therefore, the Committee considered it reasonable to assume that the disutility in the first year after a vertebral fracture was equivalent to the disutility in the first year after a hip fracture and decided to include this assumption in the base-case analysis.

4.3.13 The Committee was not persuaded that the drugs under consideration had been unequivocally shown to reduce fracture risk that was attributable to risk factors not mediated through low BMD and age. The Committee concluded that the uncertainty surrounding the efficacy of the drugs on risk factors not mediated through low BMD and age should be factored into its decision-making by using an analysis that assumed 50% efficacy of the drugs on fractures associated with risk factors other than age and low BMD. Although the Committee recognised that 50% was necessarily an arbitrary figure, the use of either 0% or 100% was considered both extreme and less plausible. In the analysis accepted by the Committee, the assumption of 50% efficacy of the drugs on fracture risk associated with other risk factors was adjusted by using a correspondingly greater efficacy of the drugs on fractures associated with the key independent clinical risk factors (age, BMD and prior fracture).

4.3.14 The Committee considered the assumptions used in the modelling for the side effects of bisphosphonates, in which women who experience bisphosphonate-related side effects had 91% of the utility of women who did not have such side effects. In the base case, this was applied to 2.35% of patients in the first treatment month and 0.35% of patients thereafter. Taking into account the persistence data (sections 4.1.31 and 4.1.32) and the comments received from consultees and commentators that about 25–30% of women experience gastrointestinal side effects when first taking a bisphosphonate, the Committee agreed that it was important to consider the results of a sensitivity analysis assuming that 24% of women were experiencing bisphosphonate-related side effects in the first treatment month and 3.5% of women thereafter.

4.3.15 The Committee acknowledged that the modelling of the identification strategies made assumptions necessary about the value of a QALY gained that could be considered an acceptable use of NHS resources. The Committee acknowledged this to be £20,000, as modelled, because there were no additional factors, as referred to in the 'Guide to the methods of technology appraisal' (see www.nice.org.uk), that could be considered to increase this value in this situation: that is, in primary prevention where an asymptomatic group of adult patients with a high number needed to treat to avoid a fracture is under consideration.

4.3.16 The Committee discussed a number of concerns surrounding other issues that were not represented in the model but which may have had an impact on the cost-effectiveness estimates. These included: possible long-term adverse effects of bisphosphonates on the formation of new bone; the probability that more GP time would be involved in identifying women with risk factors associated with osteoporosis; the likelihood that DXA scanning outside a clinical trial environment would not be as effective as in the clinical trials; and the possibility that the proportion of women who experience side effects may exceed the model's base-case assumptions. Finally, the Committee noted that current discount rates used by the Treasury, the Department of Health and NICE result in a cost-effectiveness calculation less favourable to the drugs than the discount rates used in the analysis considered by the Committee. Although a quantitative analysis of the uncertainties surrounding all these issues was not available, the Committee agreed that, for first-line treatment with a bisphosphonate, these uncertainties could be collectively approximated through the sensitivity analysis for side effects (see section 4.3.14). The Committee was persuaded, however, that the results of the sensitivity analysis need only apply to first-line treatment with a bisphosphonate, because many of the factors that led to the adoption of the sensitivity analysis did not apply for second-line treatment.

Alendronate

4.3.17 The Committee considered the results of the economic model following the price reduction for non-proprietary alendronate, the newly included assumptions and the sensitivity analyses (see sections 4.3.8 to 4.3.14). The Committee agreed that, when considering the use of alendronate as a first-line treatment, the sensitivity analysis that captured the uncertainties in the economic model (see section 4.3.14) was the most appropriate. This led the Committee to conclude that alendronate (based on the price of £53.56 per year for once-weekly treatment) would be an appropriate use of NHS resources for the treatment of postmenopausal women who are confirmed to have osteoporosis (that is, a T-score of −2.5 SD or below) who are aged 65 years or older and who have at least one independent clinical risk factor for fracture. The Committee also concluded that, in addition, women aged 70 years and older could be considered for DXA scanning if there was an indication that they might have low BMD, and treated with alendronate if osteoporosis was confirmed. The Committee's reason for restricting DXA scanning for women aged 70 years or older (who visit their GP for any reason) to those with indicators of low BMD was to avoid unnecessarily scanning many women who are well and asymptomatic and who are relatively unlikely to have a low BMD. The Committee was advised by the clinical specialists from the original Guideline Development Group for the NICE clinical guideline on osteoporosis that, in women aged 75 years or older with two or more clinical risk factors, a DXA scan may not be required if the clinician considers it to be clinically inappropriate or unfeasible. This is because a very high proportion of these women would be likely to have a T-score of −2.5 SD or below.

4.3.18 Having reviewed the evidence on independent clinical risk factors for fracture and the views of the clinical specialists, the Committee agreed that the appropriate independent clinical risk factors for fracture are: parental history of hip fracture, alcohol intake of 4 or more units per day, and rheumatoid arthritis. The Committee also concluded that there are indicators of low BMD, and these are low BMI (defined as less than 22 kg/m2) and medical conditions such as ankylosing spondylitis, Crohn's disease, conditions that result in prolonged immobility, and untreated premature menopause. The Committee acknowledged that rheumatoid arthritis is a condition that indicates low BMD, but was also proven to be an independent clinical risk factor for fracture. The Committee noted that prior fracture and long-term systemic corticosteroid use are also relevant clinical risk factors; women with prior fracture or who are on long-term systemic corticosteroid treatment will be considered in NICE technology appraisal guidance on the secondary prevention of osteoporosis.

4.3.19 The Committee noted that the prices of the different brands of alendronate vary greatly and concluded that alendronate should be prescribed on the basis of the lowest acquisition cost available.

4.3.20 The Committee considered postmenopausal women below the age of 65 years for whom opportunistic identification was not normally cost effective. The Committee recognised that a small number of postmenopausal women below the age of 65 years who present to healthcare practitioners with conditions that are indicators of low BMD are at high risk of osteoporotic fracture and would not need opportunistic identification. Therefore the Committee concluded that women under 65 years of age with rheumatoid arthritis, ankylosing spondylitis, Crohn's disease or any condition that has resulted in prolonged immobility, provided that they also have an independent clinical risk factor for fracture, should be considered for DXA scanning, and treated with alendronate if osteoporosis is confirmed.

Considerations for the other drugs under appraisal

4.3.21 The Committee noted that risedronate, etidronate, raloxifene and strontium ranelate were dominated by alendronate (based on the price of £53.56 per year for alendronate); that is, these drugs have a higher acquisition cost than alendronate, but are not more efficacious. The Committee was also aware that, for women for whom weekly non-proprietary alendronate could be recommended based on cost effectiveness, the ICERs for risedronate and strontium ranelate were very high, even without inclusion of identification costs (see examples in section 4.2.21).

4.3.22 The Committee considered an approach where the higher costs of risedronate and strontium ranelate were incorporated into the analysis by combining costs based on the estimated use of alendronate, risedronate and strontium ranelate. However, the overall cost effectiveness of such a combined approach for fracture prevention would be less favourable than that of alendronate. As a consequence, some women who would be eligible for treatment with alendronate as recommended in section 1.1 would not be offered treatment using such a combined approach. For this reason, the Committee did not consider the combined approach to be appropriate.

4.3.23 The Committee considered treatment options available for a woman who is intolerant to alendronate or unable to comply with instructions for administration despite reasonable measures to support continuation of alendronate treatment. The Committee noted that all other treatment options have higher acquisition costs and/or different effectiveness profiles, which would reduce the cost effectiveness of preventive treatment if these drugs were used. The Committee observed that the identification costs associated with finding women who could be cost-effectively treated with one of the other drugs would be negligible, because they would have already undergone an assessment and had a DXA scan in order to be assessed for first-line treatment with alendronate. Therefore, it agreed that the recommendations for this situation should be based on the modelling that excluded identification costs. The Committee also agreed that, when considering second-line or subsequent treatment, the base-case assumptions for side effects could be applied; that is, a 0.91 utility multiplier should be applied to 2.35% of patients in the first treatment month and 0.35% of patients thereafter.

4.3.24 The Committee considered women who cannot take alendronate because of a contraindication or a disability that prevents them from complying with the instructions for administration. Because such a contraindication or disability would be known before the risk assessment, this would comprise a first-line treatment situation, where identification costs are included. Alternative drugs become cost effective at a higher age and lower BMD in a first-line treatment situation, compared with a second-line treatment situation where identification costs are not included. However, such an approach was considered inappropriate by the Committee because it would unfairly disadvantage women who cannot take alendronate because of a contraindication or a disability. Therefore the Committee concluded that women who cannot take alendronate for these reasons should have access to alternative drugs in the same way as women who cannot tolerate alendronate (that is second-line treatment, where the analysis excluded identification and assessment costs).

Risedronate

4.3.25 The Committee concluded that risedronate could be recommended for women who are unable to comply with the special instructions for the administration of alendronate, or have a contraindication to or are intolerant of alendronate, and who have a combination of T-score, age and number of independent clinical risk factors for fracture where treatment with risedronate resulted in an ICER of less than £20,000 per QALY gained without the consideration of identification costs, as outlined in sections 4.2.23–24. The Committee agreed that in women aged 75 years or older, where the T-score needed to make treatment cost-effective was −2.5 SD or below, a DXA scan may not be required if the clinician considers it to be clinically inappropriate or unfeasible (see section 4.3.17).

Etidronate

4.3.26 The Committee considered the cost effectiveness of etidronate, and noted that in previous modelling etidronate had a better cost-effectiveness profile than risedronate; since then there has been no change in the evidence base that would affect the relative position of these two drugs. In view of its concerns surrounding the clinical evidence base for etidronate, and taking into account the views of clinical specialists and consultees, the Committee decided that etidronate should not be recommended in preference to risedronate. However, the Committee agreed that guidance on the use of etidronate should be included in the recommendations, and concluded that etidronate can be recommended as an alternative treatment option for women who cannot take alendronate, as outlined for risedronate in section 4.3.25. In deciding between risedronate and etidronate, clinicians and patients need to balance the overall effectiveness profile of the drugs against their tolerability and adverse effects in individual patients.

Strontium ranelate

4.3.27 Following the Court of Appeal Order of April 2010, the Committee considered the clinical and cost effectiveness of strontium ranelate, focusing on the most appropriate estimate for the efficacy in reducing the rate of hip fracture. The Committee considered the additional submission from Servier (see sections 4.1.37 to 4.1.43), a report by the DSU (see sections 4.1.44 to 4.1.49) reviewing this new submission and Servier's response to the DSU report (see section 4.1.50). At its meeting on 20 October 2010, the Committee heard from representatives of Servier and a representative of the DSU.

4.3.28 The Committee first considered whether it was plausible that strontium ranelate has a greater or lesser relative benefit in any subgroup of the population for which it has a marketing authorisation (that is, whether a different RR for hip fracture could be assumed to apply to some women). The Committee was aware of the advice received from the original Osteoporosis Guideline Development Group that drugs for osteoporosis have constant RR reductions irrespective of age, BMD and prior fracture status (see section 4.2.6).

4.3.29 The Committee noted the DSU's advice that the correct statistical procedure for investigating if a subgroup of trial participants has a significantly different response to treatment is a test for interaction (see section 4.1.44). No test for interaction had been undertaken for the high-risk subgroup from TROPOS. The Committee also noted that it had not received evidence of a differential benefit, supported by a test for interaction, in any subgroup of any trial of osteoporosis drugs.

4.3.30 The Committee noted Servier's view that an age cut-off of 74 years was justified by the epidemiological findings of Donaldson et al. (see section 4.1.39). It understood from the DSU that this paper suggests that the rate of hip fracture rises to a notable level after 75 years of age (see section 4.1.46). The Committee also noted that Donaldson et al. state that the absolute risk of hip fracture increases 'steadily' with age: although women are at greater risk of hip fracture as they grow older, there is no particular age at which the risk jumps from low to high. The Committee therefore concluded that Donaldson et al.'s study did not provide support for the use of a specific age cut-off of 74 years.

4.3.31 The Committee recognised the hypothesis advanced by Servier that there may be biological grounds for assuming an additional effect for strontium ranelate in older women (see section 4.1.41). However, it considered that it should be possible to demonstrate any such effect by statistical and biochemical tests, and it heard from Servier's representatives that no such evidence had been collected. The Committee concluded that a hypothesis alone, without supporting evidence, was insufficient to demonstrate a differential benefit for strontium ranelate in older women.

4.3.32 For these reasons, the Committee concluded that it could not justify discounting previous advice that drugs for osteoporosis are assumed to have the same relative effect regardless of age, BMD and prior fracture status. Therefore, it agreed that it was most appropriate for the cost-effectiveness model to rely on a single RR to quantify the effect of strontium ranelate in preventing hip fractures in all postmenopausal women with osteoporosis. As a result, the Committee did not concur with the view that it might choose to provide a specific recommendation only for women corresponding to the high-risk subgroup analysed by Servier, based on an assumption of differential effectiveness of strontium ranelate in those women.

4.3.33 The Committee then considered the value that represents the most appropriate estimate of effect (RR) for strontium ranelate in preventing hip fractures. It discussed Servier's view that the best estimate of effect for the whole population would be that observed in the high-risk subgroup – an RR of 0.64 (see section 4.1.42). The Committee emphasised that, in order to adopt this figure for the whole population, it would first need to be confident that it was a robust estimate of treatment effect. It discussed the process by which the high-risk subgroup had been selected by Servier. It noted that the pooled data from the placebo arms of TROPOS and SOTI had been screened to establish a subgroup at increased risk of hip fracture (see section 4.1.39). The Committee agreed with the DSU's advice that the method used to identify the age cut-off for the subgroup was 'data-dependent' and, therefore, the RR for strontium ranelate derived from this approach was likely to be inflated (see section 4.1.45).

4.3.34 The Committee also discussed whether it would be appropriate to use an RR derived from a subgroup of trial participants to quantify the effect of a drug in the whole population for which it has a marketing authorisation. It considered Servier's assertion that, in contrast to the whole trial population, the high-risk subgroup of TROPOS provided a statistically robust demonstration of the effect of strontium ranelate in preventing hip fractures (see section 4.1.42). It acknowledged that TROPOS did not include enough participants to demonstrate a statistically significant benefit for strontium ranelate in preventing hip fractures and that, because of this, it would be appropriate to consider using an estimate of effect that was more precise (that is, subject to less statistical uncertainty) than that derived from the whole trial population. The Committee accepted the DSU's advice that the precision of an RR is primarily influenced by the absolute number of observed events (in this case the absolute number of fractures), which would be greatest in the whole trial population. Additionally, it noted that the size of the groups – and, therefore, the rate of events – is important, so that, in theory, it is possible that an estimate of effect from a subgroup may be more statistically precise than the estimate from the whole trial population from which the subgroup is derived. However, in the case of TROPOS, the estimates from the subgroup and the whole trial population had 95% confidence intervals of very similar width. Therefore, the Committee did not accept that the RR in the subgroup was more precise than the RR in the whole trial population. As a result, the Committee concluded that there was no reason to assume that the subgroup analysis was any more statistically robust than the analysis of the whole trial population. The Committee also noted that it is incorrect to infer that one estimate is more accurate than another just because it achieved conventional standards of statistical significance whereas the other did not.

4.3.35 Taking all this into account, the Committee decided that it would not be appropriate to adopt an RR of 0.64 in assessing the cost effectiveness of strontium ranelate, because the method for the subgroup selection was likely to favour strontium ranelate, and because the RR derived in this way was no more precise that the RR from the overall population.

4.3.36 The Committee further noted that when values derived from subgroups have been considered in NICE technology appraisals, the evidence has been used to inform specific recommendations applying only to groups of people with the same characteristics as those in the trial subgroup. The Committee reiterated its conclusion that it had not received unambiguous evidence of differential benefit from strontium ranelate in any particular group (see sections 4.3.27 to 4.3.32). The Committee was aware that, in order to make recommendations for cost-effective treatment to prevent fractures in postmenopausal women with osteoporosis, it was necessary to consider separate populations defined by age, T-score and independent clinical risk factors. However, these populations are defined because the absolute likelihood of fracture increases in the presence of these risk factors and not because of variations in the relative benefit of treatment.

4.3.37 The Committee next considered the possibility of adopting an RR of 1.00 to quantify the effect of strontium ranelate in reducing hip fractures. It noted that the 95% confidence interval around the RR from the whole TROPOS population spanned unity (the upper limit was greater than 1). This means that, at the 95% confidence level, the observed results could from a statistical point of view be interpreted as being consistent with strontium ranelate having no effect. The Committee noted that, when the other drugs within this appraisal had been associated with RRs with 95% confidence intervals spanning 1, the model had assumed no effect (RR = 1.00). Therefore, it might be considered consistent to apply the same logic to the estimation of the effectiveness of strontium ranelate. However, the Committee heard the DSU's advice that it is important to base cost-effectiveness analysis on the most plausible estimate for each parameter, with associated uncertainty explored in sensitivity analysis. The Committee also agreed with the DSU's view that the available evidence suggests that strontium ranelate is effective in reducing the risk of hip fracture. For these reasons, the Committee concluded that it would be inappropriate to assume that strontium ranelate has no effect on the incidence of hip fractures, and rejected the use of an RR of 1.00 in the model.

4.3.38 Finally, the Committee discussed using an effect estimate of 0.85 – the RR of hip fracture observed in the whole TROPOS population. It noted the DSU's advice that, in the absence of a robust demonstration of differential benefit in one or more subgroup of a trial, it is most appropriate to rely on an intention-to-treat analysis of the whole trial population (see section 4.1.48). Having concluded that it had not seen evidence of a differential benefit for a specific subgroup in TROPOS, and having rejected the use of the alternative values 0.64 and 1.00 for the whole population, the Committee concluded it had no reason to depart from this principle. It therefore concluded that an RR of 0.85 represented the most appropriate estimate of effect for strontium ranelate in preventing hip fractures in postmenopausal women with osteoporosis. As a result, the Committee agreed that the Assessment Group had been correct to use an RR of 0.85 in its cost-effectiveness calculations to reflect the effect of strontium ranelate in reducing the rate of hip fractures (see section 4.2.7).

4.3.39 The Committee concluded that strontium ranelate can be recommended for women who are unable to comply with the special instructions for the administration of alendronate and either risedronate or etidronate, or have a contraindication to or are intolerant of alendronate and either risedronate or etidronate, and who have a combination of T-score, age and number of independent clinical risk factors for fracture where treatment with strontium ranelate resulted in an ICER less than £20,000 per QALY gained without the consideration of identification costs, as outlined in section 4.2.26.

4.3.40 The Committee agreed a definition of alendronate, risedronate or etidronate intolerance as: persistent upper gastrointestinal disturbance that is sufficiently severe to warrant discontinuation of treatment with alendronate or risedronate and that occurs even though the instructions for administration have been followed correctly.

Raloxifene

4.3.41 The Committee discussed the reported benefits of raloxifene on breast cancer risk, and heard from the clinical specialists that the possibility of preventing vertebral fractures and breast cancer simultaneously could be attractive, particularly to younger postmenopausal women. The Committee also heard from the specialists that evidence on the effect of raloxifene in reducing cardiovascular risk is not considered to be robust and that there is some concern over the increased risk of VTE (see section 4.1.25).

4.3.42 The Committee noted that a higher proportion of the overall benefit associated with raloxifene was attributable to its effect on the prevention of breast cancer than to its effect on the prevention of osteoporotic fragility fractures. The Committee agreed that, in principle, the side effects of using a drug should be considered; however, there were a number of reasons why the Committee considered that the breast cancer benefit should not be the sole factor in deciding whether raloxifene is a cost-effective option for treatment for the primary prevention of osteoporotic fragility fractures, as follows:

  • From the evidence presented, raloxifene was not as effective as the bisphosphonates for treating osteoporosis.

  • Full assessment of raloxifene's effect on the prevention of breast cancer and its cost effectiveness in this indication would require consideration of how it compares with other drugs that could be used for breast cancer prevention.

4.3.43 The Committee noted that treatment with raloxifene did not result in an ICER of less than £20,000 per QALY gained in any age group, even when identification costs were excluded from the modelling. Therefore, the Committee did not consider raloxifene to be a cost-effective use of NHS resources for the primary prevention of osteoporotic fragility fractures in postmenopausal women.

Women who cannot take alendronate

4.3.44 The Committee carefully considered the position of women who cannot take alendronate because of a condition which either makes alendronate contraindicated or which prevents individuals from complying with the instructions for administration for alendronate. In doing so the Committee noted that at least some women in this patient group were likely to be 'disabled' as defined by the Disability Discrimination Act 1995. The Committee was aware of its duties under that Act to avoid unlawful discrimination, to have due regard to the need to promote equality of opportunity for disabled people, and the need to take steps to take account of disabled people's disabilities, as well as its broader legal duties to ensure that its guidance is fair and reasonable.

4.3.45 The Committee noted that the drugs other than alendronate are cost effective only for patients at higher risk of fracture than the risk levels at which alendronate is cost effective. If these other drugs are recommended for use by patients who cannot take alendronate only when those patients meet the criteria at which these alternative drugs become cost effective, these patients will not receive preventative treatment unless they are at higher risk of fracture than the risk levels at which alendronate is recommended. The Committee therefore considered whether, for women who cannot receive alendronate, the other drugs should be recommended at the same risk levels as alendronate (that is using the criteria established as being cost effective for alendronate) in order to provide access to preventative treatment for all patients with the same level of risk. The Committee reviewed the ICERs for risedronate and strontium ranelate within the criteria established to be cost effective for alendronate. The Committee noted that the prices for risedronate and strontium ranelate are approximately five to six times higher than the price for non-proprietary weekly alendronate, and that the ICERs for these drugs compared with no treatment were very high. For example, the ICER for strontium ranelate for women aged 65–69 years with an independent clinical risk factor for fracture was approximately £90,000 per QALY gained (see section 4.2.21). The Committee noted that strontium ranelate would be the most likely choice to be considered for women who are unable to comply with the instructions for administration of alendronate, because the instructions for administration of alendronate and risedronate are similar. The Committee took the view that recommending drugs other than alendronate using the same criteria as alendronate for women who cannot take alendronate would not be justified in this case because of the very high ICERs for the alternative drugs. In reaching this decision the Committee had regard to the fact that the impact of refusing the more favourable recommendation is that there is no generally recommended preventative treatment for a particular group of patients who are at the lower end of fracture risk for which treatment was considered, but that the alternative drugs are recommended when these patients are at higher risk of fracture.

4.3.46 The Committee considered that it is important to maximise the number of patients who are able to take alendronate. Some women will be unable to take alendronate in any circumstances because of contraindication, intolerance or inability to comply with the instructions for administration. However, some women who have a disability that makes it difficult for them to comply with the instructions for administration of alendronate would be able to receive the drug if they received assistance in taking it. The Committee concluded that all reasonable steps should be taken to provide women who have a disability that makes it difficult for them to comply with the instructions for administration of alendronate, with such practical support and assistance with administration (for example through district nurse visits or other home support services), as will enable them to take the drug.

FRAX fracture risk calculation tool

4.3.47 The Committee was aware of the availability of the FRAX internet-based tool, which can be used to calculate a 10-year absolute risk of fracture, developed under the auspices of the WHO. This assessment tool was based on the same epidemiological data that were used in the Assessment Group's model. However, the Committee was not persuaded that recommendations about treatment should be based on absolute risk as calculated using FRAX. Firstly, the Committee did not agree that all clinical risk factors included in the WHO algorithm were appropriate (see sections 4.2.11 and 4.3.8). Secondly, the Committee was aware that absolute fracture risk is not directly related to cost effectiveness, as outlined in the 2005 Strontium Ranelate Assessment Report. This is because absolute fracture risk is the total for all fracture sites, but different fracture sites have different impacts on quality of life, costs and mortality. Therefore cost effectiveness is dependent on the contribution from each fracture site to the total fracture risk. Thirdly, the Committee had agreed that treatment benefit had not been proven for fracture risk associated with all independent clinical risk factors (section 4.3.8). Therefore, the Committee concluded that using a combination of T-score, age and number of independent clinical risk factors for fracture is more appropriate for defining treatment recommendations in this appraisal.

Evidence on use of acid suppressive medication and fracture risk

4.3.48 The Committee was made aware of data indicating that acid-suppressive medication leads to a small increase in fracture risk and that co-administration of acid-suppressive medication and bisphosphonates may lead to an increased fracture risk compared with bisphosphonate administration alone. The Committee was not persuaded by this evidence; it noted that the data are observational and have not been reported in full, and are different for different fracture sites and for different acid suppressors. Furthermore, the Committee was informed, during consultation, of analyses showing that acid-suppressive medication given in addition to risedronate did not increase fracture risk. The Committee concluded that caution should be exercised when considering the evidence about co-prescription of acid-suppressive medication and bisphosphonates.

4.3.49 The Committee noted sensitivity analyses that included the assumption of an increase in fracture risk for women for whom acid-suppressive medications are co-prescribed (see section 4.2.18). The analysis for treatment strategies did not decrease the T-scores at which the ICERs for alendronate fell below £20,000 to the T-scores established for strategies including strontium ranelate. The Committee also noted that the ICERs for treatment compared with no treatment for an individual woman with a relevant combination of age and T-score were not more favourable for strontium ranelate than for risedronate even if an effect of acid-suppressive medication was assumed. The Committee considered that the evidence for this effect was not sufficiently robust. However, it concluded that the relative positions of alendronate, risedronate and strontium ranelate would remain unchanged even if an effect of acid-suppressive medication was assumed. The Committee therefore concluded that it was not necessary to change its recommendations (section 1) to take account of acid-suppressive medication.

Calcium and vitamin D prerequisites for treatment

4.3.50 The Committee discussed the effect of calcium and vitamin D on the clinical effectiveness of the drugs considered. In the studies that formed the basis of this guidance, all participants were said to have adequate calcium and vitamin D levels. The Committee appreciated that the general population, particularly the elderly population, cannot be assumed to have an adequate dietary intake of calcium and vitamin D. It was also considered important to note that adequate levels (normal serum concentrations) of calcium and vitamin D are needed to ensure optimum effects of the treatments for osteoporosis. The Committee concluded that calcium and/or vitamin D supplementation should be provided unless clinicians are confident that women who receive treatment for osteoporosis have an adequate calcium intake and are vitamin D replete.

Consultation on the Assessment Group's economic model

4.3.51 Following the outcome of the judicial review and the court ruling of March 2009, the Appraisal Committee considered the comments received from consultees on the Assessment Group's executable economic model, a report by the DSU reviewing these comments, and responses from the consultees to the DSU report.

4.3.52 The Committee considered the comments from consultees that the Assessment Group's model was not sufficiently transparent, lacked adequate documentation and could not be validated. It noted the number of consultations that took place during the appraisal guidance development, that the consultation documents had included descriptions of the model, and that assumptions and parameter values used had also been provided to consultees. The Committee was aware that instructions on how to run the model were released with the model and that consultees were able to run the model with changed input parameters. The Committee was satisfied with the exploration by the DSU of the functionality and validity of the model. The Committee noted that Servier stated that it had constructed its own economic model in order to validate the Assessment Group's model and to demonstrate the mathematical rationale to support its comments. The Committee noted that the results from Servier's model were very similar to those from the Assessment Group's model when similar assumptions and parameter inputs were used. The Committee was not persuaded by the consultees' doubt about the validity of the model, particularly since differences between the results obtained using Servier's model and the Assessment Group's model were largely because of differences in the assumptions used.

4.3.53 The Committee considered the comments from consultees that some inputs in the Assessment Group's model could not be changed and that it was unclear how fracture risk was calculated. The Committee noted that some of the fixed input parameters were inputs that do not need changing (such as the discount rate and standard mortality rates). Other fixed input values, such as the BMI and issues around the time horizon, were discussed separately (see sections 4.3.56 and 4.3.59 respectively). The Committee concluded that it was reasonable for some inputs in the model to be fixed. The Committee noted that fracture risks were calculated by the Assessment Group using the WHO algorithm in a separate spreadsheet and then entered into the model. It understood that the WHO algorithm itself was provided to the Assessment Group in 2005 as academic in confidence and that at that time NICE did not have permission from the owner of the algorithm to release it to consultees. The Committee understood that although the WHO fracture risk algorithm itself was not embedded in the economic model, the model could not be released because the algorithm could have been back-calculated from the fracture risks entered in the model and because the numbers of women with risk factors from the algorithm were included in the model.

4.3.54 The Committee considered the comments from consultees that the fracture risks entered into the model, calculated using the WHO algorithm, were different from fracture risks estimated using the FRAX tool. The Committee was aware that some differences could be because of the Assessment Group's use of midpoint ages in each 5-year age grouping. It also heard that the Assessment Group had verified the application of the WHO algorithm as provided in 2005, including all interactions between clinical risk factors, and was satisfied that the DSU had adequately assessed its application as being correct in the model. Because neither the DSU nor NICE has access to the algorithm used for the construction of the FRAX tool, the Committee was not in a position to comment further on differences between the two ways of estimating fracture risk. It concluded that differences between fracture risk estimates produced using the FRAX tool and those used in the Assessment Group's model were not in themselves a reason to doubt the correct use of the WHO algorithm within the Assessment Group's model.

4.3.55 The Committee considered the comments from consultees that mortality associated with clinical risk factors had been omitted from the Assessment Group's model, and noted the confirmation from the DSU that this was the case. It was persuaded that the inclusion of such additional mortality effects would increase the complexity of the model, and may increase the ICERs for the treatment of women with such clinical risk factors but decrease the ICERs for the treatment of women without such risk factors. The Committee agreed that the overall effect of including mortality associated with clinical risk factors in the model was unlikely to lead to a marked change in the overall results.

4.3.56 The Committee reviewed the consultee comments relating to the fixed BMI value of 26 kg/m2 used in the Assessment Group's model. It noted the rationale for selecting this value (see section 4.2.44). It also noted that in the DSU's exploratory analysis using the WHO algorithm, no increase in fracture risk was identified for women with a higher or lower BMI when BMD was known. The Committee was aware of its recommendation to assess BMD in all women under the age of 75 years for whom treatment is being considered, and noted that BMI is a weak predictor of fracture when BMD is known. Therefore the Committee concluded that the use of a fixed BMI value of 26 kg/m2 did not lead to an unfavourable assessment of the cost effectiveness of the interventions.

4.3.57 The Committee considered comments from consultees that the fracture risk associated with alcohol consumption used in the model was incorrect. It noted that the DSU had determined that the WHO algorithm had been correctly implemented, and understood that alcohol consumption of more than 2 units per day was included as a risk factor in the model. The Committee also noted that in its recommendations it had chosen to use a higher level of alcohol consumption in the risk identification strategy, because only alcohol consumption of 4 or more units per day was identified as a statistically significant risk factor for fracture for women – the population considered in the guidance. The Committee also considered a consultee comment that stated that it was unclear whether smoking and corticosteroid use had been included in the model as risk factors. It noted that the DSU had determined that the WHO algorithm had been correctly implemented with regard to both smoking and corticosteroid use in the model. The Committee noted that the effect of smoking in women was not statistically significant when assessing risk of osteoporotic fractures taken as a whole. The Committee was therefore satisfied that risks associated with corticosteroids, smoking and alcohol consumption had been faithfully applied in the Assessment Group's model, and agreed that the levels of alcohol consumption and smoking that should be used in the risk identification strategy were a matter for the Committee to consider and determine. The Committee took the view that it is not appropriate to identify women at high risk of fracture on the basis of risks that were not statistically significant (such as smoking and consumption of fewer than 4 units of alcohol per day) and that, in addition, the impact of these risk factors could arguably be approached by a strategy of smoking cessation and reducing alcohol consumption. The Committee noted comments from the consultees that the Assessment Group's model should have been amended to reflect the Committee's agreed inclusion of risk factors. However, the Committee took the pragmatic view that such amendments would have added unnecessarily to the mathematical complexity of an already complex clinical situation. It noted that women who had taken corticosteroids were included in the model and therefore contributed to the underlying fracture risk, with the effect of reducing the ICERs for the treatment of the population of women considered in the recommendations.

4.3.58 The Committee considered consultee comments that giving equal weighting to different clinical risk factors for fracture in the Assessment Group's model was inaccurate. The Committee considered the complex results presented originally in the 2005 Strontium Ranelate Assessment Report related to the inclusion of different risk factors and combinations of risk factors. The Committee noted that it had previously agreed that a clinically workable risk identification and treatment strategy should include the grouping of risk factors as the only practical way forward. At the time of the model's development, no individual risk calculation tool was available. Even if such a tool had been used in the development of the guidance, the prediction of cost effectiveness from overall absolute fracture risk alone, as suggested by consultees, would not be appropriate, for two reasons. Firstly, risk factors have different effects on different fracture types, and the cost effectiveness of treatment depends on the relative contributions of each risk factor to fracture risk. Secondly, the effectiveness of the drugs in reducing fracture risk was limited to only some of the clinical risk factors (age, T-score of −2.5 SD or below and prior fracture). The Committee heard from the DSU that there was considerable uncertainty about the cost effectiveness of treating women based on absolute risk alone (see section 4.3.47). Therefore, the Committee concluded that, when developing the guidance, simplification of the model was justified to in order to produce workable recommendations.

4.3.59 The Committee reviewed a comment from a consultee that the methods used to model effects beyond 10 years were not adequately described. It noted that the DSU confirmed that consequences beyond 10 years were considered in the Assessment Group's model, and an expanded description of the methods used was provided in an annex to the DSU report. The Committee also noted that the DSU carried out a sensitivity analysis in order to establish the impact of any possible underestimation of the mortality risk after the 10-year time horizon. It noted that doubling of the mortality risk led to only very small changes in the results. The Committee therefore concluded that mortality effects beyond the 10-year time horizon had been reasonably accounted for in the model and that sufficient description of these methods had been made available to consultees.

4.3.60 The Committee considered comments from consultees that the population data on the distribution of BMD (T-score) were not appropriate. It noted the DSU response confirming that the UK epidemiological dataset from the Holt study had been correctly implemented in the Assessment Group's model, and that the assumptions about the normality of the distributions used were likely to favour treatment for women at risk of fracture. The Committee also noted that the particular UK epidemiological dataset used in the model had been originally suggested by consultees for this appraisal. The Committee concluded that the population data had been used appropriately in the model.

4.3.61 The Committee considered a comment from a consultee that using a single estimate of cost effectiveness for 5-year age groupings of women at risk of fracture could exclude women from being offered treatment. It noted that this identification method was a Committee decision, and that identification strategies based on other factors could make treatments less cost effective.

4.3.62 The Committee reviewed comments from a consultee that the application of the same disutility for the side effects associated with strontium ranelate and bisphosphonates was not correct, as the side effects of strontium ranelate are different from those of the bisphosphonates. The Committee was aware that the side effects observed for strontium ranelate in the clinical trials did not include gastrointestinal effects, but did include an increased risk of VTE. Because the increased risk of VTE was not included in the Assessment Group's model, the Committee had agreed that it was appropriate to include a disutility equivalent to the bisphosphonate base-case side-effect disutility to take account of this adverse effect.

4.3.63 The Committee reviewed comments from consultees about model assumptions or inputs that the Committee had directed the Assessment Group to use. It noted that issues such as treatment compliance, discount rates, costs of fracture, utility values for vertebral fracture and side-effect profiles used in the model had been considered and agreed by the Committee and reported in the guidance. The Committee also agreed that it had considered identification strategies for women at risk of fracture and, noting the advice of clinical specialists, it had recommended that women should have their BMD assessed by DXA scanning, except in certain circumstances as defined in the guidance. The Committee concluded that views expressed by consultees on the choice of modelling assumptions, input parameters and risk identification strategy were not about the operation of the Assessment Group's model, but were about Committee decisions that had already been discussed during development of the guidance.

4.3.64 The Committee also considered the consultees' view that the FRAX tool provides a 'mechanism to compute cost-effectiveness' according to clinical risk factors and that each of the current recommendations covers a wide range of absolute risk values, depending on the individual risk factors involved. The Committee understood that the FRAX tool is not an economic model, but a tool to estimate fracture risk. The Committee acknowledged that the current set of recommendations involved necessary simplifications from the more complex algorithm used to develop the Assessment Group's model. It was also aware that a direct prediction of cost effectiveness from absolute fracture risk alone would be inappropriate (see section 4.3.47).

4.3.65 The Committee concluded that the Assessment Group had provided an executable economic model and had implemented the WHO algorithm (as supplied) correctly. The Committee agreed with the DSU's comments that alterations to the modelling approach, as suggested by consultees, would not lead to significant improvements in the cost effectiveness of treatment for women at risk of fracture. The Committee confirmed that the model provided a suitable framework to allow it to make recommendations on the cost-effective use of treatment for women at risk of fracture. The Committee noted that assumptions used in the Assessment Group's model had been considered and agreed by the Committee in developing the guidance. It agreed that it would not be useful to request further analysis from the Assessment Group at this stage. The Committee further agreed that any exploration of how absolute fracture risk could be used in making treatment decisions would require a new assessment and appraisal. Therefore the Committee concluded that the recommendations based on the Assessment Group's model were appropriate, and that the recommendations should remain unchanged.

4.3.66 The Committee noted the comments from some consultees that the guidance should be reviewed soon because the price of some of the appraised drugs had changed. The Committee noted that any possible price reductions could be offset by the use of the currently applicable discount rate, and that any review should also take into consideration how NICE might assess diagnostic tools such as absolute fracture risk prediction tools.



[2] T-scores were calculated according to trial-specific normative data, using a threshold of −3.0 or below, which is equivalent to a T-score of −2.4 or below when measured according to the standards subsequently adopted by the WHO. The latter classification is used throughout this guidance document.

[3] The remit of the original osteoporosis guideline has since been amended; see Osteoporosis and Osteoporosis fragility fracture risk.

  • National Institute for Health and Care Excellence (NICE)