4 Evidence

Guidance

Clinical effectiveness
Cost effectiveness

4 Evidence

The diagnostics advisory committee (section 8) considered evidence on tests used in secondary care to help identify people at high risk of ovarian cancer from several sources. Full details of all the evidence are in the committee papers.

Clinical effectiveness

4.1 Fifty-one diagnostic cohort studies were identified (in 65 publications) that reported data on 1 or more of the included tests or risk scores. Also, an unpublished interim report of phase 5 of the International Ovarian Tumor Analysis (IOTA) study was available to the external assessment group (EAG) and committee as academic in confidence. No randomised controlled trials or controlled clinical trials were identified; neither were studies that reported how test results affect clinical management decisions. Ten studies had inclusion criteria which allowed people under 18 years to take part; but the number of participants in this age group was not reported.

4.2 All the included studies reported the accuracy of tests and risk scores to assess people with an adnexal or pelvic mass. When summary estimates of sensitivity and specificity from multiple studies were calculated, these were separate pooled estimates produced using random-effects logistic regression. The bivariate/hierarchical summary receiver operating characteristic model was not used because data sets were either too small or too heterogeneous.

4.3 Histopathology was the reference standard used to assess test accuracy in all of the identified studies. The target condition (that is, what was considered a positive reference standard test result) varied between the included studies. Some studies classified borderline ovarian tumours as positive, but others did not (and either classified them as disease negative or excluded them from analyses). Furthermore, studies varied as to whether they included people with metastases to the ovaries and germ cell tumours in analyses.

4.4 The methodological quality of the diagnostic cohort studies was assessed using the QUADAS‑2 tool. Fifteen studies had a high risk of bias in the 'flow and timing' domain, most commonly because not all patients were included in the analyses and patients did not all have the same reference standard. Regarding applicability, 26 studies were rated as 'high' concern on at least 1 domain. The EAG commented that areas of concern for applicability included how the index test was applied and whether this could be considered to be representative of routine practice. A further issue for applicability of studies was how the target condition was defined. One study, which reported the development and validation of the ADNEX model (Van Calster et al. 2014), was also assessed using the PROBAST tool; a tool developed to assess the methodological quality of prediction modelling studies.

Assessment of test accuracy

Risk of malignancy index 1 (RMI I) at decision thresholds other than 250

4.5 Ten studies reported diagnostic accuracy of the RMI I using a decision threshold of 250 (the comparator for this assessment) and at least 1 further threshold value. Two studies were done in the UK, 2 elsewhere in Europe and 6 in non-European countries. CA125 assays from various manufacturers were used in the studies.

4.6 In studies that directly compared RMI I at a threshold of 250 and 200, no statistically significant difference between the sensitivity and specificity of RMI I at these thresholds was seen in any of the target condition categories (see table 4).

Table 4 Comparative accuracy of RMI I at thresholds of 200 and 250

Source	Subgroup	Index test	Sensitivity % (95% CI)	Specificity % (95% CI)
Target condition: All malignant tumours including borderline
Summary estimates (6 studies; n=1,079)	All	RMI I (200)	70.8 (65.6 to 75.6)	91.2 (88.9 to 93.1)
		RMI I (250)	69.0 (63.7 to 73.9)	91.6 (89.3 to 93.5)
Target condition: Ovarian malignancies including borderline
Yamamoto et al. 2009 (n=253)	All	RMI I (200)	80.0 (65.2 to 89.5)	86.4 (81.8 to 89.9)
		RMI I (250)	72.5 (57.2 to 83.9)	88.7 (84.4 to 92.0)
Target condition: All malignant tumours excluding borderline
Summary estimates (2 studies; n=248)	All	RMI I (200)	73.5 (64.3 to 81.3)	89.6 (83.2 to 94.2)
		RMI I (250)	66.4 (56.9 to 75.0)	93.3 (87.7 to 96.9)
Abbreviations: CI, confidence interval; RMI I, risk of malignancy index 1.

Risk of ovarian malignancy algorithm (ROMA)

4.7 Fourteen studies (in 22 publications) reported diagnostic accuracy data for the ROMA using either Abbott ARCHITECT assays (9 studies) or Roche Elecsys assays (5 studies). No studies were identified that used the Fujirebio Lumipulse G automated CLEIA system.

ARCHITECT HE4 (Abbott Diagnostics)

4.8 All of the 9 ROMA studies which used Abbott ARCHITECT assays were done outside the UK: 3 in European countries, 4 in Asia, 1 in the US and 1 in Oman. No direct comparisons (that is, when both tests were assessed in the same patient cohort) between ROMA and RMI I (threshold of 250) were identified.

4.9 Three studies made a direct comparison between ROMA using Abbott ARCHITECT assays and RMI I (threshold of 200), shown in table 5. One study (Al Musalhi et al. 2016) did not exclude participants from analysis based on their final histopathological diagnosis; but the other 2 studies did. Sensitivity was highest when people with borderline tumours and non-epithelial ovarian cancers were excluded from analysis, and lowest when all participants (regardless of final histopathological diagnosis) were included. The reverse was true for specificity. When all participants were included in the analysis (Al Musalhi et al. 2016) there was no statistically significant difference between the sensitivity and specificity estimates of ROMA and RMI I (threshold of 200). This was also true for the summary sensitivity estimate when the target condition was 'epithelial ovarian malignancies excluding borderline'; however specificity was statistically significantly lower for ROMA compared with RMI I (threshold of 200).

Table 5 Comparative accuracy of ROMA (using Abbott ARCHITECT assays) and RMI I (threshold of 200)

Source	Subgroup	Index test	Sensitivity % (95% CI)	Specificity % (95% CI)
Target condition: All malignant tumours including borderline
Al Musalhi et al. 2016	All (n=213)	ROMA¹	75.0 (60.4 to 86.4)	87.9 (81.9 to 92.4)
	All (n=213)	RMI I (200)	77.1 (62.7 to 88.0)	81.8 (75.1 to 87.4)
	Premenopausal (n=162)	ROMA¹	52.4 (29.8 to 74.3)	90.1 (83.9 to 94.5)
	Premenopausal (n=162)	RMI I (200)	57.1 (34.0 to 78.2)	85.1 (78.1 to 90.5)
	Postmenopausal (n=51)	ROMA¹	92.6 (75.7 to 99.1)	79.2 (57.8 to 92.9)
	Postmenopausal (n=51)	RMI I (200)	91.7 (73.0 to 99.0)	66.7 (46.0 to 83.5)
Target condition: Epithelial ovarian malignancies including borderline
Winarto et al. 2014	All (n=128)	ROMA	91.0 (81.5 to 96.6)	42.6 (30.0 to 55.9)
Winarto et al. 2014	All (n=128)	RMI I (200)	80.6 (69.1 to 89.2)	65.6 (52.3 to 77.3)
Target condition: Epithelial ovarian malignancies excluding borderline
Summary estimate (2 studies)	All (n=1,172)	ROMA	96.4 (93.6 to 98.2)	53.3 (50.0 to 56.7)
Summary estimate (2 studies)	All (n=1,172)	RMI I (200)	93.4 (90.0 to 95.9)	80.3 (77.5 to 82.9)
¹ Manufacturer's suggested thresholds not used. Abbreviations: CI, confidence interval; RMI I, risk of malignancy index 1; ROMA, risk of ovarian malignancy algorithm.

4.10 Further identified studies assessed the performance of the ROMA score (using the Abbott ARCHITECT assays and at the company's suggested thresholds) without comparison with RMI I, across a range of target conditions. These included epithelial ovarian malignancies (both including and excluding borderline tumours). One study reported that the sensitivity of the ROMA was higher when the target condition was stage III or IV epithelial ovarian cancer, rather than stage I or II. Also, accuracy data at ROMA thresholds different from those suggested by the manufacturer were identified, but the EAG commented that no alternative threshold offered a clear performance advantage.

Elecsys HE4 immunoassay (Roche Diagnostics)

4.11 All of the 5 ROMA studies that used Roche Elecsys assays were done outside the UK: 1 in a European country, 3 in Asia and 1 in the US. No direct comparisons (that is, when both tests were assessed in the same cohort) between ROMA and RMI I (threshold of 250) were identified. One study (Yanaranop et al. 2016) made a direct comparison between ROMA using Roche Elecsys assays and RMI I (threshold of 200). In this study, people with a final histological diagnosis of borderline ovarian tumour were classified as disease negative. Differences between the ROMA and RMI I (threshold of 200) sensitivity (83.8% compared with 78.4%) and specificity (68.6% compared with 79.6%) values were not statistically significant. The data were similar when stratified by menopausal status. When people with non-epithelial ovarian cancer were excluded from analysis in this study (target condition epithelial ovarian malignancies), sensitivity for both ROMA and RMI I (threshold of 200) increased, but not statistically significantly. Sensitivity was higher for ROMA when the target condition was stage II to IV epithelial ovarian malignancies (97.2%; 95% confidence interval [CI] 85.5 to 99.9%) when compared with stage I epithelial ovarian malignancies (76.7%; 95% CI 57.7 to 90.1%). This was also the case for RMI I (threshold of 200).

4.12 Four further studies assessed the ROMA score (using Roche Elecsys assays) without comparison with RMI I. Two of these studies included all participants in analyses (Janas et al. 2015; Shulman et al. 2016; target condition all malignant tumours including borderline), shown in table 6.

Table 6 Diagnostic accuracy of ROMA (using Roche Elecsys assays and manufacturer's suggested thresholds)

Source	Subgroup	Sensitivity % (95% CI)	Specificity % (95% CI)
Target condition: All malignant tumours including borderline
Summary estimate (2 studies; n=1,252)	All	79.1 (74.2 to 83.5)	79.1 (76.3 to 81.6)
Janas et al. 2015	Premenopausal (n=132)	90.0 (55.5 to 99.7)	82.0 (74.0 to 88.3)
Janas et al. 2015	Postmenopausal (n=127)	78.6 (65.6 to 88.4)	76.1 (64.5 to 88.4)
Abbreviation: CI, confidence interval.

4.13 Two studies assessed the performance of the ROMA score (using the Roche Elecsys assays and at the company's suggested thresholds) without comparison with RMI I and with a target condition of ovarian malignancies excluding borderline tumours. The sensitivity estimates from these studies were very different (95.5% and 53.8%) and no summary estimate was calculated. Also, accuracy data at ROMA thresholds different from those suggested by the manufacturer were identified, but the EAG commented that no alternative threshold offered a clear performance advantage.

Lumipulse G HE4 (Fujirebio Diagnostics)

4.14 None of the included studies assessed the ROMA score and used the Fujirebio Lumipulse G HE4 assay. The EAG identified 2 studies that used a ROMA score calculated using a manual Fujirebio tumour marker enzyme immunometric assay (EIA) assay; however this assay was outside the scope of this assessment.

Simple Rules

4.15 Seventeen published studies had data on the diagnostic accuracy of Simple Rules. Eleven of these studies were done in Europe, including 3 in the UK. Two studies were multinational and included UK participants, 2 studies were done in Thailand, 1 was done in Brazil and 1 study did not provide detail on location. Also, the provided interim report (academic in confidence) had diagnostic accuracy results for Simple Rules. In studies included in summary estimates of sensitivity and specificity, Simple Rules was done by a level 2 or 3 examiner as defined by the European Federation of Societies for Ultrasound in Medicine and Biology (EFSUMB) classification system; 1 study also reported data from level 1 examiners.

4.16 Four published studies and the unpublished interim report provided direct comparison of the accuracy of Simple Rules and RMI I at a threshold of 200. The summary estimate of sensitivity was statistically significantly higher for Simple Rules (93.9%; 95% CI 92.8 to 94.9%) when compared with RMI I (threshold of 200; 66.9%; 95% CI 64.8 to 68.9%); however the summary specificity estimate was statistically significantly lower (74.2% [95% CI 72.6 to 75.8%] compared with 90.1% [95% CI 88.9 to 91.2%]). All these studies included all participants in analysis, regardless of their final histopathological diagnosis (target condition all malignant tumours including borderline). The unpublished interim report also directly compared Simple Rules and RMI I (threshold of 250; academic in confidence).

4.17 A further 4 studies had data on the accuracy of Simple Rules for the same target condition but without a direct comparison with RMI I. There was no statistically significant change in sensitivity (94.2%; 95% CI 93.3 to 95.1%) or specificity (76.1%; 95% CI 74.9 to 77.3%) when data from these studies were included in the summary estimates of Simple Rules accuracy (a total of 8 published studies and the unpublished interim work).

4.18 Three studies directly compared Simple Rules and RMI I (threshold of 200) stratified by menopausal status. There was no statistically significant difference between the sensitivity and specificity estimates for Simple Rules produced for the pre- and postmenopausal subgroups. However if data from a further study (which did not report a direct comparison with RMI I) were added, the summary estimate for specificity was statistically significantly higher for people who are premenopausal (79.3%; 95% CI 77.0 to 81.5%), when compared with people who are postmenopausal (67.3%; 95% CI 63.5 to 70.9%).

4.19 In the above estimates of accuracy for Simple Rules, inconclusive results were treated as malignancy positive. Test accuracy data were also available from some studies in which inconclusive results were instead classified by expert subjective assessment of the ultrasound images. Assessment of inconclusive results from Simple Rules using expert subjective assessment (rather than assuming them to be malignant) statistically significantly increased the specificity of the test, but statistically significantly lowered sensitivity.

The ADNEX model

4.20 Six published studies had data on the diagnostic accuracy of the ADNEX model. One was done entirely in the UK and 2 were multicentre studies that included UK participants. The remaining 3 studies were done elsewhere in Europe. A further unpublished interim report (provided as academic in confidence) also had data on the diagnostic accuracy of the ADNEX model. Four of the studies did not report details about the people doing the ultrasound scans. In 1 study, ultrasound scans were done by EFSUMB level 2 ultrasound examiners (non-consultant gynaecology specialists, gynaecology trainee doctors and gynaecology sonographers) and in another study they were done by EFSUMB level 2 or 3 practitioners with 8 to 20 years' experience in gynaecological sonography.

4.21 The EAG focused on test accuracy at the 10% threshold. One published study and the unpublished interim report made a direct comparison between the ADNEX model and RMI I (threshold of 200). Sensitivity was statistically significantly higher for ADNEX (96.0%; 95% CI 94.5 to 97.1%) than RMI I (threshold 200; 66.0%; 95% CI 62.9 to 69.0%), but specificity was statistically significantly lower (67.0% [95% CI 64.2 to 69.6%] compared with 89.0% [95% CI 87.0 to 90.7%]). Also, a further 2 studies reported on the accuracy of the ADNEX model in the same target population (all malignant tumours including borderline) but without direct comparison with RMI I. Inclusion of data from these studies in summary estimates did not cause a statistically significant change to sensitivity (96.3%; 95% CI 95.3 to 97.1%) or specificity (69.1%; 95% CI 67.4 to 70.8%) of the ADNEX model. The unpublished interim report also directly compared the ADNEX model and RMI I (threshold of 250; academic in confidence).

4.22 Two further studies had data on the accuracy of the ADNEX model without comparison with RMI I. These studies excluded people with histopathological diagnoses other than primary ovarian cancer from analysis (target condition ovarian malignancies including borderline). The summary estimate of sensitivity from these studies did not differ significantly from that of studies that included all participants in analysis; however the summary estimate of specificity (77.6%; 95% CI 73.6 to 81.2%) was statistically significantly higher.

4.23 Data stratified by menopausal status was available from 1 study. No statistically significant effect on sensitivity was reported, but specificity was statistically significantly higher for people who were premenopausal than for people who were postmenopausal.

4.24 One published study and the unpublished interim analysis directly compared the ADNEX model and Simple Rules (inconclusive results assumed to be malignant). The summary estimate of sensitivity was statistically significantly higher for ADNEX (96.0%; 95% CI 94.5 to 97.1%) than Simple Rules (92.8%; 95% CI 90.9 to 94.3%). Summary estimates of specificity were similar.

Overa (MIA2G)

4.25 Three studies (in 4 publications) had data on the diagnostic performance of Overa (MIA2G). All the studies were done in the USA and used a score of 5 units as a threshold. No studies were identified that directly compared Overa (MIA2G) with RMI I (at any threshold). However, 1 study assessed the accuracy of the Overa (MIA2G) and ROMA (using Roche Elecsys assays and manufacturer suggested thresholds for ROMA) in the same population with a target condition of all malignancies including borderline. Overa (MIA2G) had a statistically significantly higher sensitivity (91.0% [95% CI 86.8 to 94.0%] compared with 79.2% [73.7 to 83.8%]) and statistically significantly lower specificity (65.5% [95% CI 62.0 to 68.8%] compared with 78.9% [75.8 to 81.7%]) than the ROMA in this study.

4.26 Two further studies reported the diagnostic accuracy of Overa (MIA2G) without comparison with other risk scores. The summary estimate of sensitivity was 90.2% (95% CI 84.6 to 94.3%), and specificity was 65.8% (95% CI 61.9 to 69.5%). One of these studies assessed subgroups of people who were pre- and postmenopausal; there was no statistically significant difference between these groups.

Cost effectiveness

Systematic review of cost-effectiveness evidence

4.27 The EAG did a systematic review to identify existing studies that assessed the cost effectiveness of the included tests and risk scores to help identify people with ovarian cancer. Five studies were identified, however 2 of these related to the use of tests in screening so were not applicable to the scope of this assessment. One of the studies (Havrilesky et al. 2015) included the ROMA and the Multivariate Index Assay algorithm (MIA; from Vermillion who also produce the Overa [MIA2G; multivariate index assay 2nd generation]). Both were dominated (that is, they cost more and produced less life years) by the use of CA125 alone or by a strategy of referring all people for specialist care (without testing). Conversely, in Forde et al. (2016) MIA dominated the use of CA125 alone (that is, it was cost saving and produced more quality-adjusted life years [QALYs]). No identified studies assessed the cost effectiveness of all the tests and risk scores included in this assessment.

Modelling approach

4.28 The EAG developed a de novo economic model designed to assess the cost effectiveness of the following tests and risk scores when used in secondary care to help decide whether to refer people with suspected ovarian cancer to a specialist multidisciplinary team (MDT):

RMI I – threshold of 250
ROMA – using Abbott ARCHITECT assays
ROMA – using Roche Elecsys assays
Overa (MIA2G) – threshold of 5 units
IOTA Simple Rules – inconclusive results assumed to be malignant
IOTA ADNEX model – threshold of 10%
RMI I – threshold of 200.

4.29 The model did not include assessment of the ROMA using Fujirebio Diagnostics' Lumipulse G HE4 assay because no studies were identified that provided data on the accuracy of the ROMA using this assay. In the base-case analysis the starting cohort was assumed to be 40 years old, consistent with the modelling produced for the NICE guideline on ovarian cancer. All costs and effects included in the model were discounted by 3.5%.

Model structure

4.30 The EAG developed a decision tree and Markov model for the assessment. The decision tree was used to model short-term outcomes (up to 30 days after surgery) and the Markov model for longer-term outcomes over a lifetime horizon. In the decision tree, the alternative tests and risk scores were assessed by their ability to help decision-making about referral to a specialist MDT. After the referral, people in the decision tree were classified as being in 1 of the following states: early ovarian cancer, advanced ovarian cancer, benign mass, colorectal cancer or death (to account for mortality 30 days after surgery).

4.31 Longer-term costs and QALYs (over a lifetime horizon) were estimated using a Markov cohort model. This model included separate states for people with ovarian cancer who were treated in a specialist MDT and those who were not, to allow a beneficial effect for treatment in a specialist MDT to be applied. This treatment effect was a hazard ratio of 0.90 (95% CI 0.82 to 0.99) applied to overall survival of people with ovarian cancer (for people with both early and advanced stage ovarian cancer) and to progression-free survival for people with early stage ovarian cancer. This effect size was taken from a Cochrane review (Woo et al. 2012) which reported this hazard ratio for overall survival of people with ovarian cancer who had treatment in institutions with gynaecologic oncologists on site compared with community or general hospitals. The EAG assumed that this hazard ratio would also apply for progression-free survival, based on data in Woo et al. (2012).

Model inputs

4.32 The accuracy of the assessed tests and risk scores used in the model were taken from the clinical-effectiveness review and are shown in table 7. The EAG used diagnostic accuracy estimates derived from studies in which the target condition was 'all malignant tumours including borderline'; that is, studies that did not exclude participants from analysis on the basis of their final histological diagnosis. This was because the EAG considered that this population would produce estimates of test performance most representative of clinical practice. The prevalence of malignancies used in the model (21.3%; comprising ovarian malignancies, including borderline, and non-ovarian malignancies) was a summary estimate calculated from diagnostic cohort studies identified in the clinical-effectiveness review.

Table 7 Diagnostic accuracy estimates used in the model

	Sensitivity (standard error)	Specificity (standard error)	Source
RMI I – threshold of 250	64.4% (1.4%)	91.8% (0.7%)	Summary estimate from 1 unpublished study (IOTA 2017) and 6 studies (Davies et al. 1993; Jacobs et al. 1990; Lou et al. 2010; Morgante et al. 1999; Tingulstad et al. 1996; Ulusoy et al. 2007).
ROMA Abbott ARCHITECT	75.0% (6.6%)	87.9% (2.7%)	Summary estimate from Al Musalhi et al. (2016).
ROMA Roche Elecsys	79.1% (2.4%)	79.1% (1.4%)	Summary estimate from 2 studies (Janas et al. 2015; Shulman et al. 2016).
Overa (MIA2G) – threshold of 5 units	90.2% (2.5%)	65.8% (1.9%)	Summary estimate from 2 studies (Coleman et al. 2016; Zhang et al. 2015).
IOTA Simple Rules – inconclusive assumed to be malignant	94.2% (0.5%)	76.1% (0.6%)	Summary estimate from 1 unpublished study (IOTA 2017) and 8 studies (Adballa et al. 2013; Alcazar et al. 2013; Knafel et al. 2015; Meys et al. 2016; Sayasneh et al. 2013; Silvestre et al. 2015; Testa et al. 2014; Timmerman et al. 2010).
IOTA ADNEX model – threshold of 10%	96.3% (0.5%)	69.1% (0.9%)	Summary estimate from 1 unpublished study (IOTA 2017) and 3 studies (Meys et al. 2016; Sayasneh et al. 2016; Van Calster et al. 2014).
RMI I – threshold of 200	68.1% (0.9%)	90.1% (0.5%)	Summary estimate from 1 unpublished study (IOTA 2017) and 12 studies (Abdalla et al. 2013; Al Musalhi et al. 2016; Davies et al. 1993; Jacobs et al. 1990; Lou et al. 2010; Meys et al. 2016; Morgante et al. 1999; Sayasneh et al. 2013; Testa et al. 2014; Tingulstad et al. 1996; Ulusoy et al. 2007; Van Gorp et al. 2012).
Abbreviations: RMI I, risk of malignancy index 1; ROMA, risk of ovarian malignancy algorithm.

Costs

4.33 The costs associated with the use of the different risk scores used in the model are shown in table 8. Costs were taken from companies, published literature and routine sources of NHS costs. Further costs used in modelling were taken from modelling done for the NICE guideline on ovarian cancer, relevant NHS reference costs, Personal Social Services Research Unit publications and further identified literature. No costs related to the training needed for the use of Simple Rules and ADNEX model were included in base-case analysis. However, the effect of additional costs (to reflect potential training costs) for these tests was investigated in scenario analysis.

Table 8 Risk score costs used in modelling

Test	Ultrasound cost1 (£)	Test cost per kit (£)	Total HE4 test-related costs2 (£)	CA125 cost3 (£)	Total cost (£)
ADNEX	76.75	–	–	25.58	102.34
Overa (MIA2G)	76.75	99.00	–	–	175.80
RMI I	76.75	–	–	25.58	102.34
ROMA (Abbott ARCHITECT)	76.75	21.33	6.64	25.58	130.31
ROMA (Roche Elecsys)	76.75	15.95	7.81	25.58	126.09
Simple Rules	76.75	–	–	–	76.75
¹Calculated from the cost of transvaginal ultrasound scans used in economic modelling for the NICE guideline on ovarian cancer and inflated to 2015/16 values. ²Includes capital, quality control, maintenance, shipping, calibration and personnel costs, as set out in appendix 6 of the diagnostics assessment report. ³Cost of doing a CA125 assay calculated from the NICE guideline on ovarian cancer (adjusted for inflation). Abbreviations: RMI I, risk of malignancy index 1; ROMA, risk of ovarian malignancy algorithm.

Health-related quality of life and quality-adjusted life year decrements

4.34 Utility estimates used in modelling are shown in table 9.

Table 9 Utility scores used in modelling

		Utility value estimate	Source
Benign mass (assumed equal to general population)		Age dependent	Ara et al. (2010)
Early ovarian cancer	Treated by specialist MDT	0.83	Havrilesky et al. (2009)
Early ovarian cancer	Not treated by specialist MDT treated	Equal to treated by specialist MDT	Assumption
Advanced ovarian cancer	Treated by specialist MDT	0.63	Grann et al. (1998)
Advanced ovarian cancer	Not treated by specialist MDT treated	Equal to treated by specialist MDT	Assumption
Colorectal cancer	Dukes' A	0.74	Ness et al. (1999)
	Dukes' B	0.67
	Dukes' C	0.50
	Dukes' D	0.25
Abbreviations: MDT, multidisciplinary team.

Base-case results

4.35 The following assumptions were applied in the base-case analysis:

All non-ovarian malignancies were assumed to be colorectal cancer.
People with a false negative diagnosis were more likely to have early-, rather than advanced-, stage ovarian cancer.
Inconclusive results from Simple Rules were assumed to be malignant.
All people with a false positive and false negative diagnosis were operated on for a benign mass.
No disutility was applied for people who were incorrectly told that they have ovarian cancer (false positives).

4.36 In the base-case model analysis, the EAG did a pairwise analysis comparing the costs and QALYs resulting from using the included tests and risk scores with RMI I (threshold of 250), and also a fully incremental analysis (table 10). Use of Simple Rules (inconclusive assumed to be malignant) was the cheapest and second most effective, and dominated RMI I (at a threshold of 200 and 250). Use of the ADNEX model was most effective (that is, produced the most QALYs) and when compared with Simple Rules produced an incremental cost-effectiveness ratio (ICER) of £15,304 per QALY gained. Use of the ROMA and Overa (MIA2G) were dominated.

Table 10 Base-case analysis results

	Compared with RMI I (threshold of 250)			Full incremental analysis
	Difference in costs	Difference in QALYs	Difference in costs / difference in QALYS	Full incremental analysis
Simple Rules – inconclusive assumed to be malignant	−£2	0.021	Dominant	Cheapest
RMI I – threshold of 250	£0	0	N/A	Dominated
RMI I – threshold of 200	£4	0.002	£2,483	Dominated
ADNEX – threshold of 10%	£30	0.023	£1,274	£15,304
ROMA – Abbott ARCHITECT	£38	0.005	£7,506	Dominated
ROMA – Roche Elecsys	£44	0.007	£6,409	Dominated
Overa (MIA2G) – threshold of 5 units	£105	0.017	£6,038	Dominated
Abbreviations: QALY, quality-adjusted life year; RMI I, risk of malignancy index 1; ROMA, risk of ovarian malignancy algorithm.

4.37 At a maximum acceptable ICER of £20,000 per QALY gained, the ADNEX model and Simple Rules had a probability of being cost effective of 60% and 39% respectively. At a maximum acceptable ICER of £30,000 per QALY gained, these probabilities were 75% (ADNEX) and 23% (Simple Rules). The probability of RMI I (threshold of 250) being cost effective at both thresholds was about 1%, and the probabilities of the other tests and risk scores was less than 1%.

Sensitivity analysis

4.38 Use of the ADNEX model remained cost effective at £20,000 and £30,000 per QALY gained in one-way deterministic sensitivity analysis when most parameters were altered. Simple Rules became cost effective in some analyses, typically when the costs of using the ADNEX model were increased (or Simple Rules costs were decreased) or the diagnostic accuracy of the Simple Rules was improved relative to ADNEX. Also, when the upper bound value for the overall-survival hazard ratio for people with an ovarian malignancy treated in a specialist MDT (rather than secondary care) was used, (that is, the beneficial effect of surgery done by a specialist MDT was at its lowest level in the model), Simple Rules became cost effective at both £20,000 and £30,000 per QALY gained.

Alternative scenario analyses

4.39 The EAG did several scenario analyses to test assumptions made about parameter values used in the base-case model analysis. Use of the ADNEX model remained cost effective in most scenario analysis. However, in some scenarios Simple Rules (inconclusive results assumed to be malignant) was cost effective. These included when a disutility (the value of which was arbitrary) was applied for people with a false positive diagnosis for 1 year and when the benefit of treatment in specialist care was reduced.

4.40 In a scenario analysis in which a higher cost of surgery done by a specialist MDT was used, RMI I (threshold of 250) was cost effective at a maximum acceptable ICER of £20,000 per QALY gained and Simple Rules was cost effective at a maximum acceptable ICER of £30,000 per QALY gained. In this scenario, an additional cost of £2,500 was added to the average cost of surgery done by a specialist MDT, to reflect expert opinion that some patients referred to a specialist MDT will have extensive surgery for ovarian cancer.

Subgroup analyses

4.41 Results from subgroup analyses were similar to the base-case analyses when the starting age of the cohort was 50 years and also when only early stage cancer was considered. However, when the analysis was run for advanced stage cancer, Simple Rules (rather than ADNEX) was cost effective at maximum acceptable ICERs of £20,000 and £30,000 per QALY gained. No changes to sensitivity or specificity values for tests were made in these subgroup analyses (because of a lack of data on test performance in these populations).

4.42 The EAG also did subgroup analyses for populations who were pre- and postmenopausal. Sensitivity and specificity estimates for tests or risk scores in these subgroups were taken from the clinical-effectiveness review; but relatively few studies were available to inform these estimates. A different starting age of the cohort and prevalence of malignancy (compared with the base-case analysis) was also used for these subgroups. The ADNEX model was cost effective at thresholds of £20,000 and £30,000 per QALY gained for both these subgroup analyses.