4 Evidence

The diagnostics advisory committee (section 6) considered evidence on the biomarker tests (PartoSure, Actim Partus and Rapid fetal fibronectin [fFN] 10Q Cassette Kit) for diagnosing preterm labour from several sources. Full details of all the evidence are in the committee papers.

Clinical effectiveness

4.1 The external assessment group (EAG) did 2 systematic reviews of the clinical-effectiveness evidence for Actim Partus, PartoSure and quantitative fetal fibronectin using the Rapid fFN 10Q Cassette Kit; 1 for diagnostic accuracy and 1 for clinical outcomes.

4.2 The EAG also did a non-systematic update of their report to include 7 new studies submitted for PartoSure with stakeholder comments on the diagnostics assessment report.

4.3 For the diagnostic accuracy review, studies were included if:

  • they recruited women with signs and symptoms of preterm labour who were not in established labour and who had intact amniotic membranes

  • the population was described as preterm

  • twin or multiple pregnancies made up 20% or less of the total population recruited

  • at least 1 index test was reported and at least 1 of the following reference standards or comparators was included:

    • preterm delivery within 48 hours

    • preterm delivery within 7 days

    • clinical assessment of symptoms

    • fetal fibronectin at a threshold of 50 nanograms/millilitre (ng/ml)

  • they were prospective or retrospective diagnostic accuracy studies with random or consecutively recruited women; both single- and two‑gate[1] designs were eligible.

4.4 All studies included in the diagnostic accuracy review were appraised using the QUADAS‑2 tool. In total, 20 studies met the inclusion criteria for the diagnostic accuracy review.

4.5 The EAG also searched for studies in which clinical outcomes were reported, but did not identify any studies. The inclusion criteria were restricted to controlled studies only because the EAG considered that uncontrolled study designs are likely to be susceptible to bias.

Study characteristics

4.6 Of the 20 diagnostic accuracy studies, data for more than 1 index test were reported in 2 studies (Hadzi‑Lega et al. 2017, APOSTEL‑1 2016), 16 studies assessed Actim Partus, 4 assessed PartoSure and 2 assessed fetal fibronectin at thresholds other than 50 ng/ml (APOSTEL‑1, EUIFS 2016). All 20 studies assessed the index tests against a reference standard of preterm delivery within 7 days, and 7 studies also assessed the index tests against a reference standard of preterm delivery within 48 hours.

4.7 The characteristics of women in the study varied and this introduced heterogeneity:

  • Mean maternal age was 25 to 31 years.

  • The proportion of multiple pregnancies was 0 to 20%.

  • The mean number of previous term pregnancies was 0.4 to 2.9 per person.

  • The proportion of previous preterm deliveries was 0 to 30%.

  • The proportion of previous miscarriages was 4 to 27%.

  • The prevalence of preterm birth within 7 days was 1.7 to 73.3%.

  • The prevalence of preterm birth within 48 hours was 2.4 to 58.3%.

  • The women were 20 weeks to 37 weeks pregnant.

4.8 The reporting of whether delivery was spontaneous or as a result of medical intervention varied between studies. Only 11 studies provided details on delivery; in 4, the authors stated that they excluded women from test accuracy calculations if birth occurred because of medical intervention before the 7‑day or 48‑hour reference standard.

Diagnostic accuracy

Delivery within 7 days: Actim Partus

4.9 In the 16 studies that included data for Actim Partus, the prevalence of birth within 7 days of testing was 1.7 to 73.3%. Across the studies, sensitivity estimates were 33.3 to 94.7%. The 3 studies (Cooper 2012, Danti 2011, Riboni 2011) with the lowest sensitivity estimates also had a lower prevalence of preterm birth (1.7 to 6.7%) than the other studies (9.8 to 73.3%). Specificity was 50.0 to 93.5%. The EAG did not identify any major differences in methods or participant characteristics in the 3 studies with the lowest specificity estimates. The pooled analysis of these 16 studies estimated a sensitivity of 77% (95% confidence interval [CI] 68 to 83%) and a specificity of 81% (95% CI 76 to 85%).

4.10 There were 6 studies that tested each sample with both Actim Partus and fetal fibronectin at a threshold of 50 ng/ml. Using delivery within 7 days as the reference standard, sensitivity for Actim Partus was lower than for fetal fibronectin in 1 study (APOSTEL‑1 2016), higher in 2 studies (Ting 2017, Tripathi 2016) and the same for both tests in the remaining 3 studies (Cooper 2012, Eroglu 2007, Riboni 2011). Specificity was higher for Actim Partus than for fetal fibronectin in 4 of the 6 studies, and lower in the 2 remaining studies (Cooper, Tripathi). Cooper only reported test accuracy results for a proportion of the total Actim Partus group (58 fewer women had results for the qualitative fFN test).

4.11 In response to stakeholder comments on the diagnostics assessment report, the EAG updated the diagnostic accuracy estimates for Actim Partus to include 18 studies identified by a company. The updated pooled sensitivity decreased compared with the original review; from 77% (95% CI 68 to 83%) to 74.3% (95% CI 64.2 to 82.3%) and specificity increased slightly from 81% (95% CI 76 to 85%) to 81.2% (95% CI 76.2 to 85.4%).

Delivery within 7 days: PartoSure

4.12 In the 4 studies that included diagnostic accuracy for PartoSure, the prevalence of birth within 7 days of testing was 2.4 to 17.2%. Specificity was similar across studies, 90.2 to 97.5%, but sensitivity was 0 to 100%. Werlen et al. (2015) reported 0% sensitivity because only 1 of 41 women tested positive and this was a false positive. The pooled analysis of the 4 studies estimated a sensitivity of 83% (95% CI 61 to 94%) and a specificity of 95% (95% CI 89 to 98%).

4.13 Nikolova et al. (2015) assessed fetal fibronectin at a threshold of 50 ng/ml and PartoSure in the same samples (66 of the total 203 women). Against the 7‑day reference standard, sensitivity for PartoSure was 80% (95% CI 63.1 to 91.6%) and for fetal fibronectin it was 50% (95% CI 21.1 to 79.0%). Specificity for PartoSure was 94.6% (95% CI 90.1 to 97.5%) and for fetal fibronectin it was 72.2% (95% CI 58.4 to 83.5%).

4.14 In response to stakeholder comments on the diagnostics assessment report, the EAG updated the diagnostic accuracy estimates for PartoSure to include 9 studies identified by a company. The updated pooled sensitivity decreased compared with the original review; from 83% (95% CI 61 to 94%) to 68.5% (95% CI 51.2 to 81.9%) and pooled specificity slightly increased from 95% (95% CI 89 to 98%) to 96.6% (95% CI 95.1 to 97.6%).

Delivery within 7 days: quantitative fetal fibronectin

4.15 There were 2 studies (APOSTEL‑1 2016, EUIFS 2016) that included diagnostic accuracy for quantitative fetal fibronectin. The prevalence of preterm birth within 7 days was 10.5% (EUIFS) to 19.7% (APOSTEL‑1). In both studies sensitivity decreased as the threshold increased and specificity increased as the threshold increased.

4.16 The EAG reviewed 1 unpublished study (Ravi et al.) that included evidence for quantitative fetal fibronectin, submitted with stakeholder comments on the diagnostics assessment report. This presented sensitivity values lower than APOSTEL‑1 and EUIFS, but higher specificity values. The details of the study cannot be reported here because they are confidential.

Delivery within 48 hours: Actim Partus

4.17 There were 6 studies that assessed the diagnostic accuracy of Actim Partus. The prevalence of delivery within 48 hours of testing was 5.3 to 58.3%. Sensitivity was 65.7 to 100.0% and specificity was 56.0 to 82.4%. The pooled analysis of the 6 studies estimated a sensitivity of 87% (95% CI 74 to 94%) and a specificity of 73% (95% CI 62 to 82%).

Delivery within 48 hours: PartoSure

4.18 Only 1 study (Werlen et al. 2015) assessed the diagnostic accuracy of PartoSure against the 48‑hour reference standard. The prevalence of preterm birth was 2.4%. Sensitivity was 0% (95% CI 0 to 97.5%) and specificity was 97.5% (95% CI 86.8 to 99.9%). Sensitivity was 0% because only 1 of 41 women tested positive and this was a false positive.

Accuracy of the comparator (fetal fibronectin with a threshold 50 ng/ml)

4.19 The EAG identified studies in the diagnostic accuracy systematic review that included data for fetal fibronectin at a threshold of 50 ng/ml. The generalisability of these data was assessed by comparing them with results from 3 recently published systematic reviews of fetal fibronectin.

4.20 Of the 20 studies included in the diagnostic accuracy review, 8 included accuracy data for quantitative fetal fibronectin at 50 ng/ml. Of these, 2 studies (APOSTEL‑1 2016, EUIFS 2016) used a quantitative fetal fibronectin test, 3 used the QuikCheck qualitative fetal fibronectin test (Eroglu 2007, Nikolova 2015, Tripathi 2016), 1 used an ELISA-based laboratory test (Riboni 2011), and in the remaining 2 studies (Cooper 2012, Ting 2007), it was not clear which fetal fibronectin test was used.

4.21 For the 8 studies looking at the diagnostic accuracy of fetal fibronectin at a threshold of 50 ng/ml against a 7‑day reference standard, sensitivity was 23.8 to 91.3% and specificity was 62.2 to 99.1%. These results were similar to those from the 3 existing literature reviews.

Diagnostic accuracy data informing the economic model

4.22 The EAG concluded that there was too much heterogeneity in the pooled results to use them for indirect comparisons between tests in the economic modelling. It decided to prioritise studies that reported results for more than 1 test in the same population. Therefore, 2 studies (APOSTEL‑1 2016, Hadzi‑Lega et al. 2017) were used in the base case for the economic model.

Cost effectiveness

Review of economic evidence

4.23 The EAG did a systematic search to identify studies that investigated the cost effectiveness of Actim Partus, PartoSure and quantitative fetal fibronectin. One study (Gibson et al. 2013) assessed the effect of a fetal fibronectin test (at thresholds of 10, 50, 200 and 500 ng/ml) on the use of antenatal corticosteroids. There were a further 3 observational cost–minimisation studies that reported costs and resource use data, but these were published over 10 years ago and it was not certain whether the protocols used in the studies reflected current clinical practice.

4.24 The EAG also identified 6 economic models. A cost–minimisation modelling approach was used in 2 studies. Chuck and Nguyen (2015) looked at the cost of adopting fetal fibronectin in Alberta, Canada and estimated that introducing the test between 2008 and 2013 increased costs by US $4 million. Conversely, the Deshpande et al. (2013) study was done in the UK and found that the Rapid fetal fibronectin test saved the NHS £23.88 per patient compared with clinical examination alone.

4.25 Cost-effectiveness modelling was used in 3 studies (Boyd et al. 2011, Mozurkewich et al. 2000 and van Baaren et al. 2017) and in NICE's guideline on preterm labour and birth. The NICE guideline model was hypothetical and assessed what the specificity and sensitivity of the tests (cervical length measurement by ultrasound, Actim Partus and fetal fibronectin) would need to be for them to be considered cost effective compared with a no-test, treat-all strategy. It accounted for the effect of test accuracy on cost effectiveness at different gestational ages and found that testing was not cost effective below 30 weeks of pregnancy. The main assumptions in the NICE guideline model were:

  • the choice of diagnostic strategy had no significant effect on the mother's health outcomes

  • clinicians did not deviate from the diagnostic protocol

  • neonatal morbidity outcomes were based on respiratory distress syndrome and intraventricular haemorrhage

  • the lifetime quality of life and costs are the same for both full-term and preterm babies.

Modelling approach

4.26 The EAG developed a de novo economic model to evaluate the cost effectiveness of quantitative fetal fibronectin using the Rapid fFN 10Q Cassette Kit at thresholds other than 50 ng/ml, Actim Partus and PartoSure compared with fetal fibronectin at 50 ng/ml. The model was based on the NICE guideline model, but several parameters used to populate the model were updated. The base case took the perspective of the NHS and personal social services and had a lifetime time horizon (100 years). A discount rate of 3.5% was applied to both costs and effects.

4.27 The population was women with intact membranes presenting with symptoms of threatened preterm labour between 24 and 36 weeks of pregnancy. It was assumed that, before entering the model, a clinical assessment had been done that could not rule out preterm labour.

Model structure

4.28 A decision tree structure that included a diagnostic phase followed by treatment and long-term outcomes was used. The model started with an assessment of preterm labour, and then modelled the decision of whether to admit to hospital or discharge home, and whether to offer corticosteroids. It evaluated:

  • the interventions (Actim Partus, PartoSure and quantitative fetal fibronectin at thresholds of 10, 200 and 500 ng/ml)

  • a no-test, treat-all strategy, which assumes that all women entering the model are admitted to hospital

  • the comparator (fetal fibronectin at a threshold of 50 ng/ml).

    Longer-term costs and quality-adjusted life years (QALYs) were then calculated for each branch of the decision tree.

Model inputs

4.29 The model was populated with data from the diagnostic accuracy review, published literature and expert opinion. Estimates of diagnostic accuracy for fetal fibronectin and Actim Partus were taken from APOSTEL‑1, which included a direct comparison of the 2 tests. None of the studies directly compared PartoSure with fetal fibronectin, so diagnostic accuracy was estimated using data from APOSTEL‑1 and Hadzi‑Lega et al. Scenario analyses were done using data from alternative sources: Cooper et al. 2012, Abbott et al. 2013 and an EAG meta-analysis. The diagnostic accuracy estimates are in table 1.

Table 1 Diagnostic accuracy values used in the economic model

Study

Diagnostic test (threshold)

Sensitivity

Specificity

Bruijn et al. 2016 (APOSTEL‑1)

n=350 (base case)

fFN (10 ng/ml)

0.957

0.423

fFN (50 ng/ml)

0.913

0.648

fFN (200 ng/ml)

0.710

0.836

fFN (500 ng/ml)

0.420

0.957

Actim Partus

0.783

0.893

Hadzi‑Lega et al. 2017

n=57 (base case)

PartoSure

0.833

0.902

Actim Partus

0.833

0.765

Cooper et al. 2012

n=349 (scenario)

Actim Partus

0.333

0.741

fFN (50 ng/ml)

0.333

0.898

Abbott et al. 2013

n=299 (scenario)

fFN (10 ng/ml)

0.778

0.576

fFN (50 ng/ml)

0.778

0.790

fFN (200 ng/ml)

0.778

0.931

fFN (500 ng/ml)

0.556

0.972

EAG meta-analysis

n=963 (scenario)

Actim Partus

0.832

0.879

fFN (50 ng/ml)

0.683

0.872

Abbreviations: fFN, fetal fibronectin; ng/ml, nanograms per millilitre; EAG, external assessment group.

Costs

4.30 The following costs, from companies, published literature and routine sources of NHS costs, were used in the model:

  • fetal fibronectin test: £66 (includes 15 minutes of midwife time)

  • Actim Partus test: £35 (includes 10 minutes of midwife time)

  • PartoSure test: £52 (includes 10 minutes of midwife time)

  • maternal steroid injection: £5

  • tocolytics (atosiban plus infusion equipment): £362

  • inpatient hospital stay: £1,325

  • in utero transfer: £965

  • long-term healthcare costs of intraventricular haemorrhage: £114,648

  • neonatal hospital costs for respiratory distress syndrome: £5,587

  • neonatal hospital costs: baby dies before discharge: £22,834.

Health-related quality of life and QALY decrements

4.31 Health-related quality-of-life estimates for babies were included in the base case, and a scenario analysis also included maternal health-related quality-of-life estimates. A utility for severe persistent asthma was applied to 56% of children with respiratory distress syndrome based on clinical expert opinion. Utilities used in the model are shown in table 2.

Table 2 Utilities used in the economic model

Variable

Patient

Source

Utility

'Severe' RDS (severe persistent asthma used as proxy)

Child

Carroll and Downs 2009

0.85

IVH grades 3 to 4 (moderate cerebral palsy used as proxy)

Child

Carroll and Downs via Bastek et al. 2012

0.76

Death

Child

Vandenbussche et al. 1999

0

Preterm survivor

Child

Cooke 2004

0.879

Mother with previous adverse child outcome

Mother

Couto et al. 2009

0.644

Mother with no adverse child outcome

Mother

Couto et al. 2009

0.834

Abbreviations: IVH, intraventricular haemorrhage; RDS, respiratory distress syndrome.

Base-case results

4.32 The following assumptions were applied in the base-case analysis:

  • The population entered the model after a clinical examination which did not rule out preterm labour.

  • QALY outcomes were the same for women with false positive results (who did not deliver before the reference standard), true negative results and false negative results.

  • All treatment decisions were driven by the test result, and clinical judgement did not override this.

  • Diagnostic accuracy was equivalent across all gestational ages.

  • The prevalence of preterm birth within 7 days of testing was 3%, and the preterm birth rate was 12.1%.

  • Antenatal corticosteroids were only effective within 7 days of delivery; babies born more than 7 days after treatment did not benefit.

  • In utero transfers were only available for women presenting to a level 1 or 2 hospital at less than 28 weeks of pregnancy.

  • Tocolysis was used for all in utero transfers at less than 28 weeks of pregnancy.

  • Only intraventricular haemorrhage resulted in longer-term costs.

  • Babies who survived beyond 1 year had a long-term quality of life equivalent to the average for preterm babies who survived.

4.33 The base-case results were given for groups of women presenting at 33, 30 and 26 weeks of pregnancy. Results were also stratified according to the level of neonatal care available at the place of birth.

4.34 Base-case results for women presenting at 30 weeks of pregnancy at a level 2 hospital are shown in table 3. Most tests were cheaper and less effective than fetal fibronectin 50 ng/ml, apart from a treat-all strategy and fetal fibronectin at 10 ng/ml, which resulted in very small QALY gains and additional cost. Many of the tests were cheaper and less effective than the comparator; this means that the results are in the south-west quadrant of the cost-effectiveness plane.

Table 3 Base-case results for women presenting at 30 weeks of pregnancy at a level 2 hospital

Test

Total costs (£)

Total QALYs

ICER (£ per QALY)

(versus fFN 50 ng/ml)

PartoSure

4,895

22.010

81,925*

fFN 500 ng/ml

5,004

21.992

17,013*

Actim Partus

5,055

22.010

56,033*

fFN 200 ng/ml

5,159

22.006

25,213*

fFN 50 ng/ml

5,401

22.016

fFN 10 ng/ml

5,690

22.018

140,270

Treat all

6,171

22.020

186,757

* ICER represents cost saved per QALY lost, it is in the south-west quadrant of the cost-effectiveness plane.

Abbreviations: fFN, fetal fibronectin; ICER, incremental cost-effectiveness ratio; QALY, quality-adjusted life year; ng/ml, nanograms per millilitre.

4.35 The incremental cost-effectiveness ratios (ICERs) for the tests for women in a level 2 hospital at 26 weeks of pregnancy reduced compared with the base-case results. This reduction applied to tests that were cheaper and less effective than the comparator (which became less cost effective) and to tests that cost more and were more effective than the comparator (which became more cost effective).

4.36 The ICERs for the tests for women in a level 2 hospital at 33 weeks of pregnancy increased compared with the base-case results. This increase applied to tests that were cheaper and less effective than the comparator (which became more cost effective) and to tests that cost more and were more effective than the comparator (which became less cost effective).

4.37 The EAG updated the results of the analyses using data from 2 additional studies (Nikolova et al. and Wing et al. 2017) that were highlighted in stakeholder comments on the diagnostics assessment report. For PartoSure compared with fetal fibronectin at 50 ng/ml, the addition of accuracy data from these studies resulted in a greater QALY loss than was seen in the EAG's base case. However, using the accuracy estimates from Nikolova et al. resulted in greater cost savings and using the accuracy estimates from Wing et al. resulted in lower cost savings.

Alternative scenario analyses

4.38 The effect of changing assumptions about the accuracy of the tests was explored in 2 scenario analyses. Alternative diagnostic accuracy data from Cooper et al. (2016) were used to calculate ICERs for Actim Partus compared with fetal fibronectin at a threshold of 50 ng/ml. In this analysis Actim Partus was dominated by fetal fibronectin (that is, fetal fibronectin was more effective and less expensive). Also, alternative diagnostic accuracy data were obtained for fetal fibronectin at thresholds of 10, 200 and 500 ng/ml compared with fetal fibronectin at a threshold of 50 ng/ml from an unpublished study by Abbott et al. The results of this analysis are academic in confidence.

4.39 The scenario analysis with the greatest effect on the ICERs was limiting the time horizon of the analysis to the first year after birth. The ICERs became less favourable for all interventions and increased by more than 20 times the base-case value.

4.40 Assuming that antenatal steroids have partial benefits if given more than 7 days before birth also had a considerable effect on all the ICERs. This produced more favourable ICERs for PartoSure, the treat-all strategy and fetal fibronectin at a threshold of 10 ng/ml. However, it produced less favourable ICERs for Actim Partus and fetal fibronectin at thresholds of 200 ng/ml and 500 ng/ml.

Sensitivity analyses

4.41 Deterministic sensitivity analyses were done by varying the base-case parameters by 20%. These analyses found that the ICERs were most sensitive to changes in health-related quality of life of preterm babies who survived. Other parameters affecting the ICERs were cost of hospital admission, prevalence of preterm birth within 7 days, effectiveness of steroid treatment and baseline mortality risk.

4.42 The EAG also ran probabilistic sensitivity analyses and presented the results as cost-effectiveness acceptability curves. Probabilistic ICERs were not presented.



[1] A single-gate study recruits patients whose disease status is unknown before testing (a consecutive series), whereas a two-gate study recruits patients with the target condition and patients who do not have the target condition (a case-control study).

  • National Institute for Health and Care Excellence (NICE)