9 Evidence synthesis

NICE process and methods

9.1 Estimates of the relative effectiveness of service delivery interventions
9.2 Evidence on uptake and compliance outcomes
9.3 Estimates of relative effectiveness for questions about access and availability
9.4 Formal consensus techniques

9 Evidence synthesis

9.1 Estimates of the relative effectiveness of service delivery interventions

It is helpful to distinguish between 2 general types of service delivery questions. One type concerns different pathways of care, different service configurations, interventions to be managed by different types of staff, whether a 'care team' approach is needed, and so on. These are questions for which trial evidence could in principle be found. For these kinds of questions, standard approaches to evidence identification and synthesis (for example, those described in 'The guidelines manual' and by the NICE Decision Support Unit) could, in principle, be used. However, for service guidance it is unlikely that 1 type of study or piece of evidence will be sufficient to inform recommendations. Therefore non-standard approaches to evidence synthesis will also need to be considered to enable the Committee to develop recommendations. Two specific problems that will often need to be addressed are:

uncertainty about the quality and relevance of existing evidence on clinical outcomes
the need to consider evidence on intermediate or surrogate outcomes, such as uptake of services or compliance, rather than (or in addition to) evidence on clinical outcomes.

A second type of service delivery issue relates to questions about the feasibility of providing access to services and procedures, or making them available within a certain timeframe, rather than whether the services or procedures are effective. In these questions, estimates of the effect of providing the service, compared with not providing it, are needed for decision making, whether based on cost-effectiveness analysis or on other criteria.

It should be emphasised that some service delivery guidance may present a combination of both access and availability issues as well as standard effectiveness issues.

Guidance on how to approach both kinds of problem, as well as on using consensus techniques when estimates based on published data cannot be obtained, is given in the following sections.

Finding studies that provide unbiased estimates of the effectiveness of service interventions is often difficult, for the following reasons:

Service delivery interventions are inherently 'variable'. Even with a standard protocol, the precise way in which they are implemented at different sites or by different people is necessarily situation- and/or individual-dependent. This could be manifested by centre effects in multi-centre trials.
The relative benefit of a new intervention over 'standard' or pre-existing care is likely to depend on the 'intensity' of the current care. For example, the beneficial effect of a new patient reminder system on the uptake of screening for breast cancer depends on what the current arrangements are, and on current uptake. For example, the effect of introducing a reminder system in the USA, where there is no systematic screening programme, will be quite different from the effect of adding the reminder system to existing infrastructure in the UK. In other words, results from studies carried out within other healthcare systems might not be easily generalised to the UK.

In these circumstances a standard systematic review is likely to identify a range of studies on interventions that are similar to the interventions being considered, but not necessarily the same. In this case, the Committee will need to consider carefully fidelity and applicability issues, and ensure these are accounted for in the 'Linking evidence to recommendations' section of the guidance.

In most cases, the expert opinion of the Committee will be used to explore and estimate any impacts on the confidence in the results of such evidence, but quantitative methods can be used. If quantitative methods are to be used, the NICE Clinical Guidelines Technical Support Unit should be contacted for advice on using such quantitative methods and on which types of evidence could be searched for.

9.2 Evidence on uptake and compliance outcomes

In some service delivery evaluations, measures of service uptake, patient satisfaction or compliance of health service staff are recorded, rather than data on clinical outcomes for patients. This is typically the case, for example, when the intervention is directed at changing staff behaviour or patient referral routes.

Such evidence can be used when analysing the effectiveness or cost effectiveness of a service delivery intervention, but only if there is also an estimate available – from whatever source – of the underlying clinical effect of the procedure or treatment. It is then possible to combine estimates of the efficacy or effectiveness of the clinical intervention with estimates of the effectiveness of the service delivery intervention in ensuring that the clinical intervention is implemented. It is possible to combine evidence from trials reporting process outcomes alone, trials reporting clinical outcomes alone, and trials reporting both.

The NICE Clinical Guidelines Technical Support Unit can be consulted for advice on how the 2 kinds of evidence can be combined within a single modelling framework.

9.3 Estimates of relative effectiveness for questions about access and availability

For questions about access and availability, there is a particular difficulty in deriving an estimate of relative effectiveness, over and above those described in the previous section. This would be the case, for example, where a procedure such as endoscopy for upper gastrointestinal bleeding is indicated. The question is not about whether endoscopy should be done, but whether or not the procedure can be safely delayed (for example, at night or at weekends) in patients whose symptoms suggest they are at lower risk.

Studies based on individual patient 'audit' data that relate outcomes to treatment parameters while controlling for patient characteristics are difficult to interpret. This is because patients in whom the treatment was withheld or delayed are always likely to be those who were considered to be at lower risk.

It is likely that better estimates of the effectiveness of such interventions can be derived from nationally collected data in which between-unit variation in outcomes, or variation between different time periods, can be related to the local policies and practices (for example staffing levels) in operation at the time. For example, mortality rates within 1 or 2 days of hospital admission could be compared between weekends and weekdays, and hospitals where weekend cover was the same as weekday cover could also be compared with those where it is not. There are a number of examples where comparisons of this type have been published, for example by Dr Foster. Although these surveys avoid the problems of individual audit data, they are still observational and the use of aggregated data introduces further potential biases. The design of the data collection, and the analysis and interpretation of the data obtained, requires major input from clinical epidemiologists, expert clinicians, methodologists, operational research experts and people with relevant operational experience in the NHS.

A service delivery issue that is quite often examined in this way is the relationship between performance indicators and 'volume' (that is, number of cases seen per year). Such data are also used to establish 'institutional rankings'. Data of this type tend to show considerable overdispersion: in other words, there is far more variation between units than would be expected by chance. To determine whether individual units are performing at a level that requires some intervention, control charts can be used. There are also methods and processes for interpreting the relationships between performance and volume and the need to take into account general between-unit variation when trying to infer causal effects.

9.4 Formal consensus techniques

Formal consensus techniques are increasingly being used in developing clinical guidelines because of their explicit structure, process and output. A number of well-established formal consensus methods have been used in the health field; the 3 main approaches are the Delphi method, the nominal group technique and the consensus development conference. The Health Technology Assessment report 'Consensus development methods, and their use in clinical guideline development' (Murphy et al. 1998^[3]) provides a useful summary of the strengths and limitations of each technique.

Since the concepts of appropriate and necessary care are fundamental to an efficient and equitable healthcare delivery system, the RAND Appropriateness Method (RAM) is often described as the preferred approach for developing service guidance. One of the advantages of RAM is that the process of developing consensus statements can be presented as a service pathway. In addition, the interactions and discussions during development can be structured to fit the current 'Evidence to recommendations' framework that is used in NICE clinical guidelines.

Developers should consult NICE if formal consensus methods are to be used. If formal consensus is used, the methods used should be clearly described in the guidance document.

^[3] Murphy MK, Black NA, Lamping DL et al. (1998) Consensus development methods, and their use in clinical guideline development. Health Technology Assessment 2 (3).