Appendix A: Service delivery – developing review questions, evidence reviews and synthesis | Tools and resources | Developing NICE guidelines: the manual | Guidance

Download (PDF)

Developing NICE guidelines: the manual – appendices A to O

Appendix A: Service delivery – developing review questions, evidence reviews and synthesis

The scope should identify key areas that the guidance will cover. There are various types of review question that may be considered for service guidance; for example, these may cover:

the content, configuration or integration of services, including the allocation of:
- medical equipment or tools
- staff, such as:
  - skills, mix and experience of staff
  - training requirements of staff
  - staffing levels (numbers and staff mix)
access to services for patients, including:
- the availability of services
- the uptake of services
timing and delivery of services, including:
- diagnosis
- treatment
- transfer and referral
- waiting times
location of services, in terms of:
- setting for delivery
- economies of scales
- geographic variation
feasibility, with regard to:
- resource constraints (including capacity, queues and waiting lists)
- policy constraints.

The questions will compare possible service configurations, which may be existing variations to current services (national and international variations) or a proposed service configuration, with a current service configuration with respect to effectiveness and cost effectiveness.

Key outcomes of service delivery questions are likely to include measures of:

service effectiveness:
- health outcomes, including health-related quality of life
- process outcomes (both directly and indirectly linked to outcomes)
- compliance rates of staff
- system failures
service experience:
- patient experience
- family or carer experience
- staff experience
service resource use:
- staff
- equipment
- time
- costs
service efficiency/optimisation:
- cost effectiveness (cost–utility analysis)
- cost consequence
- cost saving
- cost minimisations
service equity (including health and geographical inequalities).

A key difference for service guidance compared with other guidelines is that, to adequately address the question, it is necessary to explore the underlying health and/or service concern first, and then assess the effectiveness of the various health service interventions in addressing this underlying issue. This requires an iterative approach to developing the review questions. The first step is to develop questions to explore the underlying problem, followed by developing questions around potential solutions and service models.

These types of review questions will often require the consideration of supplementary methodological approaches for identifying, assessing, synthesising and interpreting the evidence.

Evidence reviews will be iterative, with new searches and/or analysis being planned depending on the outcome of the initial reviews. For example, a search for studies exploring the effectiveness of a particular intervention may not produce any results. The next step would be to consider whether to search for evidence for a similar condition or another healthcare system. Alternatively, primary data may need to be identified or requested to inform recommendations. The guideline committee and NICE staff with responsibility for quality assurance should be consulted on the suitability of different types of evidence for developing recommendations.

Estimates of the relative effectiveness of service delivery interventions

It is helpful to distinguish between two general types of service delivery questions. One type concerns different pathways of care, different service configurations, interventions to be managed by different types of staff, whether a 'care team' approach is needed, and so on. These are questions for which trial evidence could in principle be found. For these kinds of questions, standard approaches to evidence identification and synthesis (for example, those described in this guideline manual and on the NICE Decision Support Unit website) could, in principle, be used. However, for service guidance it is unlikely that one type of study or piece of evidence will be sufficient to inform recommendations. Therefore non-standard approaches to evidence synthesis will also need to be considered to enable the guideline committee to develop recommendations. Two specific problems that will often need to be addressed are:

uncertainty about the quality and relevance of existing evidence on outcomes
the need to consider evidence on process, intermediate or surrogate outcomes, such as uptake of services or compliance, rather than (or in addition to) evidence on outcomes.

A second type of service delivery issue relates to questions about the feasibility of providing access to services and procedures, or making them available within a certain time frame, rather than whether the services or procedures are effective. In these questions, estimates of the effect of providing the service, compared with not providing it, are needed for decision-making, whether based on cost-effectiveness analysis or on other criteria.

It should be emphasised that some service delivery guidance may present a combination of both access and availability issues as well as standard effectiveness issues.

Guidance on how to approach both kinds of problem, as well as on using consensus techniques when estimates based on published data cannot be obtained, is given in the following sections.

Finding studies that provide unbiased estimates of the effectiveness of service interventions is often difficult, for the following reasons:

Service delivery interventions are inherently 'variable'. Even with a standard protocol, the precise way in which they are implemented at different sites or by different people is necessarily situation- and/or individual-dependent. This could be manifested by centre effects in multicentre trials.
The relative benefit of a new intervention over 'standard' or pre-existing care is likely to depend on the 'intensity' of the current care. For example, the beneficial effect of a new patient reminder system on the uptake of screening for breast cancer depends on what the current arrangements are, and on current uptake. For example, the effect of introducing a reminder system in the USA, where there is no systematic screening programme, will be quite different from the effect of adding the reminder system to existing infrastructure in the UK. In other words, results from studies carried out within other healthcare systems might not be easily generalised to the UK.

In these circumstances a standard systematic review is likely to identify a range of studies on interventions that are similar to the interventions being considered, but not necessarily the same, or which are described variably with respect to their components. In this case, the guideline committee will need to consider carefully fidelity and applicability issues, and ensure these are accounted for in the 'committee discussion' section of the guidance.

In most cases, the expert opinion of the guideline committee will be used to explore and estimate any impacts on the confidence in the results of such evidence, but quantitative methods for elicitation can be used. If quantitative methods for eliciting are to be used, the NICE Guidelines Technical Support Unit (TSU) should be contacted for advice on methods and on which types of evidence could be searched for.

Evidence on uptake and compliance outcomes

In some service delivery evaluations, measures of service uptake, patient satisfaction or compliance of health service staff are recorded, rather than data on clinical outcomes for patients. This is typically the case, for example, when the intervention is directed at changing staff behaviour or patient referral routes.

Such evidence can be used when analysing the effectiveness or cost effectiveness of a service delivery intervention, but only if there is also an estimate available – from whatever source – of the underlying effect of the procedure or treatment. It is then possible to combine estimates of the efficacy or effectiveness of the intervention with estimates of the effectiveness of the service delivery intervention in ensuring that the intervention is implemented. It is possible to combine evidence from trials reporting process outcomes alone, trials reporting outcomes alone, and trials reporting both.

The NICE TSU can be consulted for advice on how the two kinds of evidence can be combined within a single modelling framework.

Estimates of relative effectiveness for questions about access and availability

For questions about access and availability, there is a particular difficulty in deriving an estimate of relative effectiveness, over and above those described in the previous section. This would be the case, for example, where a procedure such as endoscopy for upper gastrointestinal bleeding is indicated. The question is not about whether endoscopy should be done, but whether or not the procedure can be safely delayed (for example, at night or at weekends) in patients whose symptoms suggest they are at lower risk.

Studies based on individual patient 'audit' data that relate outcomes to treatment parameters while controlling for patient characteristics are difficult to interpret. This is because patients in whom the treatment was withheld or delayed are always likely to be those who were considered to be at lower risk.

It is likely that better estimates of the effectiveness of such interventions can be derived from nationally collected data in which between-unit variation in outcomes, or variation between different time periods, can be related to the local policies and practices (for example, staffing levels) in operation at the time. For example, mortality rates within 1 or 2 days of hospital admission could be compared between weekends and weekdays, and hospitals where weekend cover was the same as weekday cover could also be compared with those where it is not. There are a number of examples where comparisons of this type have been published, for example by Dr Foster. Although these surveys avoid the problems of individual audit data, they are still observational and the use of aggregated data introduces further potential biases. The design of the data collection, and the analysis and interpretation of the data obtained, requires major input from clinical epidemiologists, expert clinicians, methodologists, operational research experts and people with relevant operational experience in the NHS.

A service delivery issue that is quite often examined in this way is the relationship between performance indicators and 'volume' (that is, number of cases seen per year). Such data are also used to establish 'institutional rankings'. Data of this type tend to show considerable overdispersion: in other words, there is far more variation between units than would be expected by chance. To determine whether individual units are performing at a level that requires some intervention, control charts can be used. There are also methods and processes for interpreting the relationships between performance and volume and the need to take into account general between-unit variation when trying to infer causal effects.

This page was last updated: 23 October 2025