4 Developing review questions and planning the systematic review

Download (PDF)

At the start of guideline development, the key clinical issues listed in the scope need to be translated into review questions. In some instances, this may be done as part of the scoping process (see chapter 2). The review questions must be clear, focused and closely define the boundaries of the topic. They are important both as the starting point for the systematic literature review and as a guide for the development of recommendations by the Guideline Development Group (GDG). The development of the review questions should be completed soon after the GDG is convened.

This chapter describes how review questions are developed, formulated and agreed. It describes the different types of review question that may be used, and provides examples. It also provides information on how to plan the systematic review.

4.1 Number of review questions

The exact number of review questions for each clinical guideline depends on the topic and the breadth of the scope (see chapter 2). However, the number of review questions must be manageable for the GDG and the National Collaborating Centre (NCC) or the NICE Internal Clinical Guidelines Programme^[6] within the agreed timescale. For standard clinical guidelines that take 10–18 months to develop (from the time the scope is signed off to submission of the draft guideline), between 15 and 20 review questions is a reasonable number. This number is based on the estimate that, on average, it is feasible for a maximum of two systematic reviews to be presented at any one GDG meeting. However, review questions vary considerably in the number of relevant studies and the complexity of the question and analyses, and the numbers of questions given here are only a guide. For example, a single review question might involve a complex comparison of several treatment options with many individual studies. At the other extreme, a question might address the effects of a single intervention and have few relevant studies.

4.2 Developing review questions from the scope

Review questions should address all areas covered in the scope, and should not introduce new aspects not specified in the scope. They will contain more detail than, and should be seen as building on, the key clinical issues in the scope.

Review questions are usually drafted by the NCC team. They should then be refined and agreed by all GDG members through discussions at GDG meetings. The different perspectives among GDG members will help to ensure that the right review questions are identified, thus enabling the literature search to be planned efficiently. On occasion the questions may need refining once the evidence has been searched; such changes should be documented.

Review questions then inform the development of protocols used by NCCs to detail how questions will be addressed.

4.2.1 Economic aspects

This chapter relates to the specification of questions for reviewing the clinical evidence. Evidence about economic aspects of the key clinical issues should also be sought from published economic evaluations and by conducting new modelling where appropriate. Methods for identifying and reviewing the economic literature are discussed in chapters 5 and 6; health economics modelling is discussed in chapter 7. When developing review questions, it is important to consider what information is required for any planned economic modelling. This might include, for example, information about quality of life, rates of adverse effects or use of health services.

4.3 Formulating and structuring review questions

A good review question is clear and focused. It should relate to a specific patient problem, because this helps to identify the clinically relevant evidence. The exact structure of the review question will depend on what is being asked, but it is likely to fall into one of three main areas:

intervention
diagnosis
prognosis.

Patient experience is a component of each of these and should inform the development of a structured review question. In addition, review questions that focus on a specific element of patient experience may merit consideration in their own right.

4.3.1 Review questions about interventions

Usually, most review questions for a particular clinical guideline relate to interventions. Each intervention listed in the scope is likely to require at least one review question, and possibly more depending on the populations and outcomes of interest.

A helpful structured approach for developing questions about interventions is the PICO (population, intervention, comparator and outcome) framework (see box 4.1). This divides each question into four components:

population (the population under study)
intervention (what is being done)
comparators (other main treatment options)
outcome (measures of how effective the interventions have been).

Box 4.1 Features of a well-formulated review question on the effectiveness of an intervention using the PICO framework

Population: Which populations of patients are we interested in? How can they be best described? Are there subgroups that need to be considered?

Intervention: Which intervention, treatment or approach should be used?

Comparators: What is/are the main alternative(s) to compare with the intervention being considered?

Outcome: What is really important for the patient? Which outcomes should be considered? Examples include intermediate or short-term outcomes; mortality; morbidity and quality of life; treatment complications; adverse effects; rates of relapse; late morbidity and re-admission; return to work, physical and social functioning; resource use.

For each review question, the GDG should take into account the various confounding factors that may influence the outcomes and effectiveness of an intervention. They should also specify the healthcare setting for the question if necessary. To facilitate this process, outcomes and other key criteria that the GDG considers to be important should be listed. Once the review question has been framed, key words can be identified as potential search terms for the systematic review. Examples of review questions on the effectiveness of interventions are presented in box 4.2.

Box 4.2 Examples of review questions on the effectiveness of interventions

For people with IBS (irritable bowel syndrome), are antimuscarinics or smooth muscle relaxants effective compared with placebo or no treatment for the long-term control of IBS symptoms? Which is the most effective antispasmodic?

(Adapted from: Irritable bowel syndrome in adults: diagnosis and management of irritable bowel syndrome in primary care. NICE clinical guideline 61 [2008])

Which first-line opioid maintenance treatments are effective and cost-effective in relieving pain in patients with advanced and progressive disease who require strong opioids?

(Adapted from: Opioids in palliative care. NICE clinical guideline 140 [2012]).

Review questions about drugs will usually only consider drugs with a UK marketing authorisation for some indication. Use of a drug outside its licensed indication (off-label use) may be considered if this use of the drug is common in the UK (see also section 9.3.6.3). Drugs with no UK marketing authorisation for any indication will not usually be considered in a guideline.

A review question relating to an intervention is usually best answered by a randomised controlled trial (RCT), because this is most likely to give an unbiased estimate of the effects of an intervention. Further information on the side effects of a drug may be obtained from other sources. Some advice on finding data on the adverse effects of an intervention is available in the Cochrane handbook for systematic reviews of interventions.

There are, however, circumstances in which an RCT is not necessary to confirm the effectiveness of a treatment (for example, giving insulin to a person in a diabetic coma compared with not giving insulin) because we are sufficiently certain from non-randomised evidence that an important effect exists. This is the case only if all of the following criteria are fulfilled:

An adverse outcome is likely if the person is not treated (evidence from, for example, studies of the natural history of a condition).
The treatment gives a dramatic benefit that is large enough to be unlikely to be a result of bias (evidence from, for example, historically controlled studies).
The side effects of the treatment are acceptable (evidence from, for example, case series).
There is no alternative treatment.
There is a convincing pathophysiological basis for treatment.

4.3.2 Review questions about diagnosis

Review questions about diagnosis are concerned with the performance of a diagnostic test or test strategy. A diagnostic test is a means of determining whether a patient has a particular condition (disease, stage of disease or subtype of disease). Diagnostic tests can include physical examination, history taking, laboratory or pathological examination and imaging tests.

Broadly, review questions that can be asked about a diagnostic test are of three types:

questions about the diagnostic accuracy of a test or a number of tests individually against a comparator (the reference standard)
questions about the diagnostic accuracy of a test strategy (such as serial testing) against a comparator (the reference standard)
questions about the clinical value of using the test.

Questions about a diagnostic test consider the ability of the test to predict the presence or absence of disease. In studies of the accuracy of a diagnostic test, the results of the test under study (the index test[s]) are compared with those of the best available test (the reference standard) in a sample of patients. It is important to be clear when deciding on the question what the exact proposed use of the test is; for example, as an initial 'triage' test or after other tests.

The PICO framework described in the previous section is useful when formulating review questions about diagnostic test accuracy (see box 4.3). The healthcare setting of the test should be specified. The intervention is the test under investigation (the index test[s]), the comparison is the reference standard, and the outcome is a measure of the presence or absence of the particular disease or disease stage that the index test is intended to identify (for example, sensitivity or specificity). The target condition that the test is intended to identify should be specified in the review question.

Box 4.3 Features of a well-formulated review question on diagnostic test accuracy using the PICO framework

Population: To which populations of patients would the test be applicable? How can they be best described? Are there subgroups that need to be considered?

Intervention (index test[s]): The test or test strategy being evaluated.

Comparator: The test with which the index test(s) is/are being compared, usually the reference standard (the test that is considered to be the best available method to establish the presence or absence of the condition of interest – this may not be the one that is routinely used in practice).

Target condition: The disease, disease stage or subtype of disease that the index test(s) and the reference standard are being used to establish.

Outcome: The diagnostic accuracy of the test or test strategy for detecting the target condition. This is usually reported as test parameters, such as sensitivity, specificity, predictive values, likelihood ratios, or – where multiple cut-off values are used – a receiver operating characteristic (ROC) curve.

Examples of review questions on the accuracy of a diagnostic test are given in box 4.4. A review question relating to diagnostic test accuracy is usually best answered by a cross-sectional study in which both the index test(s) and the reference standard are performed on the same sample of patients. Case–control studies are also used to assess diagnostic test accuracy, but this type of study design is more prone to bias (and often results in inflated estimates of diagnostic test accuracy). Further advice on conducting reviews of diagnostic test accuracy can be found in the Cochrane handbook for diagnostic test accuracy reviews.

Box 4.4 Examples of review questions on diagnostic test accuracy

Review question:

In children and young people under 16 years of age with a petechial rash, can non-specific laboratory tests (C-reactive protein, white blood cell count, blood gases) help to confirm or refute the diagnosis of meningococcal disease?

Formulation of question:

Population: All children and young people from birth up to their 16th birthday who have or are suspected of having bacterial meningitis or meningococcal septicaemia.

Index test(s): Non-specific laboratory tests (C-reactive protein, white blood cell count, blood gases).

Reference standard: Microscopy, lumbar puncture or clinical follow-up.

Outcomes: Event rates; prevalence; sensitivity; specificity; positive predictive value; negative predictive value.

(Adapted from: Bacterial meningitis and meningococcal septicaemia: management of bacterial meningitis and meningococcal septicaemia in children and young people younger than 16 years in primary and secondary care. NICE clinical guideline 102 [2010]).

Although the assessment of test accuracy is an important component of establishing the usefulness of a diagnostic test, the clinical value of a test lies in its usefulness in guiding treatment decisions, and ultimately in improving patient outcomes. 'Test and treat' studies compare outcomes of patients who undergo a new diagnostic test (in combination with a management strategy) with those of patients who receive the usual diagnostic test and management strategy. These types of study are not very common. If there is a trade-off between costs, benefits and harms of the tests, a decision-analytic model may be useful (see Lord et al. 2006).

Review questions aimed at establishing the clinical value of a diagnostic test in practice can be structured in the same way as questions about interventions. The best study design is an RCT. Review questions about the safety of a diagnostic test should also be structured in the same way as questions about interventions.

4.3.3 Review questions about prognosis

Prognosis describes the likelihood of a particular outcome, such as the progression of a disease, or the survival time for a patient after the diagnosis of a disease or with a particular set of risk markers. A prognosis is based on the characteristics of the patient ('prognostic factors'). These prognostic factors may be disease-specific (such as the presence or absence of a particular disease feature) or demographic (such as age or sex), and may also include the likely response to treatment and the presence of comorbidities. A prognostic factor does not need to be the cause of the outcome, but should be associated with (in other words, predictive of) that outcome.

Prognostic information can be used within clinical guidelines to:

provide information to patients about their prognosis
classify patients into risk categories (for example, cardiovascular risk) so that different interventions can be applied
define subgroups of populations that may respond differently to interventions
identify factors that can be used to adjust for case mix (for example, in explorations of heterogeneity)
help determine longer-term outcomes not captured within the timeframe of a clinical trial (for example, for use in an economic model).

Review questions about prognosis address the likelihood of an outcome for patients from a population at risk for that outcome, based on the presence of a proposed prognostic factor.

Review questions about prognosis may be closely related to questions about aetiology (cause of a disease) if the outcome is viewed as the development of the disease itself based on a number of risk factors. They may also be closely related to questions about interventions if one of the prognostic factors is treatment. However, questions about interventions are usually better addressed by controlling for prognostic factors.

Examples of review questions relating to prognosis are given in box 4.5.

Box 4.5 Examples of review questions on prognosis

Are there factors related to the individual (characteristics either of the individual or of the act of self-harm) that predict outcome (including suicide, non-fatal repetition, other psychosocial outcomes)?

(From: Self-harm: the short-term physical and psychological management and secondary prevention of self-harm in primary and secondary care. NICE clinical guideline 16 [2004].)

For women in the antenatal and postnatal periods, what factors predict the development or recurrence of particular mental disorders?

(From: Antenatal and postnatal mental health: clinical management and service guidance. NICE clinical guideline 45 [2007].)

For people who are opioid dependent, are there particular groups that are more likely to benefit from detoxification?

(From: Drug misuse: opioid detoxification. NICE clinical guideline 52 [2007].)

A review question relating to prognosis is best answered using a prospective cohort study. A cohort of people who have not experienced the outcome in the review question (but for whom the outcome is possible) is followed to monitor the number of outcome events occurring over time. The cohort will contain people who possess or have been exposed to the prognostic factor, and people who do not possess or have not been exposed to it. The cohort may be taken from one arm (usually the control arm) of an RCT, although this often results in a highly selected, unrepresentative group. Case–control studies are not suitable for answering questions about prognosis, because they give only an odds ratio for the occurrence of the event for people with and without the prognostic factor – they give no estimate of the baseline risk.

4.3.4 Using patient experience to inform review questions

The PICO framework should take into account the patient experience. Patient experience, which may vary for different patient populations ('P'), covers a range of dimensions, including:

patient views on the effectiveness and acceptability of given interventions ('I')
patient preferences for different treatment options, including the option of foregoing treatment ('C')
patient views on what constitutes a desired, appropriate or acceptable outcome ('O').

The integration of relevant patient experiences into each review question therefore helps to make the question patient-centred as well as clinically appropriate. For example, a review question that looks at the effectiveness of aggressive chemotherapy for a terminal cancer is more patient-centred if it integrates patient views on whether it is preferable to prolong life or to have a shorter life but of better quality.

It is also possible for review questions to ask about specific elements of the patient experience in their own right, although the PICO framework may not provide a helpful structure if these do not involve an intervention designed to treat a particular condition. Such review questions should be clear and focused, and should address relevant aspects of the patient experience at specific points in the care pathway that are considered to be important by the patient and carer members and others on the GDG. Such questions can address a range of issues, such as:

patient information and support needs
elements of care that are of particular importance to patients
the specific needs of groups of patients who may be disadvantaged compared with others
which outcomes reported in intervention studies are most important to patients.

As with the development of all structured review questions, questions that are broad in scope and lack focus (for example, 'what is the patient experience of living with condition X'?) should be avoided. Examples of review questions relating to patient information and support needs are given in box 4.6.

Box 4.6 Examples of review questions on patient experience

What information and support should be offered to children with atopic eczema and their families/carers?

(From: Atopic eczema in children: management of atopic eczema in children from birth up to the age of 12 years. NICE clinical guideline 57 [2007].)

What elements of care on the general ward are viewed as important by patients following their discharge from critical care areas?

(From: Acutely ill patients in hospital: recognition of and response to acute illness in adults in hospital. NICE clinical guideline 50 [2007].)

Are there cultural differences that need to be considered in delivering information and support on breast or bottle-feeding?

(From: Postnatal care: routine postnatal care of women and their babies. NICE clinical guideline 37 [2006].)

A review question relating to patient experience is likely to be best answered using qualitative studies and cross-sectional surveys, although information on patient experience is also becoming increasingly available as part of wider intervention studies.

4.3.5 Review questions about service delivery

Clinical guidelines may cover issues of service delivery. Examples of review questions relating to service delivery are given in box 4.7.

Box 4.7 Examples of review questions on service delivery

In patients with hip fractures what is the clinical and cost effectiveness of early surgery (within 24, 36 or 48 hours) on the incidence of complications such as mortality, pneumonia, pressure sores, cognitive dysfunction and increased length of hospital stay?

In patients with hip fracture what is the clinical and cost effectiveness of hospital-based multidisciplinary rehabilitation on functional status, length of stay in secondary care, mortality, place of residence/discharge, hospital readmission and quality of life?

What is the clinical and cost effectiveness of surgeon seniority (consultant or equivalent) in reducing the incidence of mortality, the number of patients requiring reoperation, and poor outcome in terms of mobility, length of stay, wound infection and dislocation?

(From: Hip fracture: the management of hip fracture in adults. NICE clinical guideline 124 [2011].)

The most appropriate study design to answer review questions about service delivery is an RCT. However, a wide variety of methodological approaches and study designs have been used.

4.4 Planning the systematic review

For each systematic review, the systematic reviewer (with input from other technical staff at the NCC) should prepare a review protocol that outlines the background, the objectives and the planned methods. This protocol will explain how the review is to be carried out and will help the reviewer to plan and think through the different stages, as well as providing some protection against the introduction of bias. In addition, the review protocol should make it possible for the review to be repeated by others at a later date. A protocol should also make it clear how equality issues have been considered in planning the review work, if appropriate.

4.4.1 Structure of the review protocol

The protocol should be short (no longer than one page) and should describe any differences from the methods described in this guidelines manual (chapters 5–7), rather than duplicating the methodology stated here. It should include the components outlined in table 4.1.

Table 4.1 Components of the review protocol

Component	Description
Review question	The review question as agreed by the GDG.
Objectives	Short description; for example 'To estimate the effectiveness and cost effectiveness of…' or 'To estimate the diagnostic accuracy of…'.
Criteria for considering studies for the review	Using the PICO framework. Including the study designs selected.
How the information will be searched	The sources to be searched and any limits that will be applied to the search strategies; for example, publication date, study design, language. (Searches should not necessarily be restricted to RCTs.)
The review strategy	The methods that will be used to review the evidence, outlining exceptions and subgroups. Indicate if meta-analysis will be used and how it will be conducted.

The review protocol is an important opportunity to look at issues relating to equalities that were identified in the scope, and to plan how these should be addressed. For example, if it is anticipated that the effects of an intervention might vary with patient age, the review protocol should outline the plan for addressing this in the review strategy.

4.4.2 Process for developing the review protocol

The review protocol should be produced after the review question has been agreed by the GDG and before starting the review (that is, usually between two GDG meetings). The protocol should be approved by the GDG at the next meeting.

All review protocols should be included as appendices in the draft of the full guideline that is prepared for consultation (see also chapter 10). Any changes made to a protocol in the course of the work should be described. Review protocols will also be published on the NICE website 5–7 weeks before consultation on the guideline starts.

4.5 Further reading

Centre for Reviews and Dissemination (2009) Systematic reviews: CRD's guidance for undertaking reviews in health care. Centre for Reviews and Dissemination, University of York

Cochrane Diagnostic Test Accuracy Working Group (2008) Cochrane handbook for diagnostic test accuracy reviews, version 1.0.1 (updated March 2009). The Cochrane Collaboration

Higgins JPT, Green S, editors (2008) Cochrane handbook for systematic reviews of interventions, version 5.1.0 (updated March 2011). The Cochrane Collaboration

Lord SJ, Irwig L, Simes RJ (2006) When is measuring sensitivity and specificity sufficient to evaluate a diagnostic test, and when do we need randomized trials? Annals of Internal Medicine 144: 850–5

National Institute for Health and Clinical Excellence (2011) Diagnostics assessment programme manual. London: National Institute for Health and Clinical Excellence

Richardson WS, Wilson MS, Nishikawa J et al. (1995) The well-built clinical question: a key to evidence-based decisions. American College of Physicians Journal Club 123: A12–3

^[6] Information throughout this manual relating to the role of the National Collaborating Centres in guideline development also applies to the Internal Clinical Guidelines Programme at NICE.