Process and methods

4 Developing review questions and planning the evidence review

At the start of guideline development, the key issues and draft questions listed in the scope should be translated into review questions and review protocols.

Review questions and review protocols define the scope of the review and therefore must be clear and focused. They provide the framework for the design of the literature searches, inform the planning, methods and process of the evidence review, and act as a guide for the development of recommendations by the committee.

This chapter describes how review questions are developed and agreed. It describes the different types of review question and provides examples. It also provides information on the different types of evidence and how to plan the evidence review. The best approach may vary depending on the topic. Options should be considered by the developer, and the chosen approach discussed and agreed with NICE staff with responsibility for quality assurance. The approach should be documented in the review protocol (see the appendix on review protocol template) and the guideline, together with the reasons for the choice.

4.1 Number of review questions

The number of review questions for each guideline depends on the topic and the breadth of the scope. However, it is important that the total number of review questions:

  • provides sufficient focus for the guideline, and covers all key areas outlined in the scope

  • can be completed in the time and with the resources available.

Review questions can vary considerably in terms of both the number of included studies and the complexity of the question and analyses. For example, a single review question might involve a complex comparison of several interventions, in a number of population subgroups, across multiple outcomes, and including many studies that are using different study designs (including systematic reviews). At the other extreme, a review question might investigate the effects of a single intervention compared with a single comparator and there may be few primary studies or no study meeting the inclusion criteria. The number of review questions for each guideline, and how much time and resources are needed for them, will therefore vary depending on the topic and its complexity.

4.2 Developing review questions from the scope

The review questions should cover all key areas specified in the scope but should not introduce new areas. They should build on the draft questions in the scope and usually contain more detail.

Review questions are usually drafted by the developer. They are then refined and agreed with the committee members. This enables the literature search to be planned efficiently. Sometimes the questions need refining before the development of review protocols, or very occasionally after the evidence has been searched. Any such changes to review questions (with reasons) should be agreed with a member of NICE staff with a quality assurance role. All changes should be clearly documented in the review protocol and evidence review, so it is auditable.

4.3 Formulating and structuring different review questions

Review questions should be clear and focused. The exact structure of each question depends on what is being asked. The aims of questions will differ, but are likely to cover at least one of the following:

  • extent and nature of the issue as described in the scope

  • associations between factors or variables and the outcome of interest, the epidemiology or aetiology of a disease or condition

  • interventions that work best in ideal circumstances and might work in specific circumstances or settings (the extent to which something works, how and why)

  • technologies or tests that work best to diagnose certain diseases or conditions

  • a relevant programme theory, theory of change, or mechanisms of action likely to explain behaviour or effects

  • views and experiences of people using services or affected by the recommendation (for example, the acceptability and accessibility of interventions, or people's values and preferences).

  • practitioners', providers' or commissioners' views, experiences and working practices (including any factors hindering the implementation of the intervention and factors supporting implementation)

  • potential for an intervention to do harm or have unintended consequences.

Conceptual frameworks or logic models can be useful when developing review questions.

When developing review questions, it is important to consider what information is needed for any planned economic modelling. This might include information about quality of life, rates of, and inequalities in, adverse effects and use of health and social care services. The nature and type of review questions determines the type of evidence that is most suitable (Petticrew and Roberts 2003). There are examples of different types of review questions and the type of evidence that might best address them throughout this chapter. Developers should consider whether particular review questions might be addressed through analysis of primary data, based on an understanding of the evidence base (and possible limitations) and different sources available (see the section on stages of scope development in the chapter on the scope).

Review questions about the effectiveness of an intervention

A helpful structured approach for developing questions about interventions is the PICO (population, intervention, comparator and outcome) framework (see box 4.1). The setting for the question should also be specified if relevant.

However, other frameworks exist (such as SPICE; setting, perspective, intervention, comparison, evaluation) and can be used as appropriate.

Box 4.1 Formulating a review question on the effectiveness of an intervention using the PICO framework

Population: Which population are we interested in? How best can it be described? Are there subgroups that need to be considered?

Intervention: Which intervention, treatment or approach should be examined?

Comparators: Are there alternatives to the intervention being examined? If so, what are these (for example, other interventions, standard active comparators, usual care or placebo)?

Outcome: Which outcomes should be considered to assess how well the intervention is working (including outcomes on both benefits and harms)? What is really important for people using services? Core outcome sets should be used if suitable based on quality and validity; one source is the COMET database. The Core Outcome Set Standards for Development (COS STAD) and Core Outcome Set Standards for Reporting (COS STAR) should be used to assess the suitability of identified core outcome sets.

For each review question, factors that may affect the outcomes and effectiveness of an intervention, including any wider social factors that may affect health and any health inequalities, should be considered. Outcomes (on both benefits and harms) and other factors that are important should be specified in the review protocol. In general, a range of 5 to 9 outcomes should be defined. Guidance on prioritising outcomes is provided by the GRADE working group.

Box 4.2 Examples of review questions on the effectiveness of interventions
  • What types of mass-media intervention help prevent children and young people from taking up smoking? Are the interventions delaying rather than preventing the onset of smoking?

  • Which of the harm-reduction services offered by needle and syringe programmes (including advice and information on safer injecting, onsite vaccination services, and testing for hepatitis B and C and HIV) are effective in reducing blood-borne viruses and other infections among people who inject drugs?

  • What types of intervention and programme are effective in increasing physical activity levels among children under 8 - particularly those who are not active enough to meet the national recommendations for their age - or help to improve their core physical skills?

  • Does brief advice from GPs increase adult patients' physical activity levels?

  • What are the most effective school-based interventions for changing young people's attitudes to alcohol use?

  • For people with IBS (irritable bowel syndrome), are antimuscarinics or smooth muscle relaxants effective compared with placebo or no treatment for the long-term control of IBS symptoms? Which is the most effective antispasmodic?

  • Which first-line opioid maintenance treatments are effective and cost effective in relieving pain in patients with advanced and progressive disease who require strong opioids?

  • What are the most effective methods of care planning, focusing on improving outcomes for people with dementia and their carers?

  • What is the effectiveness and cost effectiveness of intermediate care and reablement for people living with dementia?

Review questions about pharmacological management will usually only include medicines with a UK marketing authorisation for some indication, based on regulatory assessment of safety and efficacy. Use of a medicine outside its licensed indication (off‑label use) may be considered in some circumstances; for example, if this use is common practice in the UK, if there is good evidence for this use, and there is no other medicine licensed for the indication (see also the section on recommendations on medicines, including off-label use of licensed medicines). Medicines with no UK marketing authorisation for any indication will not usually be considered in a guideline because there is no UK assessment of safety and efficacy to support their use.

A review question about the effectiveness of an intervention is usually best answered by a randomised controlled trial (RCT), because a well-conducted RCT is most likely to give an unbiased estimate of effects. However, RCT evidence on the effectiveness of an intervention may not always be available. In addition, for many health and social care interventions it can be difficult or unethical to assign populations to control and intervention groups (for example, for interventions which aim to change policy). In such cases, other study designs (such as non-randomised studies) might be appropriate for assessing association, possible cause and effect, or longer-term outcome data. There are also some reviews that may include analysis of large, high-quality primary data sources (such as patient registries). The Medical Research Council (MRC) has produced guidance on evaluating complex interventions and using natural experiments to evaluate health interventions delivered at population level. Advice on finding data on the adverse effects of an intervention is also available in the Cochrane handbook for systematic reviews of interventions and the SuRe Info (Summarized Research in Information Retrieval for HTA) resource.

There are also circumstances in which non-RCT evidence could be used to establish the benefits and harms of an intervention. For example, the impact or effectiveness of a nationally implemented policy (such as the ban on smoking in enclosed public places and workplaces) may be more appropriately addressed by a natural experiment or interrupted time series study. Non-RCT evidence may also be useful to establish the benefits and harms of interventions if:

  • there are long-term adverse outcomes (cohort or longitudinal studies could be used)

  • the intervention gives a large benefit or shows a clear dose–response gradient that is unlikely to be a result of bias (a very large cohort study could demonstrate this)

  • there is a convincing mechanism of action (such as a pathophysiological basis) for the intervention.

When review questions are about the effectiveness of interventions, additional types of evidence may be needed to answer different aspects of the question. For example, additional evidence might address the views of people using services or the communities where services are based, or barriers to use as reported by practitioners or providers. In this case 2 related review questions (a quantitative intervention review question and a qualitative review question) may be used to address the issues. Sometimes a single review may be suitable, if it uses different sources of evidence or types of data (for example, a review may combine current practice or map or integrate quantitative information with qualitative data [that is, a mixed methods review]). A review on effectiveness may also include evidence of the intervention's mechanism of action, that is, evidence of how the intervention works. All these examples of questions about the effectiveness of interventions may be addressed by 2 related review questions or by a mixed methods review.

Box 4.3 Examples of mixed methods reviews

For further guidance and examples of mixed methods reviews, see chapter 8 of the JBI Manual for Evidence Synthesis.

What is the clinical effectiveness of larvae therapy for wound healing, and what do patients perceive the acceptability of larvae therapy to be?

What is the clinical effectiveness of self-management in adolescents with asthma, and what factors are perceived by them as important to maintain adherence to their self-management plan?

What is the effectiveness of integrated working among registered social workers and other practitioners to support adults with complex needs, and based on their views and experiences what are the barriers to integrated working?

Review questions that consider implementation

Review questions on effectiveness may also consider implementation, for example, 'What systems and processes should be in place to increase shared decision-making?'

Review questions that consider cost effectiveness

For more information on review questions that consider cost effectiveness, see the chapter on incorporating economic evaluation.

Review questions about the accuracy of diagnostic tests

Review questions about diagnosis are concerned with the performance of a diagnostic test or test strategy. Diagnostic tests can include identification tools, physical examination, signs and symptoms, history‑taking, laboratory or pathological examination and imaging tests.

Broadly, review questions that can be asked about a diagnostic test are of 2 types:

  • questions about the diagnostic accuracy (or diagnostic yield) of a test or a number of tests individually against a comparator (the reference standard)

  • questions about the diagnostic accuracy (or diagnostic yield) of a test strategy (such as serial testing) against a comparator (the reference standard).

In studies of the accuracy of a diagnostic test, the results of the test under study (the index test) are compared with those of the best available test (the reference standard). It is important to be clear when deciding on the question what the exact proposed use of the test is (for example, as an identification tool, an initial 'triage' test or after other tests).

The PICTO (population, index test, comparator, target condition and outcome) framework can be useful when formulating review questions about diagnostic test accuracy (see box 4.4). However other frameworks (such as PPIRT; population, prior tests, index test, reference standard, target condition) can be used if helpful.

Box 4.4 Features of a well-formulated review question on diagnostic test accuracy using the PICTO framework

Population: To which populations would the test be applicable? How can they be best described? Are there subgroups that need to be considered?

Index test: The test or test strategy being evaluated for accuracy.

Comparator/reference standard: The test with which the index test is being compared, usually the reference standard (the test that is considered to be the best available method for identifying the presence or absence of the condition of interest – this may not be the one that is routinely used in practice).

Target condition: The disease, disease stage or subtype of disease that the index test(s) and the reference standard are being used to identify.

Outcome: The diagnostic accuracy of the test or test strategy for detecting the target condition. This is usually reported as test parameters, such as sensitivity, specificity, predictive values, likelihood ratios, or – when multiple thresholds are used – a receiver operating characteristic (ROC) curve. This should also include issues of importance to people having the test, such as acceptability.

A review question about diagnostic test accuracy is usually best answered by a cross-sectional study in which both the index test and the reference standard are performed on the same sample of people. Cohort and case–control studies are also used to assess the accuracy of diagnostic tests, but these types of study design are more prone to bias (and often result in inflated estimates of diagnostic test accuracy). Further advice on the types of study to include in reviews of diagnostic test accuracy can be found in the Cochrane handbook for diagnostic test accuracy reviews.

Box 4.5 Examples of review questions on diagnostic test accuracy

What is the accuracy of imaging (MRI, CT scan, PET scan, X ray, ultrasonography) for diagnosing osteomyelitis compared with invasive bone biopsy?

What is the accuracy of D dimer assay for diagnosing deep vein thrombosis compared with compression ultrasonography?

In people suspected of having coronary artery disease, can multi-slice spiral CT of coronary arteries be used as replacement for conventional invasive coronary angiography?

In patients suspected of cow's milk allergy, should skin prick tests rather than an oral food challenge with cow's milk be used for diagnosis and management?

In adults receiving care in non-specialist settings, should serum or plasma cystatin C rather than serum creatinine concentration be used for diagnosing and managing renal impairment?

Although assessing test accuracy is important for establishing the usefulness of a diagnostic test, the value of a test lies in how useful it is in guiding treatment decisions or the provision of services, and ultimately in improving outcomes. 'Test and treat' studies, for example, compare outcomes for people who have a new diagnostic test (in combination with a management strategy) with outcomes of people who have the usual diagnostic test and management strategy. These types of study are not very common. If there is a trade‑off between costs, benefits and harms of the tests, a decision-analytic model may be useful (see Lord et al. 2006).

Review questions aimed at establishing the value of a diagnostic test in practice can be structured in a similar way as questions about interventions. The best study design is test-and-treat RCT. Review questions about the safety of a diagnostic test should be structured in the same way as questions about the safety of interventions.

Review questions about prognosis

Prognosis describes the likelihood of a particular outcome, such as disease progression, the development of higher levels of need, or length of survival after diagnosis or for a person with a particular set of risk markers. A prognosis is based on the characteristics of the person or user of services ('prognostic factors'). These prognostic factors may be disease specific (such as the presence or absence of a particular disease feature) or demographic (such as age or sex). They may also include the likely response to treatment or care and the presence of comorbidities. A prognostic factor does not need to be the cause of the outcome, but should be associated with (in other words, predictive of) that outcome.

Information about prognosis can be used within guidelines to:

  • classify people into risk categories (for example, cardiovascular risk or level of need) so that different interventions or preventative strategies can be applied

  • define subgroups of populations that may respond differently to interventions

  • identify factors that can be used to adjust for case mix (for example, in investigations of heterogeneity)

  • help determine longer-term outcomes not captured within the timeframe of a trial (for example, for use in an economic model).

Review questions about prognosis address the likelihood of an outcome for a person or user of services from a population at risk for that outcome, based on the presence of a proposed prognostic factor.

Review questions about prognosis may be closely related to questions about aetiology (cause of a disease or need) if the outcome is viewed as the development of the disease or need based on a number of risk factors.

Box 4.6 Examples of review questions on prognosis

Are there factors related to the individual (characteristics either of the individual or of the act of self-harm) that may predict outcomes (including suicide, non-fatal repetition, other psychosocial outcomes) from self-harm?

Which people having neoadjuvant chemotherapy or chemoradiotherapy for rectal cancer do not need surgery?

A review question about prognosis is best answered using a prospective cohort study with multi-variate analysis. Case-control studies and cross-sectional studies are not usually suitable for answering questions about prognosis because they do not estimate baseline risk, but give only an estimate of the likelihood of the outcome for people with and without the prognostic factor.

Review questions about predicting an individual prognosis or identifying an individual diagnosis

In some circumstances, advanced statistical analyses can be used to develop prediction models for a specific diagnosis or prognosis. These models are usually developed using multivariable modelling methods. Multivariable prediction models are developed to help healthcare professionals estimate the probability or risk that a specific disease or condition is present (diagnostic prediction models) or that a specific event will occur in the future (prognostic prediction models). These models are used to inform decision-making. They are usually developed using a multivariable model – a mathematical equation that relates multiple predictors for a particular person to the probability of or risk for the presence (diagnosis) or future occurrence (prognosis) of a particular outcome. Other names for a prediction model include risk prediction model, predictive model, prognostic (or prediction) index or rule, and risk score.

Diagnostic prediction models can be used to inform who should be referred for further testing, whether treatment should be started directly, or to reassure patients that a serious cause for their symptoms is unlikely. Prognostic prediction models can be used for planning lifestyle or treatment decisions based on the risk for developing a particular outcome or state of health in a given period.

Prediction model studies can be broadly categorised into those that develop models, those that validate models (with or without updating the model) and those that do both. Studies that report model development aim to derive a prediction model by selecting the relevant predictors and combining them statistically into a multivariable model. Logistic and Cox regression are most frequently used for short-term (for example, disease absent versus present, 30‑day mortality) outcomes and long-term (for example, 10‑year risk) categorical outcomes, respectively. Studies may also focus on quantifying how much value a specific predictor (for example, a new predictor) adds to the model.

Quantifying the predictive ability of a model using the same data from which the model was developed (often referred to as apparent performance) tends to overestimate performance. Studies reporting the development of new prediction models should always include some form of validation to quantify any optimism in the predictive performance (for example, calibration and discrimination). There are 2 types of validation: internal validation and external validation. Internal validation uses only the original study sample with methods such as bootstrapping or cross-validation. External validation evaluates the performance of the model with data not used for model development. The data may be collected by the same investigators or other independent investigators, typically using the same predictor and outcome definitions and measurements, but sampled from a later period (temporal or narrow validation). If validation indicates poor performance, the model can be updated or adjusted on the basis of the validation data set. For more information on validating prediction models, see Steyerberg et al. 2001, 2003, 2009; Moons et al. 2012; Altman et al. 2009; and Justice et al. 1999.

Well-known prediction models include QCancer, GerdQ, Ottawa Ankle Rules, and the Alvarado Score for diagnosis; and for prognosis, QRISK2, QFracture, FRAX, EuroScore, Nottingham Prognostic Index, the Framingham Risk Score and the Simplified Acute Physiology Score.

For more information, see the TRIPOD statement in the Annals of Internal Medicine and the TRIPOD statement: explanation and elaboration in the Annals of Internal Medicine.

Although assessing predictive accuracy is important for establishing the usefulness of a prediction model, the value of a prediction model lies in how useful it is in guiding treatment or management decisions, or the provision of services, and ultimately in improving outcomes. Review questions aimed at establishing the value of a prediction model in practice, for example, to compare outcomes of people who were identified from a prediction model (in combination with a management strategy) with outcomes of people who were identified opportunistically (in combination with a management strategy) can be structured in the same way as questions about interventions.

Box 4.7 Examples of review questions on prediction models

Diagnostic prediction models

Which scoring tools for signs and symptoms (including Centor and FeverPAIN) are most accurate in predicting sore throat caused by group A beta-haemolytic streptococcus (GABHS) infection in primary care?

What are the accuracy, clinical utility and cost effectiveness of clinical prediction models/tools (clinical history, cardiovascular risk factors, physical examination) in evaluating stable chest pain of suspected cardiac origin?

Prognostic prediction models

What risk assessment or prediction tool best identifies people with multiple conditions who are at risk of unplanned hospital admission?

What risk tool best identifies people with type 2 diabetes who are at risk of reduced life expectancy?

Which risk assessment tools are the most accurate in predicting the risk of fragility fracture in adults with osteoporosis or previous fragility fracture?

What factors and baseline characteristics are accurate in predicting positive treatment outcomes in people with pancreatic cancer?

Review questions about views and experiences of people using or providing services, family members or carers and the public

In some circumstances, specific questions should be formulated about the views and experience of people using services, family members or carers and the public. The views and experiences of those providing services may also be relevant. These can cover a range of dimensions, including:

  • views and experiences of people using or providing services, family members or carers or the public on the effectiveness and acceptability of interventions

  • preferences of people using services, family members or carers or the public for different treatment or service options, including the option of foregoing treatment or care

  • views and experiences of people using or providing services, family members or carers or the public on what constitutes a desired, appropriate or acceptable outcome.

Such questions should address experiences of an intervention or approach that are considered important by people using or providing services, family members or carers or the public. Such questions can address a range of issues, including:

  • elements of care or a service that are of particular importance to people using or providing services

  • factors that encourage or discourage people from using interventions or services

  • the specific needs of certain groups of people using services, including those sharing the characteristics protected by the Equality Act (2010)

  • information and support needs specific to the topic

  • which outcomes reported in studies of interventions are most important to people using services, family members or carers or the public

  • health and care inequalities.

As for other types of review question, questions that are broad and lack focus (for example, 'What is the experience of living with condition X?') should be avoided.

NICE guidelines should not reiterate or re‑phrase recommendations from the NICE guidelines on patient experience in adult NHS services, service user experience in adult mental health, people's experience in adult social care services, babies, children and young people's experience of healthcare, shared decision making, or other NICE guidelines on the experience of people using services. However, whether there are specific aspects of views or experiences that need addressing for a topic should be considered during the scoping of every guideline. Specific aspects identified during scoping should be included in the scope if they are not covered by existing guidelines and are supported as a priority area. These are likely to be topic-specific and should be well-defined and focused. The PICo (Population, Interest, Context) framework and the SPIDER framework in McMaster University's National Collaborating Centre for Methods and Tools registry of tools are examples of frameworks that can be used to structure review questions on the views or experiences of people using or providing services, family members or carers or the public.

Box 4.8 Examples of review questions on the views or experiences of people using or providing services, family members or carers or the public

What elements of care on the general ward are viewed as important by patients following their discharge from critical care areas?

How does culture affect the need for and content of information and support for bottle or breastfeeding?

What are the perceived risks and benefits of immunisation among parents, carers or young people? Is there a difference in perceived benefits and risks between groups whose children are partially immunised and those who have not been immunised?

What are the views and experiences of health, social care and other practitioners about the practicality and implementation of home-based intermediate care?

A review question about the views or experiences of people using or providing services, family members or carers or the public could be answered using qualitative studies or cross-sectional surveys (or both), although information on views and experiences is also becoming increasingly available as part of some intervention studies.

When there is a lack of evidence on issues important to people affected by the guideline (including families and carers, where appropriate), the developer should consider seeking information via a call for evidence (see the section on calls for evidence from stakeholders in the chapter on identifying the evidence: literature searching and evidence submission), or approaching experts who may have access to additional data sources, such as surveys of user views and experiences, to present as expert testimony. For more information, see:

Review questions about service delivery

Guidelines often cover areas of service delivery. These might include how delivery of services could improve or what are the different core components of services and how different components could be re‑configured.

Box 4.9 Examples of review questions on service delivery

In people with hip fracture what is the clinical and cost effectiveness of hospital-based multidisciplinary rehabilitation on the following outcomes: functional status, length of stay in secondary care, mortality, place of residence or discharge, hospital readmission and quality of life?

What is the clinical and cost effectiveness of surgeon seniority (consultant or equivalent) in reducing the incidence of mortality, the number of people requiring reoperation, and poor outcome in terms of mobility, length of stay, wound infection and dislocation?

What are the best service models to support the identification of people who may be entering their last year of life?

What types of needle and syringe programmes (including their location and opening times) are effective and cost effective?

What regional or city level commissioning models, service models, systems and service structures are effective in:

  • reducing diagnostic delay for TB

  • improving TB contact tracing

  • improving TB treatment completion?

A review question about the effectiveness of service delivery models is usually best answered by an RCT. However, a wide variety of methodological approaches and study designs have been used, including observational evidence (including routine healthcare and audit data), experimental and qualitative evidence. Other types of questions on service delivery are also likely to be answered using evidence from study types other than RCTs. For example, in order to determine whether an intervention will work for a particular subgroup or setting, we might want to know how the intervention works, which will require evidence of the relevant underlying mechanisms.

Depending on the type of review questions, the PICO framework may be appropriate but other frameworks can be used.

When a topic includes review questions on service delivery, approaches described in the chapter on incorporating economic evaluation and the appendix on service delivery – developing review questions, evidence reviews and synthesis may be used. Such methods should be agreed with NICE staff with responsibility for quality assurance and should be clearly documented in the guideline.

Review questions about epidemiology

Epidemiological reviews describe the problem under investigation and can be used to inform other review questions. For example, an epidemiological review of incidence or prevalence of a condition would provide baseline data for further evidence synthesis, an epidemiological review of accidents would provide information on the most common accidents, as well as morbidity and mortality statistics, and data on inequalities in the impact of accidents.

Box 4.10 Examples of review questions that might benefit from an epidemiological review

What are the patterns of physical activity among children from different populations and of different ages in England?

Which populations of children are least physically active and at which developmental stage are all children least physically active?

What is the incidence of Lyme disease in the UK?

The structure of the question and the type of evidence will depend on the aim of the review.

Another use of epidemiological reviews is to describe relationships between epidemiological factors and outcomes – a review on associations. If an epidemiological review has been carried out, information will have been gathered from observational studies on the nature of the problem. However, further analysis of this information – in the form of a review on associations – may be needed to establish the epidemiological factors associated with any positive or negative behaviours or outcomes.

Box 4.11 Examples of review questions that might benefit from a review on associations

What factors are associated with children's or young people's physical activity and how strong are the associations?

What physiological and aetiological factors are associated with coeliac disease?

What physical, environmental and sociological factors are associated with the higher prevalence of multiple sclerosis in European countries?

4.4 Evidence used to inform recommendations

In order to formulate recommendations, the guideline committee needs to consider a range of evidence about what works, why it works, and what might work (and how) in specific circumstances. The committee needs evidence from multiple sources, extracted for different purposes and by different methods.

Scientific evidence

Scientific evidence should be explicit, transparent and replicable. It can be context-free or context-sensitive. Context-free scientific evidence assumes that evidence can be independent of the observer and context. It can be derived from evidence reviews or meta-analyses of quantitative studies, individual studies or theoretical models. Context-sensitive scientific evidence looks at what works and how well in real‑life situations. It includes information on attitudes, implementation, organisational capacity, forecasting, economics and ethics. It is mainly derived using social science and behavioural research methods, including quantitative and qualitative research studies, surveys, theories, cost-effectiveness analyses and mapping reviews. Sometimes, it is derived using the same techniques as context-free scientific evidence. Context-sensitive evidence can be used to complement context-free evidence, and can so provide the basis for more specific and practical recommendations. It can be used to:

  • supplement evidence on effectiveness (for example, to look at how factors such as occupation, educational attainment and income influence effectiveness)

  • inform the development and refinement of conceptual frameworks or logic models (see the section on stages of scope development in the chapter on the scope) and causal pathways (for example, to explain what factors predict teenage parenthood)

  • provide information about the characteristics of the population (including social circumstances and the physical environment) and about the process of implementation

  • describe psychological processes and behaviour change.

Quantitative and qualitative information can also be used to supplement conceptual frameworks or logic models (see the section on stages of scope development in the chapter on the scope). They can also be combined in a single review (mixed methods) when appropriate (for example, to address review questions about factors that help or hinder implementation or to assess why an intervention does or does not work).

Existing systematic reviews

Often reviews of quantitative or qualitative studies (secondary evidence) already exist (for example, those developed by internationally recognised producers of systematic reviews such as Cochrane, the Campbell Collaboration and the Joanna Briggs Institute among others). Existing reviews may include systematic reviews (with or without a meta-analysis or individual patient data analysis) and non-systematic literature reviews and meta-analyses. Well-conducted systematic reviews may be of particular value as sources of evidence (see the appendix on appraisal checklists, evidence tables, GRADE and economic profiles for checklists to assess risk of bias or quality of studies when developing guidelines). Some reviews may be more useful as background information or as additional sources of potentially relevant primary studies. This is because they may:

  • not cover inclusion and exclusion criteria relevant to the guideline topic's referral and parameters (for example, comparable research questions, relevant outcomes, settings, population groups or time periods)

  • group together different outcome or study types

  • include data that are difficult or impossible to separate appropriately

  • not provide enough data to develop recommendations (for example, some reviews do not provide sufficient detail on specific interventions making it necessary to refer to the primary studies).

Conversely, some high-quality systematic reviews may provide enhanced data not available in the primary studies. For example, authors of the review may have contacted the authors of the primary studies or other related bodies in order to include additional relevant data in their review, or may have undertaken additional analyses (such as individual patient data analyses). In addition, if high-quality reviews are in progress (protocol published) at the time of development of the guideline, the developer may choose to contact the authors for permission to access pre‑publication data for inclusion in the guideline (see the appendix on call for evidence and expert witnesses).

Systematic reviews can also be useful when developing the scope and when defining review questions, outcomes and outcome measures for the guideline evidence reviews. The discussion section of a systematic review can also help to identify some of the limitations or difficulties associated with a topic, for example, through a critical appraisal of the limitations of the evidence base. The information specialists may also wish to consider the search strategies of high-quality systematic reviews. These can provide useful search approaches for capturing different key concepts. They can also provide potentially useful search terms and combinations of terms, which have been carefully tailored for a range of databases.

High‑quality systematic reviews that are directly applicable to the guideline review question can be used as a source of data, particularly for complex organisational, behavioural and population level questions.

When considering using results from an existing high-quality review, due account should be taken of the following:

  • The parameters (for example, research question, PICO, inclusion and exclusion criteria) of the review are sufficiently similar to the review protocol of the guideline review question. In such cases, a search should be undertaken for primary studies published after the search date covered by the existing review.

  • Whether the use of existing high-quality reviews will be sufficient to address the guideline review question if the evidence base for the guideline topic is very large.

Colloquial evidence

'Colloquial evidence' has been described as informal evidence that can complement scientific evidence or provide context to other forms of evidence in guidance development. It can come from expert testimony (see the section on other attendees at committee meetings in the chapter on decision-making committees, and the appendix on call for evidence and expert witnesses), from members of the committee, from a reference group of people using services (see the section on forming the committee in the chapter on decision-making committees) or from comments from registered stakeholders (see the section on what happens during consultation in the chapter on the validation process for draft guidelines, and dealing with stakeholder comments). Colloquial evidence includes evidence about values (including political judgement), practical considerations (such as resources, professional experience or expertise and habits or traditions, the experience of people using services) and the interests of specific groups (views of lobbyists and pressure groups).

An example of colloquial evidence is expert testimony. Sometimes oral or written evidence from outside the committee is needed for developing recommendations, if limited primary research is available or more information on current practice is needed to inform the committee's decision-making. Inclusion criteria for oral or written evidence specify the population and interventions for each review question, to allow filtering and selection of oral and written evidence submitted to the committee.

Other evidence

Depending on the nature of the guideline topic and the review question, other sources of relevant evidence such as reports, audits, service evaluation or other real-world evidence may be included. This should be agreed with NICE staff with responsibility for quality assurance before proceeding. The quality, reliability and applicability of the evidence is assessed according to standard processes (see the appendix on appraisal checklists, evidence tables, GRADE and economic profiles).

See also the chapter on linking to other guidance (which also covers using evidence from non-NICE guidance).

4.5 Planning the evidence review

For each guideline evidence review, a review protocol is prepared that outlines the background, the objectives and the planned methods. In addition, the review protocol should make it possible for the review to be repeated by others at a later date. A protocol should also make it clear how equality issues have been considered in planning the review work, if appropriate.

Structure of the review protocol

The protocol should describe any differences from the methods described in this manual (chapters on identifying the evidence: literature searching and evidence submission, reviewing research evidence, and incorporating economic evaluation), rather than duplicating the methodology stated here. It should include the components outlined in the appendix on the review protocol template.

When a guideline is updating a published guideline, the protocol from the published guideline, if available, should be used to outline how the review question would be addressed. Information gathered during surveillance and scoping of the guideline should also be added. This might include new interventions and comparators, and extension of the population.

Process for developing the review protocol

The review protocol should be drafted by the developer, with input from the guideline committee, after the review question has been agreed and before starting the evidence review. It should then be reviewed and approved by NICE staff with responsibility for quality assurance.

All review protocols should consider registering on the PROSPERO database before the data extraction commences, if possible and appropriate. The review protocol and a version of the economic plan (see the section on prioritising questions for further economic analysis in the chapter on incorporating economic evaluation) are published on the NICE website at least 6 weeks before the draft guideline goes out for consultation. Any changes made to a protocol in the course of guideline development should be agreed with NICE staff with responsibility for quality assurance and should be described and updated on the PROSPERO database (if registered).

4.6 References and further reading

Allmark P, Baxter S, Goyder E et al. (2013) Assessing the health benefits of advice services: using research evidence and logic model methods to explore complex pathways. Health and Social Care in the Community 21: 59–68

Altman DG, Vergouwe Y, Royston P et al. (2009) Prognosis and prognostic research: validating a prognostic model. BMJ 338: b605

Cargo M, Harris J, Pantoja T et al. (2017) Cochrane Qualitative and Implementation Methods Group Guidance Series Paper 3: Methods for assessing evidence on intervention implementation. Journal of Clinical Epidemiology doi: 10.1016/j.jclinepi.2017.11.028.

Cargo M, Harris J, Pantoja T et al. (2018) Cochrane Qualitative and Implementation Methods Group Guidance Series Paper 4: Methods for integrating qualitative and implementation evidence within intervention effectiveness reviews. Journal of Clinical Epidemiology 97: 59–69

Centre for Reviews and Dissemination (2009) Systematic reviews: CRD's guidance for undertaking reviews in health care. Centre for Reviews and Dissemination, University of York

Cochrane Diagnostic Test Accuracy Working Group (2008) Cochrane handbook for diagnostic test accuracy reviews. (Updated December 2013). The Cochrane Collaboration

Collins GS, Reitsma JB, Altman DG et al. (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Annals of Internal Medicine 162: 55–63

Craig P, Dieppe P, McIntyre S et al. on behalf of the MRC (2008) Developing and evaluating complex interventions: new guidance. London: Medical Research Council

Craig P, Cooper C, Gunnell D et al. on behalf of the MRC (2011) Using natural experiments to evaluate population health interventions: guidance for producers and users of evidence. London: Medical Research Council

Flemming K, Booth A, Hannes K et al. (2018) Cochrane Qualitative and Implementation Methods Group Guidance Series Paper 6: Reporting guidelines for qualitative, implementation and process evaluation evidence syntheses. Journal of Clinical Epidemiology 97: 79–85

Harden A, Garcia J, Oliver S et al. (2004) Applying systematic review methods to studies of people's views: an example from public health research. Journal of Epidemiology and Community Health 58: 794–800

Harris JL, Booth A, Cargo M et al. (2018) Cochrane Qualitative and Implementation Methods Group Guidance Series Paper 2: Methods for question formulation, searching and protocol development for qualitative evidence synthesis. Journal of Clinical Epidemiology 97: 39–48

Higgins JPT, Thomas J, Chandler J et al., editors (2021) Cochrane handbook for systematic reviews of interventions, version 6.2. The Cochrane Collaboration

Justice AC, Covinsky KE, Berlin JA (1999) Assessing the generalizability of prognostic information. Annals of Internal Medicine 130: 515–24.

Kelly MP, Swann C, Morgan A et al. (2002) Methodological problems in constructing the evidence base in public health. London: Health Development Agency

Kelly MP, Moore TA (2012) The judgement process in evidence-based medicine and health technology assessment. Social Theory and Health 10:1–19

Kirkham JJ, Gorst S, Altman DG et al. (2016) Core Outcome Set–STAndards for Reporting: The COS-STAR Statement. PLoS: 21

Kirkham JJ, Davis K, Altman DG et al. (2017) Core Outcome Set-23 STAndards for Development: The COS-STAD Recommendations. PLoS

Lomas J, Culyer T, McCutcheon C et al. (2005) Conceptualizing and combining evidence for health system guidance: final report. Ottawa: Canadian Health Services Research Foundation

Lord SJ, Irwig L, Simes RJ (2006) When is measuring sensitivity and specificity sufficient to evaluate a diagnostic test, and when do we need randomized trials? Annals of Internal Medicine 144: 850–5

Moons KG, Kengne AP, Grobbee DE et al. (2012) Risk prediction models: II. External validation, model updating, and impact assessment. Heart 98: 691–8.

Moons KGM, Altman DG, Reitsma JB et al. (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Annals of Internal Medicine 162: W1–W73

Moore GF, Audrey S, Barker M et al. (2015) Process evaluation of complex interventions: Medical Research Council guidance. BMJ 2015;350:h1258

Muir Gray JM (1996) Evidence-based healthcare. London: Churchill Livingstone

Noyes J, Booth A, Cargo M et al. (2018) Cochrane Qualitative and Implementation Methods Group Guidance Series Paper 1: Introduction. Journal of Clinical Epidemiology 97: 35–38

Ogilvie D, Hamilton V, Egan M et al. (2005) Systematic reviews of health effects of social interventions: 1. Finding the evidence: how far should you go? Journal of Epidemiology and Community Health 59: 804–8

Ogilvie D, Egan M, Hamilton V et al. (2005) Systematic reviews of health effects of social interventions: 2. Best available evidence: how low should you go? Journal of Epidemiology and Community Health 59: 886–92

Petticrew M (2003) Why certain systematic reviews reach uncertain conclusions. British Medical Journal 326: 756–8

Petticrew M, Roberts H (2003) Evidence, hierarchies, and typologies: horses for courses. Journal of Epidemiology and Community Health 57: 527–9

Popay J, Rogers A, Williams G (1998) Rationale and standards for the systematic review of qualitative literature in health services research. Qualitative Health Research 8: 341–51

Richardson WS, Wilson MS, Nishikawa J et al. (1995) The well-built clinical question: a key to evidence-based decisions. American College of Physicians Journal Club 123: A12–3

Rychetnik L, Frommer M, Hawe P et al. (2002) Criteria for evaluating evidence on public health interventions. Journal of Epidemiology and Community Health 56: 119

Steyerberg EW (2009) Clinical prediction models: a practical approach to development, validation, and updating. Springer

Steyerberg EW, Harrell FE, Borsboom GJJM et al. (2001) Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. Journal of Clinical Epidemiology 54: 774–81

Steyerberg EW, Bleeker SE, Moll HA et al. (2003) Internal and external validation of predictive models: a simulation study of bias and precision in small samples. Journal of Clinical Epidemiology 56: 441–7.

Summarized research for Information Retrieval in HTA (SuRe Info) [online; accessed 13 September 2017]

Tannahill A (2008) Beyond evidence – to ethics: a decision making framework for health promotion, public health and health improvement. Health Promotion International 23: 380–90

Victora C, Habicht J, Bryce J (2004) Evidence-based public health: moving beyond randomized trials. American Journal of Public Health 94: 400–5