Process and methods
4 Developing review questions and planning the evidence review
Review questions and review protocols must be clear and focused and build on the draft questions listed in the scope. They provide the framework for the design of the literature searches, inform the planning, methods and process of the evidence review, and act as a guide for the development of recommendations by the committee. Review protocols may also be used to inform surveillance of guidelines, and future updates (see the chapters on ensuring that published guidelines are current and accurate and updating guideline recommendations).
This chapter describes how review questions and review protocols are developed and agreed. It describes the different types of review question and provides examples. It also provides information on the different types of evidence and how to plan the evidence review. The best approach may vary depending on the topic. Options should be considered by the development team, and the chosen approach discussed and agreed with staff with responsibility for quality assurance. The approach should be documented in the review protocol (see the appendix on review protocol templates) and the guideline, together with any reasons for the choice, if the approach is non-standard.
The number of review questions for each guideline depends on the topic and the breadth of the scope. However, it is important that the total number of review questions:
provides sufficient focus for the guideline, and covers all key areas outlined in the scope
focuses on key questions that are likely to generate useful or needed recommendations
can be completed in the time and with the resources available.
Review questions can vary considerably in terms of both the number of included studies and the complexity of the question and analyses. For example, a single review question might involve a complex comparison of several interventions, many studies that are using different study designs (including systematic reviews), multiple population subgroups, or the intervention itself may be considered complex (see the section on review questions about complex interventions). At the other extreme, a review question might investigate the effects of a single intervention compared with a single comparator and there may be few primary studies or no study meeting the inclusion criteria. The number of review questions for each guideline, and how much time and resources are needed for them, will therefore vary depending on the topic and its complexity, and the nature of the evidence.
The review questions should cover all key areas specified in the scope but should not introduce new areas. They should build on the draft questions in the scope and usually contain more detail. For details on developing the scope, see the chapter on the scope.
Review questions are usually drafted by the development team. They are then refined and agreed with the committee members. This enables the literature search to be planned efficiently. Sometimes the draft questions from the scope need refining before the development of review protocols, or very occasionally after the evidence has been searched. Any such changes to review questions (with reasons) should be agreed with a member of staff with a quality assurance role. All changes should be clearly recorded in the review protocol and evidence review document, so they are auditable.
NICE guidelines should not reiterate or rephrase recommendations from the NICE guidelines on patient experience in adult NHS services, service user experience in adult mental health, people's experience in adult social care services, babies, children and young people's experience of healthcare, shared decision making, or other NICE guidelines on the experience of people using services. However, whether there are specific aspects of views or experiences that need addressing for a topic, should be considered during the scoping of every guideline. Specific aspects identified during scoping should be included in the scope if they are not covered by existing guidelines and are supported as a priority area. These are likely to be topic-specific and should be well-defined and focused.
Review questions should be clear and focused. The exact structure of each question depends on what is being asked. The aims of questions will differ, but are likely to cover at least one of the following:
effectiveness of an intervention or interventions
accuracy or effectiveness of diagnostic tests or test strategies
prognosis of outcomes over time, based on the characteristics of the person using services
predicting an individual prognosis or identifying an individual diagnosis
views and experiences of people using services, family members or carers, or of those commissioning and providing services
epidemiology or aetiology of a disease or condition
equality and health inequalities issues.
The nature and type of review questions determines the type of evidence that is most suitable (Petticrew and Roberts 2003). There are examples of different types of review questions and the type of evidence that might best address them throughout this chapter. When developing review questions, it is important to consider if any additional information is needed for any planned economic modelling. This might include information about quality of life, rates of adverse events in particular populations, and use of health and social care services.
When review questions are being developed, it can sometimes be useful to use a conceptual framework or a logic model (or reuse ones that were developed for the scope). These display pathways through which actions and interventions are expected to lead to differences in outcomes. They can be used to potentially identify which interventions are most likely to be effective when targeted at particular places in a pathway. Conceptual frameworks and logic models can be produced by the development team or taken from previously published literature in the topic area.
Conceptual frameworks or logic models are helpful in many topics and can be particularly useful when:
the evidence reviews for the guideline will be considering complex interventions
the context around an area is complex (for example, interactions with policy, multiple bodies involved in delivering an intervention, or where the commissioning or delivery approaches are unclear or complex).
An understanding of the expected mechanisms by which outcomes occur can help guide which interventions are most worthwhile to study in particular settings or questions.
Conceptual frameworks and logic models are used to aid the committee in developing review questions and protocols that are most likely to produce useful results. However, they are not intended to constrain the recommendations the committee may make in the guideline (for example, the committee is not limited to making recommendations that fit within the causal pathways in the framework or model). Examples of conceptual frameworks and logic models are given in figure 1 and figure 2.
A helpful structured approach for developing questions about interventions is the PICO (population, intervention, comparator and outcome) framework (see box 4.1). The setting for the question should also be specified if relevant.
Population: Which population are we interested in? How best can it be described? Are there subgroups that need to be considered?
Intervention: Which intervention, treatment or approach should be examined?
Comparators: What are the alternatives to the intervention being examined (for example, other interventions, usual care, placebo)?
Outcome: Which outcomes should be considered to assess how well the intervention is working (including outcomes on both benefits and harms)? What is important for people using services? Core outcome sets should be used if suitable based on quality and validity; one source is the COMET database. The Core Outcome Set Standards for Development (COS STAD) and Core Outcome Set Standards for Reporting (COS STAR) should be used to assess the suitability of identified core outcome sets.
For each review question, factors that may affect the outcomes and effectiveness of an intervention, including any wider social factors that may affect health and any health inequalities, should be considered (see the section on considering health inequalities when preparing review protocols). Outcomes (on both benefits and harms) and other factors that are important should be specified in the review protocol. In general, a range of 5 to 9 outcomes should be defined. Guidance on prioritising outcomes is provided by the GRADE working group.
When designing review questions and protocols, it is important to consider possible intercurrent events (events that occur after starting treatment and either preclude the observation of the variable, or affect its interpretation, such as death, non-adherence to treatment or stopping treatment) and how these will be dealt with in any analysis in the guideline, if different identified studies analyse the data in different ways. Clinical trials are increasingly following the Estimand framework (European Medicines Agency), which attempts to increase the clarity of the precise treatment effect that an individual study is estimating.
Examples of review questions on the effectiveness of interventions are shown in box 4.2.
What are the most effective blood pressure targets for reducing the risk of future cardiovascular events in adults with diagnosed primary hypertension and established cardiovascular disease?
What approaches are effective in improving access to and/or engagement with health and social care for people experiencing homelessness?
What pharmacological (antimicrobial and non-antimicrobial) and non-pharmacological interventions are effective in managing acute uncomplicated otitis media?
Which of the harm-reduction services offered by needle and syringe programmes (including advice and information on safer injecting, onsite vaccination services, and testing for hepatitis B and C and HIV) are effective in reducing blood-borne viruses and other infections among people who inject drugs?
When escalating from oxygen therapy, which non-invasive modality is most effective in adults in hospital with suspected or confirmed COVID‑19?
What are the most effective combined approaches to identifying, assessing and monitoring the health, social care and education needs (including changing needs) of disabled children and young people with severe complex needs?
What is the effectiveness of SGLT2 inhibitors for children, young people and adults with chronic kidney disease and type 2 diabetes?
What is the effectiveness of selective laser trabeculoplasty as a first-line treatment compared with intraocular pressure-lowering eyedrops in adults with ocular hypertension or chronic open-angle glaucoma?
For children and young people with complex rehabilitation needs after traumatic injury that involves spinal cord injury, what specific rehabilitation programmes and packages are effective?
What interventions are effective in improving access to diagnosis and treatment services for people with suspected or diagnosed depression, and improve referral from primary to secondary and tertiary levels of care in populations or groups with low uptake in the UK?
Review questions about pharmacological interventions will usually only include medicines with a UK marketing authorisation, based on regulatory assessment of safety and efficacy. Use of a licensed medicine outside the terms of its marketing authorisation (off-label use) may be considered in some circumstances; for example, if this use is common practice in the UK, if there is good evidence for this use, or if there is no other medicine licensed for the indication. Off-label use is particularly common in pregnant women and in children and young people because these groups have often been excluded from clinical trials during medicine development.
Medicines with no UK marketing authorisation for any indication (unlicensed medicines) will not usually be considered in a guideline because there is no UK assessment of safety and efficacy to support their use. Unlicensed medicines may be included in exceptional circumstances, for example in complex conditions when there are no other treatment options. This should be agreed with the medicines adviser responsible for quality assurance (see also the section on recommendations on medicines, including off-label use of licensed medicines).
The Medicines and Healthcare products Regulatory Agency (MHRA) is responsible for ensuring that medicines meet the required standards of safety, that is, expected benefits outweigh risks of harmful effects. Therefore, we take account of national medicines safety advice and do not assess the safety of licensed medicines unless prior agreement has been reached with the MHRA (these discussions can be initiated via the medicines adviser responsible for quality assurance). In particular, we rarely consider review questions that just cover the clinical risk-benefit balance of a drug versus placebo or no treatment for its licensed indication. National medicines safety advice does not require formal quality assessment but should be reported in the evidence review document and used to inform committee discussions. See the chapter on writing the guideline for more information.
When the effectiveness of a medicine is being considered, it is expected that the evidence review will include medicines safety outcomes, such as adverse events. These outcomes are also likely to be included as part of assessing the overall impact on quality of life in any cost-effectiveness analysis, or as part of supporting shared decision making.
Sources of national medicines safety advice include:
If there is a need for additional information on drug safety for a particular topic (for example, pregnancy or breastfeeding) advice on how to obtain this should be sought from the medicines adviser responsible for quality assurance.
When including an off-label use of a licensed medicine, it is usually possible to extrapolate national medicines safety advice, for example, if the population is similar and the recommended dosage is likely to be within the licensed dose range. The approach should be discussed with the committee as early as possible and agreed with the team responsible for quality assurance.
In some circumstances, primary evidence on medicines safety may be needed, for example, for unlicensed medicines, or off-label use of licensed medicines when it is not expected that extrapolating safety information from the licensed uses is appropriate. This should be identified as early as possible, stated in the review protocol, and agreed with the team responsible for quality assurance. It is important to note that different evidence types will often be needed for assessing safety, because both rare serious events and long-term events are important, and are often not captured in research studies.
Randomised controlled trials (RCTs) are the preferred study design for estimating the effects of interventions. This is because randomisation ensures that any differences in known and unknown baseline characteristics between groups are due to chance; blinding (where applied) prevents knowledge of treatment allocation from influencing behaviours; and standardised protocols ensure consistent data collection.
However, RCTs are not always available or may not be sufficient to address the review question of interest.
In such cases or other circumstances where the randomised evidence is insufficient for decision making, non-randomised studies may sometimes be appropriate for estimating the effects of interventions. There are also some published studies that include analysis of large, high-quality primary data sources (such as patient registries). The Medical Research Council (MRC) has produced guidance on evaluating complex interventions (Skivington et al. 2021b) and using natural experiments to evaluate population health interventions (UK research and Innovation [UKRI], 2022). More information on appropriate use for non-randomised evidence to estimate the effects of interventions is given in the NICE real-world evidence framework.
Review questions that consider antimicrobial interventions or interventions that could reduce the use of antimicrobials
Review questions on antimicrobial interventions should take account of the principles of good antimicrobial stewardship – see the NICE guideline on antimicrobial stewardship. Antimicrobial‑sparing interventions (such as self‑care treatments) should be considered for self-limiting conditions (such as sore throat and conjunctivitis) or other conditions where they are relevant.
In line with these principles, the review protocol should include how the following will be considered in the evidence review:
antibiotic choice, duration, dosage and route of administration
reviewing and stepping down treatment, if appropriate (for example, if intravenous or prophylactic antibiotics are included).
Information on antimicrobial resistance can be identified from various sources, for example from:
Any relevant information should be summarised in the evidence review document and does not need quality assessment, see the chapter on writing the guideline for more information.
Studies identified from the literature search may report outcomes that measure resistance or usage (for antimicrobial‑sparing interventions). These outcomes should be routinely included in the review protocol.
In most situations, evidence reviews should aim to identify the most appropriate choice(s) of antibiotics and the optimal duration of treatment, to minimise the risk of antimicrobial resistance (also see the section on wording the recommendations in the chapter on writing the guideline).
Skivington et al. 2021b states: 'An intervention might be considered complex because of properties of the intervention itself, such as the number of components involved; the range of behaviours targeted; expertise and skills required by those delivering and receiving the intervention; the number of groups, settings, or levels targeted; or the permitted level of flexibility of the intervention or its components'.
Review questions about the effectiveness of complex interventions may need additional analyses to understand the complexity of the interventions, or additional types of evidence to answer different aspects of the question such as qualitative research or real-world evidence. For example, additional evidence might address the views of people using the service or intervention, or barriers to use, as reported by practitioners or providers, or the impact on health inequalities. In this case, 2 related review questions (quantitative and qualitative) may be used to address the issues. These different forms of data can be considered separately, or together (integrated) if needed to understand the underlying problem. A review of effectiveness may also include evidence of the intervention's mechanism of action, that is, evidence of how the intervention works. All these examples of questions about the effectiveness of interventions may be addressed by 2 related review questions or by a mixed methods review.
When formulating questions to assess complex interventions, routinely consider whether to factor in the sociocultural acceptability and accessibility of an intervention, as well as contextual factors that impact on intervention feasibility. Qualitative evidence synthesis is one method of exploring these factors (Booth et al. 2019). An extended question framework (PerSPEcTiF) is proposed to recognise these wider issues, while also being particularly suited to qualitative evidence synthesis and complex intervention reviews (see table 21.5a in chapter 21 of the Cochrane Handbook for Systematic Reviews of Interventions). The PerSPEcTiF model of question formulation includes the following elements: Per = Perspective, S = Setting, P = Phenomenon of interest or problem, E = Environment, (c) = Comparison (optional), Ti = Time or timing, F = Finding.
Examples of review questions that include analysis targeting complex interventions are shown in box 4.3.
What intervention components (alone or in combination) and approaches are most effective and acceptable in helping children and young people living with overweight or obesity?
Are psychological interventions with a particular component (or combination of components) effective for people with coronary heart disease in relation to reducing all-cause mortality, cardiac mortality, non-fatal myocardial infarction, total cholesterol, blood pressure, anxiety and depression?
School-based self-management interventions for asthma in children and adolescents: a mixed methods systematic review:
(a) What intervention components and processes are aligned with successful school-based asthma self-management intervention implementation?
(b) What is the effectiveness of school-based interventions for improvement of asthma self-management on children's outcomes?
Community engagement for health via coalitions, collaborations and partnerships (online social media and social networks):
(a) What is the extent of community engagement across design, delivery and evaluation in online social media and online social networking interventions?
(b) What health issues and populations have been studied using online social media and social networking?
(c) How effective are online social networks in improving health and wellbeing and reducing health inequalities?
(d) Do particular programme features (for example, health topic, extent of engagement, population type) account for heterogeneity in effect size estimates across studies?
(e) What processes are aligned with effective interventions?
Review questions about diagnosis are concerned with the performance of a diagnostic test or test strategy to identify the presence of a current condition in people. They begin at the point in the diagnostic process when a professional has already made an initial diagnosis, or diagnoses, based on their clinical judgement. Diagnostic tests to confirm or rule out the initial diagnosis can include history-taking, symptoms, signs, identification tools, laboratory or pathological examination, and imaging tests.
Broadly, there are 2 types of review questions about diagnostic tests:
questions about the diagnostic accuracy (or diagnostic yield) of a test or a number of tests compared individually against a comparator (the reference standard)
questions about the diagnostic accuracy (or diagnostic yield) of a test strategy (such as serial testing) against a comparator (the reference standard).
Questions looking at the accuracy of multivariable diagnostic prediction models are covered in the section on review questions about predicting an individual prognosis or identifying an individual diagnosis.
In studies of the accuracy of a diagnostic test, the results of the test under study (the index test) are compared with those of the best available test (the reference standard). It is important to be clear when deciding on the question what the exact proposed use of the test is (for example, as an identification tool, an initial 'triage' test or after other tests).
The PICTO (population, index test, comparator, target condition and outcome) framework can be useful when formulating review questions about diagnostic test accuracy (see box 4.4). However other frameworks (such as PPIRT; population, prior tests, index test, reference standard, target condition) can be used if helpful.
Population: To which populations would the test be applicable? How can they be best described? Are there subgroups that need to be considered?
Index test: The test or test strategy being evaluated for accuracy.
Comparator or reference standard: The test with which the index test is being compared, usually the reference standard (the test that is considered to be the best available method for identifying the presence or absence of the condition of interest – this may not be the one that is routinely used in practice).
Target condition: The disease, disease stage or subtype of disease that the index test(s) and the reference standard are being used to identify.
Outcome measure: The diagnostic accuracy of the test or test strategy for detecting the target condition. This is usually reported as test parameters, such as sensitivity, specificity, predictive values, likelihood ratios, or – when multiple thresholds are used – a receiver operating characteristic (ROC) curve.
A review question about diagnostic test accuracy is usually best answered by a cross-sectional study in which both the index test and the reference standard are performed on the same sample of people. Cohort and case–control studies are also used to assess the accuracy of diagnostic tests, but these types of study design are more prone to bias (and often result in inflated estimates of diagnostic test accuracy). Further advice on the types of study to include in evidence reviews of diagnostic test accuracy can be found in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy.
Examples of review questions on diagnostic test accuracy are shown in box 4.5.
In people with suspected hypertension, which test is most accurate in identifying whether hypertension is present, as indicated by the reference standard of ambulatory blood pressure measurement?
In adults with diabetes, what are the best clinical predictors or biomarker tests (alone or in combination) to distinguish between diagnosis of type 1 diabetes, type 2 diabetes, and other forms of diabetes?
Which of the following, alone or in combination, constitutes the most accurate pathway for diagnosing prostate cancer: multiparametric MRI; transrectal ultrasonography (TRUS) biopsy; transperineal template biopsy?
In adults, children, and young people from black, Asian and other minority ethnic groups with chronic kidney disease, what is the diagnostic accuracy of estimated glomerular filtration rate (eGFR) calculations?
In people with suspected prostate cancer (with any of the following symptoms – any lower urinary tract symptoms, such as nocturia, urinary frequency, hesitancy, urgency or retention or erectile dysfunction or visible haematuria), what is the diagnostic accuracy of fixed prostate-specific antigen (PSA) test threshold compared to age-adjusted PSA thresholds?
In people with suspected cow's milk allergy, should skin prick tests rather than an oral food challenge with cow's milk be used for diagnosis and management?
What are the symptoms and signs of urinary tract infection (UTI) in babies, children and young people under 16 years old?
Although assessing test accuracy is important for establishing the usefulness of a diagnostic test, the value of a test usually lies in how useful it is in guiding treatment decisions or the provision of services, or supporting shared decision making, and ultimately in improving outcomes. Review questions aimed at establishing the value of a diagnostic test in practice can be structured in a similar way to questions about interventions. The best study design is a test-and-treat RCT. These compare outcomes for people who have a new diagnostic test (in combination with a management strategy) with outcomes of people who have the usual diagnostic test and management strategy. These types of study are not very common so evidence about diagnostic test accuracy is usually also needed.
Information about prognosis can be used within guidelines to:
classify people into risk categories (for example, cardiovascular risk or level of need) so that different interventions or preventative strategies can be applied
define subgroups of populations that may respond differently to interventions
identify factors that can be used to adjust for case mix (for example, in investigations of heterogeneity)
help determine longer-term outcomes not captured within the timeframe of a trial (for example, for use in an economic model).
Review questions about prognosis address the likelihood of an outcome for an individual person from a population at risk for that outcome, based on the presence of a proposed prognostic factor or factors (see box 4.6). A helpful structured approach for developing questions about prognosis is the PICOTS (population, index prognostic factor, comparator prognostic factors, outcome, timing, setting) framework.
Review questions about prognosis may be closely related to questions about aetiology (cause of a disease or need) if the outcome is viewed as the development of the disease or need based on a number of risk factors.
Questions looking at the accuracy of multivariable prognostic prediction models are covered in the section on review questions about predicting an individual prognosis or identifying an individual diagnosis.
What is the best combination of measures of kidney function and markers of kidney damage to identify increased risk of progression in adults, children and young people with chronic kidney disease?
What are the factors (for example, mental health problems, substance misuse, medication that may cause impulse control disorders) that may increase the chance of a person participating in harmful gambling?
A review question about prognosis is best answered using a prospective cohort study with multivariable analysis. Case-control studies and cross-sectional studies are not usually suitable for answering questions about prognosis because they do not estimate baseline risk; they give only an estimate of the likelihood of the outcome for people with and without the prognostic factor. When developing a review question on prognosis, it is also important to consider possible confounding factors and whether some prognostic factors are modifiable and others non-modifiable.
Statistical analyses can be used to develop prediction models for a specific diagnosis or prognosis. These models are usually developed using multivariable modelling methods. Multivariable prediction models are developed to help healthcare professionals estimate the probability or risk that a specific disease or condition is present (diagnostic prediction models) or that a specific event will occur in the future (prognostic prediction models). They are usually developed using a multivariable model – a mathematical equation that relates multiple predictors for a particular person to the probability of or risk for the presence (diagnosis) or future occurrence (prognosis) of a particular outcome. Other names for a prediction model include risk prediction model, predictive model, prognostic (or prediction) index or rule, and risk score.
Diagnostic prediction models can be used to inform who should be referred for further testing, whether treatment should be started directly, or to reassure patients that a serious cause for their symptoms is unlikely. Prognostic prediction models can be used for planning lifestyle or treatment decisions based on the risk for developing a particular outcome or state of health in a given period.
Prediction model studies can be broadly categorised into those that develop models, those that validate models (with or without updating the model) and those that do both. Studies that report model development aim to derive a prediction model by selecting the relevant predictors and combining them statistically into a multivariable model. Logistic and Cox regression are most frequently used for short-term (for example, disease absent versus present, 30‑day mortality) outcomes and long-term (for example, 10‑year risk) categorical outcomes, respectively. Studies may also focus on quantifying how much value a specific predictor (for example, a new predictor) adds to the model. Outcomes that are important should be agreed by the committee and specified in the review protocol.
Quantifying the predictive ability of a model using the same data from which the model was developed (often referred to as apparent performance) tends to overestimate performance. Studies reporting the development of new prediction models should always include some form of validation to quantify any optimism in the predictive performance (for example, calibration and discrimination). There are 2 types of validation: internal validation and external validation. Internal validation uses only the original study sample with methods such as bootstrapping. External validation evaluates the performance of the model with data not used for model development. The data may be collected by the same investigators or other independent investigators, typically using the same predictor and outcome definitions and measurements, but using a different sample (for example, from a later time period). If validation indicates poor performance, the model can be updated or adjusted on the basis of the validation data set. For more information on validating prediction models, see Steyerberg et al. 2001, 2003, 2009; Moons et al. 2012; Altman et al. 2009; and Justice et al. 1999.
For more information on developing review questions about prediction models, see the TRIPOD statement in the Annals of Internal Medicine and the TRIPOD statement: explanation and elaboration in the Annals of Internal Medicine.
Although assessing predictive accuracy is important for establishing the usefulness of a prediction model, the value of a prediction model lies in how useful it is in guiding treatment or management decisions, or the provision of services, and ultimately in improving outcomes. Review questions aimed at establishing the value of a prediction model in practice, for example, to compare outcomes of people who were identified from a prediction model (in combination with a management strategy) with outcomes of people who were identified opportunistically (in combination with a management strategy) can be structured in the same way as questions about interventions.
Diagnostic prediction models
Which scoring tools for signs and symptoms (including Centor and FeverPAIN) are most accurate in predicting sore throat caused by group A beta-haemolytic streptococcus infection in primary care?
What is the accuracy of clinical prediction models and tools (clinical history, cardiovascular risk factors, physical examination) in evaluating stable chest pain of suspected cardiac origin?
Prognostic prediction models
What is the effectiveness of prediction tools for identifying women at risk of pelvic floor dysfunction?
Are kidney failure prediction equations good predictors of progression, kidney failure or end-stage renal disease?
What risk assessment or prediction tool best identifies people with multiple conditions who are at risk of unplanned hospital admission?
What risk tool best identifies people with type 2 diabetes who are at risk of reduced life expectancy?
Which risk assessment tools are the most accurate in predicting the risk of fragility fracture in adults with osteoporosis or previous fragility fracture?
What factors and baseline characteristics are accurate in predicting positive treatment outcomes in people with pancreatic cancer?
In people with localised or locally advanced prostate cancer, which risk stratification models, tools and categorising systems perform better in indicating risk of poor outcomes?
In people with stable chronic obstructive pulmonary disease, does routine assessment using a multidimensional severity assessment index (such as BODE [BMI, airflow obstruction, dyspnoea/ breathlessness and exercise capacity]) better predict outcomes than forced expiratory volume in 1 second (FEV1) alone?
In some circumstances, specific questions should be formulated about the views and experience of people using services, family members and carers, and the public. The views and experiences of those commissioning and providing services may also be relevant. Qualitative questions do not have to be linked to an effectiveness question; they can stand alone in a single evidence review. Qualitative questions can cover a range of dimensions, including:
views and experiences of people using or providing services, family members or carers or the public on the effectiveness and acceptability of interventions
preferences of people using services, family members or carers or the public for different treatment or service options, including the option of foregoing treatment or care
views and experiences of people using or providing services, family members or carers or the public on what constitutes a desired, appropriate or acceptable outcome
elements of care or a service that are of particular importance to people using or providing services
factors that encourage or discourage people from using interventions or services
the specific needs of certain groups of people using services, including those sharing the characteristics protected by the Equality Act (2010)
information and support needs specific to the topic
which outcomes reported in studies of interventions are most important to people using services, family members or carers or the public
health and care inequalities.
As for other types of review question, questions that are broad and lack focus (for example, 'What is the experience of living with condition X?') should not be asked. The question should be more focused by including the phenomenon of interest such as acceptability, accessibility, preferences, information and support needs, feasibility or implementation.
The PICO (Population, phenomena of Interest, Context) framework and the SPIDER framework in McMaster University's National Collaborating Centre for Methods and Tools registry are examples of frameworks that can be used to structure qualitative evidence synthesis (QES) review questions. Examples of QES review questions are shown in box 4.8.
What works well, and what could be improved, about access to, engagement with and delivery of health and social care for people experiencing homelessness?
What is the experience of disabled children and young people with severe complex needs and their families and carers of joint delivery of health, social care and education services?
What are the barriers and facilitators to, and key aspects of (including systems and processes), the successful implementation or delivery of mental wellbeing interventions, programmes, policies or strategies at work?
What factors influence the acceptability of, access to, and uptake of cardiac rehabilitation services?
What are the views and experiences of health, social care and other practitioners about the practicality and implementation of home-based intermediate care?
What are the barriers and facilitators to identifying children and young people at risk of poor social, emotional and mental wellbeing?
While qualitative studies often answer a review question about the views or experiences of people using or providing services, family members or carers or the public, other options include quantitative patient preference studies. Information on views and experiences is also becoming increasingly available as part of some intervention studies, for example, collecting qualitative data from trial participants. When qualitative and quantitative data are generated from the same trial in this way, they are referred to as 'trial siblings'. For more information see chapter 21 of the Cochrane Handbook for Systematic Reviews of Interventions.
When there is a lack of evidence on issues important to people affected by the guideline (including families and carers, where appropriate), the development team should consider seeking additional information outside of that from formal literature searching. This could be through a call for evidence (see the section on calls for evidence from stakeholders in the chapter on identifying the evidence: literature searching and evidence submission). It could also be through additional consultation, commissioned primary research, or by approaching experts who may have access to additional data sources (such as surveys of people's views and experiences) or who may be able draw on their experience of working in the field. For more information, see:
For further information on mixed methods reviews, see chapter 8 of the JBI Manual for Evidence Synthesis.
Lizarondo et al. 2020 states: 'The core intention is to combine quantitative and qualitative data (from primary studies) or integrate transformed quantitative evidence and qualitative evidence to create a breadth and depth of understanding that can confirm or dispute evidence and ultimately answer the review question/s posed'.
'Dependent on the nature of the review question mixed methods systematic reviews may allow for:
an examination of the degree of agreement between quantitative and qualitative data to validate or triangulate results/findings,
identification of discrepancies within the available evidence,
determination of whether the quantitative and qualitative data address different aspects of a phenomenon of interest, and
one type of data that can explore, contextualize or explain the findings of the other type of data.'
A mixed method approach is only needed when a question cannot be answered by separate quantitative or qualitative evidence synthesis and when multiple perspectives are needed to understand the underlying problem. For example, if the method of intervention delivery affects whether people engage with it and that has an impact on its effectiveness. Another example is when integrating the 2 types of evidence provides more evidence or explanation than the separate quantitative and qualitative evidence reviews. Reasons to use a mixed methods review may include:
to use qualitative data (such as barriers and facilitators or explanatory factors from peoples' experiences of having or giving an intervention) to:
explain quantitative results (for example if the mechanisms of action behind an intervention are unclear, or very different results are seen in different studies or populations), or
supplement quantitative evidence when it is limited (for example in certain populations).
when results are inconclusive, for example if a systematic review of quantitative data finds no effects, but there is a plausible mechanism of action and it is unclear why the intervention shows no effect. It should be noted that:
inconclusive results may arise because other outcomes may be more important than those assessed,
plausible mechanisms do not always translate into real-world effects or
people with experience of the intervention may be able to offer other explanations for an intervention's lack of impact.
to better contextualise results, for example to understand how to reach certain populations (this may include removing structural barriers, or other things potentially relevant to health inequalities such as lower uptake in some groups than others, including why and how can it be addressed)
when it is unclear who the intervention is effective for and when it is most effective, and not enough data can be gathered from subgroup analysis or meta-regression (for example, the more sensitive the intervention is to context, the more important it may be to look at the context for where, when and in whom it is most effective)
when the implementation of the intervention needs to be considered, either because:
of previous evidence that implementation is complicated and inconsistent leading to mixed use or differential uptake across the country, or
implementation is likely to be complicated and a recommendation may be needed to provide guidance on what could be useful to achieve this.
Using a mixed methods approach requires integration of evidence. That is:
quantitative and qualitative data are synthesised separately and juxtaposed so that different dimensions of a phenomenon (qualitative) may explain the outcomes of the quantitative synthesis (convergent segregated), or
when both quantitative and qualitative data can answer a single question and so data are transformed, and quantitative and qualitative studies synthesised simultaneously (convergent integrated).
When developing a mixed methods review question, it is helpful to consider a number of things to help guide the method to take, such as sequence of synthesis, approach to integration and the nature of the question (for more information, see sections 8.2 and 8.3 in chapter 8 of the JBI Manual for Evidence Synthesis).
Examples of mixed methods review questions are shown in box 4.9.
What is the effectiveness of integrated working among registered social workers and other practitioners to support adults with complex needs, and based on their views and experiences what are the barriers to integrated working?
(a) What is the effectiveness of integrated working among registered social workers and other practitioners to support adults with complex needs?
(b) Based on the views and experiences of everyone involved, what are the facilitators and barriers to integrated working between registered social workers and other practitioners to support adults with complex needs?
What is the efficacy of telehealth and mobile health interventions and what are the benefits and challenges of these interventions in patients with inflammatory bowel disease?
(a) Are telehealth and mobile health interventions effective in improving the health-related outcomes of adults with IBD?
(b) What are the perceived benefits and challenges of telehealth and mobile health interventions by adults with IBD?
What are the effects of clinical supervision of healthcare professionals on organisational outcomes?
(a) What are healthcare professionals' experiences, views, and opinions regarding clinical supervision as it relates to organisational processes and outcomes?
(b) What can be inferred from the qualitative synthesis of healthcare professionals' experiences or views that can explain the effects of clinical supervision or inform its appropriateness and acceptability for health professionals?
End-of-life care preferences of older patients with multimorbidity: willingness to receive life-sustaining treatments, place of care, and shared decision-making processes (González-González 2021)
Lifestyle interventions through participatory research: a mixed methods systematic review of alcohol and other breast cancer behavioural risk factors: what works and how? (Thomas 2022)
What is the clinical effectiveness of self-management in adolescents with asthma, and what factors are perceived by them as important to maintain adherence to their self-management plan? (Lizarondo 2021)
Guidelines sometimes cover areas of service delivery. These might include the relative effectiveness of different models of service delivery, how delivery of services could improve, how delivery of services impact on health inequalities, or what the core components of services are and how different components could be re‑configured.
In people with hip fracture what is the effectiveness of hospital-based multidisciplinary rehabilitation on the following outcomes: functional status, length of stay in secondary care, mortality, place of residence or discharge, hospital readmission and quality of life?
What is the effectiveness of surgeon seniority (consultant or equivalent) in reducing the incidence of mortality, the number of people requiring reoperation, and poor outcome in terms of mobility, length of stay, wound infection and dislocation?
What are the best service models to support the identification of people who may be entering their last year of life?
What are the most effective approaches and activities to normalise shared decision making in the healthcare system?
What are the most effective service models for weight management services that would improve uptake in population groups with low uptake?
What types of needle and syringe programmes (including their location and opening times) are effective?
What regional or city-level commissioning models, service models, systems and service structures are effective in:
reducing diagnostic delay for tuberculosis (TB)
improving TB contact tracing
improving TB treatment completion?
A review question about the effectiveness of service delivery models is usually best answered by a pragmatic RCT, if it is feasible to do one. However, a wide variety of methodological approaches and study designs can be used, including non-randomised studies that report observational data (including routine healthcare and audit data), experimental and qualitative evidence. Other types of questions on service delivery are also likely to be answered using evidence from study types other than RCTs. For example, to determine whether an intervention will work for a particular subgroup or setting that does not have specific evidence from an RCT, we might want to know how the intervention works, which will require evidence of the relevant underlying mechanisms.
Depending on the type of review questions, the PICO framework may be appropriate but other frameworks can be used.
When a topic includes review questions on service delivery, approaches described in the chapter on incorporating economic evaluation and the appendix on service delivery – developing review questions, evidence reviews and synthesis may be used. Such methods should be agreed with staff with responsibility for quality assurance and should be clearly documented in the guideline.
Some epidemiological reviews describe the problem under investigation and can be used to inform other review questions. For example, an epidemiological review of incidence or prevalence of a condition would provide baseline data for further evidence synthesis, an epidemiological review of accidents would provide information on the most common accidents, as well as morbidity and mortality statistics, and data on inequalities in the impact of accidents. These review questions may also be necessary to provide input data for economic modelling.
What are the patterns of physical activity among children from different populations and of different ages in England?
Which populations of children are least physically active and at which developmental stage are all children least physically active?
Which population groups are disproportionately affected by type 2 diabetes mellitus?
What is the incidence of Lyme disease in the UK?
The structure of the question and the type of evidence will depend on the aim of the review.
Other epidemiological reviews describe relationships between epidemiological factors and outcomes – a review on associations. If an epidemiological review has been carried out, information will have been gathered from observational studies on the nature of the problem. However, further analysis of this information – in the form of a review on associations – may be needed to establish the epidemiological factors associated with any positive or negative behaviours or outcomes.
What factors are associated with children's or young people's physical activity and how strong are the associations?
What physiological and aetiological factors are associated with coeliac disease?
What physical, environmental and sociological factors are associated with the higher prevalence of multiple sclerosis in European countries?
What factors are associated with a higher mortality rate from breast cancer in people from the most deprived quintile?
In most NICE guidelines, it is expected that considerations around cost effectiveness will be included for all review questions. Therefore, it is not necessary to explicitly mention cost effectiveness in the review question itself. If some review questions for a guideline will not consider cost effectiveness, it should be specified which questions will and will not be including these considerations. For more information on review questions that consider cost effectiveness, see the chapter on incorporating economic evaluation.
For each guideline evidence review, a review protocol is prepared that outlines the background, the objectives and the planned methods. In addition, the review protocol should make it possible for the review to be repeated by others. A protocol should also make it clear how equality and health inequalities issues have been considered in planning the review work (see the section on considering health inequalities when preparing review protocols).
The protocol should describe any differences from the methods described in this manual (see the chapters on identifying the evidence: literature searching and evidence submission, reviewing evidence, and incorporating economic evaluation), rather than duplicating the manual.
Templates for the 4 common review types (intervention, diagnostic accuracy, qualitative and prognosis) are available in the appendix on the review protocol templates. These templates can be amended to match the specifics of an individual review question. For reviews not covered by these templates, a similar level of detail should be provided in the review protocol.
When a guideline is updating an evidence review from a published NICE guideline, the protocol from the published guideline, if available, should be used as a starting point for developing the new review protocol. It should be updated based on any changes since the original protocol was developed (such as new interventions and comparators, and extensions of the population). The level of changes needed is likely to depend on the length of time since the original protocol was developed. No more changes than necessary should be made, as the closer the protocol remains to the original, the easier it will be to reuse data extracted in that original review.
The review protocol should be drafted by the development team, with input from the guideline committee, and then reviewed and approved by staff with responsibility for quality assurance. This should take place after agreeing the review question and before starting the evidence review.
Although an original systematic review of primary studies is the most common method to answer a review question, there are several other possible alternatives that should be considered, such as:
Making use of a previously published systematic review or qualitative evidence synthesis. This review could either be used without further modification, or as a starting point for additional work (for example, to include studies published after the review search date or additional outcomes that may be relevant to the guideline but were not included in the original review). See the section on existing systematic reviews for more details.
Doing a review of reviews. This involves doing a systematic search for published systematic reviews or qualitative evidence syntheses and using these reviews as the included evidence. No further analysis is done on the primary studies in those reviews.
Using formal consensus methods, such as Delphi panels or nominal group technique. These techniques can be used instead of doing formal evidence searches, or as a way to interpret the evidence found from these searches (for example, if a large volume of lower quality evidence is available).
Using informal committee consensus to make recommendations, without searching for evidence first. This is only likely to be suitable in situations where the development team is confident that no evidence is likely to exist that would help inform recommendations.
Adapting recommendations from previously published guidelines, either other NICE guidelines or guidelines from other organisations that are assessed to be sufficiently high quality using the AGREE II instrument.
Doing primary analysis of real-world data (such as routinely collected NHS or registry data). The NICE real-world evidence framework provides advice on situations where this type of analysis may be appropriate and outlines best practices for identifying and assessing data sources and doing the analysis. For questions on effectiveness of interventions such analyses are likely to be undertaken when randomised evidence is not available or sufficient to address the research question of interest, while for other question types (such as prognostic or epidemiological) this may represent the optimal type of evidence.
Using calls for evidence and expert witnesses to obtain evidence that may not be available from standard literature searches (see the section on other attendees at committee meetings in the chapter on decision-making committees, and the appendix on call for evidence and expert witnesses).
More than one of these methods may need to be used for some review questions, if different parts of the question need different approaches. It will often not be possible to describe evidence reviews using any of these methods in the format of a standard review protocol. When this is not possible a narrative description of the review plan can be produced instead. This should clearly describe the planned approach to the review, and the reason why this approach was preferred over doing an original review of published primary data.
When considering which of these potential approaches to use, it is important to consider the trade-off between the optimal evidence to address a question, and the additional time and resources needed to gather that evidence. In particular, it should be considered whether any additional work is likely to lead to different recommendations being made.
All review protocols should consider registering on the PROSPERO database before the data extraction commences, if possible and appropriate. The review questions are published on the NICE website at least 6 weeks before consultation on the draft recommendations. Any changes made to a protocol during guideline development should be agreed with staff who have responsibility for quality assurance and the version on the website updated. Any deviations from the signed-off and published protocol should be clearly stated and justified in the evidence review document. If protocols are published anywhere else (for example, on the PROSPERO database) development teams should ensure the versions of the protocol are consistent.
Often reviews of quantitative or qualitative studies (secondary evidence) already exist (for example, those developed by internationally recognised producers of systematic reviews such as Cochrane, the Campbell Collaboration and the Joanna Briggs Institute among others). Existing reviews may include systematic reviews (with or without a meta-analysis or individual participant data analysis) and non-systematic literature reviews and meta-analyses. Well-conducted systematic reviews may be of particular value as sources of evidence (see the appendix on appraisal checklists, evidence tables, GRADE and economic profiles for checklists to assess risk of bias or quality of studies when developing guidelines). Some reviews may be more useful as background information or as additional sources of potentially relevant primary studies. This is because they may:
not cover inclusion and exclusion criteria relevant to the guideline topic's referral and parameters (for example, comparable research questions, relevant outcomes, settings, population groups or time periods)
group together different outcome or study types
include data that are difficult or impossible to separate appropriately
not provide enough data to develop recommendations (for example, some reviews do not provide sufficient detail on specific interventions making it necessary to refer to the primary studies).
Conversely, some high-quality systematic reviews (as assessed using the checklists recommended in the appendix on appraisal checklists, evidence tables, GRADE and economic profiles) may provide enhanced data not available in the primary studies. For example, authors of the review may have contacted the authors of the primary studies or other related bodies in order to include additional relevant data in their review, or may have undertaken additional analyses (such as individual participant data analyses). In addition, if high-quality reviews are in progress (protocol published) at the time of development of the guideline, the development team may choose to contact the authors for permission to access pre‑publication data for inclusion in the guideline (see the appendix on call for evidence and expert witnesses).
Systematic reviews can also be useful when developing the scope and when defining review questions, outcomes and outcome measures for the guideline evidence reviews. The discussion section of a systematic review can also help to identify some of the limitations or difficulties associated with a topic, for example, through a critical appraisal of the limitations of the evidence. The information specialists may also wish to consider the search strategies of high-quality systematic reviews. These can provide useful search approaches for capturing different key concepts. They can also provide potentially useful search terms and combinations of terms, which have been carefully tailored for a range of databases.
High‑quality systematic reviews that are directly applicable to the guideline review question can be used as a source of data instead of doing an original review. In such circumstances it can sometimes be beneficial to contact and collaborate with the authors of the original review, because it can be more efficient to share data rather than extract it from the published study.
When considering using results from an existing high-quality review, an assessment should be made of whether the parameters (for example, research question, PICO, inclusion and exclusion criteria) of the review are sufficiently similar to the review protocol of the guideline review question. If they are, the development teams and the committee should make a judgement on whether it is necessary to do an additional search for primary studies published after the search date covered by the existing review.
When developing review protocols it is important to identify any health inequalities that may be relevant to the review question. This should involve identifying any issues from the equality and health inequalities assessment that are particularly relevant to the particular question, as well as documenting any new issues identified by the committee.
The committee will need to consider these issues and any gaps in the evidence when interpreting the result of the review. However, in some circumstances it may be advisable to make modifications to the review protocol to ensure health inequalities are appropriately addressed, for example by:
including relevant subgroups
including outcomes that may be correlated to or explain inequalities (for example, including adherence as an outcome if this is a possible mechanism by which health inequalities are generated or exacerbated)
including a wider range of study types if there are reasons to believe some groups are systematically excluded from a particular study design but will be included in others.
Depending on the nature of the guideline topic and the review question, other sources of relevant evidence such as reports, audits or service evaluations from the published or grey literature may be included. Often these will not need identifying from a systematic literature search (for example, if there is a national organisation responsible for producing reports on a particular subject). This should be agreed with staff who have responsibility for quality assurance and documented in the review protocol. When it is necessary to assess the quality, reliability and applicability of this evidence, it should be assessed according to standard processes (see the appendix on appraisal checklists, evidence tables, GRADE and economic profiles).
See also the chapter on linking to other guidance (which also covers using evidence from non-NICE guidance).
Altman DG, Vergouwe Y, Royston P et al. (2009) Prognosis and prognostic research: validating a prognostic model. BMJ 338: b605
Booth A, Noyes J, Flemming K, et al. Formulating questions to explore complex interventions within qualitative evidence synthesis. BMJ Glob Health 2019;4:e001107
Collins G, Reitsma J, Altman D et al. (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Annals of Internal Medicine 162: 55–63
Craig P, Dieppe P, McIntyre S et al. on behalf of the MRC (2008) Developing and evaluating complex interventions: the new Medical Research Council guidance. London: Medical Research Council
Craig P, Cooper C, Gunnell D et al. on behalf of the MRC (2011) Using natural experiments to evaluate population health interventions: new Medical Research Council guidance. London: Medical Research Council
Davis S, Ross H (2021) Telehealth and mobile health interventions in adults with inflammatory bowel disease: A mixed‐methods systematic review. Research in nursing & health. Issue: 1. p.155-172
González-González A, Schmucker C, Nothacker J et al. (2021) End-of-Life Care Preferences of Older Patients with Multimorbidity: A Mixed Methods Systematic Review. J. Clin. Med. 10:91
Higgins JPT, Thomas J, Chandler J et al., editors (2022) Cochrane Handbook for Systematic Reviews of Interventions, version 6.2. The Cochrane Collaboration
Justice AC, Covinsky KE, Berlin JA (1999) Assessing the generalizability of prognostic information. Annals of Internal Medicine 130: 515–24
Kirkham JJ, Gorst S, Altman DG et al. (2016) Core Outcome Set–STAndards for Reporting: The COS-STAR Statement. PLoS: 21
Kirkham JJ, Davis K, Altman DG et al. (2017) Core Outcome Set-STAndards for Development: The COS-STAD Recommendations. PLoS: 23
Lizarondo L, Stern C, Carrier J et al. (2020) Chapter 8: Mixed methods systematic reviews. In: Aromataris E, Munn Z (Editors), JBI Manual for Evidence Synthesis. JBI Downloaded 21-04-2022.
Martin P, Lizarondo L, Kumar S, Snowdon D (2021) Impact of clinical supervision on healthcare organisational outcomes: A mixed methods systematic review. PLoS ONE 16(11): e0260156
Moons KG, Kengne AP, Grobbee DE et al. (2012) Risk prediction models: II. External validation, model updating, and impact assessment. Heart 98: 691–8
Moons KGM, Altman DG, Reitsma JB et al. (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Annals of Internal Medicine 162: W1–W73
Noyes J, Booth A, Cargo M et al. (2022) Chapter 21: Qualitative evidence. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.3. Cochrane.
Petticrew M, Roberts H (2003) Evidence, hierarchies, and typologies: horses for courses. Journal of Epidemiology and Community Health 57: 527–9
Riley R, Moons K, Snell K et al. (2019) A guide to systematic review and meta-analysis of prognostic factor studies | The BMJ. BMJ 364:k4597
Skivington K, Matthews L, Simpson S et al. (2021a) Framework for the development and evaluation of complex interventions: gap analysis, workshop and consultation-informed update. Health Technol Assess 25:57
Skivington K, Matthews L, Simpson S et al. (2021b). A new framework for developing and evaluating complex interventions: update of Medical Research Council guidance. BMJ 374:n2061
Steyerberg E (2009) Clinical prediction models: a practical approach to development, validation, and updating. Springer
Steyerberg E, Harrell F, Borsboom G et al. (2001) Internal validation of predictive models - efficiency of some procedures for logistic regression analysis. Journal of Clinical Epidemiology 54: 774–81
Steyerberg EW, Bleeker SE, Moll HA et al. (2003) Internal and external validation of predictive models: A simulation study of bias and precision in small samples. Journal of Clinical Epidemiology 56: 441–7
Thomas J, Miller E, Ward P. (2022) Lifestyle Interventions through Participatory Research: A Mixed-Methods Systematic Review of Alcohol and Other Breast Cancer Behavioural Risk Factors. Int. J. Environ. Res. Public Health 19:980
Thomas J, Petticrew M, Noyes J et al. (2022) Chapter 17: Intervention complexity. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.3. Cochrane.
Viswanathan M, Melissa L, McPheeters Met et al. (2017) AHRQ series on complex intervention systematic reviews-paper 4: selecting analytic approaches. Journal of Clinical Epidemiology, 90:28