6 Reviewing evidence

NICE process and methods

6.1 Identifying and selecting relevant evidence
6.2 Assessing evidence: critical appraisal, analysis, and certainty in the findings
6.3 Equality and diversity considerations
6.4 Health inequalities
6.5 Summarising evidence
6.6 Presenting evidence for reviews other than reviews of primary studies
6.7 References and further reading

6 Reviewing evidence

Reviewing evidence is an explicit, systematic and transparent process that can be applied to both quantitative (experimental and observational) and qualitative evidence (see the chapter on developing review questions and planning the evidence review). The key aim of any review is to provide a summary of the relevant evidence to ensure that the committee can make fully informed decisions about its recommendations. This chapter describes how evidence is reviewed in the development of guidelines.

Evidence reviews for NICE guidelines summarise the evidence and its limitations so that the committee can interpret the evidence and make appropriate recommendations, even where there is uncertainty.

Most of the evidence reviews for NICE guidelines will be presenting syntheses of evidence from systematic literature searches for primary research studies. Evidence identified during these literature searches and from other sources (see the chapter on identifying the evidence: literature searching and evidence submission) should be reviewed against the review protocol to identify the most appropriate information to answer the review questions. The evidence review process used to inform guidelines must be explicit and transparent, and involves 8 main steps:

writing the review protocol (see the section on planning the evidence review in the chapter on developing review questions and planning the evidence review)
identifying and selecting relevant evidence (including a list of excluded studies with reasons for exclusion)
critical appraisal (assessing the study design and its methods)
extracting relevant data
synthesising the results (including statistical analyses such as meta-analysis)
assessing quality and certainty in the evidence
interpreting the results
considering health inequalities.

Any substantial deviations from these steps need to be agreed, in advance, with staff with responsibility for quality assurance. Additional considerations for reviews using alternative methods not based primarily on literature reviews of primary studies (such as formal consensus methods, adapting recommendations from other guidelines or primary analyses of real-world data) are discussed in the section on presenting evidence for reviews other than reviews of primary studies.

For all evidence reviews and data synthesis, it is important that the method used to report and evaluate the evidence is easy to follow. It should be written up in clear English and any analytical decisions should be clearly justified.

Updating previous NICE reviews

In many cases, the evidence reviews will be an update of a previous review we've done on the same or a similar topic, to include more recently published evidence. In these cases, a judgement should be made on what elements of the previous review can be reused, and which need to be redone, based on the level of similarity between the original and new review questions, protocols and methods. Examples of elements that can be considered for reuse include:

literature searches and literature search results
evidence tables for included studies
critical appraisal of included studies
data extraction and meta-analysis
previously identified information on equalities and health inequalities.

6.1 Identifying and selecting relevant evidence

The process of selecting relevant evidence is common to all evidence reviews based on systematic literature searches; the other steps are discussed in relation to the main types of review question. The same rigour should be applied to reviewing all data, whether fully or partially published studies or unpublished data supplied by stakeholders. Care should be taken to identify and remove multiple reports of the same study to prevent double-counting.

Published studies

Titles and abstracts of the retrieved citations should be screened against the inclusion criteria defined in the review protocol, and those that do not meet these should be excluded. A percentage should be screened independently by 2 reviewers (that is, titles and abstracts should be double-screened). The percentage of records to be double-screened for each review should be specified in the review protocol.

If reviewers disagree about a study's relevance, this should be resolved by discussion or by recourse to a third reviewer. If, after discussion, there is still doubt about whether or not the study meets the inclusion criteria, it should be retained. If there are concerns about the level of disagreement between reviewers, the reasons should be explored, and a course of action agreed to ensure a rigorous selection process. A further proportion of studies should then be double-screened to validate this new process until appropriate agreement is achieved.

Once the screening of titles and abstracts is complete, full versions of the selected studies should be obtained for assessment. As with title and abstract screening, a percentage of full studies should be checked independently by 2 reviewers, with any differences being resolved and additional studies being assessed by multiple reviewers if sufficient agreement is not achieved. Studies that fail to meet the inclusion criteria once the full version has been checked should be excluded at this stage.

The study selection process should be clearly documented and include full details of the inclusion and exclusion criteria. A flow chart should be used to summarise the number of papers included and excluded at each stage and this should be presented in the evidence review document (see the PRISMA statement). Each study excluded after checking the full version should be listed, along with the reason for its exclusion. Reasons for study exclusion need to be sufficiently detailed for people to be able to understand the reason without needing to read the original paper (for example, avoid stating only that 'the study population did not meet that specified in the review protocol', but also include why it did not match the protocol population).

Priority screening

Priority screening refers to any technique that uses a machine learning algorithm to enhance the efficiency of screening. Usually, this involves taking information on previously included or excluded papers, and using this to order the unscreened papers from those most likely to be included to those least likely. This can be used to identify a higher proportion of relevant papers earlier in the screening process, or to set a cut‑off for manual screening, beyond which it is unlikely that additional relevant studies will be identified.

There is currently no published guidance on setting thresholds for stopping screening where priority screening has been used. Any methods used should be documented in the review protocol and agreed in advance with the team with responsibility for quality assurance. Any thresholds set should, at minimum, consider the following:

the number of references identified so far through the search, and how this identification rate has changed over the review (for example, how many candidate papers were found in each 1,000 screened)
the overall number of studies expected, which may be based on a previous version of the guideline (if it is an update), published systematic reviews, or the experience of the guideline committee
the ratio of relevant/irrelevant records found at the random sampling stage (if undertaken) before priority screening.

The actual thresholds used for each review question should be clearly documented, either in the guideline methods chapter or in the evidence review documents. Examples of how this has been implemented can be found in NICE's guidelines on autism spectrum disorders in under 19s and prostate cancer.

Ensuring relevant records are not missed

Regardless of the level of double-screening, and whether or not priority screening was used, additional checks should always be made to reduce the risk that relevant studies are not identified. These should include, at minimum:

checking reference lists of identified systematic reviews, even if these reviews are not used as a source of primary data
checking with the guideline committee that they are not aware of any relevant studies that have been missed
looking for published papers associated with any key trial registry entries or published protocols that have been identified.

It may be useful to test the sensitivity of the search by checking that it picks up known studies of relevance.

Conference abstracts

Conference abstracts seldom contain enough information to allow confident judgements about the quality and results of a study. It can be difficult to trace the original studies or additional data, and the information found may not always be useful. Also, good-quality studies will often publish full text papers after the conference abstract, and these will be identified by routine searches. Conference abstracts should therefore not routinely be included in the search strategy and review, unless there are good reasons for doing so. If a decision is made to include conference abstracts for a particular review, the justification for doing so should be clearly documented in the review protocol. If conference abstracts are searched for, the investigators may be contacted if additional information is needed to complete the assessment for inclusion.

National policy, legislation and medicines safety advice

Relevant national policy, legislation or medicines safety advice may be identified in the literature search and used to inform guidelines (such as drug safety updates from the Medicines and Healthcare products Regulatory Agency [MHRA]). This evidence does not need critical appraisal in the same way as other evidence, given the nature of the source. National policy, legislation or medicines safety advice can be quoted verbatim as evidence (for example, the Health and Social Care Act [2012]), where needed, and a summary of any relevant medicines safety advice identified should be included in the evidence review document.

Unpublished data and studies in progress

Any unpublished data should be quality assessed in the same way as published studies (see the section on assessing evidence: critical appraisal, analysis, and certainty in the findings). If additional information is needed to complete the quality assessment, the investigators may be contacted. Similarly, if data from in-progress studies are included, they should be quality assessed in the same way as published studies. Confidential information should be kept to a minimum, and a structured abstract of the study must be made available for public disclosure during consultation on the guideline. Additional considerations for reviews using primary analyses of real-world data are discussed in the section on presenting evidence for reviews other than reviews of primary studies.

Grey literature

Grey literature may be quality assessed in the same way as published literature, although because of its nature, such an assessment may be more difficult. Consideration should therefore be given to the elements of quality that are most likely to be important (for example, elements of the study methodology that are less clearly described than in a published article, because of the lack of need to go through the peer-review process, or conflicts of interest in the study).

6.2 Assessing evidence: critical appraisal, analysis, and certainty in the findings

Introduction

Assessing the quality of the evidence for a review question is critical. It requires a systematic process of assessing both the appropriateness of the study design and the methods of the study (critical appraisal) as well as the certainty of the findings (using an approach, such as GRADE).

Options for assessing the quality of the evidence should be considered by the development team. The chosen approach should be discussed and agreed with staff with responsibility for quality assurance, where the approach deviates from the standard (described in critical appraisal of individual studies). The agreed approach should be documented in the review protocol (see the appendix on review protocol templates) together with the reasons for the choice. If additional information is needed to complete the data extraction or quality assessment, study investigators may be contacted, although this is not something that is done routinely.

Critical appraisal of individual studies

Every study should be appraised using a checklist appropriate for the study design (see the appendix on appraisal checklists, evidence tables, GRADE and economic profiles). If a checklist other than those listed is needed, or the one recommended as the preferred option is not used, the planned approach should be discussed and agreed with staff with responsibility for quality assurance and documented in the review protocol.

The ROBINS-I checklist is currently only validated and recommended for use with non-randomised controlled trials and cohort studies. However, there may be situations where a mix of non-randomised study types is included within a review. It can then be helpful to use this checklist across all included study types to maintain consistency of assessment. If this is done, additional care should be taken to ensure all relevant risks of bias for study designs for which ROBINS-I is not currently validated (such as case–control studies) are assessed.

In some evidence reviews, it may be possible to identify particular risk of bias criteria that are likely to be the most important indicators of biases for the review question (for example, conflicts of interest or study funding, if it is an area where there is known to be concern about the sponsorship of studies). If any such criteria are identified, these should then be used to guide decisions about the overall risk of bias of each individual study.

Sometimes, a decision might be made to exclude certain studies at particularly high risk of bias, or to explore any impact of bias through sensitivity analysis. If so, the approach should be specified in the review protocol and agreed with staff with responsibility for quality assurance.

Criteria relating to key areas of bias may also be useful when summarising and presenting the evidence (see the section on summarising evidence). Topic-specific input (for example, from committee members) may be needed to identify the most appropriate criteria to define subgroup analyses, or to define inclusion in a review, for example, the minimum biopsy protocol for identifying the relevant population in cancer studies.

For each criterion that might be explored in sensitivity analysis, the decision on whether it has been met or not (for example, which population subgroup the study has been categorised as), and the information used to arrive at the decision (for example, the study inclusion criteria, or the actual population recruited into the study), should be recorded in a standard template for inclusion in an evidence table (see the appendix on appraisal checklists, evidence tables, GRADE and economic profiles).

Each study included in an evidence review should be critically appraised by 1 reviewer and a proportion of these checked by another reviewer. Any differences in critical appraisal should be resolved by discussion or involving a third reviewer.

Data extraction

Study characteristics should be extracted to a standard template for inclusion in an evidence table (see the appendix on appraisal checklists, evidence tables, GRADE and economic profiles). Care should be taken to ensure that newly identified studies are cross-checked against existing studies to avoid double-counting. This is particularly important where there may be multiple reports of the same study.

If complex data extraction is done for a review question (for example, situations where a large number of transformations or adjustments are made to the raw data from the included studies), data extraction should be checked by a second reviewer to avoid data errors, which are time-consuming to fix. This may be more common in reviews using more complex analysis methods (for example, network meta-analyses or meta-regressions) but decisions around dual data extraction should be based on the complexity of the extraction, not the complexity of the analysis.

Analysing and presenting results for studies on the effectiveness of interventions

Meta-analysis may be appropriate if treatment estimates of the same outcome from more than 1 study are available. Recognised approaches to meta-analysis should be used, as described in the handbook from Cochrane, in Higgins et al. (2021) and documents developed by the NICE Guidelines Technical Support Unit.

There are several ways of summarising and illustrating the strength and direction of quantitative evidence about the effectiveness of an intervention, even if a meta-analysis is not done. Forest plots can be used to show effect estimates and confidence intervals for each study (when available, or when it is possible to calculate them). They can also be used to provide a graphical representation when it is not appropriate to do a meta-analysis and present a pooled estimate. However, the homogeneity of the outcomes and measures in the studies needs to be carefully considered: a forest plot needs data derived from the same (or justifiably similar) population, interventions, outcomes and measures.

Head‑to‑head data that compares the effectiveness of interventions is useful for a comparison between 2 active management options. A network meta-analysis (NMA) is a method that can include trials that compare the interventions of interest head-to-head and also trials that allow indirect comparisons via other interventions.

The same principles of good practice for evidence reviews and meta-analyses should be applied when conducting network meta-analyses. The reasons for identifying and selecting the randomised controlled trials (RCTs) should be explained. This includes the reasons for selecting the treatment comparisons, and whether any interventions that are not being considered as options for recommendations will be included within the network to allow for indirect comparisons between interventions of interest. The methods of synthesis should be described clearly either in the methods section of the evidence review document or the guideline methods chapter.

When multiple competing options are being appraised, network meta-analysis is the preferred approach to use, and should be considered in such cases. The data from individual trials should also be documented (usually as an appendix). If there is doubt about the inclusion of particular trials (for example, because of concerns about limitations or applicability), a sensitivity analysis in which these trials are excluded may also be presented. The level of consistency between the direct and indirect evidence on the interventions should be reported, including consideration of model fit and comparison statistics such as the total residual deviance, and the deviance information criterion (DIC). Results of any further inconsistency tests done, such as deviance plots or those based on node-splitting, should also be reported.

In addition to the inconsistency checks described above, which compare the direct and indirect evidence within a network meta-analysis model, results from direct comparisons may also be presented for comparison with the results from a network meta-analysis (thus comparing the direct and overall network meta-analysis results to aid validity checks and interpretation, rather than direct and indirect to check consistency). These may be the results from the direct evidence within the network meta-analysis, or from direct pairwise comparisons done outside the network meta-analysis, depending on which is considered more informative.

When evidence is combined using network meta-analyses, trial randomisation should typically be preserved. If this is not appropriate, the planned approach should be discussed and agreed with staff with responsibility for quality assurance. A comparison of the results from single treatment arms from different RCTs is not acceptable unless the data are treated as observational and analysed as such.

Further information on complex methods for evidence synthesis is provided by the documents developed by the NICE Guidelines Technical Support Unit. The methods described in these documents should be used as the basis for analysis, and any deviations from these methods clearly described and justified, and agreed with staff who have responsibility for quality assurance.

To promote transparency of health research reporting (as endorsed by the EQUATOR network), evidence from a network meta-analysis should usually be reported according to the criteria in the modified PRISMA‑NMA checklist in the appendix on network meta-analysis reporting standards.

Evidence from a network meta-analysis can be presented in a variety of ways. The network should be presented diagrammatically with the available treatment comparisons clearly identified, and show the number of trials in each comparison. Further information on how to present the results of network meta-analyses is provided by the documents developed by the NICE Guidelines Technical Support Unit.

There is no NICE-endorsed approach for assessing the quality or certainty of outputs derived from network meta-analysis. At a minimum, a narrative description of the confidence in the results of the network meta-analysis should be presented, considering all the areas in a standard GRADE profile (risk of bias, indirectness, inconsistency and imprecision). Several other approaches have been suggested in the literature that may be relevant in particular circumstances (Phillippo et al. 2019, Phillippo et al. 2017, Caldwell et al. 2016, Purhan et al. 2014, Salanti et al. 2014). The approach to assessing confidence in results should take into account the particular questions the network meta-analysis is trying to address. For example, the approach to imprecision may be different if a network meta-analysis is trying to identify the single most effective treatment, compared to creating a ranking of all possible treatments.

Dealing with complex interventions

Analysing quantitative evidence on complex interventions may involve considering factors other than effectiveness. This includes:

whether there are particular circumstances when the interventions work
is there interaction, synergy or mediation between intervention components
which factors impact on implementation
is the intervention feasible and acceptable in different contexts
how might this enhance or reduce the interventions' effect in different circumstances (see sections 17.2 and 17.5 in the Cochrane Handbook for Systematic Reviews of Interventions).

Different analytical approaches are relevant to different types of complexity and question (see table 1 in Higgins et al. 2019). The appropriate choice of technique will depend on the review question, available evidence, time needed to do the approach and likely impact on guideline recommendations. The approach should be discussed and agreed with staff who have responsibility for quality assurance.

Further information on complex methods for evidence synthesis is provided by the documents developed by the NICE Guidelines Technical Support Unit and NICE's Decision Support Unit.

Additional information is available from:

Analysing and presenting results of studies of diagnostic test accuracy

Information on methods of presenting and synthesising results from studies of diagnostic test accuracy is available in the Cochrane Handbook for Systematic Reviews of Interventions. When meta-analyses of paired accuracy measures (such as sensitivity and specificity) are done, bivariate analysis should be used where possible, to preserve correlations between outcomes. Univariate analyses can still be used if there are insufficient studies for a bivariate analysis.

Meta-analyses should not normally be done on positive and negative predictive values, unless the analysis takes account of differences in prevalence. Instead, analyses can be done on sensitivity and specificity and these results applied to separate prevalence estimates to obtain positive and negative predictive values, if these are outcomes specified in the review protocol.

If meta-analysis is not possible or appropriate (for example, if the differences between populations, references standard or index test thresholds are too large), there should be a narrative summary of the results that were considered most important for the review question.

Analysing and presenting results of studies of prognosis, or prediction models for a diagnosis or prognosis

There is currently no consensus on approaches for synthesising evidence from studies on prognosis, or prediction models for diagnosis or prognosis. The approach chosen should be based on the types of data included (for example, prognostic accuracy data, prediction models, or associative studies presenting odds ratios or hazard ratios). For prognostic accuracy data, the same approach for synthesis can be taken as with diagnostic accuracy data, with the addition of the need to consider length of follow-up as part of the analysis. When considering meta-analysis, reviewers should consider how similar the prognostic factors or predictors and confounding factors are across all studies reporting the same outcome measure. It is important to explore whether all likely confounding factors have been accounted for, and whether the metrics used to measure exposure (or outcome) are universal. When studies cannot be pooled, results should be presented consistently across studies. For more information on prognostic reviews, see Collins 2015 and Moons 2015.

Analysing, synthesising and presenting results of qualitative evidence

Qualitative evidence occurs in many forms and formats and so different methods may be used for synthesis and presentation (such as those described by the Cochrane Qualitative & Implementation Methods Group).

Qualitative evidence should be synthesised and then summarised using GRADE-CERQual (see GRADE-CERQual Implementation series). If synthesis of the evidence is not appropriate, a narrative summary may be adequate; this should be agreed with staff with responsibility for quality assurance. The approach used may depend on the volume of the evidence. If the qualitative evidence is extensive, then a recognised method of synthesis is preferable (normally aggregative, thematic or framework synthesis type approaches). If the evidence is disparate and sparse, a narrative summary may be appropriate.

The simplest approach to synthesise qualitative data in a meaningful way is to group the findings in the evidence tables (comprising of 'first order' participant quotes and participant observations as well as 'second order' interpretations by study authors). Then, to write third-order interpretations based on the reviewers' interpretations of the first and second-order constructs synthesised across studies. These third-order interpretations will become themes and subthemes or 'review findings'. This synthesis can be carried out if enough data are found, and the papers and research reports cover the same (or similar) context or use similar methods. These should be relevant to the review questions and could, for example, include intervention, age, population or setting.

Synthesis can be carried out in several ways (as noted above), and each may be appropriate depending on the question type, and the evidence identified. Papers reporting on the same findings can be grouped together to compare and contrast themes, focusing not just on consistency but also on any differences. The narrative should be based on these themes.

A more complex but useful approach is 'conceptual mapping' (see Johnson et al. 2000). This involves identifying the key themes and concepts across all the evidence tables and grouping them into first level (major), second level (associated) and third level (subthemes) themes. Results are presented in schematic form as a conceptual diagram and the narrative is based on the structure of the diagram.

Integrating and presenting results of mixed methods reviews

If a mixed methods approach has been identified as needed (see the chapter on developing review questions and planning the evidence review), then the approach to integration needs consideration. Integration refers to A) how quantitative and qualitative evidence are combined following separate synthesis (convergent-segregated) or, B) how quantitative and qualitative data that have been transformed are merged (convergent-integrated).

A) The convergent-segregated approach consists of doing separate quantitative and qualitative syntheses (as usual), followed by integration of the results derived from each of the syntheses. Integrating the quantitative and qualitative synthesised findings gives a greater depth of understanding of the phenomena of interest compared to doing 2 separate component syntheses without formally linking the 2 sets of evidence.
B) All qualitative evidence from a convergent-segregated mixed methods review should be synthesised and then summarised using GRADE-CERQual. If appropriate, all quantitative data (for example, for intervention studies) should be presented using GRADE. An overall summary of how the quantitative and qualitative evidence are linked should ideally be presented in either matrices or thematic diagrams. It should also be summarised in the review using the approach questions in the section on integration of quantitative and qualitative evidence to frame the integration evidence summary (JBI manual for evidence synthesis).

Integration of quantitative and qualitative evidence

The integration section should provide a summary that represents the configured analysis of the quantitative and qualitative evidence. This can include matrices, look-up tables or thematic maps, but as a minimum should include statements that address all of the following questions:

Are the results and findings from individual syntheses supportive or contradictory?
Does the qualitative evidence explain why the intervention is or is not effective?
Does the qualitative evidence explain differences in the direction and size of effect across the included quantitative studies?
Which aspects of the quantitative evidence were or were not explored in the qualitative studies?
Which aspects of the qualitative evidence were or were not tested in the quantitative studies?

'All of the questions above should be answered, but dependent on the evidence included in the review it is acknowledged that some responses will be more detailed than others' (JBI manual for evidence synthesis).

This should be reported as a summary of the mixed findings after reporting on the effectiveness and qualitative evidence synthesis.

A) The convergent-integrated approach refers to a process of combining extracted data from quantitative studies (including data from the quantitative component of mixed methods studies) and qualitative studies (including data from the qualitative component of mixed methods studies) and involves data transformation.

B) The convergent-segregated approach is the standard approach to adopt in most of our mixed methods reviews. If convergent-segregated is not the planned approach, data transformation methods and outcome reporting should be discussed and agreed with staff who have responsibility for quality assurance and documented in the review protocol.

Certainty or confidence in the findings of analysis

Once critical appraisal of the studies and data analysis are complete, the certainty or confidence in the findings should be presented (for individual or synthesised studies) at outcome level using GRADE or GRADE-CERQual. Although GRADE has not been formally validated for all quantitative review types (such as prognostic reviews), GRADE principles can be applied and adapted to other types of questions. Any substantial changes made by the development team to GRADE should be agreed with staff with responsibility for quality assurance before use.

If using GRADE or GRADE-CERQual is not appropriate, the planned approach should be discussed and agreed with staff with responsibility for quality assurance. It should be documented in the review protocol (see the appendix on review protocol templates) together with the reasons for the choice.

Certainty or confidence in the findings by outcome

Before starting an evidence review, the outcomes of interest which are important to people using services and the public for the purpose of decision making should be identified. The reasons for prioritising outcomes should be stated in the evidence review document. This should be done before starting the evidence review and clearly separated from discussion of the evidence, because there is potential to introduce bias if outcomes are selected when the results are known. An example of this would be choosing only outcomes for which there were statistically significant results.

The committee discussion section should also explain how the importance of outcomes was considered when discussing the evidence. For example, the committee may want to define prioritised outcomes into 'critical' and 'important'. Alternatively, they may think that all prioritised outcomes are crucial for decision making. In this case, there will be no distinction between 'critical' or 'important' for all prioritised outcomes. The impact of this on the final recommendations should be clear.

GRADE and GRADE-CERQual assess the certainty or confidence in the review findings by looking at features of the evidence found for each outcome or theme. GRADE is summarised in box 6.1, and GRADE-CERQual in box 6.2.

Box 6.1 GRADE approach to assessing the certainty of evidence for intervention studies

GRADE assesses the following features for the evidence found for each outcome:

study limitations (risk of bias) – the internal validity of the evidence
inconsistency – the heterogeneity or variability in the estimates of treatment effect across studies
indirectness – the extent of differences between the population, intervention, comparator for the intervention and outcome of interest in the studies from that in the review protocol
imprecision – the level of certainly in the effect estimate
other considerations – publication bias, the degree of selective publication of studies.

In a standard GRADE approach, the certainty or confidence of evidence is classified as high, moderate, low or very low. In the context of NICE guidelines, it can be interpreted as follows:

High – further research is very unlikely to change our recommendation.
Moderate – further research may have an important impact on our confidence in the estimate of effect and may change the strength of our recommendation.
Low – further research is likely to have an important impact on our confidence in the estimate of effect and is likely to change the recommendation.
Very low – any estimate of effect is very uncertain and further research will probably change the recommendation.

Box 6.2 GRADE-CERQual approach to assessing the confidence of evidence for qualitative studies

GRADE-CERQual assesses the following features for the evidence found for each finding:

methodological limitations – the internal validity of the evidence
relevance – the extent to which the evidence is applicable to the context in the review question
coherence – the extent of the similarities and differences within the evidence
adequacy of data – the extent of richness and quantity of the evidence.

In a standard GRADE-CERQual approach, the certainty or confidence of evidence is classified as high, moderate, low or very low. In the context of NICE guidelines, it can be interpreted as follows:

High – it is highly likely that the review finding is a reasonable representation of the phenomenon of interest.
Moderate – it is likely that the review finding is a reasonable representation of the phenomenon of interest.
Low – it is possible that the review finding is a reasonable representation of the phenomenon of interest.
Very low – it is unclear whether the review finding is a reasonable representation of the phenomenon of interest.

The approach we take differs from the standard GRADE and GRADE-CERQual system in 2 ways:

it also integrates a review of the quality of cost-effectiveness studies (see the chapter on incorporating economic evaluation)
it does not use 'overall summary' labels for the quality of the evidence across all outcomes, or for the strength of a recommendation, but uses the wording of recommendations to reflect the strength of the evidence (see the chapter on interpreting the evidence and writing the guideline).

GRADE or GRADE-CERQual tables summarise the certainty in the evidence and data for each critical and each important outcome or theme and include a limited description of the certainty in the evidence. GRADE or GRADE-CERQual tables should be available (in an appendix) for each review question.

For mixed methods findings there is no recognised approach to combining the certainty of evidence from GRADE and GRADE-CERQual. The certainty and confidence ratings should be reported for both evidence types within the evidence summary of integrated findings and their impact on decision making described in the relevant section of the review.

Alternative approaches to assessing imprecision in GRADE

For information on assessing imprecision the standard GRADE approach can be used. If this approach is not used, the approach should be agreed with staff who have responsibility for quality assurance.

6.3 Equality and diversity considerations

Our equality and diversity duties are expressed in a single public sector equality duty ('the equality duty', see the section on key principles for developing NICE guideline recommendations in the introduction chapter). The equality duty supports good decision making by encouraging public bodies to understand how different people will be affected by their activities. As much of our work involves developing advice for others on what to do, this includes thinking about how people will be affected by our recommendations when they are implemented (for example, by health and social care practitioners).

6.4 Health inequalities

In addition to meeting our legal obligations, we are committed to going beyond compliance, particularly in terms of tackling health inequalities. Specifically, we consider that we should also take account of the 4 dimensions of health inequalities – socioeconomic status and deprivation, protected characteristics (defined in the Equality Act 2010), inclusion health groups (such as people experiencing homelessness and young people leaving care), and geography. Wherever possible, our guidance aims to reduce and not increase identified health inequalities.

Ensuring inclusivity of the evidence review criteria

Any equality criteria specified in the review protocol should be included in the evidence tables. At the data extraction stage, reviewers should refer to the health inequalities framework criteria (including age, gender/sex, sexual orientation, gender reassignment, disability, ethnicity, religion, place of residence, occupation, education, socioeconomic position and social capital; Gough et al. 2012) and any other relevant protected characteristics, and record these where reported, if specified in the review protocol. See the section on reducing health inequalities in the introduction chapter. Review inclusion and exclusion criteria should also take the relevant groups into account, as specified in the review protocol.

Equalities and health inequalities should be considered during the drafting of the evidence reviews, including any issues documented in the equality and health inequalities assessment. Equality and health inequality considerations should be included in the data extraction process and should be recorded in the committee discussion section. Equalities and health inequalities are also considered during surveillance and updating. See chapters on ensuring that published guidelines are current and accurate and updating guideline recommendations for more information.

6.5 Summarising evidence

Presenting evidence

The following sections should be included in the evidence review document:

an introduction to the evidence review
a description of the studies or other evidence identified, in either table or narrative format
evidence tables (usually presented in an appendix)
full GRADE or GRADE-CERQual profiles (in an appendix)
evidence summaries (of the results or conclusions of the evidence)
an overall summary of merged quantitative and qualitative evidence (either using matrices or thematic diagrams) and the integration questions for mixed methods reviews
results from other analysis of evidence, such as forest plots, area under the curve graphs, network meta-analysis (usually presented in an appendix; see the appendix on network meta-analysis reporting standards).

The evidence should usually be presented separately for each review question; however, alternative methods of presentation may be needed for some evidence reviews (for example, where review questions are closely linked and need to be interpreted together).

Any substantial deviations in presentation need to be agreed, in advance, with staff with responsibility for quality assurance.

Describing the included evidence

A description of the evidence identified should be produced. The content of this will depend on the type of question and the type of evidence. It should also identify and describe any gaps in the evidence, and cover at minimum:

the volume of information for the review question(s), that is, the number of studies identified, included, and excluded (with a link to a PRISMA selection flowchart, in an appendix)
the study types, populations, interventions, settings or outcomes for each study related to a particular review question.

Evidence tables

Evidence tables help to identify the similarities and differences between studies, including the key characteristics of the study population and interventions or outcome measures.

Data from identified studies are extracted to standard templates for inclusion in evidence tables. The type of data and study information that should be included depends on the type of study and review question, and should be concise and consistently reported.

The types of information that could be included for quantitative studies are:

bibliography (authors, date)
study aim, study design (for example, RCT, case–control study) and setting (for example, country)
funding details (if known)
population (for example, source and eligibility, and which population subgroup of the protocol the study has been mapped to, if relevant)
intervention, if applicable (for example, content, who delivers the intervention, duration, method, dose, mode or timing of delivery, and which intervention subgroup of the protocol the study has been mapped to, if relevant)
comparator, if applicable (for example, content, who delivers the intervention, duration, method, dose, mode or timing of delivery)
method of allocation to study groups (if applicable)
outcomes (for example, primary and secondary and whether measures were objective, subjective or otherwise validated, and the timepoint at which these outcomes were measured)
key findings (for example, effect sizes, confidence intervals, for all relevant outcomes, and where appropriate, other information such as numbers needed to treat and considerations of heterogeneity if summarising a systematic review or meta-analysis)
inadequately reported data, missing data or if data have been imputed (include method of imputation or if transformation is used)
overall comments on quality, based on the critical appraisal and what checklist was used to make this assessment. When study details are inadequately reported, or absent, this should be clearly stated.

If data are not being used in any further statistical analysis, or are not reported in GRADE tables, effect sizes (point estimate) with confidence intervals should be reported, or back calculated from the published evidence where possible. If confidence intervals are not reported, exact p values (whether or not significant), with the test from which they were obtained, should be described. When confidence intervals or p values are inadequately reported or not given, this should be stated. Any descriptive statistics (including any mean values and degree of spread such as ranges) indicating the direction of the difference between intervention and comparator should be presented. If no further statistical information is available, this should be clearly stated.

The type of data that could be reported in evidence tables for qualitative studies includes:

bibliography (authors, date)
study aim, study design and setting (for example, country)
funding details (if known)
population or participants
theoretical perspective adopted (such as grounded theory)
key objectives and research questions; methods (including analytical and data collection technique)
key themes/findings (including quotes from participants that illustrate these themes or findings, if appropriate)
gaps and limitations
overall comments on quality, based on the critical appraisal and what checklist was used to make this assessment. When study details are inadequately reported, or absent, this should be clearly stated.

Evidence summaries

Full GRADE or GRADE-CERQual tables that present both the results of the analysis and describe the confidence in the evidence should normally be provided (in an appendix).

Additionally, whether GRADE or GRADE-CERQual are used or not, a summary of the evidence should be included within the evidence review document. This summary can be in any format (narrative, tabular, pictorial) but should contain sufficient detail to explain the key findings of the review without needing to refer to the full results in the appendices.

Evidence summaries are structured and written to help committees formulate recommendations, and stakeholders and users of the guidance to understand the reason why those recommendations were made. They are separate to the committee's interpretation of the evidence, which should be covered in the committee discussion section. They can help to understand:

whether or not there is sufficient evidence (in terms of strength and applicability) to form a judgement
whether (on balance) the evidence demonstrates that an intervention, approach or programme is effective or ineffective, or is inconclusive
the size of effect and associated measure of uncertainty
whether the evidence is applicable to people affected by the guideline and contexts covered by the guideline.

Structure and content of evidence summaries

Evidence summaries do not need to repeat every finding from an evidence review, but should contain sufficient information to understand the key findings of the review, including:

Sufficient descriptions of the interventions, tests or factors being reported on to enable interpretation of the results reported.
The volume of and confidence in the evidence, as well as the magnitude and direction of effects.
Key strengths and limitations of the evidence that may not be obvious from overall confidence ratings (for example, the countries evidence came from, if that is expected to have a meaningful impact on the results).
For findings not showing a meaningful benefit or harm between multiple options, it should be clear whether these have been interpreted as demonstrating equivalence, or simply that it is not possible to tell whether there is a difference or not from the available evidence.
Any outcomes where evidence was searched for but no or insufficient evidence was found.

These summaries can be done in a variety of formats (for example, evidence statement, narrative summaries, tables) provided they cover the relevant information. 'Vote counting' (merely reporting on the number or proportion of studies showing a particular positive or negative finding) is not an acceptable summary of the evidence.

Context- or topic-specific terms (for example, 'an increase in HIV incidence', 'a reduction in injecting drug use' and 'smoking cessation') may be used. Any such terms should be used consistently in each review and their definitions reported.

6.6 Presenting evidence for reviews other than reviews of primary studies

The principles described above remain relevant when reporting evidence not based on systematic reviews of primary studies done by NICE. A description of some of these alternative approaches and when they may be appropriate is given in the chapter on developing review questions and planning the evidence review. However, additional factors need to be considered in many of these situations and are described in this section. When reviews have used either multiple options described in this section or an option combined with a systematic review of primary studies, the different approaches should be reported separately according to the appropriate reporting approach outlined in this chapter. A description of how these sources of evidence were either combined or interpreted together by the committee should also be given.

Reporting reviews based on a published systematic review or qualitative evidence synthesis

In some cases, evidence reviews may be based on previously published systematic reviews or qualitative evidence syntheses done outside of NICE, rather than an original review. In such cases, where that review is publicly available, presentation of review content in NICE evidence review documents should be limited to those sections where additional material or analysis has been undertaken. If a published and free to access review has been used with no adaptation, it should be cited in the relevant sections and appendices of the NICE evidence review document and a hyperlink to the original review provided, with no reproduction of the review content. If the review used is not free to access, then the relevant content should be summarised within the guideline.

Examples of additions that may be made to published reviews include adding new data to an out-of-date review, including additional outcomes or subgroups, re-analysing data using different statistical strategies, re-evaluating GRADE quality assessments, and combining separate reviews in a network meta-analysis. If we have updated a review to include additional material or analysis, a link should be provided to the relevant original review with a full citation in line with the NICE style guide on referencing and citations. Only the relevant updated sections should be written up in the NICE evidence review document.

An evidence summary should still be provided in the evidence review, which makes clear which parts of the cited reviews were used as evidence within the guideline, and summarises any changes or additional analyses undertaken, if relevant. When considering the confidence we have in the findings of a published review, both the quality of the overall review (as assessed using the checklists recommended in the appendix on appraisal checklists, evidence tables, GRADE and economic profiles), and the quality of the studies within that review should be taken into account.

Reporting reviews based on a published individual participant data meta-analysis

Evidence reviews based on a published individual patient data (IPD) meta-analysis should follow the same principles as reviews based on other published systematic reviews. Reviewers can make use of the PRISMA-IPD checklist to assess the reporting standards of published IPD analyses, and Wang 2021 includes a checklist that can be used for quality assessment of IPD meta-analyses.

In most cases it is not possible to update an IPD meta-analysis within a guideline, and therefore an approach must be decided if there are additional relevant studies not included within the analysis (for example, additional studies published after the searches in the published review). A number of possible approaches can be followed:

Only include the IPD meta-analysis in the review, and exclude any additional studies.
Include the IPD meta-analysis review, and additionally report aggregated results for the studies not included in the IPD analysis.
Include the IPD meta-analysis review, and additionally report aggregated results for all studies within the review, both those included within the IPD meta-analysis and those not included.

The approach taken should be described and justified within the review. It should take into account the number and proportion of studies not included in the IPD meta-analysis, whether those studies are systematically different to the studies included, and whether the studies not included would be likely to lead to different overall conclusions.

Reporting reviews based on multiple published systematic reviews or qualitative evidence syntheses

Sometimes an evidence review may report the results of multiple systematic reviews, either as a result of a review of reviews being done, or because multiple relevant reviews are otherwise identified. Each review should be reported following the advice in the section on reporting reviews based on a published systematic review or qualitative evidence synthesis.

Additionally, the evidence review should report on any overlaps between the included reviews (for example, where multiple included reviews cover the same intervention or include some of the same studies), or any important differences between the methodologies of the included reviews. How these overlaps or differences were dealt with when assessing evidence and making recommendations should be reported.

Reporting reviews based on formal consensus methods

When formal consensus methods, such as Delphi panels or nominal group technique, are used as a way of generating or interpreting evidence, at minimum the following information should be reported in the evidence review document:

How the participants involved in the formal consensus exercise were selected.
How the initial evidence or statements presented as part of the formal consensus exercise were derived.
The methodology used for the formal consensus exercises, including any thresholds used for retaining or discarding statements.
The results of each round or iteration of the formal consensus exercise.
How the results of the formal consensus exercise were then used to inform the recommendations made.

Reporting reviews or using recommendations from previously published guidance from other organisations

If systematic reviews or qualitative evidence syntheses done as part of a published non-NICE guideline are used as evidence within a NICE guideline, those reviews should be assessed following the advice in the section above on reporting reviews based on a published systematic review or qualitative evidence synthesis. No assessment of other aspects of the guideline is needed, because only the evidence from the reviews is being used, not any other part of the non-NICE guideline.

If parts of the non-NICE guideline other than evidence reviews are used (for example, if the recommendations made are themselves used as evidence, not just the underlying reviews) then the guideline should be assessed for quality using the AGREE II instrument. There is no cut-off point for accepting or rejecting a guideline, and each committee needs to set its own parameters. These should be documented in the methods of the guideline, and the full results of the assessment included in the evidence review document. In addition to the assessment of the quality of the guideline, the following should also be included in the review at a minimum:

A summary of the content from the non-NICE guideline used to inform the NICE guideline (for example, the recommendations considered).
A description of the justifications presented in the non-NICE guideline (for example, why those recommendations were made).
A description of how the NICE committee interpreted that content, including any concerns about quality and applicability, and how it informed their own discussions and recommendations.
A clear link between which parts of the non-NICE guideline informed the final recommendations in the NICE guideline.

Reporting reviews or using recommendations from previously published NICE guidelines

If systematic reviews or qualitative evidence syntheses done as part of published NICE guidelines are considered relevant and appropriate, they can be used as evidence within a different NICE guideline. These reviews can be included as part of the evidence when:

the review question in the guideline in development is sufficiently similar to the question addressed in the published guideline
the evidence is unlikely to have changed significantly since the publication of the related published NICE evidence review.

When evidence reviews from another guideline are used to develop new recommendations, the decision should be made clear in the methods section of the guideline in development, and the committee's independent interpretation and discussion of the evidence should be documented in the discussion section. The evidence reviews from the published guideline (including review protocol, search strategy, evidence tables and full evidence profiles [if available]) should be included in the guideline in development. They then become part of the evidence for the new guideline and are updated as needed in future updates of the guideline.

If parts of a published NICE guideline (or multiple guidelines) other than evidence reviews are used (for example, if recommendations made are themselves used as evidence, not just the underlying reviews) and new recommendations are formulated, the committee's discussion and decision should be documented clearly in the review. This should include areas of agreement and difference with the committee for the published guideline (for example, in terms of key considerations – balance of benefits and harms or costs, and interpretation of the evidence).

The following should be included in the review at a minimum:

A summary of the content from the published NICE guideline used to inform the guideline in development (for example, the recommendations considered).
A description of the justifications presented in the published NICE guideline (for example, why those recommendations were made).
A description of how the committee interpreted that content, including any concerns about applicability, and how it informed their own discussions and recommendations, including how the recommendations from the published NICE guideline were extrapolated to the guideline in development. It is not routinely necessary to do an assessment of the published NICE guideline using the AGREE II instrument. However, in certain circumstances such an assessment may be useful (for example, if it is an older NICE guideline that used different methods to those currently in use), and if an assessment is undertaken the results should be reported in the review.
A clear link between which parts of the published NICE guideline informed the final recommendations in the guideline in development and why new recommendations were needed (including why the original recommendations could not be adopted without change).

Reporting reviews using primary analysis of real-world data

Reviewers should follow the advice in the NICE real-world evidence framework when reporting primary analyses of real-world data done by NICE. At a minimum, the level of detail provided should match that which would be provided in a published research article. It should also be enough to enable an independent researcher with access to the data to reproduce the study, interpret the results, and to fully understand the strengths and limitations of the study.

More information on what is required and links to relevant reporting tools are provided in the NICE real-world evidence framework.

Reporting reviews using calls for evidence or expert witnesses

If evidence for a review has been obtained using either a call for evidence or an expert witness, follow the reporting advice in the appendix on calls for evidence and expert witnesses.

Reporting reviews using additional consultation or commissioned primary research

If evidence for a review has been obtained using either additional consultation or commissioned primary research, follow the reporting advice in the appendix on approaches to additional consultation and commissioned primary research.

6.7 References and further reading

AGREE Collaboration (2003) Development and validation of an international appraisal instrument for assessing the quality of clinical practice guidelines: the AGREE project. Quality and Safety in Health Care 12: 18–23

Booth A, Lewin S, Glenton C. et al. (2018) Applying GRADE-CERQual to qualitative evidence synthesis findings–paper 7: understanding the potential impacts of dissemination bias. Implementation Sci 13:12

Brouwers M, Kho M, Browman G et al. for the AGREE Next Steps Consortium (2010) AGREE II: advancing guideline development, reporting and evaluation in healthcare. Canadian Medical Association Journal 182: E839–42

Caldwell D, Ades A, Dias S et al. (2016) A threshold analysis assessed the credibility of conclusions from network meta-analysis. Journal of Clinical Epidemiology 80: 68–76

Caldwell D, Welton N (2016) Approaches for synthesising complex mental health interventions in meta-analysis Evidence-Based Mental Health 19:16

Collins G, Reistma J, Altman D et al. (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD Statement. Annals of Internal Medicine 162: 55–63

Colvin C, Garside R, Wainwright M et al. (2018) Applying GRADE-CERQual to qualitative evidence synthesis findings-paper 4: how to assess coherence. Implementation Sci 13:13

Glenton C, Carlsen B, Lewin S et al. (2018) Applying GRADE-CERQual to qualitative evidence synthesis findings—paper 4: how to assess coherence. Implementation Sci 13:14

GRADE working group (2004) Grading quality of evidence and strength of recommendations. British Medical Journal 328: 1490–4

The GRADE series in the Journal of Clinical Epidemiology

Guyatt G, Oxman A, Schünemann H et al. (2011) GRADE guidelines: a new series of articles in the Journal of Clinical Epidemiology. Journal of Clinical Epidemiology 64: 380–2

Higgins J, Thomas J, Chandler J et al., editors (2021) Cochrane Handbook for Systematic Reviews of Interventions, version 6.2

Higgins J, López-López J, Becker B, et al (2019) Synthesising quantitative evidence in systematic reviews of complex health interventions BMJ Global Health 2019;4:e000858

Johnsen J, Biegel D, Shafran R (2000) Concept mapping in mental health: uses and adaptations. Evaluation and Programme Planning 23: 67–75

Lewin S, Bohren M, Rashidian A et al. (2018) Applying GRADE-CERQual to qualitative evidence synthesis findings—paper 2: how to make an overall CERQual assessment of confidence and create a Summary of Qualitative Findings table. Implementation Sci 13:10

Lewin S, Booth A, Glenton C et al. (2018) Applying GRADE-CERQual to qualitative evidence synthesis findings: introduction to the series. Implementation Sci 13:2

Lizarondo L, Stern C, Carrier J, et al. Chapter 8: Mixed methods systematic reviews. In: Aromataris E, Munn Z (Editors), JBI Manual for Evidence Synthesis. JBI, 2020.

Moons K, Altman D, Reistma J et al. (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Annals of Internal Medicine 126: W1–W73

Munthe-Kaas H, Bohren M, Glenton C et al. (2018) Applying GRADE-CERQual to qualitative evidence synthesis findings—paper 3: how to assess methodological limitations. Implementation Sci 13:9

NICE Decision Support Unit (2020) Sources and Synthesis of Evidence: Update to Evidence Synthesis Methods (sheffield.ac.uk) [online; accessed 31 March 2022]

NICE Decision Support Unit Evidence synthesis TSD series [online; accessed 31 August 2018]

Noyes J, Booth A, Lewin S et al. (2018) Applying GRADE-CERQual to qualitative evidence synthesis findings–paper 6: how to assess relevance of the data (nih.gov). Implementation Sci 13:4

Phillippo D, Dias S, Ades A et al. (2017) Sensitivity of treatment recommendations to bias in network meta-analysis. Journal of the Royal Statistical Society; Series A

Phillippo D, Dias S, Welton N et al. (2019) Threshold Analysis as an Alternative to GRADE for Assessing Confidence in Guideline Recommendations Based on Network Meta-analyses. Annals of Internal Medicine 170(8): 538-46

Puhan M, Schünemann H, Murad M et al. (2014) A GRADE working group approach for rating the quality of treatment effect estimates from network meta-analysis. British Medical Journal 349: g5630

Salanti G, Del Giovane C, Chaimani A et al. (2014) Evaluating the quality of evidence from a network meta-analysis. PloS one. 9(7): e99682

Thomas J, O'Mara-Eves A, Brunton G. (2014) Using qualitative comparative analysis (QCA) in systematic reviews of complex interventions: a worked example. Systematic Reviews 3:67

Thomas J, Petticrew M, Noyes J, et al. Chapter 17: Intervention complexity. In: Higgins JPT, Thomas J, Chandler J et al (editors), Cochrane Handbook for Systematic Reviews of Interventions version 6.3 (updated February 2022) [online; accessed 31 March 2022]

Viswanathan M, McPheeters M, Murad M et al. (2017) AHRQ series on complex intervention systematic reviews—paper 4: selecting analytic approaches - Journal of Clinical Epidemiology. J Clin Epidemiol 90:28

Wang H, Chen Y, Lin T, et al. (2021) The methodological quality of individual participant data meta-analysis on intervention effects: systematic review BMJ 373:n736

Welton N, Caldwell D, Adamopoulos E, et al. (2009) Mixed Treatment Comparison Meta-Analysis of Complex Interventions: Psychological Interventions in Coronary Heart Disease. Am J Epidemiol 169:1158

Whiting P, Rutjes A, Westwood M et al. and the QUADAS‑2 group (2011) QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Annals of Internal Medicine 155:529