Process and methods
6 Reviewing the evidence
Studies identified during literature searches (see section 5) need to be reviewed to identify the most appropriate data to help address the review questions, and to ensure the guidance recommendations are based on the best available evidence. The systematic review process used should be explicit and transparent. This involves 5 major steps:
writing the review protocol (see section 4.4.1)
selecting relevant studies
assessing their quality
synthesising the results
interpreting the results.
The process of selecting relevant studies is common to all systematic reviews; the other steps are discussed below in relation to the major types of review questions. The same rigour should be applied to reviewing fully and partially published studies, as well as unpublished data supplied by stakeholders and expert testimony, if submitted.
Methods for developing clinical guidelines are relatively well established. However, this may not be the case for social care guidance. During development, it may become apparent that existing methods for considering evidence are not appropriate for social care topics. As part of the development process, the NCCSC should highlight to NICE any methodological development needs, and work with NICE to develop strategies to address them.
Detailed information on methods of reviewing and synthesising evidence can be found in the Cochrane Handbook.
The study selection process for social care studies and economic evaluations should be clearly documented, giving details of the inclusion and exclusion criteria that were applied.
Before acquiring papers for assessment, the information specialist or systematic reviewer should sift the evidence identified in the search and discard irrelevant material. First, the titles of the retrieved citations should be scanned and those that fall outside the topic of the guidance should be excluded. A quick check of the abstracts of the remaining papers should identify those that are clearly not relevant to the review questions and can be excluded.
Next, the remaining abstracts should be scrutinised against the inclusion and exclusion criteria agreed by the Guidance Development Group (GDG). Abstracts that do not meet the inclusion criteria should be excluded. Any doubts about inclusion should be resolved by discussion with the GDG before the results of the study are considered. Once the sifting is complete, full versions of the selected studies can be acquired for assessment. Studies that fail to meet the inclusion criteria once the full version has been checked should be excluded; those that meet the criteria can be assessed. Because there is always a potential for error and bias in selecting the evidence, double sifting (that is, sifting by 2 people) of a random selection of abstracts should be performed periodically (Edwards et al. 2002).
Conference abstracts can be a good source of information in systematic reviews. For example, conference abstracts can be important in identifying published trials that may be missed and ongoing trials that are due to be published, or in estimating the amount of not-fully-published evidence (and so guiding calls for evidence and judgements about publication bias). However, the following should be considered when deciding whether to include conference abstracts as a source of evidence:
Conference abstracts may not include sufficient information to allow confident judgements to be made about the quality and results of a study.
It can be time-consuming to trace the original studies or additional data relating to the conference abstracts, and the information found may not always be useful.
If sufficient evidence has been identified from full published studies, it may be reasonable not to trace the original studies or additional data related to conference abstracts.
If there is a lack of or limited evidence identified from full published studies, the systematic reviewer may consider an additional process for tracing the original studies or additional data relating to the conference abstracts, to allow full critical appraisal and to make judgements on their inclusion in or exclusion from the systematic review.
The process for sifting and selecting economic evaluations for assessment is essentially the same as for social care studies. Consultation between the information specialist, the economist and the systematic reviewer is essential when deciding the inclusion criteria; these decisions should be discussed and agreed with the GDG. The review should be targeted to identify the papers that are most relevant to current practice and to GDG decision-making. The review should also usually focus on 'full' economic evaluations that compare both the costs and consequences of the alternative interventions and any services under consideration.
Inclusion criteria for filtering and selection of papers for review by the economist should specify relevant populations and interventions for the review question. They should also specify the following:
An appropriate date range, as older studies may reflect outdated practices.
The country or setting, because studies conducted in other social care systems might not be relevant to the UK. In some cases, it may be appropriate to limit consideration to UK-based or OECD (Organisation for Economic Cooperation and Development) studies.
The type of economic evaluation. This may include cost–utility, cost–benefit, cost-effectiveness, cost-minimisation or cost–consequences analyses. Non-comparative costing studies, 'burden of disease' studies and 'cost of illness' studies should usually be excluded.
This section applies to the assessment of both qualitative and quantitative evidence.
The review team should assess the quality of evidence selected for inclusion in the review using the appropriate quality appraisal checklist (see section 6.2.2). This is a key stage in the guidance development process because the quality rating of studies will be reflected in the evidence statements (see section 6.4). These, in turn, are taken into account in the recommendations (along with other factors and considerations, see section 9.2).
Some of the more commonly used study types and their abbreviations are:
quantitative studies: experimental
non-randomised controlled trial (NRCT)
randomised controlled trial (RCT)
quantitative studies: observational
interrupted time series (ITS)
observation and participant observation
The quality of individual studies should be assessed using an appropriate quality appraisal checklist. This is to make a judgement about both the quality of execution of the study and its fitness-for-purpose in terms of answering the review question(s). Factors that influence judgements on the 'trustworthiness' of the study, such as its relevance to the review questions and how 'convincing' the results are, should be clearly described in the review.
Some studies, particularly those using mixed methods, may report quantitative, qualitative and economic outcomes. In such cases, each aspect of the study should be separately assessed using the appropriate checklist.
Similarly, a study may assess the effectiveness of an intervention using different outcome measures, some of which are more reliable than others (for example, self-reported levels of school attendance compared with a formal measure of attendance from the school). In such cases, the study might be rated differently for each outcome, depending on the reliability of the measures used. For further information on how to integrate evidence from qualitative and quantitative studies, see Dixon-Woods et al. (2004).
Quality assessment is a critical stage of the evidence review process.
The systematic reviewer should use the relevant quality appraisal checklist to assess a study's internal validity: that is, to check whether potential sources of bias have been minimised and to determine whether its conclusions are open to any degree of doubt. The quality of each study should be rated as follows:
++ All or most of the checklist criteria have been fulfilled and where they have not been fulfilled, the conclusions are very unlikely to alter.
+ Some of the checklist criteria have been fulfilled and where they have not been fulfilled, or not adequately described, the conclusions are unlikely to alter.
− Few or no checklist criteria have been fulfilled and the conclusions are likely or very likely to alter.
If a study is not assigned a '++' quality rating, the review team should record the key reasons why this is the case in the quality appraisal checklist comments column, alongside the overall quality rating. They should also record these reasons in the evidence table and highlight them in the narrative summary.
The systematic reviewer should also use the quality appraisal checklist to assess the external validity of studies: the extent to which the findings for the study participants apply to the whole 'source population' (that is, the population they were chosen from).
This involves assessing the extent to which study participants are representative of the source population. It may also involve an assessment of the extent to which, if the study were replicated in a different setting but with similar population parameters, the results would have been the same or similar. If the study includes an 'intervention', then it should be assessed to see whether it would be feasible in settings other than that initially investigated.
Studies should be given a separate rating for external validity (++, + or −) prefixed with 'EV' (external validity).
Reviewers are not expected to search unpublished data as a matter of routine. However, if time and resources allow, the systematic reviewer may obtain such papers, particularly from stakeholders and experts in the topic area (see section 5.6). Any unpublished data that the authors intend to publish as peer-reviewed literature should be quality-assessed in the same way as published studies. If additional information is needed to complete the quality appraisal checklist, the authors should be contacted if possible.
This section describes how to present data from quantitative and qualitative evidence and develop related evidence statements for both qualitative and quantitative evidence reviews.
Any expert or value judgements that have been made (including expert advice from third parties) should be reported in the review.
Both qualitative and quantitative evidence reviews should incorporate narrative summaries of, and evidence tables for, all studies. Concise detail should be given (where appropriate) on populations, interventions, settings, outcomes, measures and effects.
This includes identifying any similarities and differences between studies, for example, in terms of the study population, interventions and outcome measures.
The evidence tables can also be used as data extraction templates for included studies.
Evidence tables can help determine whether it is possible to calculate a summary estimate of effect, if applicable (see Conducting and presenting a meta-analysis in section 6.3.4).
Concise details (sometimes in bullet point or another list form) should be given on: bibliography (authors, date); study aim and type (for example, randomised controlled trial, case–control); population (source, eligible and selected); intervention, if applicable (content, intervener, duration, and method, mode and timing of delivery); method of allocation to study group (if applicable); outcomes (primary and secondary, and whether measures were objective, subjective or otherwise validated); and key findings (including effect sizes, confidence intervals and their significance, for all relevant outcomes).
If given, exact p values (whether or not significant) and confidence intervals must be reported, as should the test from which they were obtained. If p values are not given, any descriptive statistics indicating the direction of the difference between intervention and comparator should be presented. If no further statistical information is available, then this should be clearly stated.
The quality ratings of the study's internal and external validity should also be given (see External validity in section 6.2.2). If study details are not reported (or not applicable), this should be clearly stated.
Concise details should be given on: bibliography (authors, date); location (for example, UK); funding details (if known); population or participants; study design; theoretical perspective; key aims, objectives and research questions; methods (including analytic and data collection technique); key themes or findings (including quotes from participants that illustrate these themes or findings, if appropriate); gaps and limitations; conclusions; and the study's quality rating.
The narrative summary provides an opportunity to place a study and its findings in context. It should highlight key factors influencing the results observed, and give an interpretation of the results and more on the detail presented in the evidence tables (see section 6.3.1).
The narrative summary should conclude with a short discussion, followed by 1 or more evidence statements. These should reflect the key findings, the quantity, quality and consistency of the evidence, and its applicability to the review question (including its applicability to the target population).
Narrative summaries of all studies and interventions should be incorporated in the main findings of the evidence review. They should be organised by review question and could be divided into smaller subcategories, such as outcome measure, setting or subpopulation. The summary should be brief and, where possible, use tables or other methods to summarise and present key elements or features of the evidence.
If appropriate, short summary tables can be included with the main findings (usually preceding an evidence statement) or in the appendices. For example, these might:
summarise the information gleaned for different review questions
summarise the study types, populations, interventions, settings or outcomes for each study related to a particular research question
organise and summarise studies related to different outcomes.
There are a range of ways to summarise and illustrate the strength and direction of quantitative evidence about the effectiveness of an intervention. Some of the most commonly used methods are described below, although this is not an exhaustive list.
Results from relevant studies (whether statistically significant or not) can be presented graphically.
Forest plots should be used to show effect estimates and confidence intervals for each study (when available, or when it is possible to calculate them). They could be used even when it is not appropriate to do a meta-analysis and present a pooled estimate (see Conducting and presenting a meta-analysis in section 6.3.4). However, the homogeneity of the outcomes and measures in the studies needs to be carefully considered: the forest plot needs data derived from the same (or justifiably similar) outcomes and measures.
If a forest plot is not appropriate, other graphical forms may be used (for example, a harvest plot [Ogilvie et al. 2008]).
When outcome measures vary between studies, it may be appropriate to present separate summary graphs for each outcome. However, if outcomes can be transformed on to a common scale by making further assumptions, an integrated (graphical) summary would be helpful. In such cases, the basis (and assumptions) used should be clearly stated and the results obtained in this way should be clearly marked.
Meta-analysis data may be used to produce a graph if the data (usually from randomised controlled trials) are sufficiently homogenous and if there are enough relevant and valid data from comparable (or the same) outcome measures. If such data are not available, the synthesis may have to be restricted to a narrative overview of individual studies looking at the same question. In such cases, a forest plot (see Graphical presentation in section 6.3.4) is a useful way of illustrating the results.
A full description of data synthesis, including meta-analysis and extraction methods, is available in Undertaking systematic reviews of research on effectiveness (NHS Centre for Review and Dissemination 2001).
The nature of qualitative evidence is such that it is unhelpful to set a prescriptive method for its synthesis and description. Qualitative evidence occurs in many forms and formats.
In some cases, the evidence may be synthesised and then summarised. In other cases, a narrative description may be adequate. The approach used depends on the volume and consistency of the evidence. If the qualitative literature is extensive, then a synthetic approach is preferable. If the evidence is more disparate and sparse, a descriptive approach may be more appropriate.
An evidence statement is a brief summary of 1 finding from a review of evidence that social care guidance is based on.
This section applies to both qualitative and quantitative reviews. As described in section 6.3.2, each evidence review should include a narrative summary and should conclude with a short discussion and 1 or more supporting evidence statements.
The evidence statements should reflect the strength (quality, quantity and consistency) of the evidence and make a statement about its applicability. They may also highlight a lack of evidence. They should provide an aggregated summary of the evidence (from 1 or more studies) in relation to a key question or issue. In the case of intervention studies, they should also reflect what is plausible, given the evidence available about what has worked in similar circumstances.
Evidence statements are structured and written to help the GDG formulate and prioritise recommendations. They help it decide:
whether or not there is sufficient evidence (in terms of strength and applicability) to form a judgement
whether, on balance, the evidence shows that an intervention or programme can be effective or is inconclusive (where relevant)
the typical size of effect (where relevant)
whether the evidence is applicable to the target groups and contexts covered by the guidance.
Evidence statements that support the recommendations should be included in the final guidance document.
One or more evidence statements are prepared for each review question or its subsidiary questions. (Subsidiary questions may cover a type of intervention, specific population groups, a setting or an outcome.)
Once all the data have been collected, consideration should be given on how to group the evidence. For example, it could be grouped according to the similarity of the populations, interventions and outcomes covered in the studies.
However, the decision will be highly context-specific and will depend on the amount, breadth and depth of evidence. A separate evidence statement for each study should be avoided. Evidence statements based on so many studies that the statement becomes too generic and therefore meaningless are also to be avoided.
Short evidence statements should be presented, by outcome where possible, summarising the key features of the evidence on effectiveness (including harms as appropriate) and cost effectiveness.
The evidence statements should include the number of studies and participants, the quality of the evidence and the direction of estimate of the effect. An evidence statement may be needed even if no evidence is identified for an important outcome. Evidence statements may also note the presence of relevant ongoing research.
There is moderate evidence of mixed quality from 4 retrospective US cohort studies (1 [++], 1 [+], 2 [–]) to suggest that looked-after children and young people who received transition support services (TSSs) were more likely to complete compulsory education with formal qualifications than those who had not received these TSSs; whereas 1 prospective US cohort study (+) reported a non-significant finding in favour of the comparison group.
There is moderate evidence of a mixed effect with regard to the effect of TSSs on employment at case closing. Two US cohort studies, 1 prospective (+) and 1 retrospective (–) reported that those who had received TSSs were more likely to be employed at case closing than those who had not received TSSs, whereas 1 retrospective US cohort study (–) reported that those who had received TSSs were less likely to be employed at case closing than those who had not.
Both of the above examples are taken from Looked-after children and young people (NICE public health guidance 28).
Any relevant published guidance (from NICE and other agencies) should always be identified and considered, as well as relevant NICE guidance in development.
Recommendations taken from published NICE guidance should be quoted verbatim. Published NICE guidance should be fully referenced and the evidence underpinning the recommendations left unchanged, provided it is not out of date.
Relevant published guidance from other organisations may be identified in the search for evidence. If these are not from NICE accredited sources, they should be assessed for quality using the AGREE II (appraisal of guidelines research and evaluation II) instrument (Brouwers et al. 2010). The aim is to ensure they have sufficient documentation to be considered.
There is no cut-off point for accepting or rejecting a piece of guidance, and each GDG will need to set its own parameters. These should be documented in the methods section of the guidance, along with a summary of the assessment. The results should be presented as an appendix to the guidance.
Reviews of evidence from other guidance that cover questions formulated by the GDG may be considered as evidence if:
they are assessed using the appropriate methodology checklist from this manual and are judged to be of high quality
they are accompanied by an evidence statement and evidence tables
the evidence is updated according to the process for exceptional updates of NICE social care guidance (see section 14.1.2).
The GDG should create its own evidence summaries or statements to include in the standard template. Evidence tables from other guidance should be referenced with a direct link to the source website or a full reference of the published document. The GDG should formulate its own recommendations, taking into consideration the whole body of evidence.
Recommendations from other guidance should not be quoted verbatim, except for recommendations from Department of Health or Department for Education social care policy or legislation.
In the discussion section of the evidence reviews, the following questions should be considered.
All relevant inequalities data should be included in the reviews. At the data extraction stage, reviewers are prompted to refer to the PROGRESS-Plus criteria (age, sex, sexual orientation, disability, family origin, religion, place of residence, occupation, education, socioeconomic position and social capital; Oliver et al. 2008). Review inclusion and exclusion criteria should also take the relevant groups into account.
Equalities evidence should be considered during the drafting of reviews. It should be included in the data extraction process and should appear in the summary evidence statements.
Brouwers M, Kho ME, Browman GP et al. for the AGREE Next Steps Consortium (2010) AGREE II: advancing guideline development, reporting and evaluation in healthcare. Canadian Medical Association Journal 182: E839–42
Centre for Reviews and Dissemination (2009) Systematic reviews: CRD's guidance for undertaking reviews in health care. University of York: Centre for Reviews and Dissemination
Dixon-Woods M, Agarwal S, Young B et al. (2004) Integrative approaches to qualitative and quantitative evidence. London: Health Development Agency
Drummond MF, O'Brien B, Stoddart GL et al. (1997) Critical assessment of economic evaluation. In: Methods for the economic evaluation of health care programmes, 2nd edition. Oxford: Oxford Medical Publications
Eccles M, Mason J (2001) How to develop cost-conscious guidelines. Health Technology Assessment 5: 1–69
Edwards P, Clarke M, DiGuiseppi C et al. (2002) Identification of randomized trials in systematic reviews: accuracy and reliability of screening records. Statistics in Medicine 21: 1635–40
Evers SMAA, Goossens M, de Vet H et al. (2005) Criteria list for assessment of methodological quality of economic evaluations: Consensus on Health Economic Criteria. International Journal of Technology Assessment in Health Care 21: 240–5
Higgins JPT, Green S, editors (2011) Cochrane handbook for systematic reviews of interventions. Version 5.1.0 [updated March 2011]
Ogilvie D, Fayter D, Petticrew M et al. (2008) The harvest plot: a method for synthesising evidence about the differential effects of interventions. BMC Medical Research Methodology 8: 8
Oxman AD, Guyatt GH (1992) A consumer's guide to subgroup analyses. Annals of Internal Medicine 116: 78–84
Scottish Intercollegiate Guidelines Network (2008) SIGN 50. A guideline developer's handbook. Revised edition January 2008. Edinburgh: Scottish Intercollegiate Guidelines Network
Sutton AJ, Jones DR, Abrams KR et al. (2000) Methods for meta-analysis in medical research. London: John Wiley