5 Reviewing the scientific evidence

5.1 Introduction

This chapter describes how NICE's public health 'evidence reviews' are produced. These are normally conducted by external contractors, but occasionally they may be done by an internal NICE review team. The same methods and quality assurance processes apply to external contractors and NICE review teams.

As outlined in chapter 3, different types of evidence can be used to answer different research questions. For example:

  • experimental studies (such as controlled trials) and observational studies (such as before-and-after studies) can be used to consider the effectiveness of interventions

  • correlation studies look at the relationship between exposure to particular factors and an outcome of interest

  • qualitative research studies (such as interviews or focus groups) can be used to examine the views of the target populations.

Reviewing is an explicit, systematic and transparent process that can be applied to both quantitative (experimental, observational and correlational) and qualitative evidence (see chapter 3). The review team should follow the procedures outlined below to the best of its ability. However, at certain points, evidence reviews may well differ and flexibility may be needed (for example, to determine the strength of the evidence or the way evidence statements should be organised).

Further, as the presentation and interpretation of different types of evidence will always involve some degree of expert judgement, the review process relies on the expertise of the review team and the CPHE project team. (For the appraisal and presentation of economic evidence, see chapter 6.)

Standard systematic review methodologies (for example, those used by Cochrane reviewers) prescribe exhaustive and thorough processes, normally relating to efficacy and effectiveness and often taking years to complete. This approach focuses on the precision and reliability of measurements used in the original science and tends to emphasise the limits of the evidence. However, while it is important to be aware of these limits, the process of interpretation is equally important.

NICE public health evidence reviews need to summarise and interpret evidence, in spite of its limitations so that the Public Health Advisory Committees (PHACs) can make recommendations in areas of uncertainty. Rarely would it be necessary to undertake a full systematic review (and anyway, time and resource constraints would make this difficult). The key point is that the evidence reviewed has to be good enough for the advisory committees to be able to make decisions about their recommendations.

Evidence reviews are made available during the guidance consultation, and also published on the NICE website alongside the final guidance. They may also be made available through other NICE information resources and guidance products, such as NICE pathways and the Effective Interventions Library. It is important that each evidence review is written and presented in a way that allows it to be read and understood in isolation from any other reviews developed for a guidance topic. It is also important that information about interventions is clearly and accurately presented in both narrative and table sections of the review, and in sufficient detail, to ensure clear and transparent links between recommendations and evidence.

All NICE public health evidence reviews (except mapping reviews) involve the following steps:

1. Select the relevant evidence.

2. Assess its quality.

3. Extract, synthesise and present it.

4. Derive evidence statements.

5. Assess its applicability.

(NICE public health mapping reviews may use purposive sampling to select data and do not necessarily assess its quality.)

5.2 Selecting relevant evidence

This section applies to both qualitative and quantitative evidence reviews.

Identifying and selecting all relevant studies is a critical stage in the evidence review process (see chapter 4 for identifying evidence). Before undertaking screening, the review team should discuss and work through examples of studies meeting the inclusion criteria (as set out in the agreed review protocol) to ensure a high degree of inter-rater reliability. Then studies meeting the inclusion criteria should be selected using the 2-stage screening approach below:

  1. Title or abstract screening: titles or abstracts should normally be screened independently by 2 reviewers (that is, they should be double-screened) using the parameters set out in the review protocol. Where reviewers disagree about a study's relevance, this should be resolved by discussion or by recourse to a third reviewer. If, after discussion, there is still doubt about whether or not the study meets the inclusion criteria, it should be retained.

  2. Full-paper screening: once title or abstract screening is complete, the review team should assess full-paper copies of the selected studies, using a full-paper screening tool developed for this purpose. This should normally be done independently by 2 people (that is, the studies should be double-screened). Any differences should be resolved by discussion between the 2 reviewers or by recourse to a third reviewer.

The study selection process should be clearly documented and include details of the inclusion criteria. A flow chart should be used to summarise the number of papers included and excluded at each stage of the process and this should be presented in the report. Each study excluded at the full-paper screening stage should be listed in the appendix of the review, along with the reason for its exclusion.

5.3 Assessing the quality of the evidence

5.3.1 Introduction

This section applies to the assessment of both qualitative and quantitative evidence (for the appraisal and presentation of economic evidence, see chapter 6).

  • The review team should assess the quality of evidence selected for inclusion in the review using the appropriate quality appraisal checklist (see section 5.3.2). This is a key stage in the guidance development process, since the quality rating of studies will be reflected in the evidence statements (see section 5.5). These, in turn, inform the recommendations (along with other factors and considerations, see section 7.2).

Some of the more commonly used study types and their abbreviations are listed below:

Quantitative studies: experimental
  • Before-and-after study.

  • Non-randomised controlled trial (NRCT).

  • Randomised controlled trial (RCT).

Quantitative studies: observational
  • Before-and-after study.

  • Case–control study.

  • Cohort study.

  • Correlation study.

  • Cross-sectional study.

  • Interrupted time series.

Qualitative studies
  • Document analysis.

  • Focus groups.

  • Interview study.

  • Observation and participant observation.

Economic studies
  • Cost–benefit analysis.

  • Cost–consequence analysis.

  • Cost-effectiveness analysis.

  • Cost–utility analysis.

The internal and external validity (see sections and of quantitative studies should be assessed using either the quality appraisal checklist for intervention studies (see appendix F) or that for correlation studies (see appendix G). Appendix E includes an algorithm for identifying the quantitative study type and this terminology should replace any provided by the study author.

Qualitative studies should be assessed using the checklist in appendix H. This is important to maintain an audit trail. However, it is acknowledged that the concept of validity in qualitative research is less clearly defined than for quantitative research. As a result, the review team may wish to take account of other factors when judging the 'trustworthiness' of the study, its relevance to the research questions and how 'convincing' the results are. These factors should be clearly described in the review.

Some studies, particularly those using mixed methods, may report quantitative, qualitative and economic outcomes. In such cases, each aspect of the study should be separately assessed using the appropriate checklist. Similarly, a study may assess the effectiveness of an intervention using different outcome measures, some of which will be more reliable than others (for example, self-reported smoking versus a measure of plasma cotinine levels). In such cases, the study might be rated differently for each outcome, depending on the reliability of the measures used. For further information on how to integrate evidence from qualitative and quantitative studies, see Dixon-Woods et al. (2004).

5.3.2 Quality assessment

Quality assessment is a critical stage of the evidence review process. Before undertaking the assessment, the review team should discuss and work through some of the studies to ensure there is a high degree of inter-rater reliability.

Each full paper should be assessed by 1 reviewer and checked for accuracy by another. Periodically throughout the process, a random selection should be considered independently by 2 people (that is, double-assessed). The size of the sample will vary from review to review, but a minimum of 10% of the studies should be double-assessed. Any differences in quality grading should be resolved by discussion or recourse to a third reviewer.

The composite inter-rater reliability scores should be reported, preferably as a kappa statistic and noting if it is good (between 0.60 and 0.74) or excellent (above 0.75) (see Cochrane Collaboration 2011. If the inter-rater reliability score is below 0.60, the reasons for digression should be explored and a course of action agreed. Internal validity

The review team should use the relevant quality appraisal checklist to assess a study's internal validity: that is, to check if potential sources of bias have been minimised and to determine if its conclusions are open to any degree of doubt. Each study should be rated ('++', '+' or '−') to indicate its quality:

Quality rating

++ All or most of the checklist criteria have been fulfilled, where they have not been fulfilled the conclusions are very unlikely to alter.

+ Some of the checklist criteria have been fulfilled, where they have not been fulfilled, or not adequately described, the conclusions are unlikely to alter.

– Few or no checklist criteria have been fulfilled and the conclusions are likely or very likely to alter.

If a study is not assigned a '++' quality rating, the review team should record the key reasons why this is the case in the quality appraisal checklist comments column, alongside the overall quality rating. They should also record these reasons in the evidence table 'notes' column under 'Limitations identified by review team' (see appendix K) and highlight them in the narrative summary. External validity

The review team should also use the quality appraisal checklist to assess the external validity of quantitative studies: the extent to which the findings for the study participants are generalisable to the whole 'source population' (that is, the population they were chosen from). This involves assessing the extent to which study participants are representative of the source population. It may also involve an assessment of the extent to which, if the study were replicated in a different setting but with similar population parameters, the results would have been the same or similar. If the study includes an 'intervention', then it will also be assessed to see if it would be feasible in settings other than the 1 initially investigated.

Studies should be given a separate rating for external validity (++, + or −) prefixed with 'EV' (external validity).

Most qualitative studies by their very nature will not be generalizable. However, where there is reason to suppose the results would have broader applicability they should be assessed for external validity.

External validity is different to 'applicability' (see section 5.6). Unpublished data, studies in progress and grey literature

Reviewers are not expected to search the grey literature or unpublished data as a matter of routine. However, if time and resources allow (or if the grey literature is particularly relevant), the review team may obtain such papers, particularly from stakeholders and experts in the topic area. Any unpublished data that the authors intend to publish as peer-reviewed literature should be quality-assessed in the same way as published studies. Ideally, if additional information is needed to complete the quality appraisal checklist, the authors should be contacted.

Grey literature may be assessed similarly, although this is not always appropriate. Where the grey literature has important insights to convey these should be reported in a manner to be agreed with the CPHE project team.

5.4 Extracting, synthesising and presenting the evidence

This section describes how to present data and develop related evidence statements for both qualitative and quantitative evidence reviews. Further detail for reviewers on the type of study information, results and data required in evidence tables (including data and information needed for the effective interventions library) is given in appendix K.

Any expert or value judgements that have been made (including expert advice from third parties) should be reported in the review.

Both qualitative and quantitative evidence reviews should incorporate narrative summaries of, and evidence tables for, all studies (see appendix K). Concise detail should be given (where appropriate) on:

  • populations

  • interventions

  • settings

  • outcomes (measures and effects).

This includes identifying any similarities and differences between studies, for example, in terms of the study population, interventions and outcome measures.

The summaries and evidence tables should be produced using the quality appraisal checklists for each study (see section 5.3.2) and original papers or reports (see appendix K).

Review authors should refer to 'Writing for NICE' and the 'NICE style guide' (noting that the Harvard referencing system is preferred). Both are available from the CPHE project team.

5.4.1 Data extraction and evidence tables

The format, reporting content and level of detail required for the evidence tables should be formally agreed with the NICE CPHE team for each topic early in the review process. Evidence tables should contain precise summary information about each study in the review. The tables are useful resources in guidance development, as they provide a quick and accessible digest of key features, results and findings related to the included studies. The information contained in the tables may also be used at a later stage to populate other guidance products, such as the Effective Interventions Library. It is important that information recorded in the tables is as precise, clear and complete as possible, because it provides an important link between recommendations and evidence. Where sufficient data is available, it may be necessary for the review team to calculate additional, summary information or data (including odds ratio, numbers needed to treat etc). Where this is needed CPHE will discuss with the review team early in the review process.

The evidence tables (see appendix K) can also be used as data extraction templates for quantitative (intervention and correlation studies), qualitative and economic studies and review-level material.

Each evidence review should include 1 main evidence table containing summaries of all the studies used in alphabetical order (by first author). If a review includes different types of studies (that is, quantitative, qualitative and economic), these may also be listed in separate evidence tables in the review. The review team should discuss any substantial changes to the structure of these tables with the CPHE project team. The CPHE team check that the summary information contained in the tables is consistent with the requirements set out in this manual.

Evidence tables can help determine whether it is possible for the review team to calculate a summary estimate of effect, if applicable (see section and appendix K). Evidence tables for quantitative studies

This section provides some detail on the reporting content of evidence tables, but it is not exhaustive. As described throughout this manual, the type and quality of available evidence to inform guidance development will vary across topics.

The quantitative evidence table templates in appendix K show examples of the type of data and study information that should be included for quantitative studies (both experimental and observational).

Concise details (sometimes in bullet point or another list form) should be given on:

  • bibliography (authors, date)

  • study aim and type (for example, RCT, case–control)

  • population (source, eligible and selected)

  • intervention, if applicable (content, intervener, duration, method, mode or timing of delivery)

  • method of allocation to study group (if applicable)

  • outcomes (primary and secondary and whether measures were objective, subjective or otherwise validated)

  • key findings (including effect sizes, confidence intervals and their significance, for all relevant outcomes)

  • inadequately reported or missing data.

Where given, exact p values (whether or not significant) and confidence intervals must be reported, as should the test from which they were obtained. Where p values are inadequately reported or not given, this should be stated. Any descriptive statistics (including any mean values) indicating the direction of the difference between intervention and comparator should be presented. If no further statistical information is available, this should be clearly stated.

The quality ratings of the study's internal and external validity should also be given (see section 5.3.2). Where study details are inadequately reported, absent (or not applicable), this should be clearly stated. Evidence tables for qualitative studies

The qualitative evidence table template shows the type of data that should be included for qualitative studies (see appendix K). Concise details should be given on:

  • bibliography (authors, date)

  • location (for example, UK)

  • funding details (if known)

  • population or participants

  • study design

  • theoretical perspective

  • key aims, objectives and research questions; methods (including analytic and data collection technique)

  • key themes/findings (including quotes from participants that illustrate these themes/findings, if appropriate)

  • gaps and limitations

  • conclusions

  • the study's quality rating.

5.4.2 Narrative summaries of quantitative or qualitative studies

The narrative summary provides an opportunity to place a study and its findings in context. It should highlight key factors influencing the results observed, an interpretation of the results and more on the detail presented in the evidence tables (see section 5.4.1). Each narrative summary should include:

  • A brief description of the study design, methodology, population, setting and research questions or outcomes (if appropriate) for all relevant studies.

  • A summary of the key findings.

  • A summary of the quality ratings (expanding, where appropriate, on study strengths and weaknesses), applicability issues and any other relevant contextual points.

  • Commentary on the scale and nature of the evidence base may also be useful.

The narrative summary should conclude with a short discussion, followed by 1 or more evidence statements (see section 5.5). These should reflect the key findings, the quantity, quality and consistency of the evidence, and its applicability to the research question (including its applicability to the target population).

Narrative summaries of all studies and interventions should be incorporated in the main findings of the evidence review. They should be organised by research question and could be divided into smaller subcategories, such as outcome measure, setting or subpopulation.

5.4.3 Summary tables

If appropriate, short summary tables can be included with the main findings (usually preceding an evidence statement) or in the appendices. For example, these might:

  • summarise the information gleaned for different research questions

  • summarise the study types, populations, interventions, settings or outcomes for each study related to a particular research question

  • organise and summarise studies related to different outcomes.

5.4.4 Other presentations of quantitative data

There are a range of ways to summarise and illustrate the strength and direction of quantitative evidence about the effectiveness of an intervention. Some of the most commonly used methods are described below, although this is not an exhaustive list: the review team should discuss the form any data presentations might take with the CPHE project team. Graphical presentation

Results from relevant studies (whether statistically significant or not) can be presented graphically.

Forest plots should be used to show effect estimates and confidence intervals for each study (when available, or when it is possible to calculate them). If possible, they should be used even when it is not appropriate to do a meta-analysis and present a pooled estimate (see section However, the homogeneity of the outcomes and measures in the studies needs to be carefully considered: the forest plot needs data derived from the same (or justifiably similar) outcomes and measures.

If a forest plot is not appropriate, other graphical forms may be used (for example, a harvest plot [Ogilvie et al. 2008]).

When outcome measures vary between studies, it may be appropriate to present separate summary graphs for each outcome. However, if outcomes can be transformed on to a common scale by making further assumptions, an integrated (graphical) summary would be helpful. In such cases, the basis (and assumptions) used should be clearly stated and the results obtained in this way should be clearly indicated.

On any graph, the order of entries, symbols, line types and brief text may all be used to illustrate the study results. Sometimes, more than 1 graph may be needed to avoid undue complexity. If evidence from a meta-analysis is being presented, it is often appropriate to plot the pooled estimate and its confidence interval.

Figure 5.1 was drawn using the 'Stata' statistical package[4] to plot symbols at the relative risk estimates, and lines ('error bars') between the corresponding upper and lower 95% confidence intervals.

Figure 5.1 Graphical presentation of quantitative evidence of effectiveness

Figure 5.1 Graphical presentation of quantitative evidence of effectiveness

RCT results (that can be expressed as relative risks) are displayed in order of decreasing study quality (and, within that, by publication date). This helps identify any links between effect-estimate and study quality. Different symbols are used to distinguish long-term from very long-term outcomes.

In this example, the symbol size has been used to give visual emphasis to the larger studies. Other dimensions of interest (such as standardised versus self-reported measures, or the country where the study was set) could also be represented in this or a supplementary graph. Conducting and presenting a meta-analysis[5]

Meta-analysis data may be used to produce a graph if the data (usually from RCTs) is sufficiently homogenous and if there is enough relevant and valid data from comparable (or the same) outcome measures. Where such data are not available, the synthesis may have to be restricted to a narrative overview of individual studies looking at the same question. In such cases, a forest plot (see section is 1 useful way of illustrating the results.

The characteristics and limitations of the data in the meta-analysis should be fully reported (for example, in relation to the population, intervention, setting, sample size and validity of the evidence).

Before pooling or combining the results of different studies, the degree of heterogeneity in the data should be assessed to determine how the results have been affected by the circumstances in which studies were carried out. The results of any homogeneity tests should be reported.

Statistical heterogeneity can be addressed using a random (as opposed to fixed) effects model. The impact of known research heterogeneity (for example, population characteristics or the intensity or frequency of an intervention) can be managed using methods such as subgroup analyses and meta-regression.

For methodological heterogeneity (for example, where different trials of varying quality are involved), sensitivity analyses should be carried out by varying the studies in the meta-analysis. Forest plots should include lines for studies that are believed to contain relevant data, even if details are missing from the published study. An estimate of the proportion of missing eligible data is needed for each analysis (as some studies will not include all relevant outcomes). Sensitivity analysis can be used to investigate the impact of missing data.

Publication bias (studies, particularly small studies, are more likely to be published if they include statistically significant or interesting results) should be critically assessed and reported in the interpretation of the meta-analysis results. It may be helpful to inspect funnel plots for asymmetry to identify any publication bias (see the Cochrane website; also Sutton et al. 2000).

Similarly, the possibility of selective reporting of outcomes (emphasising statistically significant results over others, for example) should be considered. In part, this can be done by examining which outcomes were described as primary and secondary in study reports or protocols.

A full description of data synthesis, including meta-analysis and extraction methods, is available in Undertaking systematic reviews of research on effectiveness (NHS Centre for Review and Dissemination 2001).

5.4.5 Other presentations of qualitative data based on analytic and structured techniques

The nature of qualitative evidence is such that it is unhelpful to set a prescriptive method for its synthesis and description. Qualitative evidence occurs in many forms and formats. This section includes some of the methods that may be used to synthesise and present it. As with all data synthesis, the key is transparency. It is important that PHACs and stakeholders can easily follow the method used. It should be written up in clear English and any analytic decisions should be clearly justified.

In some cases, the evidence may be synthesised and then summarised. In other cases, a narrative description may be adequate. The approach used depends on the volume and consistency of the evidence. If the qualitative literature is extensive, then a synthetic approach is preferable. If the evidence is more disparate and sparse, a descriptive approach may be more appropriate. Reporting sparse, disparate qualitative evidence

In many cases, qualitative reviews will comprise relatively few papers compared to quantitative reviews and often their focus will be inconsistent (for example, they may involve different settings, populations or interventions). If the papers have little in common it would not be appropriate to synthesise them. Instead, the authors of the review should provide a narrative description of the key themes (including illustrative quotes) of each paper. They should also provide a quality appraisal and brief general description of each study (for example, describing the methods used, the participants involved and the underlying rationale).

Both the narrative summary and the evidence table should identify all the main themes reported: only themes that are not relevant to the review at hand should be left out and these omissions should be clearly documented. As in all qualitative research, particular attention should be paid to 'outliers' (other themes) and views that disagree with or contradict the main body of research.

The narrative description should be divided up under headings derived from the research question (for example, the settings of interest) unless good reasons are documented for not doing so. It should be summarised into evidence statements that note areas of agreement and contradiction (see section 5.5). Synthesising qualitative evidence

The simplest and most rigorous approach to presenting qualitative data in a meaningful way is to analyse the themes (or 'meta' themes) in the evidence tables and write a narrative based on them. This 'second level' thematic analysis can be carried out if ample data are found and the papers and research reports cover the same (or similar) factors. (These should be relevant to the research questions and could, for example, include intervention, age, group or setting.)

It can be carried out in 1 of 2 ways. More simply, papers reporting on the same phenomena can be grouped together to compare and contrast themes, focusing not just on consistency but also on any differences. The narrative should be based on these themes.

A more complex but useful approach is 'conceptual mapping' (see Johnson et al. 2000). This involves identifying the key themes and concepts across all the evidence tables and grouping them into first level (major), second level (associated) and third level (subthemes) themes. Results are presented in schematic form as a conceptual diagram and the narrative is based on the structure of the diagram.

Alternatively, themes can be identified and extracted from the data itself, using a grounded approach (see Glaser and Strauss 1967). Other potential techniques include meta-ethnography (see Noblit and Hare 1988) and meta-synthesis (see Barroso 2000). Reporting 'bias' or variation

Any review or in particular, any synthesis of qualitative data must, by its nature, mask some of the variations considered important by qualitative researchers (for example, the way the researcher interacts with research participants when gathering data). Reviewers should, as far as possible, highlight any significant causes of variation noted during data extraction.

5.5 Deriving evidence statements

5.5.1 Introduction

This section applies to both qualitative and quantitative reviews. As described in section 5.4.2, each evidence review should include a narrative summary and should conclude with a short discussion and 1 or more supporting evidence statements.

The evidence statements should provide an aggregated summary of all of the relevant studies, regardless of their findings, for a key question or issue. They should reflect the balance of the evidence, its strength (quality, quantity and consistency) and applicability. The evidence statements can also highlight where there is a lack of evidence (note that this is different to there being evidence for a lack of effect). In the case of intervention studies, they should reflect what is plausible, given the evidence available about what has worked in similar circumstances.

They are structured and written to help the PHACs formulate and prioritise recommendations. They help the committees decide:

  • whether or not there is sufficient evidence (in terms of strength and applicability) to form a judgement

  • where relevant, whether (on balance) the evidence demonstrates that an intervention, approach or programme can be effective or is inconclusive

  • where relevant, the typical size of effect (where there is 1)

  • whether the evidence is applicable to the target groups and contexts being covered by the guidance.

Evidence statements that support the recommendations should be included in the final guidance.

5.5.2 Structure and content of evidence statements

One or more evidence statements are prepared for each review research question or its subsidiary questions. (Subsidiary questions may cover a type of intervention, specific population groups, a setting or an outcome.)

Once it has all the data, the review team should discuss with the CPHE project team how it intends to 'group' the evidence. For example, it could be grouped according to the similarity of the populations, interventions or outcomes covered in the studies. However, the decision will be highly context-specific and will depend on the amount, breadth and depth of evidence. The review team should avoid developing a separate evidence statement for each study while, at the same time, not grouping so many studies together that the evidence statements become too generic and therefore meaningless.

The evidence statements could comprise an overarching summary statement supported by various subsidiary statements. They should provide a clear, self-contained summary.

They should refer to the sources of evidence (study type and references) and their quality in brief descriptive terms and not just by acronyms. In addition, each statement should include summary information about the:

  • content of the intervention, if applicable (for example, what, how, where?)

  • population(s) and setting(s) (and country), if applicable

  • strength of evidence (reflecting the appropriateness of the study design to answer the question and the quality, quantity and consistency of evidence)

  • outcome(s), the direction of effect (or correlation) and the size of effect (or correlation) (if applicable)

  • applicability to the question, target population and setting (see section 5.6).

Note that the strength of the evidence is reported separately to the direction and size of the effects or correlations observed (if applicable).

5.5.3 Evidence statement terminology

Terms that describe the strength of the evidence should be used consistently in each review and their definitions should be reported in its methodology section. A set of standardised terms is given below. However, the evidence base for each review may vary, so the review team should define how these terms have been used. Strength of evidence

The overall strength (quality, quantity and consistency) of the evidence may be summarised as:

  • No evidence [6] Be clear about the sources and inclusion criteria. For example, state: 'No evidence was found from English-language trials published since 1990…'.

  • Weak evidence For example, 'There was weak evidence from 1 (−) before-and-after study'.

  • Moderate evidence For example, 'There was moderate evidence from 2 (+) case–control studies'.

  • Strong evidence For example, 'There was strong evidence from 2 (++) and 1 (+) randomised controlled trials'.

  • Inconsistent evidence Further commentary may be needed on the variability of findings in different studies. For example, when the results of (++) or (+) quality studies do not agree. In such cases, the review team may qualify an evidence statement with an explanatory sentence or section that gives more detail.

The above terms should not be used to describe other aspects of the evidence, such as applicability or size of effect (see below for suitable terminology).

'Vote counting' (merely reporting on the number of studies yielding significant effects) is not an acceptable summary of the evidence. Direction and size of effect

If appropriate, the direction of effect (impact) orcorrelation should be summarised using 1 of the following terms:

  • positive

  • negative

  • mixed

  • none.

However, appropriate context/topic-specific terms (for example, 'an increase in HIV incidence', 'a reduction in injecting drug use' and 'smoking cessation') may be used.

If appropriate, the size of effect (impact) or correlation and, wherever possible, the degree of uncertainty involved, should be reported using the scale applied in the relevant study. For example, an odds ratio (OR) or relative risk (RR) with confidence interval (CI), or a standardised effect size and its standard error, may be quoted. Where an estimate cannot be explained, every effort should be made to relate it to interpretable criteria or conventional public health measures. If it is not possible to provide figures for each study, or if there are too many studies to make this feasible, the size of effect or correlation can be summarised using the following standardised terms:

  • small

  • medium

  • large.

These terms should be used consistently in each review and their definitions should be reported in its methodology section.

5.5.4 Quantitative evidence statements

An example of an evidence statement about the effectiveness of an intervention is presented below. Note that these examples have been adapted from the originals and are for illustrative purposes only. Note also the use of superscript numbers to refer to the study author or date reference, which are given immediately below the evidence statement:

There is strong evidence from 4 studies (2 UK1,2 and 2 US3,4) to suggest that educational interventions delivered by youth workers may reduce the incidence of hazardous drinking by young people. Two (++) RCTs1,2 and 1 (+) NRCT3 showed reduced risk (95% confidence interval) in the intervention group: 0.75 (0.58–0.94)1; 0.66 (0.57–0.78)2; 0.42 (0.18–0.84)3. Another (+) RCT4 showed reduced risk but was not statistically significant: 0.96 (0.84–1.09). However, 1 (−) NRCT5 found increased risk of binge drinking in the intervention group: 1.40 (1.21–1.74).

1 Jelley et al. 2009 (++)
2 Lake et al. 2008 (++)
3 Wagner et al. 2010 (+)
4 Blake et al. 2007 (+)
5 Jensen et al. 2006 (-).

The following is an example of an evidence statement from a 'correlates review':

There is moderate evidence from 3 UK cross-sectional studies (2 [+]1,2 and 1 [−]3) about the correlation between young people's communication skills with safer sex and a reduction in the number of teenage pregnancies. The evidence about the strength of this correlation is mixed. One (+) study1 found that discussing condom use with new partners was associated with actual condom use at first sex (OR 2.67 [95% CI 1.55–4.57]). Another (−) study3 found that not talking to a partner about protection before first sexual intercourse was associated with teenage pregnancy (OR 1.67 [1.03–2.72]). However, another (+) study2 found small correlations between condom use, discussions about safer sex (r=0.072, p<0.01) and communication skills (r=0.204, p<0.01).

Jensen et al. 2007 (+)
2 Buston et al. 2007 (+)
3 DiLorio et al. 2000 (-)

5.5.5 Qualitative evidence statements

Evidence statements developed from qualitative data do not usually report on the impact an intervention has on behaviour or health outcomes, nor do they report statistical effects or aggregate measures of strength and effect size. They should summarise the evidence, its context and quality, and the consistency of key findings and themes across studies. Areas where there is little (or no) concurrence should also be summarised. For example:

Two UK studies (1 [+]1 and 1 [++]2) and 1 (+) Dutch study3 reported on the views of teenage mothers. In 1 (+) study1 of teenage mothers interviewed in a family planning clinic and 1 (++) study2 of teenage mothers who responded to a questionnaire at their GP surgery, the participants agreed that access to education was the thing that helped them most after they had their child. However, this was not reported as a key theme in the Dutch study3 of health visitor perceptions of teenage mothers' needs.

Smith 1999 (+)
2 Jones 2000 (++)
3 Van Dinkleholm 2004 (+).

5.6 Assessing applicability

This section applies to both quantitative and qualitative reviews. It describes how the review team should assess the applicability of evidence in qualitative or quantitative reviews. Health economic data is assessed differently (see chapter 6).

The PHAC needs to judge the extent to which the evidence reported in the reviews is applicable to the areas for which it is developing recommendations. The review team should assess each evidence statement to judge how similar the population(s), setting(s), intervention(s) and outcome(s) of the underpinning studies are to those outlined in the review question(s). The studies should be assessed as a whole – rather than assessing each one individually.

The following characteristics should be considered:

  • Population Age, sex/gender, race/ethnicity, disability, sexual orientation/gender identity, religion/beliefs, socioeconomic status, health status (for example, severity of illness/disease), other characteristics specific to the topic area/review question(s).

  • Setting Country, geographical context (for example, urban/rural), healthcare/delivery system, legislative, policy, cultural, socioeconomic and fiscal context, other characteristics specific to the topic area/review question(s).

  • Intervention Feasibility (for example, in terms of health services/costs/reach), practicalities (for example, experience/training required), acceptability (for example, number of visits/adherence required), accessibility (for example, transport/outreach required), other characteristics specific to the topic area/review question(s).

  • Outcomes Appropriate/relevant, follow-up periods, important health effects.

Following this assessment, the review team should categorise each evidence statement as:

  • directly applicable

  • partially applicable

  • not applicable.

A statement detailing the category it falls into and the reasons why should appear at the end of the evidence statement. It should state: 'This evidence is (directly, partially or not) applicable because ...'. An example[7] of an applicability statement is presented below:

This evidence is only partially applicable to people in the UK who inject drugs. That is because all these studies were conducted in countries where needles are mainly sold by pharmacies (USA, Russia and France), rather than freely distributed, as is the norm in the UK.

The review team should note that the PHAC needs to judge the extent to which the evidence reported in the reviews is applicable to the areas/topics for which it is developing recommendations and it may ask for additional information on the applicability of the evidence. The review team should also be aware that the PHAC will draw upon a wide range of information in reaching its final judgement.

Although similar issues are considered when assessing the applicability of health economic data, there are some important differences. Details can be found in section 6.2.2.

5.7 Published guidance

The review team should identify relevant published guidance (from NICE and other agencies) in its data search, as well as relevant NICE guidance in development.

5.7.1 NICE guidance

NICE guidance (public health or clinical) should be fully referenced and the evidence underpinning the recommendations left unchanged, provided it is not out of date. If there is new published evidence that would significantly alter the existing recommendations, the review team should bring this to the attention of the CPHE project team.

The CPHE project team, in turn, should pass it to the relevant team in NICE so that it can consider whether or not to update the guidance.

5.7.2 Other guidance

Other relevant published guidance should be assessed for quality using the Appraisal of Guidelines for Research and Evaluation (AGREE) II instrument (AGREE Collaboration 2003; Brouwers et al 2010) to ensure it is sufficiently documented. The PHAC should set the cut-off point for accepting or rejecting other guidance and this should be documented in the guidance appendices.

5.8 Equality and diversity

In the discussion section of the evidence reviews, the following questions should be considered.

5.8.1 Are the evidence-review criteria inclusive?

All relevant inequalities data should be included in the reviews. At the data extraction stage, reviewers are prompted to refer to the PROGRESS-Plus criteria (including age, sex, sexual orientation, disability, ethnicity, religion, place of residence, occupation, education, socioeconomic position and social capital) (Gough et al 2012). Review inclusion and exclusion criteria should also take the relevant groups into account.

5.8.2 Has the relevant data been appropriately extracted and presented in the evidence statements?

Equalities evidence should be considered during the drafting of reviews. It should be included in the data extraction process and should appear in the summary evidence statements.

5.8.3 What is the state of the evidence base?

This question aims to identify if there are any gaps in the evidence in relation to inequalities. It also aims to identify if the evidence has uncovered gaps in the scope of the guidance in relation to inequalities.

5.9 References and further reading

AGREE Collaboration (2003) Development and validation of an international appraisal instrument for assessing the quality of clinical practice guidelines: the AGREE project. Quality and Safety in Health Care 12 (1):18–23

Barroso J (2000) Meta-synthesis of qualitative research on living with HIV infection. Qualitative Health Research 10 (3): 340–53

Bowling A (2002) Research methods in health: Investigating health and health services. Buckingham: Open University Press

Brouwers MC, Kho ME, Browman GP, Burgers JS, Cluzeau F, Feder G, et al. AGREE II: advancing guideline development, reporting and evaluation in health care. J Clin Epidemiol 2010 Dec;63(12):1308–11

Centre for Reviews and Dissemination (2009) Systematic reviews: CRD's guidance for undertaking reviews in health care. Centre for Reviews and Dissemination, University of York

Dixon-Woods M, Agarwal S, Young B et al. (2004) Integrative approaches to qualitative and quantitative evidence. London: Health Development Agency

Drummond MF, O'Brien B, Stoddart GL et al. (1997) Critical assessment of economic evaluation. In: Methods for the economic evaluation of health care programmes. Oxford: Oxford Medical Publications

Eccles M, Mason J (2001) How to develop cost-conscious guidelines. Health Technology Assessment 5: 16

Edwards P, Clarke M, DiGuiseppi C et al. (2002) Identification of randomized trials in systematic reviews: accuracy and reliability of screening records. Statistics in Medicine 21: 1635–40

Egger M, Davey Smith G, Altman DG (2000) Systematic reviews in health care: meta-analysis in context. London: British Medical Journal Books

Glaser BG, Strauss AL (1967) The discovery of grounded theory: strategies for qualitative research. New York: Aldine de Gruyter

Gough D, Oliver S, Thomas J (2012) (eds) An Introduction to Systematic Reviews, London: Sage

GRADE Working Group (2004) Grading quality of evidence and strength of recommendations. British Medical Journal 328: 490–4

Harden A, Garcia J, Oliver S et al. (2004) Applying systematic review methods to studies of people's views: an example from public health research. Journal of Epidemiology and Community Health 58: 794–800

Higgins JPT, Green S, editors. Cochrane handbook for systematic reviews of interventions, version 5.1.0 (updated March 2011). The Cochrane Collaboration

Jackson N, Waters E for the Guidelines for systematic reviews of health promotion and public health interventions taskforce (2005) Guidelines for systematic reviews of health promotion and public health interventions. Australia: Deakin University

Johnson JA, Biegel DE, Shafran R (2000) Concept mapping in mental health: uses and adaptations. Evaluation and Programme Planning 23 (1): 67–75

Kelly MP, Swann C, Morgan A et al. (2002) Methodological problems in constructing the evidence base in public health. London: Health Development Agency

Khan KS, Kunz R, Kleijnen J et al. (2003) Systematic reviews to support evidence-based medicine. How to review and apply findings of healthcare research. London: Royal Society of Medicine Press

National Collaborating Centre for Methods and Tools (2011). AMSTAR: assessing methodological quality of systematic reviews. Hamilton, ON: McMaster University

Noblit G, Hare RD (1988) Meta-ethnography: synthesising qualitative studies. London: Sage

Ogilvie D, Hamilton V, Egan M et al. (2005) Systematic reviews of health effects of social interventions: 1. Finding the evidence: how far should you go? Journal of Epidemiology and Community Health 59: 804–8

Ogilvie D, Egan M, Hamilton V et al. (2005) Systematic reviews of health effects of social interventions: 2. Best available evidence: how low should you go? Journal of Epidemiology and Community Health 59: 886–92

Ogilvie D, Fayter D, Petticrew M et al. (2008) The harvest plot: a method for synthesising evidence about the differential effects of interventions. BMC Medical Research Methodology 8: 8

Oxford Centre for Evidence-Based Medicine. Levels of evidence and grades of recommendation.

Petticrew M (2003) Why certain systematic reviews reach uncertain conclusions. British Medical Journal 326: 756–8

Petticrew M, Roberts H (2003) Evidence, hierarchies, and typologies: horses for courses. Journal of Epidemiology and Community Health 57: 527–9

Popay J, editor (2005) Moving beyond effectiveness in evidence synthesis: methodological issues in the synthesis of diverse sources of evidence. London: National Institute for Health and Clinical Excellence

Popay J, Rogers A, Williams G (1998) Rationale and standards for the systematic review of qualitative literature in health services research. Qualitative Health Research 8 (3): 341–51

Ring N, Jepson R and Ritchie K (2011). Methods of synthesizing qualitative research studies for health technology assessment. International Journal of Technology Assessment in Health Care, 27: 384–390

Rychetnik L, Frommer M, Hawe P et al. (2002) Criteria for evaluating evidence on public health interventions. Journal of Epidemiology and Community Health 56: 119

Scottish Intercollegiate Guidelines Network (2002) SIGN 50. A guideline developer's handbook. Edinburgh: Scottish Intercollegiate Guidelines Network

Sutton AJ, Jones DR, Abrams KR et al. (2000) Methods for meta-analysis in medical research. London: John Wiley

Swann C, Falce C, Morgan A et al. (2005) HDA evidence base: process and quality standards for evidence briefings. London: Health Development Agency

Tooth L, Ware R, Bain C et al. (2005) Quality of reporting of observational longitudinal research. American Journal of Epidemiology 161 (3): 280–8

Tugwell P, Pettigrew M, Kristjansson E et al. (2010) Assessing equity in systematic reviews: realising the recommendations of the Commission on the Social Determinants of Health. British Medical Journal 341: 4739

Turner RM, Spiegelhalter DJ, Smith GC, Thompson SG. 2009. Bias modelling in evidence synthesis. Journal of the Royal Statistical Society, Series A (Statistics in Society) 172: 21–47

Victora C, Habicht J, Bryce J (2004) Evidence-based public health: moving beyond randomized trials. American Journal of Public Health 94 (3): 400–5

Weightman A, Ellis S, Cullum A et al. (2005) Grading evidence and recommendations for public health interventions: developing and piloting a framework. London: Health Development Agency

[4] Many other statistical and data presentation packages could be used to draw comparable summary graphs.

[5] Note that there is free Cochrane software called Review Manager, used for meta-analyses – see www.cc-ims.net/RevMan.

[6] Note that 'no evidence' is different to 'evidence of no effect'.

[7] Note these have been adapted from the original and are for illustrative purposes only.