NICE process and methods

Appendix G Quality appraisal checklist – quantitative studies reporting correlations and associations

Appendix G Quality appraisal checklist – quantitative studies reporting correlations and associations

A correlates review (see section 3.3.4) attempts to establish the factors that are associated or correlated with positive or negative health behaviours or outcomes. Evidence for correlate reviews will come both from specifically designed correlation studies and other study designs that also report on correlations.

This checklist[15] has been developed for assessing the validity of studies reporting correlations. It is based on the appraisal step of the 'Graphical appraisal tool for epidemiological studies (GATE)', developed by Jackson et al. (2006).

This checklist enables a reviewer to appraise a study's internal and external validity after addressing the following key aspects of study design: characteristics of study participants; definition of independent variables; outcomes assessed and methods of analyses.

Like GATE, this checklist is intended to be used in an electronic (Excel) format that will facilitate both the sharing and storage of data, and through linkage with other documents, the compilation of research reports. Much of the guidance to support the completion of the critical appraisal form that is reproduced below also appears in 'pop-up' windows in the electronic version[16].

There are 5 sections of the revised GATE. Section 1 seeks to assess the key population criteria for determining the study's external validity – that is, the extent to which the findings of a study are generalisable beyond the confines of the study to the study's source population.

Sections 2 to 4 assess the key criteria for determining the study's internal validity – that is, making sure that the study has been carried out carefully, and that the identified associations are valid and are not due to some other (often unidentified) factor.

Checklist items are worded so that 1 of 5 responses is possible:

++

Indicates that for that particular aspect of study design, the study has been designed or conducted in such a way as to minimise the risk of bias.

+

Indicates that either the answer to the checklist question is not clear from the way the study is reported, or that the study may not have addressed all potential sources of bias for that particular aspect of study design.

Should be reserved for those aspects of the study design in which significant sources of bias may persist.

Not reported (NR)

Should be reserved for those aspects in which the study under review fails to report how they have (or might have) been considered.

Not applicable (NA)

Should be reserved for those study design aspects that are not applicable given the study design under review (for example, allocation concealment would not be applicable for case–control studies).

In addition, the reviewer is requested to complete in detail the comments section of the quality appraisal form so that the grade awarded for each study aspect is as transparent as possible.

Each study is then awarded an overall study quality grading for internal validity (IV) and a separate one for external validity (EV):

  • ++ All or most of the checklist criteria have been fulfilled, where they have not been fulfilled the conclusions are very unlikely to alter.

  • + Some of the checklist criteria have been fulfilled, where they have not been fulfilled, or not adequately described, the conclusions are unlikely to alter.

  • – Few or no checklist criteria have been fulfilled and the conclusions are likely or very likely to alter.

Checklist

Study identification: Include full citation details

 

Study design:

  • Refer to the glossary of study designs (appendix D) and the algorithm for classifying experimental and observational study designs (appendix E) to best describe the paper's underpinning study design

Guidance topic:

Assessed by:

Section 1: Population

1.1 Is the source population or source area well described?

  • Was the country (e.g. developed or non-developed, type of health care system), setting (primary schools, community centres etc), location (urban, rural), population demographics etc adequately described?

++

+

NR

NA

Comments:

1.2 Is the eligible population or area representative of the source population or area?

  • Was the recruitment of individuals, clusters or areas well defined (e.g. advertisement, birth register)?

  • Was the eligible population representative of the source? Were important groups underrepresented?

++

+

NR

NA

Comments:

1.3 Do the selected participants or areas represent the eligible population or area?

  • Was the method of selection of participants from the eligible population well described?

  • What % of selected individuals or clusters agreed to participate? Were there any sources of bias?

  • Were the inclusion or exclusion criteria explicit and appropriate?

++

+

NR

NA

Comments:

Section 2: Method of selection of exposure (or comparison) group

2.1 Selection of exposure (and comparison) group. How was selection bias minimised?

  • How was selection bias minimised?

++

+

NR

NA

Comments:

2.2 Was the selection of explanatory variables based on a sound theoretical basis?

  • How sound was the theoretical basis for selecting the explanatory variables?

++

+

NR

NA

Comments:

2.3 Was the contamination acceptably low?

  • Did any in the comparison group receive the exposure?

  • If so, was it sufficient to cause important bias?

++

+

NR

NA

Comments:

2.4 How well were likely confounding factors identified and controlled?

  • Were there likely to be other confounding factors not considered or appropriately adjusted for?

  • Was this sufficient to cause important bias?

++

+

NR

NA

Comments:

2.5 Is the setting applicable to the UK?

  • Did the setting differ significantly from the UK?

++

+

NR

NA

Comments:

Section 3: Outcomes

3.1 Were the outcome measures and procedures reliable?

  • Were outcome measures subjective or objective (e.g. biochemically validated nicotine levels ++ vs self-reported smoking −)?

  • How reliable were outcome measures (e.g. inter- or intra-rater reliability scores)?

  • Was there any indication that measures had been validated (e.g. validated against a gold standard measure or assessed for content validity)?

++

+

NR

NA

Comments:

3.2 Were the outcome measurements complete?

  • Were all or most of the study participants who met the defined study outcome definitions likely to have been identified?

++

+

NR

NA

Comments:

3.3 Were all the important outcomes assessed?

  • Were all the important benefits and harms assessed?

  • Was it possible to determine the overall balance of benefits and harms of the intervention versus comparison?

++

+

NR

NA

Comments:

3.4 Was there a similar follow-up time in exposure and comparison groups?

  • If groups are followed for different lengths of time, then more events are likely to occur in the group followed-up for longer distorting the comparison.

  • Analyses can be adjusted to allow for differences in length of follow-up (e.g. using person-years).

++

+

NR

NA

Comments:

3.5 Was follow-up time meaningful?

  • Was follow-up long enough to assess long-term benefits and harms?

  • Was it too long, e.g. participants lost to follow-up?

++

+

NR

NA

Comments:

Section 4: Analyses

4.1 Was the study sufficiently powered to detect an intervention effect (if one exists)?

  • A power of 0.8 (i.e. it is likely to see an effect of a given size if one exists, 80% of the time) is the conventionally accepted standard.

  • Is a power calculation presented? If not, what is the expected effect size? Is the sample size adequate?

++

+

NR

NA

Comments:

4.2 Were multiple explanatory variables considered in the analyses?

  • Were there sufficient explanatory variables considered in the analysis?

++

+

NR

NA

Comments:

4.3 Were the analytical methods appropriate?

  • Were important differences in follow-up time and likely confounders adjusted for?

++

+

NR

NA

Comments:

4.6 Was the precision of association given or calculable? Is association meaningful?

  • Were confidence intervals or p values for effect estimates given or possible to calculate?

  • Were CIs wide or were they sufficiently precise to aid decision-making? If precision is lacking, is this because the study is under-powered?

++

+

NR

NA

Comments:

Section 5: Summary

5.1 Are the study results internally valid (i.e. unbiased)?

  • How well did the study minimise sources of bias (i.e. adjusting for potential confounders)?

  • Were there significant flaws in the study design?

++

+

Comments:

5.2 Are the findings generalisable to the source population (i.e. externally valid)?

  • Are there sufficient details given about the study to determine if the findings are generalisable to the source population?

  • Consider: participants, interventions and comparisons, outcomes, resource and policy implications.

++

+

Comments:



[15] Appraisal form derived from: Jackson R, Ameratunga S, Broad J et al. (2006) The GATE frame: critical appraisal with pictures. Evidence Based Medicine 11: 35–8.

[16] Available from CPHE on request.