6 Reviewing the evidence

Download (PDF)

Studies identified during literature searches (see chapter 5) need to be reviewed to identify the most appropriate data to help address the review questions, and to ensure that the guideline recommendations are based on the best available evidence. A systematic review process should be used that is explicit and transparent. This involves five major steps:

writing the review protocol (see section 4.4)
selecting relevant studies
assessing their quality
synthesising the results
interpreting the results.

The process of selecting relevant studies is common to all systematic reviews; the other steps are discussed below in relation to the major types of questions. The same rigour should be applied to reviewing fully and partially published studies, as well as unpublished data supplied by stakeholders.

6.1 Selecting relevant studies

The study selection process for clinical studies and economic evaluations should be clearly documented, giving details of the inclusion and exclusion criteria that were applied.

6.1.1 Clinical studies

Before acquiring papers for assessment, the information specialist or systematic reviewer should sift the evidence identified in the search in order to discard irrelevant material. First, the titles of the retrieved citations should be scanned and those that fall outside the topic of the guideline should be excluded. A quick check of the abstracts of the remaining papers should identify those that are clearly not relevant to the review questions and hence can be excluded.

Next, the remaining abstracts should be scrutinised against the inclusion and exclusion criteria agreed by the Guideline Development Group (GDG). Abstracts that do not meet the inclusion criteria should be excluded. Any doubts about inclusion should be resolved by discussion with the GDG before the results of the study are considered. Once the sifting is complete, full versions of the selected studies can be acquired for assessment. Studies that fail to meet the inclusion criteria once the full version has been checked should be excluded; those that meet the criteria can be assessed. Because there is always a potential for error and bias in selecting the evidence, double sifting (that is, sifting by two people) of a random selection of abstracts should be performed periodically (Edwards et al. 2002).

6.1.2 Conference abstracts

Conference abstracts can be a good source of information in systematic reviews. For example, conference abstracts can be important in pointing to published trials that may be missed, in estimating the amount of not-fully-published evidence (and hence guiding calls for evidence and judgements about publication bias), or in identifying ongoing trials that are due to be published. These sources of information are important in interpreting systematic reviews, and so conference abstracts should not be excluded in the search strategy.

However, the following should be considered when deciding whether to include conference abstracts as a source of evidence:

Conference abstracts on their own seldom have sufficient information to allow confident judgements to be made about the quality and results of a study.
It could be very time consuming to trace the original studies or additional data relating to the conference abstracts, and the information found may not always be useful.

Therefore:

If sufficient evidence has been identified from full published studies, it may be reasonable not to trace the original studies or additional data related to conference abstracts.
If there is a lack of or limited evidence identified from full published studies, the systematic reviewer may consider an additional process for tracing the original studies or additional data relating to the conference abstracts, in order to allow full critical appraisal and to make judgements on their inclusion in or exclusion from the systematic review.

6.1.3 Economic evaluations

The process for sifting and selecting economic evaluations for assessment is essentially the same as for clinical studies. Consultation between the information specialist, the health economist and the systematic reviewer is essential when deciding the inclusion criteria; these decisions should be discussed and agreed with the GDG. The review should be targeted to identify the papers that are most relevant to current NHS practice and hence likely to inform GDG decision-making. The review should also usually focus on 'full' economic evaluations that compare both the costs and health consequences of the alternative interventions and any services under consideration.

Inclusion criteria for filtering and selection of papers for review by the health economist should specify relevant populations and interventions for the review question. They should also specify the following:

An appropriate date range, as older studies may reflect outdated practices.
The country or setting, as studies conducted in other healthcare systems might not be relevant to the NHS. In some cases it may be appropriate to limit consideration to UK-based or OECD (Organisation for Economic Co-operation and Development) studies.
The type of economic evaluation. This may include cost–utility, cost–benefit, cost-effectiveness, cost-minimisation or cost–consequence analyses. Non-comparative costing studies, 'burden of disease' studies and 'cost of illness' studies should usually be excluded.

6.2 Questions about interventions

These questions concern the relative effects of an intervention, as described in section 4.3.1. The consideration of cost effectiveness is integral to the process of reviewing evidence and making recommendations about interventions. However, the quality criteria and ways of summarising the data are slightly different from those for clinical effectiveness, so these are discussed in separate subsections.

6.2.1 Assessing study quality for clinical effectiveness

Study quality can be defined as the degree of confidence about the estimate of a treatment effect.

The first stage is to determine the study design so that the appropriate criteria can be applied in the assessment. A study design checklist can be obtained from the Cochrane handbook for systematic reviews of interventions (Higgins and Green 2011). Tables 13.2.a and 13.2.b in the Cochrane handbook are lists of study design features for studies with allocation to interventions at the individual and group levels respectively, and box 13.4.a provides useful notes for completing the checklist.

Once a study has been classified, it should be assessed using the methodology checklist for that type of study (see appendices B–E). To minimise errors and any potential bias in the assessment, two reviewers should independently assess the quality of a random selection of studies. Any differences arising from this should be discussed fully at a GDG meeting.

The quality of a study can vary depending on which of its measured outcomes is being considered. Well-conducted randomised controlled trials (RCTs) are more likely than non-randomised studies to produce similar comparison groups, and are therefore particularly suited to estimating the effects of interventions. However, short-term outcomes may be less susceptible to bias than long-term outcomes because of greater loss to follow-up with the latter. It is therefore important when summarising evidence that quality is considered according to outcome.

6.2.1.1 The GRADE (Grading of Recommendations Assessment, Development and Evaluation) approach to assessing the quality of evidence

The GRADE approach for questions about interventions has been used in the development of NICE clinical guidelines since 2009. For more details about GRADE, see the Journal of Clinical Epidemiology series, appendix K and the GRADE working group website.

GRADE is a system developed by an international working group for rating the quality of evidence across outcomes in systematic reviews and guidelines; it can also be used to grade the strength of recommendations in guidelines. The system is designed for reviews and guidelines that examine alternative management strategies or interventions, and these may include no intervention or current best management. The key difference from other assessment systems is that GRADE rates the quality of evidence for a particular outcome across studies and does not rate the quality of individual studies.

In order to apply GRADE, the evidence must clearly specify the relevant setting, population, intervention, comparator(s) and outcomes.

Before starting an evidence review, the GDG should apply an initial rating to the importance of outcomes, in order to identify which outcomes of interest are both 'critical' to decision-making and 'important' to patients. This rating should be confirmed or, if absolutely necessary, revised after completing the evidence review.

Box 6.1 summarises the GRADE approach to rating the quality of evidence.

Box 6.1 The GRADE approach to assessing the quality of evidence for intervention studies

In the GRADE system, the following features are assessed for the evidence found for each 'critical' and each 'important' outcome from a systematic review:

study limitations (risk of bias): assessing the 'internal validity' of the evidence
inconsistency: assessing heterogeneity or variability in the estimates of treatment effect across studies
indirectness: assessing the degree of differences between the population, intervention, comparator for the intervention and outcome of interest
imprecision (random error): assessing the extent to which confidence in the effect estimate is adequate to support a particular decision
publication bias: assessing the degree of selective publication of studies.

Other considerations (for observational studies only):

effect size
effect of all plausible confounding
evidence of a dose–response relationship.

The quality of evidence is classified as high, moderate, low or very low (see GRADE website for definitions).

The approach taken by NICE differs from the standard GRADE system in two ways:

It also integrates a review of the quality of cost-effectiveness studies.
It has no 'overall summary' labels for the quality of the evidence across all outcomes or for the strength of a recommendation, but uses the wording of recommendations to reflect the strength of the recommendation (see section 9.3.3).

6.2.2 Summarising and presenting results for clinical effectiveness

Characteristics of data should be extracted to a standard template for inclusion in an evidence table (see appendix J1). Evidence tables help to identify the similarities and differences between studies, including the key characteristics of the study population and interventions or outcome measures. This provides a basis for comparison.

Meta-analysis may be needed to pool treatment estimates from different studies. Recognised approaches to meta-analysis should be used, as described in the manual from NHS Centre for Reviews and Dissemination (2009) and in Higgins and Green (2011).

The body of evidence addressing a question should then be presented within the text of the full guideline as an evidence profile as described in the GRADE system (see appendix K). GRADEpro software can be used to prepare these profiles. Evidence profiles contain a 'quality assessment' section that summarises the quality of the evidence and a 'summary of findings' section that presents the outcome data for each critical and each important clinical outcome. The 'summary of findings' section includes a limited description of the quality of the evidence and can be presented alone in the text of the guideline (in which case full GRADE profiles should be presented in an appendix).

Short evidence statements for outcomes should be presented after the GRADE profiles, summarising the key features of the evidence on clinical effectiveness (including adverse events as appropriate) and cost effectiveness. The evidence statements should include the number of studies and participants, the quality of the evidence and the direction of estimate of the effect (see box 6.2 for examples of evidence statements). An evidence statement may be needed even if no evidence is identified for a critical or important outcome. Evidence statements may also note the presence of relevant ongoing research.

Box 6.2 Examples of evidence statements

Prostaglandin analogues versus beta-blockers for glaucoma – from Glaucoma (NICE clinical guideline 85; 2009):

Moderate quality evidence from 12 studies with several thousand patients, showed that prostaglandin analogues are more effective than beta-blockers in reducing IOP from baseline at 6 to 36 months follow up, but the effect size is too small to be clinically effective.

Rehabilitation strategies/programmes after critical illness – from Rehabilitation after critical illness (NICE clinical guideline 83; 2009):

One study with 126 patients presented moderate quality evidence that a 6-week supported self-help rehabilitation manual improved the recovery of patients' physical function 8 weeks and 6 months after ICU discharge.

Delayed versus immediate antibiotic prescribing strategy for acute otitis media – from Respiratory tract infections – antibiotic prescribing (NICE clinical guideline 69; 2008):

Three studies with 773 children, presented high quality evidence that a delayed strategy reduced the consumption of antibiotics by 63% compared with an immediate prescribing strategy.

6.2.3 Indirect treatment comparisons and mixed treatment comparisons

NICE has a preference for data from head-to-head RCTs, and these should be presented in the reference case analysis if available. However, there may be situations when data from head-to-head studies of the options (and/or comparators) of interest are not available. In these circumstances, indirect treatment comparison analyses should be considered.

An 'indirect treatment comparison' refers to the synthesis of data from trials in which the interventions of interest have been compared indirectly using data from a network of trials that compare the interventions with other interventions. A 'mixed treatment comparison' refers to an analysis that includes both trials that compare the interventions of interest head-to-head and trials that compare them indirectly.

The principles of good practice for systematic reviews and meta-analyses should be carefully followed when conducting indirect treatment comparisons or mixed treatment comparisons. The rationale for identifying and selecting the RCTs should be explained, including the rationale for selecting the treatment comparisons that have been included. A clear description of the methods of synthesis is required. The methods and results of the individual trials should be documented. If there is doubt about the relevance of particular trials, a sensitivity analysis in which these trials are excluded should also be presented. The heterogeneity between the results of pairwise comparisons and inconsistencies between the direct and indirect evidence on the interventions should be reported.

There may be circumstances in which data from head-to-head RCTs are less than ideal (for example, the sample size may be small or there may be concerns about the external validity). In such cases, additional evidence from mixed treatment comparisons can be considered. In these cases, mixed treatment comparisons should be presented separately from the reference-case analysis and a rationale for their inclusion provided. Again, the principles of good practice apply.

When multiple options are being appraised, data from RCTs (when available) that compare each of the options head-to-head should be presented in a series of pairwise comparisons. Consideration may be given to presenting an additional analysis using a mixed treatment comparison framework.

When evidence is combined using indirect or mixed treatment comparison frameworks, trial randomisation should be preserved. A comparison of the results from single treatment arms from different randomised trials is not acceptable unless the data are treated as observational and appropriate steps are taken to adjust for possible bias and increased uncertainty.

Analyses using indirect or mixed treatment comparison frameworks may include comparator interventions (including placebo) that have not been defined in the scope of the guideline if they are relevant to the development of the network of evidence. The rationale for the inclusion and exclusion of comparator interventions should be clearly reported. Again, the principles of good practice apply.

Evidence from a mixed treatment comparison can be presented in a variety of ways. The network of evidence can be presented as tables. It may also be presented diagrammatically as long as the direct and indirect treatment comparisons are clearly identified and the number of trials in each comparison is stated.

If sufficient relevant and valid data are not available to include in meta-analyses of head-to-head trials, or mixed or indirect treatment comparisons, the analysis may have to be restricted to a qualitative overview that critically appraises individual studies and presents their results. The results of this type of analysis should be approached with particular caution.

Further information on evidence synthesis is provided by the technical support documents developed by the NICE Decision Support Unit (DSU).

6.2.4 Assessing study quality for cost effectiveness

Estimates of resource use obtained from clinical studies should be treated like other clinical outcomes and reviewed using the processes described above. Reservations about the applicability of these estimates to routine NHS practice should be noted in the economics evidence profile, in the same way as in a GRADE profile (see section 6.2.1.1), and taken into consideration by the GDG.

However, the criteria for appraising other economic estimates – such as costs, cost-effectiveness ratios and net benefits – are rather different, because these estimates are usually obtained using some form of modelling. In addition to formal decision-analytic models, this includes economic evaluations conducted alongside clinical trials. These usually require some external sources of information (for example, unit costs, health-state valuations or long-term prognostic data) and estimation procedures to predict long-term costs and outcomes. These considerations also apply to relatively simple cost calculations based on expert judgement or on observed resource use and unit cost data.

All economic estimates used to inform guideline recommendations should be appraised using the methodology checklist for economic evaluations (appendix G). This should be used to appraise unpublished economic evaluations, such as studies submitted by stakeholders and academic papers that are not yet published, as well as published papers. The same criteria should be applied to any new economic evaluations conducted for the guideline (see chapter 7).

The checklist (appendix G) includes a section on the applicability of the study to the specific question and the context for NICE decision-making (analogous to the GRADE 'directness' criterion). This checklist is designed to determine whether an economic evaluation provides evidence that is useful to inform GDG decision-making, analogous to the assessment of study limitations in GRADE.

The checklist includes an overall judgement on the applicability of the study to the guideline context, as follows:

Directly applicable – the study meets all applicability criteria, or fails to meet one or more applicability criteria but this is unlikely to change the conclusions about cost effectiveness.
Partially applicable – the study fails to meet one or more applicability criteria, and this could change the conclusions about cost effectiveness.
Not applicable – the study fails to meet one or more applicability criteria, and this is likely to change the conclusions about cost effectiveness. Such studies would usually be excluded from further consideration.

The checklist also includes an overall summary judgement on the methodological quality of economic evaluations, as follows:

Minor limitations – the study meets all quality criteria, or fails to meet one or more quality criteria but this is unlikely to change the conclusions about cost effectiveness.
Potentially serious limitations – the study fails to meet one or more quality criteria, and this could change the conclusions about cost effectiveness.
Very serious limitations – the study fails to meet one or more quality criteria, and this is highly likely to change the conclusions about cost effectiveness. Such studies should usually be excluded from further consideration.

The robustness of the study results to methodological limitations may sometimes be apparent from reported sensitivity analyses. If not, judgement will be needed to assess whether a limitation would be likely to change the results and conclusions.

If necessary, the health technology assessment checklist for decision-analytic models (Philips et al. 2004) may also be used to give a more detailed assessment of the methodological quality of modelling studies.

The judgements that the health economist makes using the checklist for economic evaluations (and the health technology assessment modelling checklist, if appropriate) should be recorded and presented in an appendix to the full guideline. The 'comments' column in the checklist should be used to record reasons for these judgements, as well as additional details about the studies where necessary.

6.2.5 Summarising and presenting results for cost effectiveness

Cost, cost effectiveness or net benefit estimates from published or unpublished studies, or from economic analyses conducted for the guideline, should be presented in an 'economic evidence profile' adapted from the GRADE profile (see appendix K). Whenever a GRADE profile is presented in the full version of a NICE clinical guideline, it should be accompanied by relevant economic information (resource use, costs, cost effectiveness and/or net benefit estimates as appropriate). It should be explicitly stated if economic information is not available or if it is not thought to be relevant to the question.

The economic evidence profile includes columns for the overall assessments of study limitations and applicability described above. There is also a comments column where the health economist can note any particular issues that the GDG should consider when assessing the economic evidence. Footnotes should be used to explain the reasons for quality assessments, as in the standard GRADE profile.

The results of the economic evaluations included should be presented in the form of a best-available estimate or range for the incremental cost, the incremental effect and, where relevant, the incremental cost-effectiveness ratio or net benefit estimate. A summary of the extent of uncertainty about the estimates should also be presented in the economic evidence profile. This should reflect the results of deterministic or probabilistic sensitivity analyses or stochastic analyses of trial data, as appropriate.

Each economic evaluation included should usually be presented in a separate row of the economic evidence profile. If large numbers of economic evaluations of sufficiently high quality and applicability are available, a single row could be used to summarise a number of studies based on shared characteristics; this should be explicitly justified in a footnote.

Inconsistency between the results of economic evaluations will be shown by differences between rows of the economic evidence profile (a separate column examining 'consistency' is therefore unnecessary). The GDG should consider the implications of any unexplained differences between model results when assessing the body of clinical and economic evidence and drawing up recommendations. This includes clearly explaining the GDG's preference for certain results when forming recommendations.

If results are available for two or more patient subgroups, these should be presented in separate economic evidence profile tables or as separate rows within a single table.

Costs and cost-effectiveness estimates should be presented only for the appropriate incremental comparisons – where an intervention is compared with the next most expensive non-dominated option (a clinical strategy is said to 'dominate' the alternatives when it is both more effective and less costly; see section 7.3). If comparisons are relevant only for some groups of the population (for example, patients who cannot tolerate one or more of the other options, or for whom one or more of the options is contraindicated), this should be stated in a footnote to the economic evidence profile table.

A short evidence statement should be presented alongside the GRADE and economic evidence profile tables, summarising the key features of the evidence on clinical and cost effectiveness.

6.3 Questions about diagnosis

Questions about diagnosis are concerned with the performance of a diagnostic test or test strategy (see section 4.3.2). Note that 'test and treat' studies (in which the outcomes of patients who undergo a new diagnostic test in combination with a management strategy are compared with the outcomes of patients who receive the usual diagnostic test and management strategy) should be addressed in the same way as intervention studies (see section 6.2).

6.3.1 Assessing study quality

Studies of diagnostic test accuracy should be assessed using the methodology checklist for QUADAS-2 (Quality Assessment of Studies of Diagnostic Accuracy included in Systematic Reviews) (appendix F). Characteristics of data should be extracted to a standard template for inclusion in an evidence table (see appendix J2). Questions relating to diagnostic test accuracy are usually best answered by cross-sectional studies. Case–control studies can also be used, but these are more prone to bias and often result in inflated estimates of diagnostic test accuracy.

There is currently a lack of empirical evidence about the size and direction of bias contributed by specific aspects of the design and conduct of studies on diagnostic test accuracy. Making judgements about the overall quality of studies can therefore be difficult. Before starting the review, an assessment should be made to determine which quality appraisal criteria (from the QUADAS-2 checklist) are likely to be the most important indicators of quality for the particular question about diagnostic test accuracy being addressed. These criteria will be useful in guiding decisions about the overall quality of individual studies and whether to exclude certain studies, and when summarising and presenting the body of evidence for the question about diagnostic test accuracy as a whole (see section 6.3.2). Clinical input (for example, from a GDG member) may be needed to identify the most appropriate quality criteria.

6.3.2 Summarising and presenting results

No well designed and validated approach currently exists for summarising a body of evidence for studies on diagnostic test accuracy. In the absence of such a system, a narrative summary of the quality of the evidence should be given, based on the quality appraisal criteria from QUADAS-2 (appendix F) that were considered to be most important for the question being addressed (see section 6.3.1).

Numerical summaries of diagnostic test accuracy may be presented as tables to help summarise the available evidence. Meta-analysis of such estimates from different studies is possible, but is not widely used. If this is attempted, relevant published technical advice should be used to guide reviewers.

Numerical summaries and analyses should be followed by a short evidence statement summarising what the evidence shows.

6.4 Questions about prognosis

These questions are described in section 4.3.3.

6.4.1 Assessing study quality

Studies that are reviewed for questions about prognosis should be assessed using the methodology checklist for prognostic studies (appendix I). There is currently a lack of empirical evidence about the size and direction of bias contributed by specific aspects of the design and conduct of studies on prognosis. Making judgements about the overall quality of studies can therefore be difficult. Before starting the review, an assessment should be made to determine which quality appraisal criteria (from the checklist in appendix I) are likely to be the most important indicators of quality for the particular question about prognosis being addressed. These criteria will be useful in guiding decisions about the overall quality of individual studies and whether to exclude certain studies, and when summarising and presenting the body of evidence for the question about prognosis as a whole (see section 6.4.2). Clinical input (for example, from a GDG member) may be needed to identify the most appropriate quality criteria.

6.4.2 Summarising and presenting results

No well designed and validated approach currently exists for summarising a body of evidence for studies on prognosis. A narrative summary of the quality of the evidence should therefore be given, based on the quality appraisal criteria from appendix I that were considered to be most important for the question being addressed (see section 6.4.1). Characteristics of data should be extracted to a standard template for inclusion in an evidence table (see appendix J3).

Results from the studies included may be presented as tables to help summarise the available evidence. Reviewers should be wary of using meta-analysis as a tool to summarise large observational studies, because the results obtained may give a spurious sense of confidence in the study results.

The narrative summary should be followed by a short evidence statement summarising what the evidence shows.

6.5 Using patient experience to inform review questions

These questions are described in section 4.3.4.

6.5.1 Assessing study quality

Studies about patient experience are likely to be qualitative studies or cross-sectional surveys. Qualitative studies should be assessed using the methodology checklist for qualitative studies (appendix H). It is important to consider which quality appraisal criteria from this checklist are likely to be the most important indicators of quality for the specific research question being addressed. These criteria may be helpful in guiding decisions about the overall quality of individual studies and whether to exclude certain studies, and when summarising and presenting the body of evidence for the research question about patient experience as a whole.

There is no methodology checklist for the quality appraisal of cross-sectional surveys. Such surveys should be assessed for the rigour of the process used to develop the questions and their relevance to the population under consideration, and for the existence of significant bias (for example, non-response bias).

6.5.2 Summarising and presenting results

A description of the quality of the evidence should be given, based on the quality appraisal criteria from appendix H that were considered to be the most important for the research question being addressed. If appropriate, the quality of the cross-sectional surveys included should also be summarised.

Consider presenting the quality assessment of included studies in tables (see table 1 in appendix H for an example). Methods to synthesise qualitative studies (for example, meta-ethnography) are evolving, but the routine use of such methods in guidelines is not currently recommended.

The narrative summary should be followed by a short evidence statement summarising what the evidence shows. Characteristics of data should be extracted to a standard template for inclusion in an evidence table (see appendix J4).

6.6 Published guidelines

Relevant published guidelines from other organisations may be identified in the search for evidence. These should be assessed for quality using the AGREE II^[10] (Appraisal of Guidelines Research and Evaluation II) instrument (Brouwers et al. 2010) to ensure that they have sufficient documentation to be considered. There is no cut-off point for accepting or rejecting a guideline, and each GDG will need to set its own parameters. These should be documented in the methods section of the full guideline, along with a summary of the assessment. The results should be presented as an appendix to the full guideline.

Reviews of evidence from other guidelines that cover questions formulated by the GDG may be considered as evidence if:

they are assessed using the appropriate methodology checklist from this manual and are judged to be of high quality
they are accompanied by an evidence statement and evidence table(s)
the evidence is updated according to the methodology for exceptional updates of NICE clinical guidelines (see section 14.4).

The GDG should create its own evidence summaries or statements. Evidence tables from other guidelines should be referenced with a direct link to the source website or a full reference of the published document. The GDG should formulate its own recommendations, taking into consideration the whole body of evidence.

Recommendations from other guidelines should not be quoted verbatim, except for recommendations from NHS policy or legislation (for example, Health and Social Care Act 2008).

6.7 Further reading

Altman DG (2001) Systematic reviews of evaluations of prognostic variables. British Medical Journal 323: 224–8

Balshem H, Helfand M, Schünemann HJ et al. (2011) GRADE guidelines: 3. Rating the quality of evidence. Journal of Clinical Epidemiology 64: 401–6

Brouwers M, Kho ME, Browman GP et al. for the AGREE Next Steps Consortium (2010) AGREE II: advancing guideline development, reporting and evaluation in healthcare. Canadian Medical Association Journal 182: E839–42

Centre for Reviews and Dissemination (2009) Systematic reviews: CRD's guidance for undertaking reviews in health care. University of York: Centre for Reviews and Dissemination

Chiou CF, Hay JW, Wallace JF et al. (2003) Development and validation of a grading system for the quality of cost-effectiveness studies. Medical Care 41: 32–44

Drummond MF, O'Brien B, Stoddart GL et al. (1997) Critical assessment of economic evaluation. In: Methods for the economic evaluation of health care programmes, 2nd edition. Oxford: Oxford Medical Publications

Eccles M, Mason J (2001) How to develop cost-conscious guidelines. Health Technology Assessment 5: 1–69

Edwards P, Clarke M, DiGuiseppi C et al. (2002) Identification of randomized trials in systematic reviews: accuracy and reliability of screening records. Statistics in Medicine 21: 1635–40

Evers SMAA, Goossens M, de Vet H et al. (2005) Criteria list for assessment of methodological quality of economic evaluations: Consensus on Health Economic Criteria. International Journal of Technology Assessment in Health Care 21: 240–5

Guyatt GH, Oxman AD, Schünemann HJ et al. (2011) GRADE guidelines: a new series of articles in the Journal of Clinical Epidemiology. Journal of Clinical Epidemiology 64: 380–2

Guyatt GH, Oxman AD, Akl EA et al. (2011) GRADE guidelines: 1. Introduction – GRADE evidence profiles and summary of findings tables. Journal of Clinical Epidemiology 64: 383–94

Guyatt GH, Oxman AD, Kunz R et al. (2011) GRADE guidelines: 2. Framing the question and deciding on important outcomes. Journal of Clinical Epidemiology 64: 395–400

Guyatt GH, Oxman AD, Vist G et al. (2011) GRADE guidelines: 4. Rating the quality of evidence – study limitations (risk of bias). Journal of Clinical Epidemiology 64: 407–15

Guyatt GH, Oxman AD, Montori V et al. (2011) GRADE guidelines 5: Rating the quality of evidence – publication bias. Journal of Clinical Epidemiology 64: 1277–82

Guyatt G, Oxman AD, Kunz R et al. (2011) GRADE guidelines 6: Rating the quality of evidence – imprecision. Journal of Clinical Epidemiology 64: 1283–93

Guyatt GH, Oxmand AD, Kunz R et al. (2011) GRADE guidelines 7: Rating the quality of evidence – inconsistency. Journal of Clinical Epidemiology 64: 1294–302

Guyatt GH, Oxman AD, Kunz R et al. (2011) GRADE guidelines 8: Rating the quality of evidence – indirectness. Journal of Clinical Epidemiology 64: 1303–10

Guyatt GH, Oxman AD, Sultan S et al. (2011) GRADE guidelines 9: Rating up the quality of evidence. Journal of Clinical Epidemiology 64: 1311–6

Harbord RM, Deeks JJ, Egger M et al. (2007) A unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics 8: 239–51

Higgins JPT, Green S, editors (2011) Cochrane handbook for systematic reviews of interventions. Version 5.1.0 (updated March 2011) [online]

Khan KS, Kunz R, Kleijnen J et al. (2003) Systematic reviews to support evidence-based medicine. How to review and apply findings of healthcare research. London: Royal Society of Medicine Press

Oxman AD, Guyatt GH (1992) A consumer's guide to subgroup analyses. Annals of Internal Medicine 116: 78–84

Philips Z, Ginnelly L, Sculpher M et al. (2004) Review of guidelines for good practice in decision-analytic modelling in health technology assessment. Health Technology Assessment 8: iii–iv, ix–xi, 1–158

Schünemann HJ, Best D, Vist G et al. for the GRADE Working Group (2003) Letters, numbers, symbols and words: how to communicate grades of evidence and recommendations. Canadian Medical Association Journal 169: 677–80

Schünemann HJ, Oxman AD, Brozek J et al. for the GRADE Working Group (2008) Grading quality of evidence and strength of recommendations for diagnostic tests and strategies. British Medical Journal 336: 1106–10

Scottish Intercollegiate Guidelines Network (2008) SIGN 50. A guideline developer's handbook, revised edition, January. Edinburgh: Scottish Intercollegiate Guidelines Network

Sharp SJ, Thompson SG (2000) Analysing the relationship between treatment effect and underlying risk in meta-analysis: comparison and development of approaches. Statistics in Medicine 19: 3251–74

Whiting PF, Rutjes AWS, Westwood ME et al. and the QUADAS-2 group (2011) QUADAS-2: A revised tool for the quality assessment of diagnostic accuracy studies. Annals of Internal Medicine 155: 529–36

^[10] For more details about AGREE II, see the AGREE Enterprise website.