Conduct of quantitative real-world evidence studies

Download (PDF)

Key messages

Transparent and reproducible generation of real-world evidence is essential to improve trust in the evidence and enable reviewers to critically appraise studies.
The following principles underpin the conduct of real-world evidence studies:
- Ensure data is of good provenance, relevant and of sufficient quality to answer the research question.
- Generate evidence in a transparent way and with integrity from study planning through to study conduct and reporting.
- Use analytical methods that minimise the risk of bias and characterise uncertainty.
The required level of evidence may depend on the application and various contextual factors (see the section on considerations for the quality and acceptability of real-world evidence). Users should refer to relevant NICE manuals for further information on how recommendations are made.

Introduction

Principles for evidence generation

This section describes NICE's preferred approaches for planning, conducting and reporting real-world evidence studies.

The following principles underpin the conduct of all real-world evidence studies:

Ensure data is of good and known provenance, relevant and of sufficient quality to answer the research question.
Generate evidence in a transparent way and with integrity from study planning through to study conduct and reporting.
Use analytical methods that minimise the risk of bias and characterise uncertainty.

The focus here is currently on real-world evidence studies of quantitative data. However, several aspects of planning, conducting and reporting that we describe are also applicable to qualitative studies. For aspects that differ, recognised methods of collecting, analysing, and presenting qualitative evidence should be applied, as outlined in appendix 4.

Patients should be consulted throughout all aspects of study planning and conduct.

Considerations for the quality and acceptability of real-world evidence

All studies should aim for the highest level of transparency and rigour. However, the large number and variety of real-world evidence studies that can inform a single piece of guidance means there may be reasonable trade-offs between the extent of analysis and reporting and the context of use, including:

the contribution of the study to the final recommendation
the impact of the recommendation on health and system outcomes
other contextual factors.

The contribution of a particular type of evidence will vary across applications depending on the key drivers of uncertainty (that is, the evidence gap). For instance, in oncology, assumptions around long-term outcomes such as overall survival and the applicability of global trials to the NHS are often key (Morrell et al. 2018). In cost-effectiveness or cost-comparison models, a number of different parameters could be important determinants of cost effectiveness including event incidence, prevalence, natural history of disease, test performance, costs or quality of life.

In general, non-randomised studies of clinical effects will need higher levels of rigour and transparency than simple characterisation studies. Estimates of clinical effectiveness are usually a key driver of recommendations and non-randomised studies can be at risk of bias.

The contextual factors that influence the acceptability of evidence include the level of decision uncertainty, disease prevalence, impact on health inequalities and the possibility of generating high-quality evidence. Users should refer to the relevant NICE manual for further information on how recommendations are made (see the section on NICE guidance).

High-quality real-world evidence may be more difficult to generate in certain contexts. These include for rare diseases, and some medical devices (including digital health technologies), interventional procedures or other complex interventions. Conducting randomised controlled trials may also be challenging in these contexts (see the section on uses and challenges of randomised controlled trials).

Common challenges in the evaluation of medical devices and interventional studies using real-world data include:

limited integrated national data collections of medical device use and outcomes
lack of granularity in many routinely collected data sources to identify specific devices (and unique device identifiers) or procedures
identifying appropriate comparators, changes to technologies over time and learning effects.

These challenges are not universal and there are ongoing improvements to the availability of high-quality data collections for medical devices and procedures including registries and electronic health record systems. When possible, the highest quality data should be used.

Common challenges in rare diseases include:

a lack of systematic identification of the target population
small sample sizes or the need to combine multiple sources of data with different data models and data collection processes
a lack of agreed common data elements
substantial variation in natural history of disease
complex treatment pathways.

Study planning

Defining the research question

Evidence developers should clearly specify their research question irrespective of the study design. While the specific elements of the research question will vary, the following are common to many study designs:

conceptual definitions of key study variables including, as relevant, population eligibility criteria, interventions or exposures, outcomes (patient or system outcome) and covariates (including confounders and effect modifiers)
subgroups, including specifying whether the subgroup categories are validated or commonly used in the relevant area of research
the target quantity that is to be estimated, for example, disease prevalence or average effect of adhering to an intervention on overall survival.

Patient outcomes should reflect how a patient feels, functions or how long a patient lives. This includes clinical outcomes such as survival as well as patient-reported outcomes. Outcomes should be reliable and valid for the context of use. Choice of outcomes may be supported by high-quality core outcome sets such as those listed in the Core Outcome Measures in Effectiveness Trials (COMET) database.

The target quantity to be estimated should address the overall research question of interest. For example, prevalence can reflect the quantity of a population who might need access to services at a point in time. It represents a function of incidence and duration of the condition; this may be useful for public health planning. Incidence captures rates of events across different subgroups or those with different exposures but assumes a constant rate across defined time intervals. Plausibility of the average rate should therefore be considered.

For non-randomised studies of comparative effects, developers should provide clear justification for the study, considering reasons for the absence of randomised evidence, the limitations of existing trials and the ability to produce robust real-world evidence for the research question.

Planning study conduct

Developers should aim to pre-specify as much of the study plan as possible. Protocols should describe the objectives of the study, data identification or collection, data curation, study design and analytical methods for all pre-planned analyses including subgroup and sensitivity analyses. We recognise that the complexity of data curation in many real-world evidence studies means not all analytical decisions can be pre-specified. When decisions will be driven by the data, these should be clearly described and planned approaches justified. The Harmonized protocol template to enhance reproducibility (HARPER) tool provides a protocol structure for supporting transparent and reproducible real-world studies of comparative effects.

Planning studies before conduct improves the quality of studies and can reduce the risk of developers performing multiple analyses and selecting those producing the most favourable results.

Pre-specifying analysis plans is especially important for studies of comparative effects. For such studies, we encourage publishing the study protocol on a publicly accessible platform, with any changes to the protocol registered and justified. We do not recommend a specific platform but options include the ClinicalTrials.gov database, the International Standard Randomised Controlled Trial Number (ISRCTN) registry, the HMA-EMA Catalogues of real-world data sources and studies (replaces European Union electronic Register of Post-Authorisation Studies [EU-PAS]), and the Open Science Framework (OSF). Some of these databases are currently more suited to real-world evidence studies than others.

Further guidance on registration of study protocols is provided by the Real-World Evidence Transparency Initiative. NICE's Advice service provides advice on how technology developers can make best use of real-world data as part of their evidence-generation plans.

When planning the study, developers should consider any equality or diversity issues that should be addressed in design, analysis, or interpretation of the study.

Choosing fit-for-purpose data

Developers should justify the selection of the final data sources, ensuring the data is of good provenance and fit for purpose for the research question (see the section on assessing data suitability).

We encourage developers to identify candidate data sources through a systematic, transparent and reproducible search, including:

pre-specification of search strategy and defined criteria for selection and prioritisation of datasets
expert consultation to inform the search strategy and selection criteria and to highlight known suitable datasets
an online search and systematic literature search, and correspondence with lead authors of relevant publications, when necessary, to gain information on access to and suitability of potential data sources
a direct search of data sources; in the UK, this may be supported by registries of data sources such as the Health Data Research UK Innovation Gateway
a flow diagram outlining the total number of potential data sources identified and the number excluded and reasons why (including for reasons of poor data suitability and feasibility of access).

This approach can be informed by the considerations outlined in the section on assessing data suitability or by following external guidance (Hall et al. 2012, Gatto et al. 2021).

The efforts made to identify data sources should be proportional to the overall importance of the study. We also recognise that currently, registries of data sources are not always available or may have limited metadata.

Data should be accessed and used in accordance with local law, governance arrangements, codes of practice and requirements of the data controller. In the UK, the Health Research Authority (HRA) provides guidance around research and use of data in accordance with the UK Policy Framework for Health and Social Care Research.

Making early contact with data controllers and data processors is prudent to ensure data is available when needed. Developers should ensure they have appropriate ethical (or other) approval for the research study if needed. Developers should also create a plan for sharing data with independent researchers and NICE collaborating centres, when appropriate.

Data collection

For some use cases, primary data collection may be needed. Examples include:

a new observational cohort study
additional data collection to complement an existing data source, for example, adding a quality-of-life questionnaire to a patient registry or performing a subsample validation study
a health survey.

When planning primary data collection, consider how to implement this collection in a patient-centred manner while minimising the burden on patients and healthcare professionals. Assess the feasibility of additional data collection before proceeding.

Sampling methods reduce the burden of data collection but can introduce selection bias. Methods such as simple random sampling support external validity but tend to be feasible only when the target population is small and homogenous. Alternative sampling techniques are available, for example:

Stratified selection divides the target population into subgroups based on important characteristics, such as prognostic factors or treatment effect modifiers, sampling from each strata to ensure representation of all important subgroups.
Balanced sampling for site selection considers important variation across sites in the target population. Recruitment focuses on sufficient representation of sites within each subgroup. Potential sites are ranked, allowing for quick identification of replacements due to non-participation.
Purposive sampling selects individuals based on their likelihood of being informative, rather than to generalise findings to a larger population. For example, to investigate heterogeneity across characteristics or settings. This approach is common in qualitative research.

Data collection should follow a predefined protocol and quality assurance processes should be put in place to ensure the integrity and consistency of data collection. This also applies to the extraction of structured information in retrospective chart reviews or when using data science methods to derive structured data elements from already collected data sources.

Data collection should follow best-practice standards for 'Findable, Accessible, Interoperable, and Reusable (FAIR)' data using open data standards (see the UK Health Data Research Alliance's Data Standards White Paper 2021).

Data should be collected, stored, processed and deleted in accordance with the current data protection laws with appropriate transparency information provided and safeguards implemented. Approvals from the HRA or local organisation review and agreement as appropriate should be in place. When appropriate, consent from participants should be provided.

Please refer to Health Research Authority guidance on governance requirements and data protection regulation for research and non-research use of healthcare data.

Study conduct

Choosing study design and analytical methods

Real-world data can be used to generate several types of evidence including disease prevalence or incidence, healthcare utilisation or costs, treatment pathways, and patient characteristics, outcomes, and experiences (see the section on use cases for real-world data). The appropriate study designs and analytical methods used should be relevant to the research question and reflect the characteristics of the data, including:

the nature and distribution of the outcome variable
sample size
the structure of the data including data hierarchies or clustering (for example, patients may be clustered within hospitals or data may be collected on a patient at multiple timepoints)
heterogeneity in outcomes across population groups
whether data is cross-sectional or longitudinal.

Diagnostic checks should be used to assess the appropriateness of the selected statistical model, if relevant. The appropriate checks will depend on the purpose of the study and methods used.

Further information on the design and analysis of comparative effect studies is provided in the methods section.

Minimising risk of bias

Threats to internal validity from sources of bias should be identified and addressed through data collection and analysis as appropriate. Key threats to internal validity come from selection, information, confounding and other biases depending on the use case (see the section on risk of bias).

The risk of bias from using a particular data source will be informed by the information considered during data suitability assessment.

More detailed guidance on minimising bias in studies of comparative effects is provided in the methods section.

Assessing robustness of study results

Developers should seek to minimise bias at the study design and analysis stages. However, because of the range of possible biases and the complexity of some real-world data sources and analytical methods, some concerns about residual bias will often remain.

Sensitivity analyses should reflect areas with the greatest concerns about risk of bias, or when data curation or analytical decisions were made despite notable uncertainty. Common considerations include:

varying operational definitions of key study variables
differing time windows to define study variables and follow up
using alternative patient eligibility criteria
addressing missing data and measurement error
alternative model specifications
addressing treatment switching or loss to follow up
adjusting for non-adherence.

If concerns about residual bias remain high and impact on the ability to make recommendations, developers could consider using quantitative bias analysis. These methods provide quantitative estimates of the impact of bias on study results (Lash et al. 2014). If external data on bias is incorporated, this should be identified in a transparent and systematic way. For parameters of economic models including relative effects, sensitivity analysis may consider the impact of bias on cost effectiveness as well as the parameter value.

Using proportionate quality assurance processes

Quality assurance of data management, analytical code and analysis is essential to ensure the integrity of the study and reduce the risk of coding errors. Quality assurance processes should be proportional to the risks of the study.

For further information on quality assurance, please see the Office for National Statistic's Quality assurance of code for analysis and research and the UK Government's Aqua Book. This may be supported by using validated analytical platforms.

Study reporting

Reporting of studies should be sufficient to enable an independent researcher with access to the data to reproduce the study, interpret the results, and fully understand its strengths and limitations. Several reporting checklists identify key reporting items for:

observational studies (see the EQUATOR network for reporting checklists by study design, and the Strengthening the reporting of observational studies in epidemiology [STROBE] guidelines)
observational studies of routinely collected data (Reporting of studies conducted using observational routinely collected data [RECORD])
studies of comparative effects (the RECORD statement for pharmacoepidemiology [RECORD-PE]; although this tool was initially designed for phamacoepidemiological studies, the items are relevant to other comparative studies).

Also, the STaRT-RWE tool has been developed to help the presentation of study data, methods and results across use cases.

Below we describe key issues across data sources, data curation, methods and results that are especially important to cover in reporting the study.

Reporting on data sources

Sufficient information should be provided to understand the data source, its provenance, and quality and relevance in relation to the research questions. This should be informed by the considerations described in the data suitability assessment.

Developers should provide additional information:

Ethical (or other) approval for the research study or explain why such approval was not necessary.
A statement that the data was accessed and used in accordance with approvals and information governance requirements.
A description of how others can access the data (that is, a data sharing statement; for an example, see the BMJ policy on data sharing).

Reporting on data curation and analysis

Many real-world evidence studies, especially those using routinely collected data, need considerable processing (or curation) before analysis is done. The decisions made in data curation (including linkages, transformations and exclusions) may have substantial effects on study results. Data curation should be well described, such that reviewers can understand what was done and how it may impact on results. This should include any curation performed before the evidence developer accessing the data wherever possible. When human abstraction, rule-based natural language processing, or artificial intelligence tools are used to extract variables from unstructured data, the methods and processes used should also be clearly described and their validity documented (see the section on data accuracy).

For each individual study, developers should provide information on the software used to perform analyses including the source, version, owner, system and any software dependencies (for example, external packages). Ideally, analytical code should follow best practice in code structure, formatting and comments following language-specific guidelines where applicable (such as the PEP 8 – Style Guide for Python Code), and be publicly available (for example, through a code repository such as GitHub) or made available on request to enable reproduction.

It may not be feasible to provide fully open code in all situations, for instance, when using proprietary software or identifiable personal information. Developers should provide clear information on the methods used and their validity. They should also seek to provide access to the algorithms necessary to replicate and validate the analyses on request, with necessary intellectual property protections in place.

Trust in the integrity of study conduct can be further improved by providing evidence that the study was done appropriately, for example, by showing an audit trail of the analysis, if this is feasible. This could demonstrate, for instance, that developers prepared analysis and finalised protocols before the relevant results were revealed (MacCoun and Perlmutter 2015).

Reporting on methods

Below we describe key items that should be reported. This information should be presented for all analyses including subgroup and sensitivity analyses. Methods should be consistent with the study protocol, and deviations should be identified and justified.

Study design

Clear operational definitions should be given for all study variables and details of follow up, if relevant. Study variables typically include patient eligibility criteria, interventions or exposures, outcomes and covariates.

For each variable, information should be provided on:

the operational definition of the variable including code lists and algorithms when possible
- how code lists or algorithms have been developed and, when possible, validated
the time period over which information for each variable is sought, defined in relation to an index date (for example, 12 months before starting treatment)
the grace period between observations that are assumed to represent continued use of an intervention, if relevant.

For studies of comparative effects, the process by which potential confounders were identified should be described alongside assumptions about the causal relationships between study variables.

The following information on follow up should be described when applicable:

the start and end of follow up in relation to the index date
for interventions, assumptions about the minimum time between intervention and outcome occurrence (latency period) and the likely duration of effects (exposure-effect window).

In longitudinal studies, this information can be usefully summarised using a study design diagram (Schneeweiss et al. 2019). The Reproducible evidence: practices to enhance and achieve transparency (REPEAT) initiative's project page hosts the paper and design diagram templates.

Statistical methods

The statistical methods used should be clearly described. Information should be sufficient to:

understand what methods were used and why they were chosen
demonstrate the validity of modelling assumptions
understand how the analysis addresses different risks of bias including selection bias, information bias and, if relevant, confounding (also see the section on quality appraisal).

Reporting results

The following information should be presented in all studies:

flow (or patient attrition) diagrams to report number of patients at each stage of the study from raw data to the final analytical sample with reasons for exclusion
patient characteristics (including missing data) and details of follow up including event rates (or other distributional information on outcomes); for comparative studies, these should be presented across groups or levels of exposure and, if relevant, before and after adjustment
differences in patient characteristics in the analytical sample and target population.

Results should include central-point estimates, measures of precision and other relevant distributional information if needed. Results should be presented for the main analysis and all subgroup and sensitivity analyses. It should be clear which of these analyses were pre-specified and which were not. For analyses that use adjustment to deal with confounding, unadjusted results should also be presented.

Ensure that information in figures and tables cannot inadvertently identify patients. The Office for National Statistics has guidance on maintaining confidentiality when disseminating health statistics.

Interpreting the results

Provide information to help interpret what the results mean. Discuss limitations in data sources, study design and analysis.

Communicating real-world evidence studies clearly

Real-world evidence studies can be technically complex. To help readers understand them, studies should be documented clearly by:

following advice on writing understandable scientific material (see Gopen and Swan 1990, Greene 2013)
avoiding jargon; if this is not possible, explain terms in plain English
avoiding abbreviations (see Narod et al. 2016)
labelling tables, graphs, and other non-text content clearly and explaining how to interpret them.