Evidence generation plan for artificial intelligence (AI) technologies for assessing and triaging skin lesions referred to the urgent suspected skin cancer pathway

3 Approach to evidence generation

3.1 Evidence gaps and ongoing studies

Table 1 summarises the evidence gaps and the status of evidence addressing them.

Table 1 Evidence gaps and status of evidence
Evidence gap Deep Ensemble for Recognition of Malignancy (DERM)

Resource and care pathway impact

Limited evidence

Accuracy of DERM in people with black or brown skin

Limited evidence

Comparative analysis of the accuracy of DERM and teledermatology

Good indirect evidence

3.2 Data sources

NICE's real-world evidence framework provides detailed guidance on assessing the suitability of a real-world data source to answer a specific research question.

Some data may be generated through the technology itself, such as the number of referrals that were assessed by the technology and the diagnostic outcomes predicted by it. This can be integrated with other data collected.

Some data may be generated as part of post-market surveillance activities done by the technology manufacturer. This may include data relating to the number of referrals seen, or the proportion of outcomes or performance data for the technology in comparison with a pre-defined ground truth.

Local or regional data collections such as NHS England's secure data environments and databases like NHS England's National Cancer Registration and Analysis Service (NCRAS) already measure outcomes specified in this plan. They could be used to collect data to address the evidence gaps. Secure data environments are data storage and access platforms that bring together many sources of data, such as from primary and secondary care, to enable research and analysis. The sub-national secure data environments are designed to be agile and can be modified to suit the needs of new projects.

The quality and coverage of real-world data collections are of key importance when used in research. Active monitoring and follow-up through a central coordinating point is an effective and viable approach to ensure good-quality data with broad coverage.

3.3 Evidence collection plan

Two potential methodological approaches are presented in this section. Both have their respective strengths and weaknesses and, depending upon the circumstances in which evidence is being generated, either may be the better approach.

Data should be collected that reflects the following different ways that the technology can be implemented:

  • as an autonomous tool, and

  • used with a healthcare professional review.

Real-world comparative cohort study

In this type of study, data should be collected from healthcare services where the artificial intelligence (AI) technology is offered and compared with services where it is not. People in both groups should be followed from the point at which they would typically be offered the AI technology.

The comparison group should include teledermatology services with comparable patient populations and standard care pathways but without access to the AI technology. Ideally, the study should be done across multiple centres to reflect the diversity of the NHS service provision.

Non-random assignment to interventions introduces a risk of confounding bias. So, appropriate methods such as matching or adjustment (for example, propensity score methods) should be used to minimise selection bias and balance confounding factors between groups. High-quality data on patient characteristics will be essential to support these methods. The identification of key confounders should be informed by expert input during protocol development.

Real-world before-and-after implementation study

A before-and-after study design allows for comparisons when there is considerable variation between services in the standards and mode of delivery of teledermatology. It also allows assessment of implementation costs, changes in referral rates, and the proportion of cases that are eligible for assessment by the AI technology.

Before the AI technology is implemented in a teledermatology service, data should be collected about the service, for example:

  • total number of referrals to that service

  • number of those referrals that resulted in a face-to-face appointment with a dermatologist

  • number of biopsies

  • number of referrals that resulted in a cancer lesion diagnosis.

The AI technology should then be implemented into the service and all implementation and training costs should be collected. After leaving a period of time to account for learning effects, the outcomes should be collected again. The number of lesions that are not eligible for assessment by the AI technology should also be collected in the after-implementation study, and the reason why.

This study could be done at a single centre with an established teledermatology service or ideally, replicated across multiple centres. This could show how the AI technology can be implemented across a range of services, representative of the variety in the NHS.

3.4 Data to be collected

The following information has been identified for collection:

  • patient demographics: age, sex, ethnicity, Fitzpatrick skin type (or other validated classification scale)

  • lesion characteristics, for example, melanoma or squamous cell carcinoma (SCC)

  • referral volumes to and from teledermatology services

  • impact on workload, for example, numbers of biopsies, appointments and face-to-face appointments, and healthcare professional time

  • number of lesions identified as benign and discharged

  • time to diagnosis or discharge

  • costs associated with implementing and using the technology, for example, set-up costs, staff needed and training

  • diagnostic accuracy, for both DERM and teledermatology alone

  • cases of diagnostic disagreement between DERM and current NHS practice and, ideally, reasons. For example, lesions that DERM was not able to assess or those identified as a cancerous lesion by DERM that were not identified by teledermatology

  • site characteristics and data that can support matching or adjustment analysis

  • ideally, system indicators such as waiting times.

Data collection should follow a pre-defined protocol and quality assurance processes should be put in place to ensure the integrity and consistency of data collection. See NICE's real-world evidence framework, which provides guidance on the planning, conduct and reporting of real-world evidence studies.

Information about the technology

Information about how the technology was developed, the update version tested, and how the effect of future updates will be monitored should also be reported. See the NICE evidence standards framework for digital health technologies.

3.5 Evidence generation period

This will be 3 years to allow for setting up and implementing the AI technologies, and for data collection, analysis and reporting.

3.6 Following best practice in study methodology

Following best practice when doing studies is paramount to ensuring the reliability and validity of the research findings. Following rigorous guidelines and established standards is crucial for generating credible evidence that can improve care. The NICE real-world evidence framework details some key considerations.

In the context of evidence generation, it is important to consider as part of the informed consent process that patients (and their carers, as appropriate) understand that data will be collected to address the evidence gaps in section 2. Where applicable this should take account of NICEs guidance about shared decision making.

This page was last updated: