3 Committee discussion

The diagnostics advisory committee considered evidence on Deep Ensemble for Recognition of Malignancy (DERM) to assess and triage skin lesions referred to the urgent suspected skin cancer pathway from several sources, including an early value assessment (EVA) report by the external assessment group (EAG), and an overview of that report. Full details are in the committee papers for this guidance on the NICE website.

Unmet need

3.1

In the UK, dermatology services receive 1.2 million referrals each year from primary care. About 60% are urgent referrals for suspected skin cancer. Of these, only 6% are confirmed to be skin cancer and the remaining 94% are either non-urgent or non-cancer cases. NHS England's July 2024 NHS referral to treatment waiting times report noted that that there was a backlog of 441,000 elective referral appointments within dermatology services, with only 63% occurring within the 18‑week target. Additionally, the Getting It Right First Time (GIRFT) dermatology national report (August 2021) stated that there is a national shortage of consultant dermatologists. It reported that only 659 consultant dermatologists were working in the NHS in England, and there were 159 full-time vacancies with at least 10 NHS trusts having no dermatology consultants at all. The report highlighted that the high number of urgent referrals combined with staff shortages has resulted in delays in diagnosis and care for people with non-cancer, non-urgent inflammatory skin conditions that need face-to-face assessment. The committee heard about the effect this can have on the quality of life and health outcomes of people with non-cancer dermatological conditions, such as psoriasis. Depending on the local services, urgent suspected skin cancer lesions are seen either in a face-to-face dermatology appointment or through teledermatology. NHS England's teledermatology roadmap supports local NHS systems to accelerate the roll out of teledermatology to help manage demand and reduce face-to-face appointments. Artificial intelligence (AI) technologies used within a teledermatology service could potentially increase staff capacity in dermatology services to help address the unmet need.

Patient considerations

3.2

The EAG's report noted that people who were offered an assessment using DERM were generally supportive of AI technologies being used as part of their assessment (for example, as a decision support tool). But many would prefer to also have a face-to-face dermatology appointment. The Unity Insights Skin Analytics evaluation report included interviews with 21 people who were offered assessment with DERM, from across 4 sites that used DERM. The people interviewed expressed high levels of satisfaction with the service, particularly because of how quickly they were assessed. For some people, the reassurance of a healthcare professional review was important for their acceptance of AI. Similar patient views were also noted in Edge Health's report on evaluating AI implementation in the NHS report, a study on patient perspectives by Kawsar et al. (2023), and a patient sentiment report by the company. The lay members of the committee expressed their preference for a face-to-face assessment of suspicious lesions because they perceived it to be a more comprehensive assessment. They expressed concern about the early use of AI technologies, particularly if they are used without a healthcare professional review. They were particularly concerned about the potential for misdiagnosis because skin cancer can be life-threatening, meaning there are high risks associated with missed or delayed diagnoses. They were concerned that people with a skin lesion identified as non-cancer by an AI technology alone may not trust the decision and may re-present in primary care. Stakeholders and the clinical experts highlighted that education for people using the service and clear communication are important in building peoples' trust in AI technologies. Education should provide people using the service with enough information on the waiting times and performance of AI compared with face-to-face and teledermatology assessments. This will allow people to make an informed decision on whether to accept or decline the use of AI technology in their assessment.

Healthcare professional considerations

3.3

The Unity Insights Skin Analytics evaluation report included interviews with 18 healthcare professionals at 4 NHS trusts. The interviews aimed to understand healthcare professionals' views on the use and acceptability of DERM. They broadly accepted DERM and recognised the positive impact it could have on dermatology capacity and waiting times, by discharging non-cancer cases from the urgent suspected skin cancer pathway. They also felt reassured by a healthcare professional reviewing the DERM assessment but were accepting of eventually removing the healthcare professional review, if data matures over time and supports automated use of DERM. Other healthcare professionals using DERM highlighted the benefits of reduced biopsies, increased discharge rates without the need for healthcare professional review and improved overall efficiency in the dermatology service. But, some healthcare professionals were concerned about DERM over-labelling low-risk lesions as high risk, which could result in increased biopsy rates.

3.4

Clinical experts highlighted the benefit of face-to-face dermatologist assessment enabling full body assessment if deemed necessary, which has the potential to identify additional lesions. However, experts acknowledged that capacity issues in the system meant that full body assessments were currently not practical.

DERM diagnostic accuracy

3.5

The committee noted that DERM is designed to identify non-cancer lesions and discharge them from the urgent suspected skin cancer pathway. But, the committee agreed that sensitivity for detecting cancer lesions is as important as sensitivity for detecting non-cancer lesions. This is because a test with a high sensitivity for cancer lesions will have a low number of false-negative results, that is, missed cancer lesions. The company's data from NHS services that are already using DERM (collected from April 2020 to November 2023) showed that automated DERM has a 97% sensitivity for detecting cancer lesions and a 95% sensitivity for detecting melanoma. This data was collected from 85,955 lesions (72,390 people). But only 27,747 of these lesions were assessed in secondary care using a recent version of DERM that included final outcomes that could be used to calculate sensitivity. Thomas et al. (2023) reported data on automated DERM used at 2 NHS trusts, which found sensitivity for detecting cancer lesions ranged between 96% and 100%. DERM‑005 (Marsden et al. 2024) reported a sensitivity of 94.0% (95% confidence interval [CI]: 84.7 to 98.1) for automated DERM in a real-world setting compared with 97.0% (95% CI 88.7 to 99.5) for teledermatology. DERM‑003 (Marsden et al. 2023) reported sensitivities for detecting cancer lesions of 96.0% (95% CI 92.6 to 98.0) for automated DERM compared with 93.8% (95% CI 90.0 to 96.3) for face-to-face dermatologist review. The committee had some concerns around the risk of bias for the reference standard in DERM‑003 because only 1 dermatologist provided the clinical diagnosis used as the ground truth for non-biopsied lesions. Neither DERM‑003 nor DERM‑005 did a test of equivalence, so it is not certain that DERM has equivalent sensitivity to teledermatology and face-to-face dermatologist review for detection of cancer lesions. But, the confidence intervals overlapped in both studies, and the committee agreed that there is no evidence to suggest that DERM is less sensitive than teledermatology or face-to-face dermatologist review for identifying cancer lesions. The EAG reported that if the sensitivity of automated DERM is 95% then using a healthcare professional review could increase this to around 98%, based on data from Edge Health's report on evaluating AI implementation in the NHS. The committee acknowledged that using DERM with a healthcare professional review could reduce the risk of missing skin cancers, but it was uncertain of the impact of this approach on dermatologist capacity (see section 3.11). The committee concluded that further evidence should be generated on the sensitivity to detect cancer lesions of automated DERM used within a well-established teledermatology service compared with the sensitivity of a well-established teledermatology service alone, as well as the impact this has on dermatologist capacity.

3.6

The committee recalled that the use case of DERM in this EVA was the triage of skin lesions after referral to the urgent suspected skin cancer pathway. It recalled that the potential benefit would be the safe triaging of non-cancer skin lesions away from the pathway and freeing up dermatologist capacity. Higher specificities would result in higher discharge rates of non-cancer skin lesions. Thomas et al. (2023) reported specificities for detection of cancer lesions ranging from 70.1% to 73.4% for automated DERM at 2 NHS trusts. The second-read reviewer overturned 40% to 50% of the cases that DERM had marked as eligible for discharge. Marsden et al. (2024) reported specificities for detection of cancer lesions of 73.3% (95% CI 69.9 to 76.4) for automated DERM in a real-world setting and 71.9% (95% CI 68.4 to 75.1) for teledermatology. Marsden et al. (2023) reported the specificity for cancer detection with automated DERM to be 45.0% (95% CI 39.5 to 50.6), which was considerably lower than the 77.4% (95% CI 72.4 to 81.8) for face-to-face dermatologist review. These results suggest that DERM used in teledermatology services may have similar discharge rates to teledermatology services alone for triaging non-cancer skin lesions referred to the urgent suspected skin cancer pathway. Compared with face-to-face assessment, results suggest that DERM used in teledermatology services would discharge fewer non-cancer lesions from the urgent suspected skin cancer pathway. The committee concluded that further evidence should be generated on the specificity to detect cancer lesions of DERM used within a well-established teledermatology service compared with the specificity of a well-established teledermatology service alone, as well as the capacity impact this has on dermatology pathways.

Classification of non-cancer lesions

3.7

The clinical experts raised concerns that DERM's ability to classify non-cancer lesions is limited to 6 types of benign lesions and 2 types of pre-cancer lesions. The technology is not indicated to give a diagnosis of other types of benign lesion, but these would typically be identified through teledermatology or face-to-face dermatologist assessment. Non-cancer lesions may have impacts on quality of life and often require diagnosis and treatment from a GP or through a non-urgent dermatology appointment. If automated DERM triages people away from urgent face-to-face dermatology services, people who enter the urgent suspected skin cancer pathway but are found to have non-cancer, non-urgent inflammatory skin conditions may move from the urgent pathway to the non-urgent pathway rather than being given a diagnosis and treatment advice in their first appointment. There was concern that this may result in delays to diagnosis and management of non-cancer lesions, which does not align with the GIRFT national dermatology report (August 2021) aims for accurate and timely first-time diagnoses. The clinical experts also raised concerns about whether DERM would increase dermatologist capacity if many patients move from the urgent pathway to the non-urgent pathway. The committee concluded that more evidence should be generated to understand the impact of using DERM on clinical capacity for both urgent and routine dermatologist services in the pathway, while it is in clinical use.

Diagnostic accuracy in people with black or brown skin

3.8

The committee was concerned about the diagnostic accuracy of using automated AI technologies to detect skin cancer in people with black or brown skin. There is limited data to validate AI technologies for people with black or brown skin. This is because there is a low incidence of skin cancers among people from Black, Black Caribbean, Black African and Asian ethnic groups. Stakeholders noted that less than 0.5% of skin cancer diagnoses in the UK are in people from Black and Asian ethnic groups. This lower incidence makes it challenging to gather data that is comparable to that available for people from White ethnic groups. The committee noted that high-risk cancers (squamous cell carcinomas and melanoma) are 20 to 30 times more likely to occur in people from White ethnic groups. The company stated that one-third of high-risk cancers are incorrectly referred to non-urgent, routine pathways in which people may wait up to 12 months to be seen by a dermatologist. The company highlighted that this is especially concerning because people from Black, Black Caribbean, Black African and Asian ethnic groups are more frequently placed on these non-urgent waiting lists than people from White ethnic groups. The company stated this is because healthcare professionals have lower diagnostic accuracy when assessing skin lesions in people from these ethnic groups. This extended wait time contributes to delayed diagnoses, which may lead to worse prognoses and outcomes. People from Black and Asian ethnic groups are also more likely to have acral lesions (lesions on the palms of hands and soles of feet) which have a higher risk of cancer. Because AI assessment is not suitable for acral lesions, they are typically referred directly for dermatologist assessment.

3.9

Automated DERM has primarily been evaluated in people with white skin (Fitzpatrick skin types 1 to 3) because of the low incidence of skin cancer in people with black or brown skin. Most studies did not report the proportion of people with different Fitzpatrick skin types, but the DERM‑003 study reported that 0% of people had black skin and the DERM‑005 study reported that 1% of people had black skin. The EAG noted that the company's recent data on using automated DERM in people with brown or black skin (Fitzpatrick skin types 5 and 6) showed that no cancer lesions were missed. This suggested that automated DERM is as diagnostically accurate in people with black or brown skin as it is in people with white skin. The committee emphasised that the amount of data remains small. So, more data is needed on the accuracy of automated DERM in people with black or brown skin to be sure that it does not incorrectly detect (false positive) or miss (false negative) skin cancer. It also noted that it is important to use AI technologies with a healthcare professional review in this group until more data is available. The clinical experts also advised that studies should measure skin tone with spectrophotometry rather than using the Fitzpatrick scale. This is because spectrophotometry is a more accurate way of measuring total melanin content in skin.

Eligibility for assessment with DERM

3.10

The committee noted that a large proportion of skin lesions that are not eligible for DERM assessment can be assessed by teledermatology. The EAG described that the reported proportion of lesions that were excluded from studies because of lesion characteristics ranged between 15.6% and 27.4%. Lesion characteristics included the lesions being obscured by hair, or lesions that were mucosal, acral or involving nails. The company's data from NHS services that are already using DERM (collected from April 2020 to November 2023) reported that approximately 25% of lesions on the urgent suspected skin cancer pathway were not eligible for DERM assessment. The company's economic model assumed that fewer people were eligible for assessment by automated DERM than teledermatology (81% compared with 90%). The committee concluded that it was appropriate for further evidence to be generated on the proportion of skin lesions that are eligible for assessment by DERM compared with teledermatology alone.

Impact on referral rates and resource use

3.11

The Unity Insights Skin Analytics evaluation report presented data from 3 NHS sites that used DERM in a post-referral setting (that is, after primary care referral to a dermatology service). This suggested that using automated DERM would lower the number of non-cancer lesions flagged for further review compared with using teledermatology alone. An analysis by the EAG suggested that, of eligible lesions, automated use of DERM could approximately halve the number of referrals to a dermatologist within the urgent skin cancer pathway. The company's early modelling suggested that automated use of DERM could result in more lesions being correctly identified as non-cancer without a biopsy compared with teledermatology or face-to-face assessment. So fewer biopsies would be needed, and people would be correctly discharged from the service. The committee noted that a well-established teledermatology service could also reduce the number of referrals to face-to-face dermatologist appointments. It discussed uncertainties on how using DERM in practice with or without healthcare professional review would impact capacity in the pathway compared with a well-established teledermatology service. The committee concluded that more evidence should be generated to understand the impact of using DERM with or without healthcare professional review on clinical capacity.

Potential cost effectiveness of automated DERM

3.12

The company's early modelling suggested that using automated DERM for assessing suspicious skin lesions within a well-established teledermatology service has the potential to be cost effective compared with face-to-face assessment. It is less certain if it would be cost effective compared with a well-established teledermatology service alone. The EAG noted that in the company's economic model, the specificity of teledermatology is a key driver in determining cost effectiveness. A low specificity to detect cancer lesions would result in a high number of lesions referred for further assessment and would increase costs. Specificity of teledermatology to detect cancer lesions is uncertain, with estimates ranging from 35% (taken from real-world data from DERM pilot studies) to 84.3% (taken from the Cochrane review on teledermatology for diagnosing skin cancer in adults). The model assumed that automated DERM has a specificity of 42% based on real-world performance data. Specificity of DERM with a healthcare professional review would be lower than the specificity of automated DERM. So, the cost effectiveness of DERM used within a teledermatology service with and without healthcare professional review compared with teledermatology alone is uncertain.

Infrastructure costs

3.13

For DERM to be used in the post-referral pathway (that is, after a primary care referral), a local teledermatology service is needed. This is because an accurate DERM assessment relies on staff taking high-quality medical photographs of the suspicious lesion. The staff are typically healthcare assistants, nurses or medical photographers who are trained to capture suitable images. There are costs associated with setting up this infrastructure and for training medical photographers. The committee noted that although there are teledermatology services in many areas, there is variation across the UK and many areas still refer all suspected skin cancer lesions for an urgent face-to-face appointment. With the wider roll out of teledermatology services, these infrastructure costs will likely be incurred regardless of whether DERM is adopted or not.

Conceptual model

3.14

The committee thought that the conceptual model proposed by the EAG was appropriate. It captured the costs and long-term health consequences that are associated with the misdiagnosis of basal cell carcinomas. The committee suggested that a comparison of the costs of using DERM (see section 2.2) with the costs incurred by the NHS for outpatient referrals should be included in the EAG's cost-effectiveness modelling. It also noted that it would be important to consider how increases in staff capacity could be captured in the model, to meaningfully quantify the impact of reducing demand on dermatology services.

Equality considerations

3.15

Skin cancer is more difficult to accurately detect in people with black or brown skin, which has led to poorer outcomes associated with later diagnosis. There is less data in people with black or brown skin because of the lower incidence of skin cancer in these groups. The committee recommended that DERM is used with a healthcare professional review for people with black or brown skin while further evidence is generated on the accuracy in these groups (see sections 3.8 and 3.9 and the evidence generation plan). People need to give informed consent before AI technology can be used in their care, including for assessing skin lesions. Some people may need extra support to understand the information given to them and help them make an informed decision.

Managing risks

3.16

The committee considered the potential risk of missing cancers if automated DERM is used. It recalled that the evidence suggested that DERM would not miss more cancers than a teledermatologist or a face-to-face dermatologist review of individual lesions (see section 3.5). The committee heard the company's and NHS England's suggestions on managing risks and agreed that it would be important to mitigate potential risk while DERM is used within clinical practice. This could be done through using a healthcare professional review, safety-net protocols to prevent missed or delayed cancer diagnoses, regular monitoring of DERM's accuracy, and a national governance framework to ensure local oversight of use of DERM.

Outcome from resolution

3.17

Resolution requests were referred to a resolution panel under the claim that there was a breach of NICE's published process in the section on resolution for medical technologies and diagnostic guidance in NICE's manual for health technology evaluations. The panel considered whether the committee adequately considered the following points described in section 3.28 of NICE's early value assessment interim statement:

  • the views and experiences of people using the technology

  • the likelihood and size of impact of adopting the technologies for the NHS while further data is collected, in terms of both potential benefits and risks

  • if steps proposed to mitigate risk could enable DERM to be used safely in clinical practice while further data is collected.

3.18

The panel noted that the committee found:

  • some acceptance by patients and healthcare professionals, as presented in the EAG's assessment report and in consultation comments (see sections 3.2 and 3.3)

  • acceptable evidence of potential benefits:

    • DERM would not miss more cancers than a teledermatologist or a face-to-face dermatologist review of individual lesions (see section 3.5 and section 3.16)

    • automated use of DERM could approximately halve the number of referrals to a dermatologist within the urgent skin cancer pathway (see section 3.11).

  • that potential risks of using DERM included:

    • whether DERM could free up overall capacity within dermatology services (see section 3.7)

    • whether automated DERM may incorrectly detect or miss skin cancer in people with black or brown skin (see section 3.9)

  • that strategies are available to mitigate potential risk if DERM is used in clinical practice (see section 3.16).

    The panel decided that the framework for decision making as described in NICE's early value assessment interim statement was not followed, based on the committee discussion. If the framework for decision making was followed, DERM would have been recommended for use in the NHS while further evidence is generated. So, the recommendations were amended to align with this conclusion.