3 Committee discussion | Artificial intelligence (AI)-assisted echocardiography analysis and reporting to support the diagnosis and monitoring of heart failure: early-use assessment | Guidance

The condition

3.1

Heart failure is common. It affects over 1 million people in the UK, with 200,000 new diagnoses annually and 800,000 people with the condition on GP registers (British Heart Foundation Statistics Factsheet – UK, 2026). Heart failure is when the heart cannot pump blood effectively because of structural or functional abnormalities. This may develop gradually (chronic, often linked to hypertension or diabetes) or suddenly (acute, for example after myocardial infarction, arrhythmia, infection, or uncontrolled hypertension). Heart failure significantly impacts quality of life, and can lead to disability and early death. Around 80% of heart failure diagnoses in England happen in hospital, despite 40% of people having symptoms that could have prompted earlier assessment (British Heart Foundation Statistics Factsheet – UK, 2026). Heart failure is classified by left ventricular ejection fraction (LVEF) measured using echocardiography. Heart failure with:

preserved ejection fraction (HFpEF) is defined as an LVEF of 50% or more
reduced ejection fraction (HFrEF) is defined as an LVEF of 40% or less
mildly reduced ejection fraction (HFmrEF) is an intermediate category with an LVEF of 41% to 49%.

Current practice

3.2

For both acute and chronic onset of heart failure, initial clinical assessment includes a detailed history, clinical examination and a blood test for N‑terminal pro‑B‑type natriuretic peptide (NT‑proBNP). If NT‑proBNP thresholds are exceeded, this is followed by confirmatory diagnosis with transthoracic echocardiography (TTE). TTE is the primary diagnostic tool for heart failure and is used in around 87% of diagnoses (National Institute for Cardiovascular Outcomes Research [NICOR] National Heart Failure Audit, 2025). In the NHS, it is usually done in secondary care by a specialist cardiac physiologist, consultant cardiologist or cardiology specialist registrar. TTE detects abnormalities and defects in the heart's chambers and valves and provides measurements of blood flow and the heart's pumping ability. When TTE detects abnormal ejection fraction, abnormalities in the motion of the heart wall or hypertrophy, this can indicate heart failure. A clinical diagnosis of heart failure is made by a heart failure specialist based on interpretation of the TTE in addition to the clinical assessment. The TTE process typically takes between 45 and 60 minutes. After TTE, people have an appointment with a heart failure specialist for clinical assessment and review of the TTE findings, and confirmation of diagnosis.

Unmet need

3.3

There is a significant backlog for echocardiography in England, with a waiting list of 156,059 people in June 2025 (NHS England monthly diagnostics data 2025 to 2026). NICE quality standards require 90% of people referred with suspected heart failure to be investigated using echocardiography (typically TTE). But only half of hospitals meet this target (NICOR National Heart Failure Audit, 2025). Although people with suspected heart failure should be seen within 6 weeks, only about two thirds of referrals meet this standard (NHS England, 2025; PDF only).

3.4

This can lead to delays in diagnosis and treatment, and sometimes people are unable to get a TTE appointment. Delays to heart failure diagnoses may lead to poorer health outcomes and increased use of healthcare resources.

3.5

AI technologies that aid the interpretation and quantification of echocardiography images, and automate report generation, could potentially reduce TTE procedure times. This could possibly lead to more appointments being available and a reduction in echocardiography waiting lists, which could lead to a reduction in time to diagnosis.

Clinical effectiveness

Patient considerations

3.6

The patient experts explained that symptoms of heart failure can develop gradually and become debilitating. Symptoms may include severe breathlessness and fatigue, and water retention leading to swollen ankles. These symptoms lead to a sedentary lifestyle because they prevent people from performing daily activities and taking part in social activities such as sports. Symptoms may also disrupt education, training, and work activities and opportunities. People with heart failure may develop other health complications over time including chest infections, heart palpitations, loss of appetite and weight loss. The patient experts explained that long waiting times for echocardiography and subsequent treatment leads to symptoms worsening and, potentially, the need for emergency surgery. They highlighted the need for people to move faster through the diagnostic pathway and so have the most appropriate treatment as soon as possible. Reduced waiting times would enable earlier diagnosis and treatment, which would improve health outcomes.

Evidence base

3.7

There were 19 studies included in the clinical-effectiveness review. Most studies assessed Us2.ai (11 studies). There were 3 studies each on EchoConfidence and EchoGo Heart Failure. Two studies assessed Ligence Heart. Most of the evidence was on aspects of diagnostic performance, with limited evidence on procedure time and clinical outcomes. There was no diagnostic accuracy evidence for Ligence Heart.

3.8

The committee noted there was limited UK‑based and real-world data, with most studies being done outside the UK or in unrealistically controlled settings. There were 2 UK studies, FEATHER study interim analysis for EchoConfidence and Campbell et al. (2025) for Us2.ai. FEATHER was a retrospective study based in a community care setting. Campbell et al. was a prospective study that assessed Us2.ai assisted handheld TTE. But, in this study, the technology was not used as an adjunct and it did not report on TTE procedure times.

3.9

Some studies were small, based in single centres and had single operators. Most of the evidence was from retrospective studies and some was from unpublished reports. The committee considered other limitations in the evidence base such as the exclusion of complex cases, and poor-quality echocardiogram images. It concluded that the evidence had limited generalisability to NHS clinical practice. It also concluded that there was limited evidence on the impact of the technologies on procedure times, waiting times and patient outcomes.

Diagnostic accuracy

3.10

Diagnostic accuracy was reported for 3 of the technologies (EchoConfidence, EchoGo Heart Failure and Us2.ai) across 5 studies. Two of these studies were done in the UK: the FEATHER study interim analysis for EchoConfidence, and Campbell et al. (2025) for Us2.ai. The committee noted that the AI technologies generally show good performance for detecting abnormalities indicating heart failure and related parameters compared with human measurements or multiparametric clinical scoring tools. The committee also noted that standard TTE and clinical assessment without AI was, in some cases, both the comparator and reference standard. So, it could not be shown that the AI technologies were superior to standard TTE, only equivalent.

3.11

A clinical expert explained that TTE is only 1 component of the overall diagnostic process. Blood tests such as NT‑proBNP, clinical assessment and multiparametric clinical scoring tools are also taken into account. Also, heart failure symptoms may be caused by other conditions such as amyloidosis, valve disease, pulmonary hypertension and pericardial constriction, which some AI technologies may miss. The committee concluded that, overall, the diagnostic performance of the AI technologies was generally good. But it concluded that real-world diagnostic performance when using technologies as intended in UK clinical practice was uncertain.

Procedure time, waiting times and system impact

3.12

The committee considered that a reduction in procedure time, leading to more appointments and a reduction in waiting lists, was a key unmet need in echocardiography clinics. A clinical expert explained that around a quarter of people with suspected heart failure are still waiting to have their echocardiogram at the time of the initially scheduled specialist clinical review appointment. So, these people either reschedule or do not attend the specialist clinical review appointment. Evidence on the impact of AI technologies on TTE procedure time was available for 2 of the technologies, EchoConfidence and Us2.ai. For EchoConfidence, the interim analysis from FEATHER reported that in, a UK community care setting, the AI technology reduced the mean time for analysis of echocardiographic parameters from 553 seconds and 587 seconds for 2 human readers to 3.2 seconds. For Us2.ai, 2 studies based in Japan reported data on procedure time, Hirata et al. (2024) and Sakamoto et al. (2025). The committee noted that Hirata et al. reported an overall time saving of 524 seconds, similar to that reported for EchoConfidence in FEATHER.

3.13

The EAG explained that there was insufficient detail in the studies to be certain about what the time saving related to. The clinical experts thought that, although these time savings were promising, it was unclear whether they would translate into routine clinical practice in the NHS. The committee said that the available evidence on procedure time was limited and not generalisable to NHS clinical practice in secondary care. It concluded that it is uncertain whether reported time savings would translate into additional appointments and reduced waiting times.

3.14

The clinical experts said that, in some instances, using the AI technologies may lead to increases in procedure times. This is because of healthcare professionals needing to check and review the AI findings and potentially intervene. The committee concluded that introducing any further delays could cause harm because of delays in diagnosis and treatment.

Community and primary care settings

3.15

The clinical experts explained that there is an increasing trend for TTE procedures to be done in community settings, such as community diagnostic centres (see the NHS England webpage on community diagnostic centres for more information). The clinical experts highlighted that, in community or primary care, use by healthcare professionals such as GPs, community nurses and pharmacists would affect diagnostic performance. So, they said that the available evidence would not be generalisable to these potential future use cases. The committee noted that community and primary care settings were out of scope. But they were explored by the EAG as a potential area for the use of AI technologies in the future.

3.16

The EAG advised that the external assessment report presented some evidence from studies that may be relevant to use of the AI technologies in community or primary care. This included studies that looked at novice operator performance compared with standard expert-led echocardiography (Huang et al. 2024a). The studies showed some potential for the technologies to be used in community care settings. But the committee noted that the Huang et al. study had a number of limitations, including that it was done in a single centre in Singapore and had only 1 novice operator. The committee noted that there was some evidence to support using the AI technologies in community or primary care settings. But it concluded these use cases were outside the scope of the assessment. It further concluded that more research is needed on diagnostic performance and procedure times when different operators of varying levels of experience in different care settings used the technologies (see section 1).

Cost effectiveness

Conceptual model structure and assumptions

3.17

The EAG constructed a conceptual Markov model with a 1-year time horizon to capture the impact of reduced waiting times from shorter TTE durations when using AI technologies. The EAG explained that, because of the limited evidence base, it needed to make multiple assumptions to develop the conceptual model. Because data on TTE appointment time was not available for EchoGo Heart Failure and Ligence Heart, it was not possible to include them in the model.

3.18

AI technologies would be used as an adjunct to standard TTE, and a specialist clinical assessment is needed to diagnose heart failure. So, diagnostic accuracy was assumed to be unaltered when using the AI technologies. The clinical experts agreed that this was a reasonable and safe assumption. Most people (79%) were assumed to enter the model in an acute episode, while the remainder entered from the waiting list. A clinical expert suggested that, in practice, most people would likely enter the care pathway from primary care. The EAG explained that the sensitivity analysis showed that this had no substantial impact on cost effectiveness. But it noted that, if the AI technologies did reduce waiting times, then this could reduce hospitalisation.

3.19

The committee noted that downstream benefits of earlier diagnosis were not modelled because of the lack of evidence in this area and uncertainty around current waiting times. It said that the conceptual model and assumptions were appropriate for assessing the potential cost effectiveness of the AI technologies. But it concluded that the limited evidence base and number of assumptions meant that the economic results were highly uncertain.

TTE appointment time

3.20

The time for a standard TTE appointment in the model was 45 minutes, based on clinical opinion, and the EAG assumed 10 TTE appointments per day. In the base case for EchoConfidence and Us2.ai, the EAG assumed a reduced appointment time of 36 minutes (based on a reduced TTE procedure time). For EchoConfidence, this was based on interim data from FEATHER and, for Us2.ai, it was based on data from Hirata et al. (2024). This resulted in an increase in the number of TTE appointments to 12 per day. The model assumed that, in standard cardiology clinics (that is, not one-stop diagnostic centres), the waiting time from TTE to specialist clinical review remained unchanged. The proportion of patients attending a one-stop diagnostic clinic in the conceptual model was 52% (data from Kwok et al. 2025). The clinical experts emphasised that any procedure time savings from the AI technologies may have different effects in standard cardiology clinics and one-stop diagnostic centres. This is because, in a standard clinic, any time saving would only affect the TTE procedure. But, in a one-stop clinic, the overall time of TTE and clinical specialist review is reduced. The committee recalled the uncertainty in the evidence on procedure time saving and its limited generalisability (see sections 3.12 and 3.13). It concluded that further research is needed on real-world time savings when the AI technologies are used in NHS clinical practice, to better inform any future economic models.

Technology costs

3.21

The committee noted that the cost per scan for each technology was made up of a number of separate components, including:

software cost per scan
system set-up and training costs
information technology support costs
staff costs.

The conceptual model base-case analysis included a total cost per scan of £4.26 for EchoConfidence and £7.70 for Us2.ai.

Plausibility of cost effectiveness

3.22

In the EAG's base-case analysis, EchoConfidence was cost saving compared with standard care. The cost difference was -£3.14 and the quality-adjusted life year (QALY) difference was 0.0005. This was mainly because the reduced staff time per TTE offset its cost per use. The base-case analysis for Us2.ai showed that it was more clinically effective but more costly than standard care. The cost difference was £0.92, the QALY difference was 0.0005 and the incremental cost-effectiveness ratio (ICER) was £1,674 per QALY gained. This was because, in the conceptual model, cost savings from the shorter procedure time when using Us2.ai were not sufficient to fully offset the earlier treatment costs incurred when more people had an earlier diagnosis. The committee understood that these results were uncertain because of the limitations in the evidence base for procedure time savings (see sections 3.12 and 3.13).

3.23

EchoConfidence remained cost saving in all sensitivity analyses, while Us2.ai had ICERs ranging from £93 to £2,684 per QALY gained. The results for both technologies were most sensitive to the impact of waiting-time reduction and the proportion of people diagnosed in a one-stop diagnostic clinic. The results for Us2.ai were also sensitive to the proportion of inpatients having TTE, and the grade of staff delivering TTE. The committee noted that a key driver of cost effectiveness in the model was reduced procedure time using the AI technologies. This could potentially increase throughput and so reduce waiting lists and time to diagnosis. The committee noted that procedure time was key to the results of the model. But it concluded that the available evidence for reduced procedure time lacked robustness and generalisability, so the cost-effectiveness estimates were uncertain.

Equality considerations

3.24

The committee was aware of equality issues around access to echocardiography services in the UK. But there was no evidence that the AI technologies could help address equality issues for different groups. Heart failure is more prevalent in people from some ethnic backgrounds, including South Asian and Black British populations. The committee agreed that it is important to understand how and on what datasets the AI technologies have been trained. A lack of external validation in UK or similar populations may limit suitability of the technologies for UK practice and pose clinical risks. The EAG explained that there is a lack of consistency in how external validation cohorts were reported in the studies. It also noted that there was some UK cohort validation data for EchoConfidence. The committee concluded that more research is needed to understand how the AI technologies are trained and that more transparency is needed around the populations used for validation (see section 1).

Adverse events

3.25

The studies in the clinical review did not report any adverse events. The committee noted that this could reflect the retrospective nature of the evidence base and the lack of real-world evidence. The committee noted that there was no direct evidence to indicate that there is any risk of harm from using the AI technologies. But it concluded that the clinical risk of their use is uncertain, given the current limited evidence base.

Evidence gaps

3.26

The committee considered the evidence gaps highlighted in the external assessment report. For AI technologies being used as an adjunct to standard care the evidence gaps included:

procedure time, including:
- time taken for automation of echocardiographic measurements
- time taken for automation of a TTE report
- overall procedure time
clinical outcomes, including:
- time to heart failure diagnosis
- time to treatment initiation
- patient-reported health-related quality of life
validation in cohorts representative of the UK population, including:
- diagnostic accuracy, interchangeability, agreement and correlation with human measurements
- diagnostic performance in detecting and classifying heart failure
acceptability of the AI technologies, including:
- ease of use
- confidence in accuracy of automation (need for human review)
- feasibility of implementation in different settings (primary and secondary care) with staff of varying skill levels
adverse events, including:
- inaccurate measurements or incorrect diagnoses
- AI failure rate
care pathway uncertainties needed for future model development, including downstream treatment costs and utilities associated with treated and untreated heart failure.

The committee concluded that further research was needed in these areas (see section 1).

Ongoing studies

3.27

The committee noted that there are a number of ongoing studies on the AI technologies. These include 2 randomised controlled trials (TARTAN‑HF and SYMPHONY‑HF) on Us2.ai, which are investigating the use of AI‑assisted echocardiography as part of screening strategies. For EchoConfidence, the FEATHER study is a double-blind evaluation of AI for heart failure diagnosis and stratification on unselected consecutive patients referred for evaluation to community cardiology services. There are also 2 ongoing studies on Ligence Heart. The committee note that there are several ongoing studies which may provide further evidence on AI‑assisted echocardiography analysis and reporting to support the diagnosis and monitoring of heart failure in the NHS. But it concluded that they will not address all the identified evidence gaps (see section 3.26).