An artificial intelligence (AI) system was better at detecting clinically significant prostate cancer on MRI compared with radiologists, a noninferiority, confirmatory study showed.
In a subset of 400 testing cases in which the AI system was compared with the radiologists participating in a reader study, the AI had an area under the receiver operating characteristic curve (AUROC) of 0.91 (95% CI 0.87-0.94) compared with 0.86 (95% CI 0.83-0.89) for the pool of 62 radiologists using the Prostate Imaging-Reporting and Data System version 2.1 (PI-RADS 2.1), reported Anindo Saha, MSc, of Radboud University Medical Center in Nijmegen, the Netherlands, and colleagues.
Therefore, the AI system passed the prespecified criteria for noninferiority (with a lower boundary of the two-sided 95% Wald CI for the difference in AUROC of 0.02), and showed superior case-level diagnosis (P<0.0001), they wrote in .
"We provided evidence that AI systems, when adequately trained and validated for a target population with thousands of patient cases, could potentially support the diagnostic pathway of prostate cancer management," Saha and team wrote. "A clinical trial is required to determine if such a system translates to improvements in workflow efficiency, healthcare equity, and patient outcomes."
At the mean PI-RADS 3 or greater operating point of all readers, the AI system detected 6.8% more cases with Gleason grade group 2 or greater cancers at the same specificity (57.7%), or 50.4% fewer false-positive results and 20% fewer cases with Gleason grade group 1 cancers at the same sensitivity (89.4%).
In all 1,000 testing cases where the AI system was compared with the radiology readings made during multidisciplinary practice, noninferiority was not confirmed, with the AI system showing lower specificity (68.9% vs 69%) at the same sensitivity (96.1%) as the PI-RADS 3 or greater operating point.
However, Saha and colleagues pointed out that the observed performance gap was just 0.1% in specificity at the same sensitivity.
"We hypothesize that this difference in performance between the radiologists participating in the reader study and the radiologists reporting in practice was due to those reporting in practice having access to patient history (including previous prostate-specific antigen levels and imaging and biopsy outcomes), peer consultation (or multidisciplinary team meetings), and protocol familiarity," they wrote. "We recommend that future studies investigate multimodal prostate-AI systems that factor in continuous health data across the complete patient pathway to improve performance further."
The authors also noted that the predictive values seen with the AI system were high compared with that of radiologists reading multiparametric MRI in the , as well as results in meta-analyses, but cautioned against cross-trial comparisons due to different populations, comparators, outcomes, and study designs.
In an , Martin Eklund, PhD, of the Karolinska Institute in Stockholm, said that the size and breadth of the data collected by Saha and colleagues will be "of crucial importance for AI development and evaluation."
However, he also pointed out that the results come with several limitations. For example, he said that the study excluded MRI exams with insufficient quality.
"In a prospective setting, such images would need to be handled either with human involvement or by AI algorithms, illustrating that prospective trials are a necessary next step to test aspects of implementation of AI algorithms for scoring prostate MRI exams that cannot be assessed in retrospective studies," he wrote, adding that the study "clearly advances the field and paves the way for such prospective trials."
In the international study, the authors combined two substudies: in the first, they trained and externally validated an AI system, developed within an international consortium, for detecting Gleason grade group 2 or greater cancers using a retrospective cohort of 10,207 biparametric MRI examinations from 9,129 patients at four European centers.
Of these examinations, 9,207 cases from three centers in the Netherlands were used for training and tuning, and 1,000 cases from four centers in the Netherlands and Norway were used for testing.
At the same time, a multireader, multicase observer study was conducted and included 62 radiologists from 45 centers in 20 countries, with a median of 7 years of experience reading prostate MRIs, who used PI-RADS 2.1 on 400 paired MRI examinations from the testing cohort.
Of the 10,207 examinations included from January 2012 through December 2021, 2,440 cases had histologically confirmed Gleason grade group 2 or greater prostate cancer.
Disclosures
This study was funded by the European Commission (EU Horizon 2020: ProCAncer-I project) and Health~Holland.
Saha had no disclosures. Multiple co-authors reported relationships with industry.
Eklund had no disclosures.
Primary Source
Lancet Oncology
Saha A, et al "Artificial intelligence and radiologists in prostate cancer detection on MRI (PI-CAI): an international, paired, non-inferiority, confirmatory study" Lancet Oncol 2024; DOI: 10.1016/S1470-2045(24)00220-1.
Secondary Source
Lancet Oncology
Eklund M "Artificial intelligence for scoring prostate MRI: ready for prospective evaluation" Lancet Oncol 2024; DOI: 10.1016/S1470-2045(24)00284-5.