An artificial intelligence (AI)-based pathology tool, AIM-MASH, has been clinically validated for the assessment of metabolic dysfunction-associated steatohepatitis (MASH), demonstrating non-inferior accuracy compared to expert pathologists in scoring critical histological features. This tool aims to improve the consistency and efficiency of MASH diagnosis in clinical trials and practice.
Validation of AIM-MASH
The study, published in Nature, details the validation of AIM-MASH, an algorithm designed to quantify steatosis, lobular inflammation, hepatocellular ballooning, and fibrosis—key components in the diagnosis and staging of MASH. The validation process included analytical, clinical, and overlay analyses to ensure the tool's reliability and accuracy.
In overlay validation, the AI-generated overlays assisting pathologists in reviewing slides showed high true positive success rates for H&E artifact (0.97, 95% CI 0.95–0.99), trichrome artifact (0.99, 95% CI 0.97–1), lobular inflammation (0.94, 95% CI 0.92–0.96), steatosis (0.96, 95% CI 0.93–0.98), and fibrosis (0.97, 95% CI 0.95–0.99). Hepatocellular ballooning narrowly missed the acceptance criteria with a success rate of 0.87 (95% CI 0.83–0.91).
Repeatability and Reproducibility
AIM-MASH demonstrated strong interday scanner repeatability, with mean agreement rates of 0.93 (95% CI, 0.89–0.96; P < 0.0001) for steatosis, 0.96 (95% CI, 0.94–0.99; P < 0.0001) for lobular inflammation, 0.96 (95% CI, 0.93–0.98; P < 0.0001) for hepatocellular ballooning, and 0.93 (95% CI, 0.89–0.96; P < 0.001) for fibrosis. Inter-site scanner reproducibility also met acceptance criteria for hepatocellular ballooning, with a mean agreement rate of 0.91 (95% CI, 0.87–0.95; P = 0.02).
Notably, the repeatability and reproducibility of AIM-MASH across different sites and scanners were higher than the mean pairwise agreement among pathologists, highlighting the potential for AI to reduce variability in MASH assessment.
Accuracy Assessment
The accuracy of AIM-MASH, both as a standalone tool and as a pathologist-assist, was evaluated in 1,481 cases. When used alone, AIM-MASH showed non-inferior accuracy to expert pathologists (IMRs) in assessing hepatocellular ballooning (difference in weighted kappa [WK] 0.15, 95% CI 0.11–0.18, P < 0.0001 for non-inferiority and superiority) and lobular inflammation (difference in WK 0.12, 95% CI 0.08–0.17, P < 0.0001 for non-inferiority and superiority). Steatosis and fibrosis met non-inferiority criteria but did not achieve superiority.
When pathologists used AIM-MASH as an assistive tool, accuracy for composite histologic scores, such as fibrosis stage 2 or 3 (F2 and F3) versus other, was higher than that of IMRs alone (WK 0.57 vs 0.53, respectively). Trial-relevant enrollment criteria (MAS ≥ 4 with ≥1 in each score category) also showed significantly higher accuracy with AI assistance (WK 0.63 vs 0.51, respectively, with a difference of 0.11 and a 95% CI of 0.07–0.16).
Implications for Clinical Trials
The enhanced accuracy and reproducibility offered by AIM-MASH could significantly benefit clinical trials for MASH therapies. By providing a more consistent and reliable assessment of histological endpoints, the AI tool may improve the efficiency and validity of trial results. Furthermore, the AI-assisted approach has the potential to standardize diagnostic processes across different laboratories and healthcare settings.