Leveraging Large Language Models to Extract Prognostic Pathology Features in Ewing Sarcoma

Study ID Citation

Huang J, Batool A, Gu Z, Zhao Z, Yao B, Black J, Davis J, Al-Ibraheemi A, DuBois S, Barkauskas D, Ramakrishnan S, Hall D, Grohar P, Xie Y, Xiao G, Leavey PJ. Leveraging Large Language Models to Extract Prognostic Pathology Features in Ewing Sarcoma. bioRxiv [Preprint]. 2026 Feb 21:2026.02.20.707103. doi: 10.64898/2026.02.20.707103. PMID: 41756885; PMCID: PMC12934661.

Abstract

Current risk stratification for Ewing sarcoma relies heavily on clinical factors such as metastatic status, failing to capture histologic heterogeneity as a potential prognostic indicator. Although pathology reports contain rich biological data, this information remains locked in unstructured narrative text, limiting large-scale retrospective analyses. We aimed to validate the utility of Large Language Models (LLMs) for scalable data abstraction and to identify prognostic histologic features from a large multi-institutional cohort.

Link To Publication opens in a new tab