Huang J, Batool A, Gu Z, Zhao Z, Yao B, Black J, Davis J, Al-Ibraheemi A, DuBois S, Barkauskas D, Ramakrishnan S, Hall D, Grohar P, Xie Y, Xiao G, Leavey PJ. Leveraging Large Language Models to Extract Prognostic Pathology Features in Ewing Sarcoma. bioRxiv [Preprint]. 2026 Feb 21:2026.02.20.707103. doi: 10.64898/2026.02.20.707103. PMID: 41756885; PMCID: PMC12934661.
Study ID Citation
Abstract
Current risk stratification for Ewing sarcoma relies heavily on clinical factors such as metastatic status, failing to capture histologic heterogeneity as a potential prognostic indicator. Although pathology reports contain rich biological data, this information remains locked in unstructured narrative text, limiting large-scale retrospective analyses. We aimed to validate the utility of Large Language Models (LLMs) for scalable data abstraction and to identify prognostic histologic features from a large multi-institutional cohort.