Virtual Library
Start Your Search
A. Bansal
Author of
-
+
P3.07 - Poster Session with Presenters Present (ID 493)
- Event: WCLC 2016
- Type: Poster Presenters Present
- Track: Regional Aspects/Health Policy/Public Health
- Presentations: 1
- Moderators:
- Coordinates: 12/07/2016, 14:30 - 15:45, Hall B (Poster Area)
-
+
P3.07-013 - Determining EGFR and ALK Status in a Population-Based Cancer Registry: A Natural Language Processing Validation Study (ID 5061)
14:30 - 14:30 | Author(s): A. Bansal
- Abstract
Background:
Population-based data on Epidermal Growth Factor Receptor (EGFR) and Anaplastic Lymphoma Kinase (ALK) gene test status can inform about real-world molecular testing practices and their impact on treatment decisions and outcomes. Yet no efficient methods are available for population-based cancer registries to ascertain molecular testing data of non-squamous non-small cell lung cancer (NS-NSCLC) from electronic pathology (e-path) records. We sought to validate natural language processing (NLP) systems to accurately ascertain EGFR and ALK test use and results in patients with stage IV NS-NSCLC included in the Fred Hutchinson Cancer Research Center’s Cancer Surveillance System (CSS), a part of the U.S. Surveillance, Epidemiology, and End Results (SEER) program.
Methods:
We identified 4,279 e-path reports available in the CSS corresponding to 1,634 patients diagnosed with stage IV NS-NSCLC between 09/1/2011 and 12/31/2013. Using a random sample of 426 (10%) reports, we developed and trained an NLP system to detect EGFR mutation and ALK gene rearrangement test use (test result reported vs. not reported), and test results (positive vs. negative among reported tests). Two oncologists reviewed all e-path reports and resolved discrepancies by consensus to determine the gold-standard classification of test use and results. We report preliminary estimates of the NLP sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for EGFR and ALK test use based on a second random sample of 426 reports (testing subsample).
Results:
Of 1,634 patients, mean age was 68 years, 815 (50%) were male, 1424 (87%) were white, and 1,347 (82%) had adenocarcinoma histology. Based on the gold-standard classification, in the training subsample, 126 (30%) and 103 (24%) reports contained information about EGFR and ALK test results, respectively. In the testing subsample, 139 (32%) and 115 (27%) had information about EGFR and ALK test results, respectively. In the testing subsample, the NLP system correctly detected 135 reports that contained EGFR test results and 285 that did not (sensitivity=97%; specificity=99%; PPV=99%; NPV=99%), respectively. The NLP system correctly detected 113 reports that contained ALK test results and 307 that did not (sensitivity=98%; specificity=99%; PPV=97%; NPV=99%), respectively.
Conclusion:
NLP is likely a valid method for capture of EGFR and ALK test use from e-path reports. Ongoing analyses include the NLP validity for ascertainment of test results among reported EGFR and ALK tests in this initial dataset and in a separate validation dataset of 3,427 pathology reports, all of which will be reported subsequently.