As the second-leading cause of death in the United States, cancer is a public health crisis that afflicts nearly one in two people during their lifetime. Cancer is also an oppressively complex disease. Hundreds of cancer types affecting more than 70 organs have been recorded in the nation's cancer registries--databases of information about individual cancer cases that provide vital statistics to doctors, researchers, and policymakers.
"Population-level cancer surveillance is critical for monitoring the effectiveness of public health initiatives aimed at preventing, detecting, and treating cancer," said Gina Tourassi, director of the Health Data Sciences Institute and the National Center for Computational Sciences at the Department of Energy's Oak Ridge National Laboratory. "Collaborating with the National Cancer Institute, my team is developing advanced artificial intelligence solutions to modernize the national cancer surveillance program by automating the time-consuming data capture effort and providing near real-time cancer reporting."
Through digital cancer registries, scientists can identify trends in cancer diagnoses and treatment responses, which in turn can help guide research dollars and public resources. However, like the disease they track, cancer pathology reports are complex. Variations in notation and language must be interpreted by human cancer registrars trained to analyze the reports.
To better leverage cancer data for research, scientists at ORNL are developing an artificial intelligence-based natural language processing tool to improve information extraction from textual pathology reports. The project is part of a DOE-National Cancer Institute collaboration known as the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) that is accelerating research by merging cancer data with advanced data analysis and high-performance computing.
As DOE's largest Office of Science laboratory, ORNL houses unique computing resources to tackle this challenge--including the world's most powerful supercomputer for AI and a secure data environment for processing protected information such as health data. Through its Surveillance, Epidemiology, and End Results (SEER) Program, NCI receives data from cancer registries, such as the Louisiana Tumor Registry, which includes diagnosis and pathology information for individual cases of cancerous tumors.
"Manually extracting information is costly, time consuming, and error prone, so we are developing an AI-based tool," said Mohammed Alawad, research scientist in the ORNL Computing and Computational Sciences Directorate and lead author of a paper published in the Journal of the American Medical Informatics Association on the results of the team's AI tool.