Pages Menu
Categories Menu

Cancer Datasets and Databases

Datasets – Cancer

Cancer Public Use Datasets National Cancer Institute

Public Use Data Sets

The Division of Cancer Control and Population Sciences (DCCPS) has the lead responsibility at NCI for supporting research in surveillance, epidemiology, health services, behavioral science, and cancer survivorship. The division also plays a central role within the federal government as a source of expertise and evidence on issues such as the quality of cancer care, the economic burden of cancer, geographic information systems, statistical methods, communication science, tobacco control, and the translation of research into practice. DCCPS staff members are innovators in creating resources for the public and the research community. Below are brief summaries and links to a number of public use data resources available through DCCPS and our partners.

Surveillance HD*Calc The Health Disparities Calculator (HD*Calc) is designed to generate multiple summary measures to evaluate and monitor health disparities. The HD*Calc statistical software can be used either as an extension of SEER*Stat—allowing users to import Surveillance, Epidemiology, and End Results (SEER) data—or with other population-based health data.

Small Area Estimates for Cancer Risk Factors & Screening Behaviors Model-based estimates for states, counties, and health service areas have been developed based on two surveys, the Behavioral Risk Factor Surveillance System (BRFSS) and the National Health Interview Survey (NHIS). The two surveys are combined using novel statistical methodology.

Colorectal Cancer Mortality Projections NCI’s Cancer Intervention and Surveillance Modeling Network (CISNET) developed  this Web site to help cancer control planners, program staff, and policy makers  consider the impact of risk factor reduction, increased early detection, and  increased access to optimal treatment on future colorectal cancer mortality  rates.

Cancer Prevalence and Cost of Care Projections See cost of care or prevalence by  cancer site, sex, age, and year under various assumptions. Cancer  prevalence was estimated and projected by tumor site through 2020 using  incidence and survival data from the Surveillance, Epidemiology, and End  Results (SEER) Program and population projections from the U.S. Census Bureau.  Annualized net costs of care were estimated using Medicare claims linked to  SEER data and adjusted to represent costs in 2010 US dollars.

Finding Cancer Statistics Recently developed to facilitate the use of cancer data, Finding Cancer  Statistics is a plain-language Web site that provides access to recent reports,  datasets, and statistical tools for professionals and the general public. It  includes definitions of commonly used statistics, descriptions of datasets and  tools, and guides to their use.

Surveillance, Epidemiology, and End Results (SEER) Program

  • SEER Web site
  •   The recently redesigned SEER Web site is the preferred mechanism for distributing most of the SEER Program’s products. Recent additions to the site include the SEER 1975-2009 Cancer Statistics Review, with a search function.
  • SEER*Stat
  • SEER*Stat is a statistical system for the analysis of SEER and other population-based cancer databases. The system provides an easy-to-use Microsoft Windows desktop package for viewing individual cancer records and for producing statistics to assess the impact of cancer on populations.
  • SEER*Prep
  • The SEER*Prep system allows users to prepare and format their own cancer incidence, mortality, population, and expected survival rate data for use with SEER*Stat.
  • Fast Stats
  • Fast Stats uses the Cancer Query System 2.0, CanQues, as an interactive system with Java interface to allow users access to millions of precalculated cancer statistics.
  • Cancer Stat Fact Sheets
  • Cancer Stat Fact Sheets are a collection of statistical summaries for a number of common cancer types. They were developed to provide a quick overview of frequently requested cancer statistics.

National Health Interview Survey

T he National Health Interview Survey (NHIS) is an annual nationwide survey of approximately 35,000 households.  It is conducted by the National Center for Health Statistics and administered by the U.S. Census Bureau.  A Cancer Control Supplement (CCS) has been periodically fielded on the NHIS since 1987, and since 2000 the CCS has been co-sponsored by the NCI and CDC.

California Health Interview Survey

The California Health Interview Survey (CHIS) provides population-based, standardized health-related data from more than 50,000 Californians selected from all 58 counties. CHIS is fielded annually (biennially before 2011) by the UCLA Center for Health Policy Research in collaboration with the California Department of Public Health and the California Department of Health Care Services.  NCI supports cancer control items on CHIS.

National Health and Nutrition Examination Survey

The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews and physical examinations. NCI supports modules in the NHANES that provide data useful for monitoring dietary intake and physical activity.

Tobacco Use Supplement to the Current Population Survey The Tobacco Use Supplement to the Current Population Survey (TUS-CPS) is an NCI-sponsored survey of tobacco use and policy information that has been administered as part of the U.S. Census Bureau’s Current Population Survey since 1992. The TUS-CPS is a key source of national, state, and sub-state level data on smoking and other tobacco use in U.S. households. These data can be used by researchers to monitor progress in the control of tobacco use, conduct tobacco-related research, and evaluate tobacco control programs.

Causes of Cancer

Genomic Datasets

Investigators can apply for controlled access to datasets from cancer genome-wide association studies (GWAS), sequencing, and other genomic datasets. Additional datasets will be added as they become available.

Cancer Family Registries The Breast Cancer Family Registry (B-CFR) and Colon Cancer Family Registry (C-CFR) are international research infrastructures for investigators interested in conducting population and clinic-based interdisciplinary studies on the genetic and molecular epidemiology of breast and colon cancers and their behavioral implications. A central goal of the CFRs is the translation of this research to the clinical and prevention setting for the benefit of Registry participants and the general public. The CFRs have information and biospecimens contributed by families across the spectrum of risk for these cancers and from population-based or relative controls.

Cancer Genetics Network The Cancer Genetics Network (CGN) is a resource for investigators conducting research on the genetic basis of human cancer susceptibility; integration of this information into medical practice; and behavioral, ethical, and public health issues associated with human genetics. The CGN can provide a wide variety of research services and specialized expertise to assist investigators with approved studies. Prospective investigators can freely query the CGN core database to learn more about the aggregate characteristics of participants and discover how the CGN may be used for research purposes.

Geographic Information System for Breast Cancer Studies on Long Island The Epidemiology and Genetics Research Program has developed a Geographic Information System for Breast Cancer Studies on Long Island (LI GIS). The LI GIS provides researchers a unique tool with which to investigate potential relationships between environmental exposures and risk for breast cancer. It potentially can be used for research on other types of cancer and other diseases.

Quality of Care

SEER-Medicare Data The SEER-Medicare database results from the linkage of two large population-based data sources:  the Surveillance, Epidemiology, and End Results (SEER) cancer registries data and the Medicare enrollment and claims files for beneficiaries.  This site contains information on how to request the data.

SEER-Medicare Health Outcomes Survey Linked Database
The SEER-Medicare Health Outcomes Survey (SEER-MHOS) linked database is designed to improve understanding of the health-related quality of life of cancer patients and survivors enrolled in Medicare Advantage health plans.  The database contains clinical, quality-of-life, socioeconomic, demographic, and other information.  SEER-MHOS is sponsored by the NCI and the Centers for Medicare & Medicaid Services (CMS). The SEER-MHOS data files became publicly available to external investigators in December 2010.  This site contains information on how to obtain the data.

Breast Cancer Surveillance Consortium The Breast Cancer Surveillance Consortium (BCSC) is a research resource for studies designed to assess the delivery and quality of breast cancer screening in the United States, including related patient outcomes. The development of new collaborations to achieve these ends is a key goal of the BCSC. The BCSC data are available to outside investigators for research purposes and this site provides detailed information regarding the specific variables and how collaborations may be developed.

HMO Cancer Research Network     The HMO Cancer Research Network (CRN) is a consortium of 14 nonprofit research centers based in integrated health care delivery organizations. The CRN allows for large, multi-center, multidisciplinary intervention research that addresses the spectrum of cancer control, including studies of prevention, early detection, treatment, survivorship, surveillance, and end-of-life care.  The CRN also develops and uses standardized approaches to data collection, data management, and analysis across health systems.

Behavioral Research
Classification of Laws Associated with School Students
Classification of Laws Associated with School Students (C.L.A.S.S.) is a scoring system that monitors and evaluates state-level school physical education and nutrition policies that have been codified into law.

Health Information National Trends Survey
The Health Information National Trends Survey (HINTS) is a nationally representative, biennial telephone survey of 8,000 randomly selected adults. NCI and extramural communication researchers are analyzing data to gain insight into people’s knowledge about cancer, the communication channels through which they obtain health information, and their cancer-related behaviors.

Grid-Enabled Measures     Grid-Enabled Measures (GEM) is a dynamic web-based database that contains  behavioral and social science measures organized by theoretical constructs. GEM  is designed to enable researchers to use common measures with the goal of  exchanging harmonized data. Through the use of these standardized measures and  common elements, prospective meta-analyses will be possible.

Food Attitudes and  Behavior Survey Project     The purpose of the Food Attitudes and  Behaviors (FAB) Survey is to evaluate a variety of factors including knowledge  of fruits and vegetable (F/V recommendations), psychosocial factors, as well as  other variables that may be related to F/V consumption. Conventional constructs  included self-efficacy, barriers, social support, and knowledge of  recommendations related to FV consumption. Novel constructs included shopping  patterns, taste preferences, views on vegetarianism, intrinsic/extrinsic  motivation, and environmental food offerings.

Cancer Survivor Prevalence Data     To better understand the demographics of the U.S. population of cancer  survivors, NCI’s Office of Cancer Survivorship (OCS) and the Surveillance Research  Program worked together to develop survivorship prevalence estimates based on  the Surveillance, Epidemiology, and End Results (SEER) registry database, which  represents five states (Connecticut, Hawaii, Iowa, New Mexico, and Utah) and  four standard metropolitan statistical areas (Detroit, Atlanta, San  Francisco-Oakland, and Seattle-Puget Sound).

Implementation Science
Cancer Control P.L.A.N.E.T.    Cancer Control P.L.A.N.E.T. (Plan, Link, Act, Network with Evidence-Based Tools) is a Web portal that provides easy access to data and research-based resources that can help state and local cancer control program planners and staff, and cancer prevention and control researchers to design, implement, and evaluate evidence-based cancer control programs.

State Cancer Profiles
A part of Cancer Control P.L.A.N.E.T., State Cancer Profiles is a comprehensive system of interactive maps and graphs enabling the investigation of cancer trends at the national, state, and county level. The goal of the site is to provide a system to characterize the cancer burden in a standardized manner in order to motivate action, integrate surveillance into cancer control planning, characterize areas and demographic groups, and expose health disparities. It is a collaboration between the NCI and the Centers for Disease Control and Prevention (CDC).

Cancer Trends Progress Report     The Cancer Trends Progress Report Update summarizes our nation’s progress against cancer in relation to Healthy People 2020 targets set forth by the Department of Health and Human Services. The report includes key measures of progress along the cancer control continuum and uses national trend data to illustrate where advancements have been made.

Presentations on Secondary Data Analysis
How  to and why use secondary data analysis Secondary data  analysis of national and state health survey data