Cancer Datasets and Databases
Datasets – Cancer
Cancer Public Use Datasets National Cancer Institute
Public Use Data Sets
The Division of Cancer Control and Population Sciences (DCCPS) has the lead responsibility at NCI for supporting research in surveillance, epidemiology, health services, behavioral science, and cancer survivorship. The division also plays a central role within the federal government as a source of expertise and evidence on issues such as the quality of cancer care, the economic burden of cancer, geographic information systems, statistical methods, communication science, tobacco control, and the translation of research into practice. DCCPS staff members are innovators in creating resources for the public and the research community. Below are brief summaries and links to a number of public use data resources available through DCCPS and our partners.
http://seer.cancer.gov/hdcalc/ The Health Disparities Calculator (HD*Calc) is designed to generate multiple summary measures to evaluate and monitor health disparities. The HD*Calc statistical software can be used either as an extension of SEER*Stat—allowing users to import Surveillance, Epidemiology, and End Results (SEER) data—or with other population-based health data.
Small Area Estimates for Cancer Risk Factors & Screening Behaviors
http://sae.cancer.gov/ Model-based estimates for states, counties, and health service areas have been developed based on two surveys, the Behavioral Risk Factor Surveillance System (BRFSS) and the National Health Interview Survey (NHIS). The two surveys are combined using novel statistical methodology.
Colorectal Cancer Mortality Projections
http://cisnet.cancer.gov/projections/colorectal/ NCI’s Cancer Intervention and Surveillance Modeling Network (CISNET) developed this Web site to help cancer control planners, program staff, and policy makers consider the impact of risk factor reduction, increased early detection, and increased access to optimal treatment on future colorectal cancer mortality rates.
Cancer Prevalence and Cost of Care Projections
http://costprojections.cancer.gov/ See cost of care or prevalence by cancer site, sex, age, and year under various assumptions. Cancer prevalence was estimated and projected by tumor site through 2020 using incidence and survival data from the Surveillance, Epidemiology, and End Results (SEER) Program and population projections from the U.S. Census Bureau. Annualized net costs of care were estimated using Medicare claims linked to SEER data and adjusted to represent costs in 2010 US dollars.
Finding Cancer Statistics
http://surveillance.cancer.gov/statistics/ Recently developed to facilitate the use of cancer data, Finding Cancer Statistics is a plain-language Web site that provides access to recent reports, datasets, and statistical tools for professionals and the general public. It includes definitions of commonly used statistics, descriptions of datasets and tools, and guides to their use.
Surveillance, Epidemiology, and End Results (SEER) Program http://seer.cancer.gov
- SEER Web site
- The recently redesigned SEER Web site is the preferred mechanism for distributing most of the SEER Program’s products. Recent additions to the site include the SEER 1975-2009 Cancer Statistics Review, with a search function.
- SEER*Stat is a statistical system for the analysis of SEER and other population-based cancer databases. The system provides an easy-to-use Microsoft Windows desktop package for viewing individual cancer records and for producing statistics to assess the impact of cancer on populations.
- The SEER*Prep system allows users to prepare and format their own cancer incidence, mortality, population, and expected survival rate data for use with SEER*Stat.
- Fast Stats
- Fast Stats uses the Cancer Query System 2.0, CanQues, as an interactive system with Java interface to allow users access to millions of precalculated cancer statistics.
- Cancer Stat Fact Sheets
- Cancer Stat Fact Sheets are a collection of statistical summaries for a number of common cancer types. They were developed to provide a quick overview of frequently requested cancer statistics.
National Health Interview Survey
T he National Health Interview Survey (NHIS) is an annual nationwide survey of approximately 35,000 households. It is conducted by the National Center for Health Statistics and administered by the U.S. Census Bureau. A Cancer Control Supplement (CCS) has been periodically fielded on the NHIS since 1987, and since 2000 the CCS has been co-sponsored by the NCI and CDC.
California Health Interview Survey
The California Health Interview Survey (CHIS) provides population-based, standardized health-related data from more than 50,000 Californians selected from all 58 counties. CHIS is fielded annually (biennially before 2011) by the UCLA Center for Health Policy Research in collaboration with the California Department of Public Health and the California Department of Health Care Services. NCI supports cancer control items on CHIS.
National Health and Nutrition Examination Survey
The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews and physical examinations. NCI supports modules in the NHANES that provide data useful for monitoring dietary intake and physical activity.
Tobacco Use Supplement to the Current Population Survey
http://riskfactor.cancer.gov/studies/tus-cps/ The Tobacco Use Supplement to the Current Population Survey (TUS-CPS) is an NCI-sponsored survey of tobacco use and policy information that has been administered as part of the U.S. Census Bureau’s Current Population Survey since 1992. The TUS-CPS is a key source of national, state, and sub-state level data on smoking and other tobacco use in U.S. households. These data can be used by researchers to monitor progress in the control of tobacco use, conduct tobacco-related research, and evaluate tobacco control programs.
Causes of Cancer
Genomic Datasets http://epi.grants.cancer.gov/dac/
Investigators can apply for controlled access to datasets from cancer genome-wide association studies (GWAS), sequencing, and other genomic datasets. Additional datasets will be added as they become available.
Cancer Family Registries
http://epi.grants.cancer.gov/CFR The Breast Cancer Family Registry (B-CFR) and Colon Cancer Family Registry (C-CFR) are international research infrastructures for investigators interested in conducting population and clinic-based interdisciplinary studies on the genetic and molecular epidemiology of breast and colon cancers and their behavioral implications. A central goal of the CFRs is the translation of this research to the clinical and prevention setting for the benefit of Registry participants and the general public. The CFRs have information and biospecimens contributed by families across the spectrum of risk for these cancers and from population-based or relative controls.
Cancer Genetics Network
http://epi.grants.cancer.gov/CGN The Cancer Genetics Network (CGN) is a resource for investigators conducting research on the genetic basis of human cancer susceptibility; integration of this information into medical practice; and behavioral, ethical, and public health issues associated with human genetics. The CGN can provide a wide variety of research services and specialized expertise to assist investigators with approved studies. Prospective investigators can freely query the CGN core database to learn more about the aggregate characteristics of participants and discover how the CGN may be used for research purposes.
Geographic Information System for Breast Cancer Studies on Long Island
http://li-gis.cancer.gov/default.html The Epidemiology and Genetics Research Program has developed a Geographic Information System for Breast Cancer Studies on Long Island (LI GIS). The LI GIS provides researchers a unique tool with which to investigate potential relationships between environmental exposures and risk for breast cancer. It potentially can be used for research on other types of cancer and other diseases.
Quality of Care
SEER-Medicare Data http://healthservices.cancer.gov/seermedicare/ The SEER-Medicare database results from the linkage of two large population-based data sources: the Surveillance, Epidemiology, and End Results (SEER) cancer registries data and the Medicare enrollment and claims files for beneficiaries. This site contains information on how to request the data.
SEER-Medicare Health Outcomes Survey Linked Database
The SEER-Medicare Health Outcomes Survey (SEER-MHOS) linked database is designed to improve understanding of the health-related quality of life of cancer patients and survivors enrolled in Medicare Advantage health plans. The database contains clinical, quality-of-life, socioeconomic, demographic, and other information. SEER-MHOS is sponsored by the NCI and the Centers for Medicare & Medicaid Services (CMS). The SEER-MHOS data files became publicly available to external investigators in December 2010. This site contains information on how to obtain the data.
Breast Cancer Surveillance Consortium
http://breastscreening.cancer.gov/ The Breast Cancer Surveillance Consortium (BCSC) is a research resource for studies designed to assess the delivery and quality of breast cancer screening in the United States, including related patient outcomes. The development of new collaborations to achieve these ends is a key goal of the BCSC. The BCSC data are available to outside investigators for research purposes and this site provides detailed information regarding the specific variables and how collaborations may be developed.
HMO Cancer Research Network
http://crn.cancer.gov/ The HMO Cancer Research Network (CRN) is a consortium of 14 nonprofit research centers based in integrated health care delivery organizations. The CRN allows for large, multi-center, multidisciplinary intervention research that addresses the spectrum of cancer control, including studies of prevention, early detection, treatment, survivorship, surveillance, and end-of-life care. The CRN also develops and uses standardized approaches to data collection, data management, and analysis across health systems.
Classification of Laws Associated with School Students http://class.cancer.gov/
Classification of Laws Associated with School Students (C.L.A.S.S.) is a scoring system that monitors and evaluates state-level school physical education and nutrition policies that have been codified into law.
Health Information National Trends Survey http://hints.cancer.gov
The Health Information National Trends Survey (HINTS) is a nationally representative, biennial telephone survey of 8,000 randomly selected adults. NCI and extramural communication researchers are analyzing data to gain insight into people’s knowledge about cancer, the communication channels through which they obtain health information, and their cancer-related behaviors.
http://cancercontrol.cancer.gov/brp/gem.html Grid-Enabled Measures (GEM) is a dynamic web-based database that contains behavioral and social science measures organized by theoretical constructs. GEM is designed to enable researchers to use common measures with the goal of exchanging harmonized data. Through the use of these standardized measures and common elements, prospective meta-analyses will be possible.
Food Attitudes and Behavior Survey Project
http://cancercontrol.cancer.gov/brp/fab/ The purpose of the Food Attitudes and Behaviors (FAB) Survey is to evaluate a variety of factors including knowledge of fruits and vegetable (F/V recommendations), psychosocial factors, as well as other variables that may be related to F/V consumption. Conventional constructs included self-efficacy, barriers, social support, and knowledge of recommendations related to FV consumption. Novel constructs included shopping patterns, taste preferences, views on vegetarianism, intrinsic/extrinsic motivation, and environmental food offerings.
Cancer Survivor Prevalence Data http://cancercontrol.cancer.gov/ocs/prevalence/index.html To better understand the demographics of the U.S. population of cancer survivors, NCI’s Office of Cancer Survivorship (OCS) and the Surveillance Research Program worked together to develop survivorship prevalence estimates based on the Surveillance, Epidemiology, and End Results (SEER) registry database, which represents five states (Connecticut, Hawaii, Iowa, New Mexico, and Utah) and four standard metropolitan statistical areas (Detroit, Atlanta, San Francisco-Oakland, and Seattle-Puget Sound).
Cancer Control P.L.A.N.E.T. http://cancercontrolplanet.cancer.gov Cancer Control P.L.A.N.E.T. (Plan, Link, Act, Network with Evidence-Based Tools) is a Web portal that provides easy access to data and research-based resources that can help state and local cancer control program planners and staff, and cancer prevention and control researchers to design, implement, and evaluate evidence-based cancer control programs.
State Cancer Profiles
A part of Cancer Control P.L.A.N.E.T., State Cancer Profiles is a comprehensive system of interactive maps and graphs enabling the investigation of cancer trends at the national, state, and county level. The goal of the site is to provide a system to characterize the cancer burden in a standardized manner in order to motivate action, integrate surveillance into cancer control planning, characterize areas and demographic groups, and expose health disparities. It is a collaboration between the NCI and the Centers for Disease Control and Prevention (CDC).
Cancer Trends Progress Report
http://progressreport.cancer.gov The Cancer Trends Progress Report Update summarizes our nation’s progress against cancer in relation to Healthy People 2020 targets set forth by the Department of Health and Human Services. The report includes key measures of progress along the cancer control continuum and uses national trend data to illustrate where advancements have been made.
Presentations on Secondary Data Analysis
How to and why use secondary data analysis Secondary data analysis of national and state health survey data