
The SDB serves four datasets:
PubMed, a service of the National Library of Medicine, includes over 19 million biomedical articles. The MEDLINE wiki page gives more detailed information about the dataset, including the table schema and data coverage.
U.S. Patent and Trademark Office Patents
For over 200 years, the United States Patent and Trademark Office (USPTO) has been processing and disseminating patent and trademark applications and information to promote an understanding of intellectual property protection and to facilitate the development and sharing of new technologies worldwide. SDB patent data prior to 1996 was generously made available by Steven A. Morris, Electrical and Computer Engineering, Oklahoma State University. Patent data from 1996 to present was downloaded from ftp://ftp.uspto.gov/pub/patdata/. The USPTO wiki page gives more detailed information about the dataset, including the table schema and data coverage.
National Science Foundation Awards
The National Science Foundation (NSF) funds research and education in science and engineering. It does this through grants, contracts, and cooperative agreements to and with more than 2,000 colleges, universities, and other research and/or education institutions in all parts of the United States. The NSF wiki page gives more detailed information about the dataset, including the table schema and data coverage.
National Institutes of Health Awards
CRISP (Computer Retrieval of Information on Scientific Projects) is a searchable database of federally funded biomedical research projects conducted at universities, hospitals, and other research institutions. The database, maintained by the Office of Extramural Research at the National Institutes of Health, includes projects funded by the National Institutes of Health (NIH), Substance Abuse and Mental Health Services (SAMHSA), Health Resources and Services Administration (HRSA), Food and Drug Administration (FDA), Centers for Disease Control and Prevention (CDCP), Agency for Health Care Research and Quality (AHRQ), and Office of Assistant Secretary of Health (OASH). The NIH wiki page gives more detailed information about the dataset, including the table schema and data coverage.
Table 1: Number of records per data set and years covered
Dataset |
# Current Records |
Years Covered |
Regular Update |
MEDLINE Papers |
19,039,860 |
1865-2010 |
Yes |
USPTO Patents |
4,178,196 |
1976-2010 |
Yes |
NIH Awards* |
1,686,889 |
1972-2010 |
Yes |
NSF Awards |
453,687 |
1952-2010 |
No |
Total |
25,358,632 |
|
|
*The number of NIH awards was not aggregated by base project, it includes subprojects. Some projects have up to 3,000 subprojects.
The number of papers/patents/grants per publication year and grant award year is given in the below chart.

