The IRS maintains several important nonprofit databases to track the current population of exempt organizations, their annual 990 filings, and organizations that have closed. This data has been released in formats that are not always easy to use - ASCII text files, json files, and XML queries.
In order to make the data accessible to the research community, we have created scripts to download data from IRS websites, clean and process it, and export into familiar formats (CSV, Stata, SPSS, etc.).
You can post questions on the discussion board.
Also see the Aspen Institute + Johnson Center’s report on accessing 990 Efile Data.
IRS 990 Data
We have documented and posted the following open data assets:
- IRS E-Filer Database: All nonprofit 990 data that is filed electronically, about 60% of nonprofits.
- Index of all E-Filers from 2009 to Present: A list of all organizations that have electronically filed each year.
- Current Exempt Organizations: The current list of all tax-exempt organizations.
- IRS Business Master File: Organizational characteristics of all current exempt organizations.
- 990N Postcard Filers: Data on nonprofits that are small enough to file the abbreviated “postcard” version of the 990 form.
- IRS Automatic Revocations:” Database of nonprofits that had their tax exempt status revoked for failing to file.
- Organizations Granted Tax Exempt Status through 1023-EZ Form: Data filed electronically on the new shorter 1023-EZ application for 501(c) status.
Open NCCS Data
The National Center for Charitable Statistics at the Urban Institute has opened up their data archives!
Historic PDFs of 990s
The Economic Research Institute provides several years of IRS 990 filings in PDF form on their site:
(1) IRS E-Filer 990 Data
The IRS has released all nonprofit 990 tax data that has been e-filed through their online system, approximately 60-65% of all 990-PC and 990-EZ filers. It is available for years 2012 to current years with a small set of returns avaialable for 2010 and 2011. The data has been posted as XML files in an Amazon Web Server (AWS) Cloud Server. More details about the data and the push to have it made public are below.
In order to support use of this data, we have converted the XML files into a research database similar to the NCCS Core dataset.
Liberating the 990 Data
For some background on the campaigns to open access to IRS data, see these articles and blogs:
- Liberating 990 Data: Stanford Social Innovation Review
- The Nonprofit Data Project Blog: The Aspen Institute
- IRS Plans to Begin Releasing Electronically Filed Nonprofit Tax Data: Chronicle of Philanthropy
- Mandatory E-Filing: Toward a More Transparent Nonprofit Sector: The Urban Institute
- Recommendations for Improving the Effectiveness of the 990 Form for Reporting: Advisory Committee on Tax-Exempt and Government Entities (ACT) Report
Working With 990 Data
Form 990: A Guide for Newcomers to Nonprofit Research [ LINK ]
A History of the Tax Exempt Sector: An SOI Perspective [ LINK ]
A Guided Tour of the 990 Form by GuideStar [ LINK ]
Revised Form 990: The Evolution of Governance and the Nonprofit World [ LINK ]
Wikipedia: History of the 990 [ LINK ]
Resources for the AWS Data
Charity Navigator has created an open-source 990 Toolkit that allows you to set up an Amazon EC2 instance and clone the full IRS dataset as a relational database. You can read their press release about the project here.
Greg Saxton has put together a Python tutorial for wrangling the AWS data into a MongoDB database.
Similarly, Chad Kruse at SmarterGiving has a script to convert 990-PF XML files into a MongoDB database on GitHub here.
You can find some useful scripts here for running queries directly within the cloud and downloading data as CSV files, for example this GitHub gist.
If you are more comfortable in Python, check out Yash Nanavati’s GitHup repo.
There are some forums on using the E-Filer data, for example this reddit forum.
Some example build scripts.
(2) Index of 990, 990-EZ and 990-PF Electronic Filers from 2009 to Present
We provide an R script that builds the INDEX file (not the full dataset) for all IRS E-Filer open data provided on the Amazon Web Server. The index contains a limited number of variables such as nonprofit name, EIN, tax year, form type, and the URL link to the XML form of the 990 return data. This index file allows you to see what is available in the open E-Filer database.
# R script for most recent sample: source( "https://raw.githubusercontent.com/Nonprofit-Open-Data-Collective/irs-990-efiler-database/master/BUILD_SCRIPTS/build_efile_database_functions.R" ) d <- buildIndex() table( d$FormType, d$TaxYear )
(3) List of all Current Exempt Organizations (all orgs granted 501(c)(3) status)
The IRS Publication 78 contains a list of all organizations that currently have 501(c)(3) tax exempt status and are in good standing (eligible to receive tax-deductible donations) under IRS code.
(4) Business Master File of All Current Exempt Orgs
The IRS Exempt Organization Business Master File Extract (EO BMF) contains information on all active nonprofits including basic information about nonprofit location, ruling date (when they were granted tax exempt status), and activities. Note that the NTEE codes are noisy and incomplete. It is recommended to use the NCCS codes instead.
(5) All 990-N Postcard Filers
Most small tax-exempt organizations whose annual gross receipts are normally $50,000 or less can satisfy their annual reporting requirement by electronically submitting Form 990-N if they choose not to file Form 990 or Form 990-EZ instead. Exceptions to this requirement include:
- Organizations that are included in a group return
- Churches, their integrated auxiliaries, and conventions or associations of churches
- Organizations required to file a different return
The Postcard Filers dataset contains close to a million cases from the following years:
(6) All Organizations with a Revoked 501(c)(.) Status
Nonprofits that fail to file 990 returns for three years have their 501(c)(3) tax exempt status automatically revoked by the IRS. This dataset contains more than 670,000 cases for the following years:
(7) Organizations Granted Tax Exempt Status through 1023-EZ Form
This dataset contains information on nonprofits that have been granted tax-exempt status through the new 1023-EZ form, a more compact and simplified version of the original 1023 form. These data do not include organizations that filed for exempt status through the original 1023 form, nor those that filed via paper forms sent to the IRS through the mail. The forms and criteria for submitting a 1023-EZ can be found here:
Current sample sizes are at:
(8) Foundation Grants
Over 1.4 million grants. Data was created by IBM Watson’s Causebot by extracting fields from the IRS e-files (very little documentation provided).
- granteeein - EIN of the grant recipient. Also known as grantee
- grantee - Name of the grantee
- grantdesc - Brief description of the grant in the tax filing
- cashgrantamt - Cash amount of the grant
- grantor - Name of the grantor
- grantorein - EIN of the grantor
- taxperiod - The tax period in which this grant belongs in
- granteecity - City location of the grantee
- granteestate - State location of the grantee
- granteezipcode - Zip code of the grantee
- grantorcity - City location of the grantor
- grantorstate - State location of the grantor
- grantorzipcode - Zip code of the grantor
Additional Open Data Resources of Note
There are some additional interesting sources of nonprofit data that have the potential to be leveraged for future research:
County Level Measures of Individual Incomes and Charitable Donations
- SOI Income Tax Stats
- Use in research: The Politics of Donations: Are Red Counties More Donative Than Blue Counties?
County Level Measures of Social Capital
County Level Measures of Nonprofit Employment
- Johns Hopkins CCS overview of the project [ website ] [ report ]
- BLS site with data downloads [ link ]
Convenient US Longitudinal Census Tract Datasets
- Census datasets are painful to create over time because of changes to tracts, variables, variable names, and many other issues. For a couple of convenient logitudinal sources try:
- Time Series Tables from the National Historical Geographic Information System
- Longitudinal Tract Data Base from the Diversity and Disparities Project
- International Aid Transparency Initiative (iati) [ database of grants ]
- OECD Stats Credit Reporting System (CRD) to NGOs [ database ]
- Example Foundations Pages:
- Ford Foundation Grants [ database ]
- Hewlett Foundation Grants [ database ]
Arts Organizations and Economic Impact
Religious Congregation Data
- Measures of Church Numbers and Membership from 1950 to 2010
- Link to Association of Religion Data Archives
- Duke University maintains an updated list of active NGO Directories
Marc Joffe’s Federal Audit Clearinghouse Harvester
Giving (and volunteering) in the Netherlands Panel Study (GINPS)
Notable APIs for Nonprofit Data
- Pro Publica Nonprofit Explorer API
- Foundation Center API
- Guidestar APIs
- Dark Money Given to Nonprofits
- An ambitious project by the Open Data Institute to create an open database for over 49 million companies globally.
- Project website here
State of Indiana’s Audit Clearinghouse
- Link to the Portal
- Click on “Report Builder”, then “Entity Annual Report” and then “Entity Annual Report”.
- Log of 2015 Nonprofit Audits
- Example of organizations included:
|Category Group||Number Audited in 2015|
|BIG BROTHERS/BIG SISTERS||6|
|BOYS & GIRLS CLUBS||31|
|COUNCIL ON AGING||43|
|COUNTY FAIR ORGANIZATION||21|
|DAY CARE CENTER||92|
|ECONOMIC DEVELOPMENT CORP.||128|
Authors and Contributors
If you are interested in submitting resources or building tools to support nonprofit scholarship please contact Jesse Lecy (firstname.lastname@example.org) or Nathan Grasse (email@example.com).
Special thanks to Francisco Santamarina for his meticulous work decoding the IRS XML documents to translate the data into a useful format and creating the Data Dictionary at the heart of this project.
This project was inspired by the R Open Science initiative, which believes in making data accessible and building tools that help a research community better utilize the data. These scripts are written in the R language because it is a freely-available open-source platform that can be used by anyone.