Open Data for Nonprofit Research

The IRS maintains several important nonprofit databases to track the current population of exempt organizations, their annual 990 filings, and organizations that have closed. This data has been released in formats that are not always easy to use - ASCII text files, json files, and XML queries.

In order to make the data accessible to the research community, we have created scripts to download data from IRS websites, clean and process it, and export into familiar formats (CSV, Stata, SPSS, etc.).

We also have some legacy files on our Dataverse site and Data World.

You can post questions on the discussion board.

You should also check out the great resources from CitizenAudit and Open990.

Check out the Generosity Commission’s Compendium of Nonprofit and Philanthropic Data.

Also see the Aspen Institute + Johnson Center’s report on accessing 990 Efile Data.

IRS 990 Data

We have documented and posted the following open data assets:

  1. IRS E-Filer Database: All nonprofit 990 data that is filed electronically, about 60% of nonprofits.
  2. Index of all E-Filers from 2009 to Present: A list of all organizations that have electronically filed each year.
  3. Current Exempt Organizations: The current list of all tax-exempt organizations.
  4. IRS Business Master File: Organizational characteristics of all current exempt organizations.
  5. 990N Postcard Filers: Data on nonprofits that are small enough to file the abbreviated “postcard” version of the 990 form.
  6. IRS Automatic Revocations:” Database of nonprofits that had their tax exempt status revoked for failing to file.
  7. Organizations Granted Tax Exempt Status through 1023-EZ Form: Data filed electronically on the new shorter 1023-EZ application for 501(c) status.

What’s a 990 form? A charity accounting expert explains

Open NCCS Data

The National Center for Charitable Statistics at the Urban Institute has opened up their data archives!

NCCS Open Data Portal

Historic PDFs of 990s

The Economic Research Institute provides several years of IRS 990 filings in PDF form on their site:

ERI Nonprofit Search

(1) IRS E-Filer 990 Data

Nice Overview of E-File Data

The IRS has released all nonprofit 990 tax data that has been e-filed through their online system, approximately 60-65% of all 990-PC and 990-EZ filers. It is available for years 2012 to current years with a small set of returns avaialable for 2010 and 2011. The data has been posted as XML files in an Amazon Web Server (AWS) Cloud Server. More details about the data and the push to have it made public are below.

In order to support use of this data, we have converted the XML files into a research database similar to the NCCS Core dataset.

  2009 2010 2011 2012 2013 2014 2015 2016 2017
990 33,360 123,107 159,539 179,674 198,738 218,614 232,975 214,585 25,921
990EZ 15,500 63,253 82,066 93,769 104,538 116,461 124,507 121,530 28,767
990PF 2,352 25,275 34,597 39,936 45,897 53,443 58,724 60,305 20,608

Check out a quick guide to working with XML files in R: [ HTML ] [ PDF ]

You can download the data in CSV and RDS formats here: [ Data Dictionary ] Link to Datasets

Liberating the 990 Data

For some background on the campaigns to open access to IRS data, see these articles and blogs:

Working With 990 Data

Example Forms:

Form 990: A Guide for Newcomers to Nonprofit Research [ LINK ]

A History of the Tax Exempt Sector: An SOI Perspective [ LINK ]

A Guided Tour of the 990 Form by GuideStar [ LINK ]

Revised Form 990: The Evolution of Governance and the Nonprofit World [ LINK ]

Wikipedia: History of the 990 [ LINK ]

Resources for the AWS Data

Charity Navigator has created an open-source 990 Toolkit that allows you to set up an Amazon EC2 instance and clone the full IRS dataset as a relational database. You can read their press release about the project here.

Greg Saxton has put together a Python tutorial for wrangling the AWS data into a MongoDB database.

Similarly, Chad Kruse at SmarterGiving has a script to convert 990-PF XML files into a MongoDB database on GitHub here.

You can find some useful scripts here for running queries directly within the cloud and downloading data as CSV files, for example this GitHub gist.

If you are more comfortable in Python, check out Yash Nanavati’s GitHup repo.

There are some forums on using the E-Filer data, for example this reddit forum.

Some example build scripts.

(2) Index of 990, 990-EZ and 990-PF Electronic Filers from 2009 to Present

We provide an R script that builds the INDEX file (not the full dataset) for all IRS E-Filer open data provided on the Amazon Web Server. The index contains a limited number of variables such as nonprofit name, EIN, tax year, form type, and the URL link to the XML form of the 990 return data. This index file allows you to see what is available in the open E-Filer database.

[Data Dictionary] [ update ] [Link to Dataset]

# R script for most recent sample:
source( "" )
d <- buildIndex()
table( d$FormType, d$TaxYear )
  2009 2010 2011 2012 2013 2014 2015 2016 2017
990 33,360 123,107 159,539 179,674 198,738 218,614 232,975 214,585 25,921
990EZ 15,500 63,253 82,066 93,769 104,538 116,461 124,507 121,530 28,767
990PF 2,352 25,275 34,597 39,936 45,897 53,443 58,724 60,305 20,608

(3) List of all Current Exempt Organizations (all orgs granted 501(c)(3) status)

The IRS Publication 78 contains a list of all organizations that currently have 501(c)(3) tax exempt status and are in good standing (eligible to receive tax-deductible donations) under IRS code.

[ Data Dictionary ] [ Link to Dataset ]

(4) Business Master File of All Current Exempt Orgs

The IRS Exempt Organization Business Master File Extract (EO BMF) contains information on all active nonprofits including basic information about nonprofit location, ruling date (when they were granted tax exempt status), and activities. Note that the NTEE codes are noisy and incomplete. It is recommended to use the NCCS codes instead.

[ Data Dictionary ] [ Link to Dataset ]

(5) All 990-N Postcard Filers

Most small tax-exempt organizations whose annual gross receipts are normally $50,000 or less can satisfy their annual reporting requirement by electronically submitting Form 990-N if they choose not to file Form 990 or Form 990-EZ instead. Exceptions to this requirement include:

  • Organizations that are included in a group return
  • Churches, their integrated auxiliaries, and conventions or associations of churches
  • Organizations required to file a different return

The Postcard Filers dataset contains close to a million cases from the following years:

2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
26,969 28,704 45,846 31,734 36,457 36,779 52,202 120,831 475,084 65,211

[ Data Dictionary ] [ Link to Dataset ]

(6) All Organizations with a Revoked 501(c)(.) Status

Nonprofits that fail to file 990 returns for three years have their 501(c)(3) tax exempt status automatically revoked by the IRS. This dataset contains more than 670,000 cases for the following years:

2010 2011 2012 2013 2014 2015 2016
372,717 92,360 47,506 52,111 36,973 36,935 35,046

[ Data Dictionary ] [ Link to Dataset ]

(7) Organizations Granted Tax Exempt Status through 1023-EZ Form

This dataset contains information on nonprofits that have been granted tax-exempt status through the new 1023-EZ form, a more compact and simplified version of the original 1023 form. These data do not include organizations that filed for exempt status through the original 1023 form, nor those that filed via paper forms sent to the IRS through the mail. The forms and criteria for submitting a 1023-EZ can be found here:

[ 1023-EZ Documentation ]
[ 1023 Documentation ]

Current sample sizes are at:

2014 2015 2016
15,160 42,392 47,557

[ Data Dictionary ] [ Link to Dataset ]

(8) Foundation Grants

Over 1.4 million grants. Data was created by IBM Watson’s Causebot by extracting fields from the IRS e-files (very little documentation provided).

2010 2011 2012 2013 2014 2015
159,435 213,457 246,691 275,551 307,383 213,564
  • granteeein - EIN of the grant recipient. Also known as grantee
  • grantee - Name of the grantee
  • grantdesc - Brief description of the grant in the tax filing
  • cashgrantamt - Cash amount of the grant
  • grantor - Name of the grantor
  • grantorein - EIN of the grantor
  • taxperiod - The tax period in which this grant belongs in
  • granteecity - City location of the grantee
  • granteestate - State location of the grantee
  • granteezipcode - Zip code of the grantee
  • grantorcity - City location of the grantor
  • grantorstate - State location of the grantor
  • grantorzipcode - Zip code of the grantor

[ All Efile Foundation Grants 2010-2015 ]

Additional Open Data Resources of Note

There are some additional interesting sources of nonprofit data that have the potential to be leveraged for future research:

County Level Measures of Individual Incomes and Charitable Donations

County Level Measures of Social Capital

County Level Measures of Nonprofit Employment

  • Johns Hopkins CCS overview of the project [ website ] [ report ]
  • BLS site with data downloads [ link ]

Convenient US Longitudinal Census Tract Datasets

Foreign AID

  • International Aid Transparency Initiative (iati) [ database of grants ]
  • OECD Stats Credit Reporting System (CRD) to NGOs [ database ]
  • Example Foundations Pages:
  • Ford Foundation Grants [ database ]
  • Hewlett Foundation Grants [ database ]

Arts Organizations and Economic Impact

Religious Congregation Data

NGO Data

Marc Joffe’s Federal Audit Clearinghouse Harvester

Giving (and volunteering) in the Netherlands Panel Study (GINPS)

Notable APIs for Nonprofit Data

OpenCorporates Project

  • An ambitious project by the Open Data Institute to create an open database for over 49 million companies globally.
  • Project website here

State of Indiana’s Audit Clearinghouse

Category Group Number Audited in 2015
4H-CLUB 62

Authors and Contributors

If you are interested in submitting resources or building tools to support nonprofit scholarship please contact Jesse Lecy ( or Nathan Grasse (

Special thanks to Francisco Santamarina for his meticulous work decoding the IRS XML documents to translate the data into a useful format and creating the Data Dictionary at the heart of this project.

Open Science

This project was inspired by the R Open Science initiative, which believes in making data accessible and building tools that help a research community better utilize the data. These scripts are written in the R language because it is a freely-available open-source platform that can be used by anyone.