The Data Science Toolkit
Examples of useful data science tools that can help clean, process, or extend your data.
Build Nonprofit Datasets
EFILE TOOLS
- 990 Data Issue Tracker
- IRS 990 Efile Raw XML Downloads
- IRS 990 Efile Database Builder
- Master Concordance Crosswalk for Efile Data
- Improved AWS Efile Index (deprecated)
From Jacob Fenton:
OTHER NONPROFIT DATASETS
- IRS Pub 78 Exempt Orgs Database
- IRS Nonprofit Business Master File
- IRS 990N Postcard Filers
- IRS Tax Exempt Revocations
- IRS 1023-EZ New Nonprofit Metadata
- 527 Political Organization Disclosures
- Historic Statistics of Income Nonprofit Microdata Samples
- Historic 990 data from 1982-1994 (pre-NCCS Core)
- ProPublica API
Refine Nonprofit Data
- nccs core harmonization
- peopleparser 990 part vii name standardization and gender coding
- titleclassifier 990 part vii title standardization
- classifying nonprofit missions
Analyze Nonprofit Data
- fiscal R package for creating a standard set of nonprofit fiscal health metrics
- compensator R package for automated compensation appraisal of NP executives
- npcompete tool to create standardized metrics of market competitiveness
Automated Mission Codes
Predict nonprofit activity codes from their name, mission, and program description text using machine learning algorithms.
Interlocking Board Networks
Use approximate matching to link individuals in your data.
For more examples see this helpful network tutorial.
Open Science
This project was inspired by the R Open Science initiative, which believes in making data accessible and building tools that help a research community better utilize the data. These scripts are written in the R language because it is a freely-available open-source platform that can be used by anyone.