Skip to contents

This is a vignette to show you how to download the raw 990 data needed to generate factor scores.

Identify relevant 990 efiler variables

We first list all questions in the 990 series that are needed to generate the governance scores.

See other vignette for how we picked these ones.

See the NCCS website for a comprehensive list of all 900 efiler variable names.

Part/Schedule Variable Name 990 Item Name Question asked on 990
Part 4 F9_04_AFS_IND_X F990-PC-PART-04-LINE-12A Independent audited financial statements?
Part 4 F9_04_AFS_CONSOL_X F990-PC-PART-04-LINE-12B Consolidated audited financial statement?
Part 4 F9_04_BIZ_TRANSAC_DTK_X F990-PC-PART-04-LINE-28A Was the organization a party to a business transaction with one of the following parties: A current former officer, director, trustee, or key employee?
Part 4 F9_04_BIZ_TRANSAC_DTK_FAM_X F990-PC-PART-04-LINE-28B Was the organization a party to a business transaction with one of the following parties: A family member of a current or former officer, director, trustee, or key employee?
Part 4 F9_04_BIZ_TRANSAC_DTK_ENTITY_X F990-PC-PART-04-LINE-28C Was the organization a party to a business transaction with one of the following parties: An entity of which a current or former officer, director, trustee, or key employee (or a family member thereof) was an officer, director, trustee, or direct, or indirect owner?
Part 4 F9_04_CONTR_NONCSH_MT_25K_X F990-PC-PART-04-LINE-29 Did the organization receive more than $25,000 in non-cash contributions?
Part 4 F9_04_CONTR_ART_HIST_X F990-PC-PART-04-LINE-30 Did the organization receive contributions of art, historical treasures, or other similar assets, or qualified conservation contributions?
Part 6 F9_06_GVRN_NUM_VOTING_MEMB F990-PC-PART-01-LINE-03 Number voting members governing body
Part 6 F9_06_GVRN_NUM_VOTING_MEMB_IND F990-PC-PART-01-LINE-04 Number independent voting members
Part 6 F9_06_GVRN_DELEGATE_MGMT_DUTY_X F990-PC-PART-06-SECTION-A-LINE-03 Delegation of management duties?
Part 6 F9_06_GVRN_DOC_GVRN_BODY_X F990-PC-PART-06-SECTION-A-LINE-08A Minutes of governing body?
Part 6 F9_06_POLICY_FORM990_GVRN_BODY_X F990-PC-PART-06-SECTION-B-LINE-11A Form 990 provided to governing body?
Part 6 F9_06_POLICY_COI_X F990-PC-PART-06-SECTION-B-LINE-12A Organization have conflict of interest policy?
Part 6 F9_06_POLICY_COI_DISCLOSURE_X F990-PC-PART-06-SECTION-B-LINE-12B Annual disclosure of interests by covered persons?
Part 6 F9_06_POLICY_COI_MONITOR_X F990-PC-PART-06-SECTION-B-LINE-12C Regular monitoring and enforcement?
Part 6 F9_06_POLICY_WHSTLBLWR_X F990-PC-PART-06-SECTION-B-LINE-13 Whistleblower policy?
Part 6 F9_06_POLICY_DOC_RETENTION_X F990-PC-PART-06-SECTION-B-LINE-14 Organization has written document retention policy?
Part 6 F9_06_POLICY_COMP_PROCESS_CEO_X F990-PC-PART-06-SECTION-B-LINE-15A Compensation process CEO?
Part 6 F9_06_DISCLOSURE_AVBL_OTH_X F990-PC-PART-06-SECTION-C-LINE-18 Form available Other (explain in Schedule O)
Part 6 F9_06_DISCLOSURE_AVBL_OTH_WEB_X F990-PC-PART-06-SECTION-C-LINE-18 Form990 Part VI - Own website
Part 6 F9_06_DISCLOSURE_AVBL_REQUEST_X F990-PC-PART-06-SECTION-C-LINE-18 Form available upon request
Part 6 F9_06_DISCLOSURE_AVBL_OWN_WEB_X F990-PC-PART-06-SECTION-C-LINE-18 Form990 Part VI - Other website
Part 12 F9_12_FINSTAT_METHOD_ACC_OTH F990-PC-PART-12-LINE-01 Indicates method of accounting is other
Part 12 F9_12_FINSTAT_METHOD_ACC_ACCRU_X F990-PC-PART-12-LINE-01 Method of accounting - Accrua
Part 12 F9_12_FINSTAT_METHOD_ACC_CASH_X F990-PC-PART-12-LINE-01; Method of accounting - Cash
Schedule M SM_01_REVIEW_PROCESS_UNUSUAL_X SCHED-M-PART-01-LINE-31 Review process reference unusual noncash gifts?

Downloading the data

There are two methods for downloading the data. The first is downloaded the data sets directly from the NCCS website. This method is fast to do, but downloads much more data than you need to calculate a governance score. The second method is to download the data with the data.table package in R. This method is more complicated to implement, but it only downloads the data needed to calculate a governance score.

Downloading Data from NCCS Website Direcetly

All 990 Efiler data is hosted on the NCCS Website here.

The data here is organized be Part/Schedule, then by year. For our purposes, we only need Part 4, Part 6, Part 12, and Schedule M. Then for each Part/Schedule, you can click the “download” button for the relevant year(s). Once the data is downloaded, you can use the columns ““OBJECTID”, “URL”, “RETURN_VERSION”, and “ORG_EIN” as unique identifiers to bind the data together.

Downloading Data using R

To download the data through R, we first identify which years we are interested in. For this example, say we are interested in years 2013 and 2014.

years <- 2013:2014

We then see from the NCCS Website that the data Part/Schedule download links are structured as

https://nccs-efile.s3.us-east-1.amazonaws.com/parsed/F9-PARTNUMBER-TABLENUMBER-TABLENAME-YEAR.csv

Using this link structure and the variable names we identified in the previous section, we can download the data as follows:

### Part IV ----------------------------------------
#initialize data
dat_4 <-  vector(mode = "list", length = length(years))

#get columns we want 
keep_cols_part4 <- c("OBJECTID", "URL", "RETURN_VERSION", "ORG_EIN", "RETURN_TYPE", 
                     "F9_04_AFS_IND_X", "F9_04_AFS_CONSOL_X",
                     "F9_04_BIZ_TRANSAC_DTK_X", "F9_04_BIZ_TRANSAC_DTK_FAM_X",
                     "F9_04_BIZ_TRANSAC_DTK_ENTITY_X",
                     "F9_04_CONTR_NONCSH_MT_25K_X", 
                     "F9_04_CONTR_ART_HIST_X")

#download the data
for(i in 1:length(years)){
  link <-  paste0("https://nccs-efile.s3.us-east-1.amazonaws.com/parsed/F9-P04-T00-REQUIRED-SCHEDULES-", years[i], ".csv")
  temp <- fread(link, select = keep_cols_part4) 
  dat_4[[i]] <- temp
}

#clean up data
dat_all_4 <- 
  rbindlist(dat_4) %>% 
  mutate( year = as.numeric(substr(RETURN_VERSION, 1, 4)))

### Part VI  ----------------------------------------
#initialize data
dat_6 <-  vector(mode = "list", length = length(years))

#get columns we want 
  keep_cols_part6 <- c("OBJECTID", "URL", "RETURN_VERSION", "ORG_EIN", "RETURN_TYPE", 
                       "F9_06_GVRN_NUM_VOTING_MEMB",
                       "F9_06_GVRN_NUM_VOTING_MEMB_IND", 
                       "F9_06_GVRN_DTK_FAMBIZ_RELATION_X", 
                       "F9_06_GVRN_DELEGATE_MGMT_DUTY_X",
                       "F9_06_GVRN_DOC_GVRN_BODY_X",
                       "F9_06_POLICY_FORM990_GVRN_BODY_X",
                       "F9_06_POLICY_COI_X",
                       "F9_06_POLICY_COI_DISCLOSURE_X",
                       "F9_06_POLICY_COI_MONITOR_X",
                       "F9_06_POLICY_WHSTLBLWR_X",
                       "F9_06_POLICY_DOC_RETENTION_X",
                       "F9_06_POLICY_COMP_PROCESS_CEO_X",
                       "F9_06_DISCLOSURE_AVBL_OTH_X",
                       "F9_06_DISCLOSURE_AVBL_OTH_WEB_X",
                       "F9_06_DISCLOSURE_AVBL_REQUEST_X",
                       "F9_06_DISCLOSURE_AVBL_OWN_WEB_X"
                       )

#download the data
for(i in 1:length(years)){
  link <-  paste0("https://nccs-efile.s3.us-east-1.amazonaws.com/parsed/F9-P06-T00-GOVERNANCE-", years[i], ".csv")
  temp <- fread(link, select = keep_cols_part6) 
  dat_6[[i]] <- temp
}

#clean up data
dat_all_6 <- 
  rbindlist(dat_6) %>% 
  mutate( year = as.numeric(substr(RETURN_VERSION, 1, 4)))



### Part XII ----------------------------------------

#initialize data
dat_12 <-  vector(mode = "list", length = length(years))

#keep the columns we want
keep_cols_part12 <- c("OBJECTID", "URL", "RETURN_VERSION", "ORG_EIN","RETURN_TYPE", 
                      "F9_12_FINSTAT_METHOD_ACC_OTH",
                      "F9_12_FINSTAT_METHOD_ACC_ACCRU_X", 
                      "F9_12_FINSTAT_METHOD_ACC_CASH_X")


#download the data 
for(i in 1:length(years)){
  link <-  paste0("https://nccs-efile.s3.us-east-1.amazonaws.com/parsed/F9-P12-T00-FINANCIAL-REPORTING-", years[i], ".csv")
  temp <- fread(link, select = keep_cols_part12) 
  dat_12[[i]] <- temp
}

#clean up data
dat_all_12 <- 
  rbindlist(dat_12) %>% 
  mutate( year = as.numeric(substr(RETURN_VERSION, 1, 4)))


### Schedule M --------------------------------------
#initialize data
dat_M <-  vector(mode = "list", length = length(years))

#get columns we want 
keep_cols_partM <- c("OBJECTID", "URL", "RETURN_VERSION", "ORG_EIN", "RETURN_TYPE", 
                     "SM_01_REVIEW_PROCESS_UNUSUAL_X")

#download data
for(i in 1:length(years)){
  link <-  paste0("https://nccs-efile.s3.us-east-1.amazonaws.com/parsed/SM-P01-T00-NONCASH-CONTRIBUTIONS-", years[i], ".csv")
  temp <- fread(link, select = keep_cols_partM) 
  dat_M[[i]] <- temp
}

#clean up data
dat_all_M <- 
  rbindlist(dat_M) %>% 
  mutate( year = as.numeric(substr(RETURN_VERSION, 1, 4)))%>% 
  filter(year <= max(years))

Once we have all of the data from each part downloaded, we can merge the parts together. Additionally, we remove an EZ filers since their data will be incomplete for our purposes.

vars.bind <- c("OBJECTID", "URL", "RETURN_VERSION", "ORG_EIN", "year", "RETURN_TYPE")

dat_govern <-
  dat_all_4 %>% 
  merge(dat_all_6, by = vars.bind) %>% 
  merge(dat_all_12, by = vars.bind) %>% 
  merge(dat_all_M, by = vars.bind) %>% 
  filter(RETURN_TYPE != "990EZ")