An R package to assign raw nonprofit executive titles from Form 990 Part VII to a well-structured title taxonomy.
## [1] "PROGRAMS COCHAIR, RESEARCH COCHAIR"
## [2] "FIN SECY / TREASURER PAST"
## [3] "RECORDING SECY PAST"
## [4] "WARDEN PAST"
## [5] "CONDUCTOR PAST"
## [6] "FINANCIAL SECRETARY/TREASU -PAST"
## [7] "VICE PRESIDENT - NEW AND PAST"
## [8] "TREASURER - NEW AND PAST"
## [9] "WARDEN - NEW AND PAST"
## [10] "FINANCIAL SEC'Y - PAST"
## [11] "VICE - PRES - PAST"
## [12] "FIN SEC'Y/ TREASURER - PAST"
## [13] "BOARD CHAIR, FINANCE COMMITTEE"
## [14] "TREASURER, DIRECTOR, FINANCE COMMITTEE"
## [15] "FIN'L SECRETARY/TREASURER - PAST"
## [16] "LOWER DIV COORDINAT"
## [17] "UPPER DIV COORDINATE"
## [18] "SCOREMASTER"
## [19] "SNACK BAR COORDINAT"
## [20] "SPONSOR/FUND COORDIN"
After installation complete without errors load the package.
The package includes some IRS 990 Part VII data for development and demonstration purposes.
Data Dictionary:
The following internal datasets are used by functions within the package for mapping and disambiguation purposes:
Disambiguation Lists:
Status Words:
Titles are often accompanied by some adjective describing a temporary or temporal period in which the person holds the title. These phrases are removed from the titles and flags are created for each.
These are used in the gen_status_codes() step:
https://github.com/Nonprofit-Open-Data-Collective/titleclassifier/blob/main/R/06-gen-status-codes.R
Demo Data:
Sample datasets for development and testing purposes:
The package includes additional data that is used by functions for cleaning and assignment processes.
NOTE: We need to export these:
These data are imported into the package via Google Sheets:
[reference google sheet cases]
There are approximately 100 meaningful titles in a sparse taxonomy of key leadership positions disclosed on the Directors, Trustees, and Key Employees section of the 990 forms. The title field appears as raw text, however, so there are no constraints on what nonprofits can write in the field. As a result, there are over 900,000 unique strings in the database.
# create sample of 100 observations for testing
set.seed( 1234 )
df.sample <- sample_n( titleclassifier::tinypartvii, 100 )
df <-
df.sample %>%
standardize_df() %>%
remove_dates() %>%
standardize_conj() %>% # condition has length > 1
split_titles() %>%
standardize_spelling() %>% # condition has length > 1
gen_status_codes() %>%
standardize_titles() %>%
categorize_titles() %>%
conditional_logic()
## [1] "standardize df step complete"
## [1] "remove dates step complete"
## [1] "standardize conjunctions step complete"
## [1] "split titles step complete"
## [1] "standardize spelling step complete"
## [1] "generate status codes step complete"
## [1] "standardize titles step complete"
## [1] "categorize titles step complete"
## [1] "conditional logic step complete"
Why is this one split?
# df[ df$F9_07_COMP_DTK_NAME_PERS == "Pat L Wheeler" , ]
# These need to be unique otherwise duplicates columns
# PRESIDENT --> CEO
# PRESIDENT --> BOARD PRESIDENT
googlesheets4::gs4_deauth()
df.standard <- googlesheets4::read_sheet( "1iYEY2HYDZTV0uvu35UuwdgAUQNKXSyab260pPPutP1M",
sheet="title-standardization", range="A:B",
col_types = "c" ) # c = character
length( df.standard$title.variant )
## [1] 1179
## [1] 1142
## [1] 1179
## [1] 1158
# [1] 696
# PROBLEM CASES
df.standard %>%
group_by( title.variant ) %>%
filter( n() > 1 ) %>%
arrange( title.variant ) %>%
pander()
title.variant | title.standard |
---|---|
ASSISTANT DIRECTOR | ASSISTANT DIRECTOR |
ASSISTANT DIRECTOR | BOARD MEMBER |
ASSOCIATE DIRECTOR | ASSOCIATE DIRECTOR |
ASSOCIATE DIRECTOR | BOARD MEMBER |
CHANCELLOR | BOARD PRESIDENT |
CHANCELLOR | CEO |
CHIEF OF OPERATIONS AND FINANCE | CFO |
CHIEF OF OPERATIONS AND FINANCE | COO |
DIRECTOR OF OPERATIONS | COO |
DIRECTOR OF OPERATIONS | DIRECTOR OF OPERATIONS |
EDUCATION DIRECTOR | BOARD CHAIR |
EDUCATION DIRECTOR | DIRECTOR OF EDUCATION |
FINANCE DIRECTOR | BOARD CHAIR |
FINANCE DIRECTOR | DIRECTOR OF FINANCE |
FINANCE OFFICER | BOARD TREASURER |
FINANCE OFFICER | FINANCE OFFICER |
OPERATIONS DIRECTOR | COO |
OPERATIONS DIRECTOR | DIRECTOR OF OPERATIONS |
OPERATIONS MANAGER | COO |
OPERATIONS MANAGER | OPERATIONS MANAGER |
REGIONAL DIRECTOR | BOARD MEMBER |
REGIONAL DIRECTOR | REGIONAL DIRECTOR |
REGIONAL VICE PRESIDENT | BOARD VICE PRESIDENT |
REGIONAL VICE PRESIDENT | REGIONAL VICE PRESIDENT |
RVP | BOARD VICE PRESIDENT |
RVP | REGIONAL VICE PRESIDENT |
V CHAIR | BOARD CHAIR |
V CHAIR | BOARD VICE PRESIDENT |
VICE PRESIDENT OF OPERATIONS | COO |
VICE PRESIDENT OF OPERATIONS | VICE PRESIDENT OF OPERATIONS |
VICE PRESIDENT OF PROGRAMS | PROGRAMS VICE PRESIDENT |
VICE PRESIDENT OF PROGRAMS | VICE PRESIDENT OF PROGRAMS |
Why are these cases uncoded?
keep <- c("PNAME", "tot.hours", "tot.comp", "title.raw",
"title.standard", "title.v6", "title.v6" )
df[ is.na( df$title.standard) , keep ]
Spacing is problematic - causing merges to break on some cases like PRESIDENT & CEO.
## NULL
# [1] "VICE PRESIDENT OF COMMUNICATIONS LEADERSHIP" "BOARD MEMBER TREASURER AND CFO"
# [3] "VICE PRESIDENT OF SHARED SERVICES" "CHIEF MEDICAL OFFICER"
# [5] " EXECUTIVE" " EXECUTIVE"
# [7] "SHOW ANNOUNCER" "PRESIDENT "
# [9] "ASSISTANT VICE PRESIDENT BASEBALL" "BOARD OF DIRECTOR"
# [11] " PRESIDENT" " CEO"
# [13] " PRESIDENT" "BOARD CHAIR "
# [15] "HH CHAIR" "RECORDER"
# [17] "DIRECTOR " " PRESIDENT"
# [19] "STEWARD"
## [1] "standardize df step complete"
Many titles include dates to signify the start or end of the period that a person holds a position. For example:
remove_dates()
¦
+-- convert_ordinal()
+-- identify_date()
+-- remove_date()
## [1] "standardize df step complete"
## [1] "remove dates step complete"
x |
---|
DIRECTOR (UNTIL 11/18) –> DIRECTOR UNTIL |
SECRETARY –> SECRETARY |
Secretary –> SECRETARY |
BOARD MEMBER –> BOARD MEMBER |
DIRECTOR –> DIRECTOR |
DIRECTOR –> DIRECTOR |
BOARD OF DIRECTOR –> BOARD OF DIRECTOR |
DIRECTOR –> DIRECTOR |
Secretary –> SECRETARY |
Director –> DIRECTOR |
Director –> DIRECTOR |
CHAIRMAN –> CHAIRMAN |
Director –> DIRECTOR |
DIRECTOR –> DIRECTOR |
NATIONAL EXECUTIVE DIRECTOR –> NATIONAL EXECUTIVE DIRECTOR |
PAST PRESIDENT –> PAST PRESIDENT |
DIRECTOR –> DIRECTOR |
Treasurer –> TREASURER |
CHAIRMAN –> CHAIRMAN |
DIRECTOR –> DIRECTOR |
DIRECTOR –> DIRECTOR |
DIRECTOR –> DIRECTOR |
Board Member Treasurer and CFO –> BOARD MEMBER TREASURER AND CFO |
TREASURER/CH –> TREASURER/CH |
ASST. VP - BASEBALL –> ASST VP - BASEBALL |
Director –> DIRECTOR |
BOARD MEMBER –> BOARD MEMBER |
VICE CHAIRMAN/DIRECTOR –> VICE CHAIRMAN/DIRECTOR |
TRUSTEE –> TRUSTEE |
ASST. SEC/TR –> ASST SEC/TR |
DIRECTOR –> DIRECTOR |
TREASURER –> TREASURER |
SECRETARY –> SECRETARY |
DIRECTOR –> DIRECTOR |
Secretary-Treas –> SECRETARY-TREAS |
SECRETARY/TR –> SECRETARY/TR |
VICE PRESIDENT –> VICE PRESIDENT |
PRESIDENT –> PRESIDENT |
Secretary –> SECRETARY |
President –> PRESIDENT |
BOARD MEMBER –> BOARD MEMBER |
DIRECTOR –> DIRECTOR |
TRUSTEE –> TRUSTEE |
Vice President –> VICE PRESIDENT |
Director –> DIRECTOR |
STEWARD –> STEWARD |
VP OF COMMUNITY LEADERSHIP –> VP OF COMMUNITY LEADERSHIP |
FORMER EXEC. –> FORMER EXEC |
SECRETARY –> SECRETARY |
DIRECTOR –> DIRECTOR |
DIRECTOR –> DIRECTOR |
DIRECTOR –> DIRECTOR |
DIRECTOR –> DIRECTOR |
BOARD MEMBER –> BOARD MEMBER |
BOARD MEMBER –> BOARD MEMBER |
EXECUTIVE VICE PRESIDENT –> EXECUTIVE VICE PRESIDENT |
DIRECTOR –> DIRECTOR |
BOARD MEMBER –> BOARD MEMBER |
VP OF SHARED SERVICES –> VP OF SHARED SERVICES |
BOARD CHAIR (through 09/18) –> BOARD CHAIR THROUGH |
IMMEDIATE PAST PRESIDENT –> IMMEDIATE PAST PRESIDENT |
DIRECTOR –> DIRECTOR |
Treasurer –> TREASURER |
Director –> DIRECTOR |
President –> PRESIDENT |
DIRECTOR –> DIRECTOR |
Director –> DIRECTOR |
PRESIDENT –> PRESIDENT |
DIRECTOR –> DIRECTOR |
BOARD CHAIR –> BOARD CHAIR |
VICE PRESIDENT –> VICE PRESIDENT |
CHIEF EXECUTIVE OFFICER/CHIEF MEDICAL OFFICER –> CHIEF EXECUTIVE OFFICER/CHIEF MEDICAL OFFICER |
Vice President –> VICE PRESIDENT |
BOARD MEMBER –> BOARD MEMBER |
BOARD MEMBER –> BOARD MEMBER |
RECORDER –> RECORDER |
Secretary –> SECRETARY |
Show Announcer –> SHOW ANNOUNCER |
Director –> DIRECTOR |
INTERIM EXEC –> INTERIM EXEC |
President –> PRESIDENT |
SGT AT ARMS –> SGT AT ARMS |
PRESIDENT –> PRESIDENT |
DIRECTOR –> DIRECTOR |
SECRETARY –> SECRETARY |
Director –> DIRECTOR |
TREASURER –> TREASURER |
PRESIDENT –> PRESIDENT |
BOARD MEMBER –> BOARD MEMBER |
EXECUTIVE VICE PRESIDENT –> EXECUTIVE VICE PRESIDENT |
Board Chairman –> BOARD CHAIRMAN |
HH DISTRICT CHAIRMAN –> HH DISTRICT CHAIRMAN |
PAST PRESIDE –> PAST PRESIDE |
VICE PRESIDE –> VICE PRESIDE |
CHAIRMAN –> CHAIRMAN |
PRESIDENT ELECT –> PRESIDENT ELECT |
Trustee –> TRUSTEE |
DIRECTOR –> DIRECTOR |
TRUSTEE –> TRUSTEE |
Executive Director –> EXECUTIVE DIRECTOR |
Convert all conjugations within titles to consistent formats.
standardize_conj( title = "TitleTxt2" )
¦
+-- standardize_and()
+-- standardize_to()
+-- standardize_of()
+-- standardize_comma()
+-- standardize_slash()
+-- standardize_and() #repeated bc of possible standardization changes
+-- standardize_separator()
## [1] "standardize df step complete"
## [1] "remove dates step complete"
## [1] "standardize conjunctions step complete"
TITLE_RAW | TitleTxt2 | TitleTxt3 |
---|---|---|
DIRECTOR (UNTIL 11/18) | DIRECTOR UNTIL | DIRECTOR UNTIL |
SECRETARY | SECRETARY | SECRETARY |
Secretary | SECRETARY | SECRETARY |
BOARD MEMBER | BOARD MEMBER | BOARD MEMBER |
DIRECTOR | DIRECTOR | DIRECTOR |
DIRECTOR | DIRECTOR | DIRECTOR |
BOARD OF DIRECTOR | BOARD OF DIRECTOR | BOARD OF DIRECTOR |
DIRECTOR | DIRECTOR | DIRECTOR |
Secretary | SECRETARY | SECRETARY |
Director | DIRECTOR | DIRECTOR |
Director | DIRECTOR | DIRECTOR |
CHAIRMAN | CHAIRMAN | CHAIRMAN |
Director | DIRECTOR | DIRECTOR |
DIRECTOR | DIRECTOR | DIRECTOR |
NATIONAL EXECUTIVE DIRECTOR | NATIONAL EXECUTIVE DIRECTOR | NATIONAL EXECUTIVE DIRECTOR |
PAST PRESIDENT | PAST PRESIDENT | PAST PRESIDENT |
DIRECTOR | DIRECTOR | DIRECTOR |
Treasurer | TREASURER | TREASURER |
CHAIRMAN | CHAIRMAN | CHAIRMAN |
DIRECTOR | DIRECTOR | DIRECTOR |
DIRECTOR | DIRECTOR | DIRECTOR |
DIRECTOR | DIRECTOR | DIRECTOR |
Board Member Treasurer and CFO | BOARD MEMBER TREASURER AND CFO | BOARD MEMBER TREASURER & CFO |
TREASURER/CH | TREASURER/CH | TREASURER & CH |
ASST. VP - BASEBALL | ASST VP - BASEBALL | ASST VP BASEBALL |
Director | DIRECTOR | DIRECTOR |
BOARD MEMBER | BOARD MEMBER | BOARD MEMBER |
VICE CHAIRMAN/DIRECTOR | VICE CHAIRMAN/DIRECTOR | VICE CHAIRMAN & DIRECTOR |
TRUSTEE | TRUSTEE | TRUSTEE |
ASST. SEC/TR | ASST SEC/TR | ASST SEC & TR |
DIRECTOR | DIRECTOR | DIRECTOR |
TREASURER | TREASURER | TREASURER |
SECRETARY | SECRETARY | SECRETARY |
DIRECTOR | DIRECTOR | DIRECTOR |
Secretary-Treas | SECRETARY-TREAS | SECRETARY-TREAS |
SECRETARY/TR | SECRETARY/TR | SECRETARY & TR |
VICE PRESIDENT | VICE PRESIDENT | VICE PRESIDENT |
PRESIDENT | PRESIDENT | PRESIDENT |
Secretary | SECRETARY | SECRETARY |
President | PRESIDENT | PRESIDENT |
BOARD MEMBER | BOARD MEMBER | BOARD MEMBER |
DIRECTOR | DIRECTOR | DIRECTOR |
TRUSTEE | TRUSTEE | TRUSTEE |
Vice President | VICE PRESIDENT | VICE PRESIDENT |
Director | DIRECTOR | DIRECTOR |
STEWARD | STEWARD | STEWARD |
VP OF COMMUNITY LEADERSHIP | VP OF COMMUNITY LEADERSHIP | VP OF COMMUNITY LEADERSHIP |
FORMER EXEC. | FORMER EXEC | FORMER EXEC |
SECRETARY | SECRETARY | SECRETARY |
DIRECTOR | DIRECTOR | DIRECTOR |
DIRECTOR | DIRECTOR | DIRECTOR |
DIRECTOR | DIRECTOR | DIRECTOR |
DIRECTOR | DIRECTOR | DIRECTOR |
BOARD MEMBER | BOARD MEMBER | BOARD MEMBER |
BOARD MEMBER | BOARD MEMBER | BOARD MEMBER |
EXECUTIVE VICE PRESIDENT | EXECUTIVE VICE PRESIDENT | EXECUTIVE VICE PRESIDENT |
DIRECTOR | DIRECTOR | DIRECTOR |
BOARD MEMBER | BOARD MEMBER | BOARD MEMBER |
VP OF SHARED SERVICES | VP OF SHARED SERVICES | VP OF SHARED SERVICES |
BOARD CHAIR (through 09/18) | BOARD CHAIR THROUGH | BOARD CHAIR THROUGH |
IMMEDIATE PAST PRESIDENT | IMMEDIATE PAST PRESIDENT | IMMEDIATE PAST PRESIDENT |
DIRECTOR | DIRECTOR | DIRECTOR |
Treasurer | TREASURER | TREASURER |
Director | DIRECTOR | DIRECTOR |
President | PRESIDENT | PRESIDENT |
DIRECTOR | DIRECTOR | DIRECTOR |
Director | DIRECTOR | DIRECTOR |
PRESIDENT | PRESIDENT | PRESIDENT |
DIRECTOR | DIRECTOR | DIRECTOR |
BOARD CHAIR | BOARD CHAIR | BOARD CHAIR |
VICE PRESIDENT | VICE PRESIDENT | VICE PRESIDENT |
CHIEF EXECUTIVE OFFICER/CHIEF MEDICAL OFFICER | CHIEF EXECUTIVE OFFICER/CHIEF MEDICAL OFFICER | CHIEF EXECUTIVE OFFICER & CHIEF MEDICAL OFFICER |
Vice President | VICE PRESIDENT | VICE PRESIDENT |
BOARD MEMBER | BOARD MEMBER | BOARD MEMBER |
BOARD MEMBER | BOARD MEMBER | BOARD MEMBER |
RECORDER | RECORDER | RECORDER |
Secretary | SECRETARY | SECRETARY |
Show Announcer | SHOW ANNOUNCER | SHOW ANNOUNCER |
Director | DIRECTOR | DIRECTOR |
INTERIM EXEC | INTERIM EXEC | INTERIM EXEC |
President | PRESIDENT | PRESIDENT |
SGT AT ARMS | SGT AT ARMS | SGT AT ARMS |
PRESIDENT | PRESIDENT | PRESIDENT |
DIRECTOR | DIRECTOR | DIRECTOR |
SECRETARY | SECRETARY | SECRETARY |
Director | DIRECTOR | DIRECTOR |
TREASURER | TREASURER | TREASURER |
PRESIDENT | PRESIDENT | PRESIDENT |
BOARD MEMBER | BOARD MEMBER | BOARD MEMBER |
EXECUTIVE VICE PRESIDENT | EXECUTIVE VICE PRESIDENT | EXECUTIVE VICE PRESIDENT |
Board Chairman | BOARD CHAIRMAN | BOARD CHAIRMAN |
HH DISTRICT CHAIRMAN | HH DISTRICT CHAIRMAN | HH DISTRICT CHAIRMAN |
PAST PRESIDE | PAST PRESIDE | PAST PRESIDE |
VICE PRESIDE | VICE PRESIDE | VICE PRESIDE |
CHAIRMAN | CHAIRMAN | CHAIRMAN |
PRESIDENT ELECT | PRESIDENT ELECT | PRESIDENT ELECT |
Trustee | TRUSTEE | TRUSTEE |
DIRECTOR | DIRECTOR | DIRECTOR |
TRUSTEE | TRUSTEE | TRUSTEE |
Executive Director | EXECUTIVE DIRECTOR | EXECUTIVE DIRECTOR |
split_titles()
¦
+-- split_title()
¦
¦ +-- unmingle()
¦
¦ +-- check_hunspell()
¦ +-- standardize_space()
## [1] "standardize df step complete"
## [1] "remove dates step complete"
## [1] "standardize conjunctions step complete"
## [1] "split titles step complete"
df[ c("F9_07_COMP_DTK_NAME_PERS","Num.Titles","TITLE_RAW","TitleTxt4","TitleTxt3") ] %>%
dplyr::rename( PNAME = F9_07_COMP_DTK_NAME_PERS ) %>%
group_by( PNAME ) %>%
mutate( N=max(Num.Titles) ) %>%
dplyr::arrange( desc(N), PNAME )
[ change “substitute_” to “fix_” in function names ? ]
standardize_spelling()
¦
+-- fix_spelling()
¦ +-- substitute_vice()
¦ +-- substitute_executive(TitleTxt)
¦ +-- substitute_director(TitleTxt)
¦ +-- substitute_operations(TitleTxt)
¦ +-- substitute_assistant(TitleTxt)
¦ +-- substitute_president(TitleTxt)
¦ +-- substitute_secretary(TitleTxt)
¦ +-- substitute_treasurer(TitleTxt)
¦ +-- substitute_finance(TitleTxt)
¦ +-- substitute_senior(TitleTxt)
¦ +-- substitute_development(TitleTxt)
¦ +-- substitute_chair(TitleTxt)
¦ +-- substitute_officer(TitleTxt)
¦ +-- substitute_admin(TitleTxt)
¦ +-- substitute_coordinator(TitleTxt)
¦ +-- substitute_strategy(TitleTxt) #strategy/strategic
¦ +-- substitute_hr(TitleTxt)
¦ +-- substitute_manage(TitleTxt) #management/managing/manager
¦ +-- substitute_programs(TitleTxt) #programs/programming
¦ +-- substitute_projects(TitleTxt)
¦ +-- substitute_public(TitleTxt)
¦ +-- substitute_business(TitleTxt)
¦ +-- substitute_comm(TitleTxt) #communication/committee
¦ +-- substitute_information(TitleTxt)
¦ +-- substitute_intelligence(TitleTxt)
¦ +-- substitute_technology(TitleTxt)
¦ +-- substitute_institute(TitleTxt) #institute/institutional
¦ +-- substitute_academics(TitleTxt) #academics/academy
¦ +-- substitute_marketing(TitleTxt)
¦ +-- substitute_advancement(TitleTxt)
¦ +-- substitute_philanthropy(TitleTxt)
¦ +-- substitute_systems(TitleTxt)
¦ +-- substitute_general(TitleTxt)
¦ +-- substitute_planning(TitleTxt) #planning/planned
¦ +-- substitute_compliance(TitleTxt)
¦ +-- substitute_enrollment(TitleTxt)
¦ +-- substitute_admissions(TitleTxt)
¦ +-- substitute_deputy(TitleTxt)
¦ +-- substitute_corresponding(TitleTxt) #correspondent/corresponding
¦ +-- substitute_emeritus(TitleTxt)
¦ +-- substitute_relations(TitleTxt)
¦ +-- substitute_representative(TitleTxt)
¦ +-- substitute_board(TitleTxt)
¦ +-- substitute_transportation(TitleTxt)
¦ +-- substitute_exofficio(TitleTxt)
¦ +-- substitute_atlarge(TitleTxt)
¦ +-- substitute_member(TitleTxt)
¦ +-- substitute_governor(TitleTxt)
¦ +-- condense_abbreviations(TitleTxt)
¦ +-- substitute_miscellaneous(TitleTxt)
¦ +-- remove_trailing_conjunctions(TitleTxt)
¦ +-- stand_titles(TitleTxt)
¦ +-- fix_of(TitleTxt)
¦ +-- spellcheck(TitleTxt) #slows things down, but is useful
df <-
df.sample %>%
standardize_df() %>%
remove_dates() %>%
standardize_conj() %>%
split_titles() %>%
standardize_spelling()
## [1] "standardize df step complete"
## [1] "remove dates step complete"
## [1] "standardize conjunctions step complete"
## [1] "split titles step complete"
## [1] "standardize spelling step complete"
NOTES:
VICE PRESIDENT MEMBERSHIP –> VICE PRESIDENT OF MEMBERSHPIP? VICE PRESIDENT VOLUNTEERS –> VICE PRESIDENT OF VOLUNTEERS?
Inspect all to see if there are cases where spelling was not corrected and it should be.
TitleTxt4 | TitleTxt5 | |
---|---|---|
1 | DIRECTOR UNTIL | DIRECTOR UNTIL |
2 | SECRETARY | SECRETARY |
3 | SECRETARY | SECRETARY |
4 | BOARD MEMBER | BOARD MEMBER |
5 | DIRECTOR | DIRECTOR |
6 | DIRECTOR | DIRECTOR |
7 | BOARD OF DIRECTOR | BOARD OF DIRECTOR |
8 | DIRECTOR | DIRECTOR |
9 | SECRETARY | SECRETARY |
10 | DIRECTOR | DIRECTOR |
11 | DIRECTOR | DIRECTOR |
12 | CHAIRMAN | CHAIR |
13 | DIRECTOR | DIRECTOR |
14 | DIRECTOR | DIRECTOR |
15 | NATIONAL EXECUTIVE DIRECTOR | NATIONAL CEO |
16 | PAST PRESIDENT | PAST PRESIDENT |
17 | DIRECTOR | DIRECTOR |
18 | TREASURER | TREASURER |
19 | CHAIRMAN | CHAIR |
20 | DIRECTOR | DIRECTOR |
21 | DIRECTOR | DIRECTOR |
22 | DIRECTOR | DIRECTOR |
23 | BOARD MEMBER TREASURER | BOARD MEMBER TREASURER |
23.1 | CFO | CFO |
24 | TREASURER | TREASURER |
24.1 | CH | CHAIR |
25 | ASST VP BASEBALL | ASSISTANT VICE PRESIDENT BASEBALL |
26 | DIRECTOR | DIRECTOR |
27 | BOARD MEMBER | BOARD MEMBER |
28 | VICE CHAIRMAN | VICE CHAIR |
28.1 | DIRECTOR | DIRECTOR |
29 | TRUSTEE | TRUSTEE |
30 | ASST SEC | ASSISTANT SECRETARY |
30.1 | TR | TREASURER |
31 | DIRECTOR | DIRECTOR |
32 | TREASURER | TREASURER |
33 | SECRETARY | SECRETARY |
34 | DIRECTOR | DIRECTOR |
35 | SECRETARY-TREAS | SECRETARY-TREASURER |
36 | SECRETARY | SECRETARY |
36.1 | TR | TREASURER |
37 | VICE PRESIDENT | VICE PRESIDENT |
38 | PRESIDENT | PRESIDENT |
39 | SECRETARY | SECRETARY |
40 | PRESIDENT | PRESIDENT |
41 | BOARD MEMBER | BOARD MEMBER |
42 | DIRECTOR | DIRECTOR |
43 | TRUSTEE | TRUSTEE |
44 | VICE PRESIDENT | VICE PRESIDENT |
45 | DIRECTOR | DIRECTOR |
46 | STEWARD | STEWARD |
47 | VP OF COMMUNITY LEADERSHIP | VICE PRESIDENT OF COMMUNITY LEADERSHIP |
48 | FORMER EXEC | FORMER EXECUTIVE |
49 | SECRETARY | SECRETARY |
50 | DIRECTOR | DIRECTOR |
51 | DIRECTOR | DIRECTOR |
52 | DIRECTOR | DIRECTOR |
53 | DIRECTOR | DIRECTOR |
54 | BOARD MEMBER | BOARD MEMBER |
55 | BOARD MEMBER | BOARD MEMBER |
56 | EXECUTIVE VICE PRESIDENT | EXECUTIVE VICE PRESIDENT |
57 | DIRECTOR | DIRECTOR |
58 | BOARD MEMBER | BOARD MEMBER |
59 | VP OF SHARED SERVICES | VICE PRESIDENT OF SHARED SERVICES |
60 | BOARD CHAIR THROUGH | BOARD CHAIR THROUGH |
61 | IMMEDIATE PAST PRESIDENT | IMMEDIATE PAST PRESIDENT |
62 | DIRECTOR | DIRECTOR |
63 | TREASURER | TREASURER |
64 | DIRECTOR | DIRECTOR |
65 | PRESIDENT | PRESIDENT |
66 | DIRECTOR | DIRECTOR |
67 | DIRECTOR | DIRECTOR |
68 | PRESIDENT | PRESIDENT |
69 | DIRECTOR | DIRECTOR |
70 | BOARD CHAIR | BOARD CHAIR |
71 | VICE PRESIDENT | VICE PRESIDENT |
72 | CHIEF EXECUTIVE OFFICER | CEO |
72.1 | CHIEF MEDICAL OFFICER | CHIEF MEDICAL OFFICER |
73 | VICE PRESIDENT | VICE PRESIDENT |
74 | BOARD MEMBER | BOARD MEMBER |
75 | BOARD MEMBER | BOARD MEMBER |
76 | RECORDER | RECORDER |
77 | SECRETARY | SECRETARY |
78 | SHOW ANNOUNCER | SHOW ANNOUNCER |
79 | DIRECTOR | DIRECTOR |
80 | INTERIM EXEC | INTERIM EXECUTIVE |
81 | PRESIDENT | PRESIDENT |
82 | SGT AT ARMS | SERGEANT AT ARMS |
83 | PRESIDENT | PRESIDENT |
84 | DIRECTOR | DIRECTOR |
85 | SECRETARY | SECRETARY |
86 | DIRECTOR | DIRECTOR |
87 | TREASURER | TREASURER |
88 | PRESIDENT | PRESIDENT |
89 | BOARD MEMBER | BOARD MEMBER |
90 | EXECUTIVE VICE PRESIDENT | EXECUTIVE VICE PRESIDENT |
91 | BOARD CHAIRMAN | BOARD CHAIR |
92 | HH DISTRICT CHAIRMAN | HH DISTRICT CHAIR |
93 | PAST PRESIDE | PAST PRESIDENT |
94 | VICE PRESIDE | VICE PRESIDENT |
95 | CHAIRMAN | CHAIR |
96 | PRESIDENT ELECT | PRESIDENT ELECT |
97 | TRUSTEE | TRUSTEE |
98 | DIRECTOR | DIRECTOR |
99 | TRUSTEE | TRUSTEE |
100 | EXECUTIVE DIRECTOR | CEO |
gen_status_codes()
¦
+-- categorize_miscellaneous()
¦
¦ +-- identify_sched_o()
¦ +-- identify_at_large()
¦ +-- identify_as_needed()
¦ +-- identify_ex_officio
¦ +-- identify_co
¦ +-- filter_regional
¦ +-- remove_numbers
¦
+-- categorize_qualifiers()
¦ +-- standardize_qualifiers()
df <-
df.sample %>%
standardize_df() %>%
remove_dates() %>%
standardize_conj() %>%
split_titles() %>%
standardize_spelling() %>%
gen_status_codes()
## [1] "standardize df step complete"
## [1] "remove dates step complete"
## [1] "standardize conjunctions step complete"
## [1] "split titles step complete"
## [1] "standardize spelling step complete"
## [1] "generate status codes step complete"
Status codes:
df$REGIONAL.X <- as.numeric( df$REGIONAL.X)
df$FUTURE.X <- as.numeric( df$FUTURE.X )
df$FORMER.X <- as.numeric( df$FORMER.X )
df$INTERIM.X <- as.numeric( df$INTERIM.X )
df$OUTGOING.X <- as.numeric( df$OUTGOING.X )
keep <-
c("Num.Titles", "TITLE_RAW", "TitleTxt6",
"DATE.X", "SCHED.O.X", "AT.LARGE.X",
"EXOFFICIO.X", "CO.X",
"REGIONAL.X", "FUTURE.X", "FORMER.X", "INTERIM.X", "OUTGOING.X")
df[ keep ]
NOTES:
Make all status code boolean.
Map all of the current titles onto a standardized list.
standardize_titles()
googlesheets4::gs4_deauth()
df.standard <- googlesheets4::read_sheet( "1iYEY2HYDZTV0uvu35UuwdgAUQNKXSyab260pPPutP1M",
sheet="title-standardization", range="A:B",
col_types = "c" ) # c = character
df.standard <- df.standard[ c("title.standard","title.variant") ]
df.standard %>%
unique( ) %>%
arrange( title.standard, title.variant )
df <-
df.sample %>%
standardize_df() %>%
remove_dates() %>%
standardize_conj() %>% # condition has length > 1
split_titles() %>%
standardize_spelling() %>% # condition has length > 1
gen_status_codes() %>%
standardize_titles()
## [1] "standardize df step complete"
## [1] "remove dates step complete"
## [1] "standardize conjunctions step complete"
## [1] "split titles step complete"
## [1] "standardize spelling step complete"
## [1] "generate status codes step complete"
## [1] "standardize titles step complete"
Apply the category codes for domain and status hierarchies to the standardized versions of the title.
categorize_titles()
df <-
df.sample %>%
standardize_df() %>%
remove_dates() %>%
standardize_conj() %>% # condition has length > 1
split_titles() %>%
standardize_spelling() %>% # condition has length > 1
gen_status_codes() %>%
standardize_titles() %>%
categorize_titles()
## [1] "standardize df step complete"
## [1] "remove dates step complete"
## [1] "standardize conjunctions step complete"
## [1] "split titles step complete"
## [1] "standardize spelling step complete"
## [1] "generate status codes step complete"
## [1] "standardize titles step complete"
## [1] "categorize titles step complete"
Applies conditional logic to further refine the dataset. For example, checks organizations with multiple CEOs if they are actually multiple CEOs or if it’s a coding mistake. Uses suborganization categories (e.g. pay rank, hours worked) to determine if recategorization is necessary.
conditional_logic()
¦
+-- clean_up_ceos()
+-- director_correction()
df <-
df.sample %>%
standardize_df() %>%
remove_dates() %>%
standardize_conj() %>% # condition has length > 1
split_titles() %>%
standardize_spelling() %>% # condition has length > 1
gen_status_codes() %>%
standardize_titles() %>%
categorize_titles() %>%
conditional_logic()
## [1] "standardize df step complete"
## [1] "remove dates step complete"
## [1] "standardize conjunctions step complete"
## [1] "split titles step complete"
## [1] "standardize spelling step complete"
## [1] "generate status codes step complete"
## [1] "standardize titles step complete"
## [1] "categorize titles step complete"
## [1] "conditional logic step complete"
Not fully implemented yet in this package, but this is similarly linked to the Google Sheets data dictionary document.