Introduction

PENDING

  • Fix results files
  • improve results exploration
  • Include troubleshooting with examples

In this script we will geocode addresses that were not geocoded by the census service through the google geocode service. The new dataset will be saved as a new version of the main files NONPROFITS-2014-2019v4.rds and PEOPLE-2014-2019v4.rds.

STEPS

  1. Subsetting input files. We work with the NONPROFIT-2014-2019v3.rds and PEOPLE-2014-2019v3.rds to subset non-POB failed addresses into the input files NPOAddresses_google.rds and PPLAddresses_google.rds.
  2. Intro to the Google geocode service and demo.
  3. Geocoding NPO Addresses (NPOAddresses_google.rds). The script will yield raw output file NPOAddresses_googleGEO.rds. The geocoded addressese will then be integrated into a new version of the main file NONPROFIT-2014-2019v4.rds.
  4. Geocoding PPL Addresses (PPLAddresses_google.rds). The script will yield raw output file PPLAddresses_googleGEO.rds.The geocoded addressese will then be integrated into a new version of the main file PEOPLE-2014-2019v4.rds.
  5. Troubleshooting

NOTES

  • Geocoding can take several hours, for this reason some code chunks in this script are not evaluated. Outputs yielded from the process are loaded from stored files to ilustrate the results.

PACKAGES

2. Demo: The Google Geocoding Service

Sources: • https://lucidmanager.org/geocoding-with-ggmap/
https://www.wpgmaps.com/documentation/troubleshooting/this-api-project-is-not-authorized-to-use-this-api/https://www.rdocumentation.org/packages/ggmap/versions/3.0.0/topics/geocode

The google API receives data as a character vector of street addresses or place names (e.g. “1600 pennsylvania avenue, washington dc” or “Baylor University”) and returns lat and lon coordinates.

Even though there should be no costs (because we are using the free geocodes available), you will need to enter your credit card information. Google allows for 40,000 calls a month for free. WHERE CAN WE CHECK THIS if THE POLICY CHANGES?

Steps to set up a google API:

  1. Use your Google Account (or create one) and obtain an API key from Google. Follow the instructions in the Link.
  2. At the Google Cloud Platform, you will need to create a project to which you will then get an API. Once the project is created, follow the instructions and create an API Key.
  3. In your R project folder create a text file with your API, so that you can load it afterwards (e.g. I created a txt called “google.api”)
  4. If you do not have one, you will need to create a billing account to associate it with the project. At the Google Cloud Platform Console, click on the menu button > Billing, and follow the instructions to create a Billing account.
  5. As the last step, now you need to enable certain API services for the geocoding to work. Following these instructions
  6. Using the API Library enable the following three APIs:

    1. Google Maps JavaScript API
    2. Google Maps Geocoding API
    3. Google Maps Places API
  7. Once this is ready, you should try running the code
  8. Google recommends placing restrictions to your account, to limit the possibility of being charged.

Testing the google service on a small sample…

The output of the google geocode looks like this:

Table continues below
ID pob input_address lon lat
54520 0 2072 CARMEL RD NORTH, NEWBURGH, ME, 04444 -68.96 44.73
177709 1 PO BOX 358, ELFRIDA, AZ, 85610 -109.7 31.69
18991 0 1217 NW ROLLING ROCK RD, ANKENY, IA, 50023 -93.64 41.74
60118 0 2222 HOME PARK CIR W, JACKSONVILLE, FL, 32207 -81.63 30.3
148525 0 8783 MONCOVE LAKE RD, GAP MILLS, WV, 24941 -80.32 37.66
address
2072 carmel rd n, newburgh, me 04444, usa
elfrida, az 85610, usa
1217 nw rolling rock rd, ankeny, ia 50023, usa
2222 w home park cir, jacksonville, fl 32207, usa
8783 moncove lake rd, gap mills, wv 24941, usa

3. Geocoding NPO Addresses

Loading file

3.1 Preparing the files

We have 48206 addresses to geocode. This will have to be divided in two batches.

3.4 Exploring NPO Geocode Results

Lets take a look at the geolocations of our Board Members:

Summary of geocoding process

There are 263272 NPO listed. with 256527 unique addresses

  frequency percent
census 182042 69.15
google 37794 14.36
NA. 43436 16.5
TOTAL 263272 100

The following numbers of POBs

Non-POB POB
87.6 12.4

Summary of geocoding process excluding POBs:

  frequency percent
census 180999 78.5
google 37794 16.39
NA. 11783 5.11
TOTAL 230576 100

Summary of geocoding process for only POBs:

  frequency percent
census 1043 3.19
NA. 31653 96.81
TOTAL 32696 100

4. Geocoding PPL Addresses

Loading file

4.1 Preparing the files

We have 161067 addresses to geocode. This will have to be divided in five batches.

4.4 Exploring PPL Geocode Results

Summary of geocoding process

There are 946093 PPL listed. with 729304 unique addresses

  frequency percent
census 689983 72.93
google 149739 15.83
NA. 106371 11.24
TOTAL 946093 100

The following numbers of POBs

Non-POB POB
94.1 5.9

Summary of geocoding process excluding POBs:

  frequency percent
census 688679 77.33
google 149739 16.81
NA. 52152 5.856
TOTAL 890570 100

Summary of geocoding process for only POBs:

  frequency percent
census 1304 2.349
NA. 54219 97.65
TOTAL 55523 100

5. Troubleshooting

In the case a geocode process is aborted before finishing, you might need to geocode the process again. The code below helps to compile all geocode results into one.

ADD examples…

What happens if census has to be done multiple times? If the data breaks? How to manage? • Potential troubleshooting: IDs • Getting stuck and having to reset • Blank files returned • What data checks can we do to make sure the step is final? • Setting your computer to not sleep or turn harddrive off.