Overview

TABLE OF CONTENTS:

Overview
Module 1 - Motivating the Bootcamp
Module 2 - R fundamentals + Base Graphics
Module 3 - Data, Packages, and APIs
Module 4 - R Shiny & Simulation
Module 5 - Data-Driven Docs, Dashboards, and Reproducibility
Module 6 - Text Analysis

This is a website to archive resources and sessions from a free Bootcamp for R series offered by the Data and Analytics Section at ARNOVA.

It’s easy when you start out programming to get really frustrated and think, “Oh it’s me, I’m really stupid,” or, “I’m not made out to program.” But, that is absolutely not the case. Everyone gets frustrated. I still get frustrated occasionally when writing R code. It’s just a natural part of programming. So, it happens to everyone and gets less and less over time. Don’t blame yourself. Just take a break, do something fun, and then come back and try again later.

~Hadley Wickham Advice to young and old programmers

Module 1 - Motivating the Bootcamp

[ SLIDES ]

Base R
Data Wrangling
Descriptive Statistics
state2R syntax translations

Why Learn R?
- Efficiency
- Public stewardship
- Reproducibility crisis
How should I learn R?
- Where to start?
- R versus Python versus others
- Resources and support

Module 2 - R fundamentals + Base Graphics

DEMO SCRIPT

Cheat Sheets

Basics

Groups

Data Wrangling

Course Notes

Cheat Sheets

Graphics

Example

compensation report

Module 3 - Data, Packages, and APIs

There are over 20,000 packages available in R, which is great when you are a power user and know enough to leverage all of these tools, but it can be overwhelming when you are just getting started.

Here are some resources that give a broad overview of R packages by topic. A good place to start is in the Tidyverse, a set of packages that were written by the same set of authors using the same syntax “grammar” so that they all work well together. Many of these are wrapper packages that provide new names and consistent argument conventions for existing R functions.

You will find notes on data input/output operations in R so that you can load external datasets and save results when you are done with sessions. The dplyr package is one of the key tools you will use for data cleaning, joins, and refinement as you are preparing your data for analysis. APIs are also powerful tools that let you import data from external databases using a few lines of code.

R Packages

Data Wrangling

Data IO

Import
Export
R Data Formats: RDS, Rdata
Open Formats: CSV, JSON, XML, ASCII
Proprietary Formats (load with a package): Excel .xls, Stata .dta, SPSS .sav
Copy/Paste from Excel with Clipboard

APIs

Module 4 - R Shiny & Simulation

DYNAMIC WEB APPS

R SHINY

R shiny replaces full-stack development with a single package. Shiny functions convert R objects to HTML+Javascript objects.

SIMULATION WITH LOOPS

Module 5 - Data-Driven Docs, Dashboards, and Reproducibility

Markdown is a simple text formatting convention (it is a few simple rules, not really a language). But it was inspired by HTML - the Hyper-Text Markup Langauge that is used to format content on websites. The tongue-in-cheek name comes from the fact that markdown is a dumbed-down version of markup.

Visit this simple markdown tutorial and you will see that it consists of about a dozen rules for formatting text. Raw text files with these formatting tags are saved as .md files, which can be rendered into HTML, DOC, or PDF files that contain nicely-formatted text. Markdown is designed to be simple and parsimonious (it takes about 20 minutes to learn the dozen basic rules), and is used widely for documenting open source projects and creating tutorials. Platforms like GitHub use markdown extensively - all .md files will rendered into nicely formatted text files automatically.

R Markdown is one of many extensions of basic markdown. Normal regular markdown files often contain examples of code:

# example code
z <- x + y 

Markdown documents will format the code so that it is easy to read but it will not actually run the code. R Markdown will format your text as well as execute any R code in your file and embed the output (results, tables, graphics) in your document. In this way they are true data-driven documents in that they contain all of the information needed to produce a report or analysis.

The scientific community is starting to embrace data-driven documents because they solve important problems regarding reproducibility. Specifically, historically the peer-review process has required authors to explain the process of dataset construction and statisticaly modeling a very high level. Reviewers must trust that the authors know what they are doing and have not made serious mistakes. As the amount of analysis included in the typical project has grown scripts have increased in size and complexity. Data-driven documents allow authors to document and share details of the process instead of just saying, trust us - we know what we are doing. Additionally, it makes it easier to build from existing research by borrowing and extending code that exists instead of starting from scratch each time.

R Markdown has made the convention even more powerful by adding additional output types. Instead of rendering text + code as an HTML or PDF document you can ask that the analysis be output as things like Slide Decks or interactive Dashboards. This allows users to package the analysis in the format that is most appropriate for their audience.

To see some of the existing options check out:

In short, R Markdown is a stable publishing platform that is already incredibly powerful and continues to evolve. It is worthwhile investing some time in learning markdown formatting conventions and R Markdown (.RMD) files that can be run inside of R Studio.

Module 6 - Text Analysis

Text as Data

From: Cut and Paste Legislation

NOTES ON STRING PROCESSING IN R:

TEXT ANALYSIS PACKAGES IN R:

EXAMPLES