Day 4 — Thursday, January 12th, 2023
Thursday January 12th
- 1:30-2:00 Lecture: Diverse Data Sources (APIs [tidycensus, WHO, World Bank, qualtRics], scraping web data, tabulizer, tesseract, datapasta)
- 2:00-2:20 Discussion: What kind of sources are students interested in using in their research or future work?
- 2:20-2:50 Lecture: How to handle factors and date-times
- 2:50-3:00 Break
- 3:00-3:30 Lecture: Working with Regression Model Objects: constructing and analyzing them
- 3:30-4:15 Activity: Working with Regression Models in R
- 4:15-4:45 Lecture: Creating maps in R
- 4:45-5:00 Lecture: Reproducible Examples for Getting Help
- 5:00-5:30 Time to work on final presentation materials together, peruse recommended materials, chat with classmates
Homework:
- Fit and report on a regression model including categorical (factor) variables
- Peer Review for Homework 2
Recommended Materials
Remember! You don’t have to read all of this! Just focus on what’s most useful to you:
- Tidy Data by Hadley Wickham https://vita.had.co.nz/papers/tidy-data.pdf
- Diverse Data Sources
- The readme to the datapasta package: https://github.com/MilesMcBain/datapasta
- Analyzing US Census Data by Kyle Walker, Chapter 2: An introduction to tidycensus: https://walker-data.com/census-r/an-introduction-to-tidycensus.html
- The
readr
cheatsheet: https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-import.pdf - Working with Qualtrics Data - Part 1: Importing Data, ROpenSci https://ropensci.org/blog/2022/08/02/working-with-qualtrics-data-importing/
- Handling factors and date-times in R:
- Chapter 15: Factors, R for Data Science by Hadley Wickham and Garrett Grolemund https://r4ds.had.co.nz/factors.html
- Chapter 16: Dates and Times, R for Data Science by Hadley Wickham and Garrett Grolemund https://r4ds.had.co.nz/dates-and-times.html
- Forcats cheatsheet https://raw.githubusercontent.com/rstudio/cheatsheets/main/factors.pdf
- Lubridate cheatsheet https://raw.githubusercontent.com/rstudio/cheatsheets/main/lubridate.pdf
- Working with Regression Models:
- Introduction to
broom
https://broom.tidymodels.org/articles/broom.html - A nice introduction to linear model diagnostics plots: https://book.stat420.org/model-diagnostics.html
- Interpretation of R’s lm() output: https://stats.stackexchange.com/questions/5135/interpretation-of-rs-lm-output
- Introduction to
- Mapping:
- Chapter 8 Plotting Spatial Data, Spatial Data Science https://r-spatial.org/book/08-Plotting.html
- This focuses more on
sf
which is the most modern and increasingly most popular paradigm for working with spatial data in R
- This focuses more on
- Chapter 9 Making Maps with R, Geocomputation with R https://geocompr.robinlovelace.net/adv-map.html
- This chapter has a lot of focus on
tmap
, a package for creating thematic maps
- This chapter has a lot of focus on
- Chapter 8 Plotting Spatial Data, Spatial Data Science https://r-spatial.org/book/08-Plotting.html
Video Recording
Resources
link to daily google doc
link to PDF slides
Lecture 1: Diverse Data Sources
Lecture 2: Factors and Date-times
link to view slides fullscreen
link to follow along code
link to slide PDFs