library(dplyr)
library(tidyr)
library(broom)
library(stringr)
library(tinyplot)
library(WDI)
tinytheme("ipsum",
family = "Roboto Condensed",
palette.qualitative = "Tableau 10",
palette.sequential = "agSunset")Chapter 1
Goals
Practice some basic skills before getting into the content.
Set up
Load packages and set your graph theme.
Practice
Let’s get (more or less) the same data the authors are using. You will need the {WDI} package from CRAN.1 I will fetch the data once and then save it locally. You can unfold the code to see how I did it if you want.
1 We will have a lot more countries using the WDI data directly.
Show code
rawdata <- WDI(indicator = "IT.NET.USER.ZS",
start = 2021,
end = 2021,
extra = TRUE)
saveRDS(rawdata, file = "data/WDI.rds")rawdata <- readRDS("data/WDI.rds")
d <- rawdata |>
filter(region != "Aggregates") |>
select(country,
iso = iso3c,
intpct = IT.NET.USER.ZS,
income) |>
drop_na() |>
mutate(myguess = 70,
residual = intpct - 70) # unconditionalLet’s make different guesses for high-income and not high-income countries.
d <- d |>
mutate(highinc = if_else(income == "High income", 1, 0),
my_cond_guess = if_else(highinc == 1, 90, 70),
my_cond_resid = intpct - my_cond_guess)
sum(abs(d$residual))[1] 3681.513
sum(abs(d$my_cond_resid))[1] 2804.545
Notice that making separate guesses makes the ERROR (SSR or RSS or SSE) go down. That’s an improvement.2
2 See the next chapter for what these terms mean in practice.
Let’s make some visualizations.
Here’s a histogram.
plt(~ intpct,
data = d,
type = type_hist(),
main = "Internet access by country, 2021",
sub = "World Development Indicators data",
xlab = "% households with internet")
Here’s a dotplot with the countries sorted by rank.
plt(~ sort(intpct),
data = d,
main = "Internet access by country, 2021",
sub = "World Development Indicators data",
ylab = "% households with internet",
xaxt = "n",
xlab = "")