Section A Recap

Overview

Goals

I’m not going to talk about everything in this section here. But I do want to mention one thing that is going to be important: maximum likelihood.

Session set up

Maximum likelihood

So far we have been interesting in minimizing the sum of squared residuals. This makes sense when we’re dealing with outcome whose residuals can be modeled as a normal distribution. But, as we’ll see later, that’s not always the case.

It turns out that minimizing the sum of squared residuals is equivalent to maximizing the likelihood of the observations. Which is a procedure we can use with basically every kind of model later on.

The relationship between the residual sum of squares (RSS) and the log likelihood of the observations is¹:

¹ You do NOT have to memorize this. But it’s good to think about.

\[\ell = -\frac{n}{2}\left[1+\log(2\pi)+\log\!\left(\frac{\mathrm{RSS}}{n}\right)\right]\]

Here’s a quick comparison in R for you to see the equivalence.

fit <- lm(wt ~ 1, data = mtcars)
deviance(fit)

[1] 29.67875

rss <- deviance(fit)
n <- nobs(fit)

ll_from_rss <- -(n/2) * (1 + log(2*pi) + log(rss/n))
ll_from_rss

[1] -44.20116

logLik(fit) # same!

'log Lik.' -44.20116 (df=2)

If you really want to understand likelihood, it’s valuable to think about it as a pointwise or rowwise quantity. That is, given a specific model, you can evaluate the likelihood of any individual observation.

m_hat <- fit$coefficients[1]
s_hat <- sd(fit$residuals)

mtcars <- mtcars |> 
  mutate(likelihood = dnorm(wt, m_hat, s_hat),
         log_lik = log(likelihood))

mtcars |> 
  arrange(wt) |> 
  select(wt, likelihood, log_lik)

                       wt likelihood    log_lik
Lotus Europa        1.513 0.08945265 -2.4140458
Honda Civic         1.615 0.10668155 -2.2379071
Toyota Corolla      1.835 0.15031880 -1.8949969
Fiat X1-9           1.935 0.17276190 -1.7558409
Porsche 914-2       2.140 0.22241163 -1.5032254
Fiat 128            2.200 0.23749870 -1.4375931
Datsun 710          2.320 0.26777471 -1.3176093
Toyota Corona       2.465 0.30340200 -1.1926966
Mazda RX4           2.620 0.33842446 -1.0834544
Ferrari Dino        2.770 0.36728051 -1.0016294
Volvo 142E          2.780 0.36898104 -0.9970100
Mazda RX4 Wag       2.875 0.38353077 -0.9583354
Merc 230            3.150 0.40676384 -0.8995225
Ford Pantera L      3.170 0.40725061 -0.8983265
Merc 240D           3.190 0.40756765 -0.8975484
Hornet 4 Drive      3.215 0.40772466 -0.8971632
AMC Javelin         3.435 0.39775323 -0.9219235
Hornet Sportabout   3.440 0.39729596 -0.9230738
Merc 280            3.440 0.39729596 -0.9230738
Merc 280C           3.440 0.39729596 -0.9230738
Valiant             3.460 0.39536891 -0.9279360
Dodge Challenger    3.520 0.38866808 -0.9450296
Duster 360          3.570 0.38207185 -0.9621466
Maserati Bora       3.570 0.38207185 -0.9621466
Merc 450SL          3.730 0.35541503 -1.0344691
Merc 450SLC         3.780 0.34557225 -1.0625535
Camaro Z28          3.840 0.33297034 -1.0997018
Pontiac Firebird    3.845 0.33188483 -1.1029673
Merc 450SE          4.070 0.27888986 -1.2769384
Cadillac Fleetwood  5.250 0.04711454 -3.0551736
Chrysler Imperial   5.345 0.03832721 -3.2615952
Lincoln Continental 5.424 0.03205090 -3.4404301

sum(mtcars$log_lik) # any differences here are just rounding error

[1] -44.20914

Because the normal is a continuous distribution, these likelihood values are probability densities. So each value is a likelihood contribution to the overall likelihood.