Regression notation

Aaargh.   A A R G H.

I like David Freedman’s book on Statistical Modelling.   I like it a lot, and I particularly like the clarity.   It’s very clear that \beta is a population parameter, a fixed but unknown constant (OK, so I prefer to think of these as random variables, but we’ll leave that aside for now).   Whatever your statistical perspective, \hat{\beta} is a random variable and hence has a realisation, a sampling distribution (population distribution), variance and so on.   So it’s quite clear that the hat symbol denotes an estimator which as a function of random variables has to be a random variable.

I therefore got rather queasy about using \hat{y} to denote the projection of y onto \boldsymbol{X b} where \boldsymbol{b} are the least squares solutions.   Many people spend a little time looking at the least squares solutions, if nothing else because the geometry is so cute.   But in this case, neither \boldsymbol{X} nor \boldsymbol{b} = \boldsymbol{(X}^T\boldsymbol{X})^{-1} \boldsymbol{X}^T \boldsymbol{y} are random variables because we don’t have to regard this as a statistical model.   It’s just a projection of some data.   We may go on to assume that at least \boldsymbol{y} is a realisation of a random variable and make assumptions about it’s properties which let us use \boldsymbol{b} as an estimator \hat{\boldsymbol{\beta}} but we don’t have to.   It’s an exercise in geometry, pure and simple.   So why use the \hat{y} notation?   Doesn’t it imply that \hat{y} is a random variable, which adds a conceptual layer to the development of the material that isn’t necessary yet.

Why not call it Shadow y, and then call the Hat matrix the shadow matrix (projection y would do as well wouldn’t it))?    The Hat matrix would become  Hat matrix only when we are making assumptions about \boldsymbol{Y} instead of looking at projections of \boldsymbol{y}?

A large part of my queasiness is about over-emphasis on \hat{\boldsymbol{y}}.   I know it’s a nice examinable exercise to give someone a formula and ask them to compute some value of \hat{y} but that seems wrong as well.    If we’re fitting a regression model isn’t the point that we are calculating E[Y_i | \boldsymbol{x}_i ] = \mu_i|\boldsymbol{x}_ibecause that we believe Y_i \sim Normal(\mu_i|\boldsymbol{x}_i, \sigma^2).   Given the probability  P(Y_i=\hat{y}_i)=0 it seems a strange thing to get quite so hung up on \hat{y} in a statistical model, whereas Shadow(y) (or whatever else we should call it) seems like a natural thing to consider when we are working with the geometry of linear regression.

This entry was posted in Theory I should know about and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s