Regression notation

Aaargh.   A A R G H.

I like David Freedman’s book on Statistical Modelling.   I like it a lot, and I particularly like the clarity.   It’s very clear that $\beta$ is a population parameter, a fixed but unknown constant (OK, so I prefer to think of these as random variables, but we’ll leave that aside for now).   Whatever your statistical perspective, $\hat{\beta}$ is a random variable and hence has a realisation, a sampling distribution (population distribution), variance and so on.   So it’s quite clear that the hat symbol denotes an estimator which as a function of random variables has to be a random variable.

I therefore got rather queasy about using $\hat{y}$ to denote the projection of $y$ onto $\boldsymbol{X b}$ where $\boldsymbol{b}$ are the least squares solutions.   Many people spend a little time looking at the least squares solutions, if nothing else because the geometry is so cute.   But in this case, neither $\boldsymbol{X}$ nor $\boldsymbol{b} = \boldsymbol{(X}^T\boldsymbol{X})^{-1} \boldsymbol{X}^T \boldsymbol{y}$ are random variables because we don’t have to regard this as a statistical model.   It’s just a projection of some data.   We may go on to assume that at least $\boldsymbol{y}$ is a realisation of a random variable and make assumptions about it’s properties which let us use $\boldsymbol{b}$ as an estimator $\hat{\boldsymbol{\beta}}$ but we don’t have to.   It’s an exercise in geometry, pure and simple.   So why use the $\hat{y}$ notation?   Doesn’t it imply that $\hat{y}$ is a random variable, which adds a conceptual layer to the development of the material that isn’t necessary yet.

Why not call it Shadow y, and then call the Hat matrix the shadow matrix (projection y would do as well wouldn’t it))?    The Hat matrix would become  Hat matrix only when we are making assumptions about $\boldsymbol{Y}$ instead of looking at projections of $\boldsymbol{y}$?

A large part of my queasiness is about over-emphasis on $\hat{\boldsymbol{y}}$.   I know it’s a nice examinable exercise to give someone a formula and ask them to compute some value of $\hat{y}$ but that seems wrong as well.    If we’re fitting a regression model isn’t the point that we are calculating $E[Y_i | \boldsymbol{x}_i ] = \mu_i|\boldsymbol{x}_i$because that we believe $Y_i \sim Normal(\mu_i|\boldsymbol{x}_i, \sigma^2)$.   Given the probability  $P(Y_i=\hat{y}_i)=0$ it seems a strange thing to get quite so hung up on $\hat{y}$ in a statistical model, whereas $Shadow(y)$ (or whatever else we should call it) seems like a natural thing to consider when we are working with the geometry of linear regression.