Aaargh. A A R G H.
I like David Freedman’s book on Statistical Modelling. I like it a lot, and I particularly like the clarity. It’s very clear that is a population parameter, a fixed but unknown constant (OK, so I prefer to think of these as random variables, but we’ll leave that aside for now). Whatever your statistical perspective, is a random variable and hence has a realisation, a sampling distribution (population distribution), variance and so on. So it’s quite clear that the hat symbol denotes an estimator which as a function of random variables has to be a random variable.
I therefore got rather queasy about using to denote the projection of onto where are the least squares solutions. Many people spend a little time looking at the least squares solutions, if nothing else because the geometry is so cute. But in this case, neither nor are random variables because we don’t have to regard this as a statistical model. It’s just a projection of some data. We may go on to assume that at least is a realisation of a random variable and make assumptions about it’s properties which let us use as an estimator but we don’t have to. It’s an exercise in geometry, pure and simple. So why use the notation? Doesn’t it imply that is a random variable, which adds a conceptual layer to the development of the material that isn’t necessary yet.
Why not call it Shadow y, and then call the Hat matrix the shadow matrix (projection y would do as well wouldn’t it))? The Hat matrix would become Hat matrix only when we are making assumptions about instead of looking at projections of ?
A large part of my queasiness is about over-emphasis on . I know it’s a nice examinable exercise to give someone a formula and ask them to compute some value of but that seems wrong as well. If we’re fitting a regression model isn’t the point that we are calculating because that we believe . Given the probability it seems a strange thing to get quite so hung up on in a statistical model, whereas (or whatever else we should call it) seems like a natural thing to consider when we are working with the geometry of linear regression.