My Mean Square Error Confusions

Mean square error is defined as E[(\hat{\theta}-\theta)^2]. To be defined, it requires that \theta is fixed, constant (albeit unknown). In other words, it’s a very frequentist thing, but it’s such a dominant way of evaluating the performance of estimators that it has to be taken very seriously. My confusion arises from the notation; I thought I’d set my thoughts out here to see if it helped with the confusion.

For a continuous random variable X assumed to follow some distribution F(x) the expectation E[X] can be defined as E[X] = \int_{-\infty}^{\infty} x f(x) dx where f(x) is the probability density function of X. Here’s where the fun starts.

  1. For compact notation, although X is a function itself, and more formally would be denoted X(\omega) where \omega denotes the subset of some sample space. But no-one seems to get confused if we simplify the notation and just use X. So far, so good.
  2. In a similar way (albeit using the simpler X notation for the random variable) expectation should perhaps more carefully be written as E_X[X] = \int_{\infty}^{\infty} x f(x) dx, telling us the expectation is taken over X. Also, so far so good.
  3. Now for conditional expectation. I now have two random variables, X and Y and I don’t at this stage care about the relationship between them. I do however wish to find the expectation of Y for some unspecified value of the variable X. I denote this Y conditional on X as Y|X. So my expectation (in short form) will be denoted E[Y|X], because it is meant to be clear what we mean. What I do is find

    E[Y|X] = \int_{-\infty}^{\infty} y f(y|x) dy.

    One thing to note here is that as X is random variable, so this expectation denotes a function of a random variable, g(x) = \int_{-\infty}^{\infty} f(y|x) dy for some value of x. In other words, the function defines a random variable. But it is still an expectation of Y. Conversely, if X is fixed, this is not a random variable. Nevertheless, it still requires an integration over Y. So the same notation can define either a random variable or some unknown value depending on whether X is a random variable or not. So the type of function depends on X, but is still an expectation of Y. But both definitions are an integration over Y. I therefore think the most formal notation here should be E_{Y|X}[Y|X] = \int_{-\infty}^{\infty} y f(y|x) dy to denote we are taking an expectation over Y, it’s the value of f(y|x) that describes the conditioning here.

  4. It’s not relevant here (yet), but if I now take the expectation of this expectation, i.e. E[E[Y|X] I am going to get a single number, in fact it collapses to E[Y].
  5. However, what’s troubling me is the notation of mean square error, defined above as E[(\hat{\theta}-\theta)^2]. I have seen this written (by authors who know their stuff) as E_{\theta}[(\hat{\theta}-\theta)^2]. This has been confusing me. I think it would be better to write E_{\hat{\theta}|\theta}[(\hat{\theta}-\theta)^2]. What we are trying to find is:

    \int_{-\infty}^{\infty} (\hat{\theta}-\theta)^2 f(\hat{\theta}|\theta) d \hat{\theta}

    because \hat{\theta} is the random variable; it has a pdf f(\hat{\theta}) (sometimes called a sampling distribution) and here a conditional pdf f(\hat{\theta}|\theta). We are also interested in evaluating a function of that random variable given by g(\hat{\theta}) = (\hat{\theta}-\theta)^2. But this g(\cdot) is not a random variable because we are integrating over \hat{\theta} and conditioning on \theta.

So I’m still none the wiser as to the E_{\theta} notation for mean square error \ldots.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s