## My Mean Square Error Confusions

Mean square error is defined as $E[(\hat{\theta}-\theta)^2]$. To be defined, it requires that $\theta$ is fixed, constant (albeit unknown). In other words, it’s a very frequentist thing, but it’s such a dominant way of evaluating the performance of estimators that it has to be taken very seriously. My confusion arises from the notation; I thought I’d set my thoughts out here to see if it helped with the confusion.

For a continuous random variable $X$ assumed to follow some distribution $F(x)$ the expectation $E[X]$ can be defined as $E[X] = \int_{-\infty}^{\infty} x f(x) dx$ where $f(x)$ is the probability density function of $X$. Here’s where the fun starts.

1. For compact notation, although $X$ is a function itself, and more formally would be denoted $X(\omega)$ where $\omega$ denotes the subset of some sample space. But no-one seems to get confused if we simplify the notation and just use $X$. So far, so good.
2. In a similar way (albeit using the simpler $X$ notation for the random variable) expectation should perhaps more carefully be written as $E_X[X] = \int_{\infty}^{\infty} x f(x) dx$, telling us the expectation is taken over $X$. Also, so far so good.
3. Now for conditional expectation. I now have two random variables, $X$ and $Y$ and I don’t at this stage care about the relationship between them. I do however wish to find the expectation of $Y$ for some unspecified value of the variable $X$. I denote this $Y$ conditional on $X$ as $Y|X$. So my expectation (in short form) will be denoted $E[Y|X]$, because it is meant to be clear what we mean. What I do is find

$E[Y|X] = \int_{-\infty}^{\infty} y f(y|x) dy$.

One thing to note here is that as $X$ is random variable, so this expectation denotes a function of a random variable, $g(x) = \int_{-\infty}^{\infty} f(y|x) dy$ for some value of $x$. In other words, the function defines a random variable. But it is still an expectation of $Y$. Conversely, if $X$ is fixed, this is not a random variable. Nevertheless, it still requires an integration over $Y$. So the same notation can define either a random variable or some unknown value depending on whether $X$ is a random variable or not. So the type of function depends on $X$, but is still an expectation of $Y$. But both definitions are an integration over $Y$. I therefore think the most formal notation here should be $E_{Y|X}[Y|X] = \int_{-\infty}^{\infty} y f(y|x) dy$ to denote we are taking an expectation over $Y$, it’s the value of $f(y|x)$ that describes the conditioning here.

4. It’s not relevant here (yet), but if I now take the expectation of this expectation, i.e. $E[E[Y|X]$ I am going to get a single number, in fact it collapses to $E[Y]$.
5. However, what’s troubling me is the notation of mean square error, defined above as $E[(\hat{\theta}-\theta)^2]$. I have seen this written (by authors who know their stuff) as $E_{\theta}[(\hat{\theta}-\theta)^2]$. This has been confusing me. I think it would be better to write $E_{\hat{\theta}|\theta}[(\hat{\theta}-\theta)^2]$. What we are trying to find is:

$\int_{-\infty}^{\infty} (\hat{\theta}-\theta)^2 f(\hat{\theta}|\theta) d \hat{\theta}$

because $\hat{\theta}$ is the random variable; it has a pdf $f(\hat{\theta})$ (sometimes called a sampling distribution) and here a conditional pdf $f(\hat{\theta}|\theta)$. We are also interested in evaluating a function of that random variable given by $g(\hat{\theta}) = (\hat{\theta}-\theta)^2$. But this $g(\cdot)$ is not a random variable because we are integrating over $\hat{\theta}$ and conditioning on $\theta$.

So I’m still none the wiser as to the $E_{\theta}$ notation for mean square error $\ldots$.