Mean square error is defined as . To be defined, it requires that is fixed, constant (albeit unknown). In other words, it’s a very frequentist thing, but it’s such a dominant way of evaluating the performance of estimators that it has to be taken very seriously. My confusion arises from the notation; I thought I’d set my thoughts out here to see if it helped with the confusion.

For a continuous random variable assumed to follow some distribution the expectation can be defined as where is the probability density function of . Here’s where the fun starts.

- For compact notation, although is a function itself, and more formally would be denoted where denotes the subset of some sample space. But no-one seems to get confused if we simplify the notation and just use . So far, so good.
- In a similar way (albeit using the simpler notation for the random variable) expectation should perhaps more carefully be written as , telling us the expectation is taken over . Also, so far so good.
- Now for conditional expectation. I now have two random variables, and and I don’t at this stage care about the relationship between them. I do however wish to find the expectation of for some unspecified value of the variable . I denote this conditional on as . So my expectation (in short form) will be denoted , because it is meant to be clear what we mean. What I do is find
.

One thing to note here is that as is random variable, so this expectation denotes a function of a random variable, for some value of . In other words, the function defines a random variable. But it is still an expectation of . Conversely, if is fixed, this is not a random variable. Nevertheless, it still requires an integration over . So the same notation can define either a random variable or some unknown value depending on whether is a random variable or not. So the type of function depends on , but is still an expectation of . But both definitions are an integration over . I therefore think the most formal notation here should be to denote we are taking an expectation over , it’s the value of that describes the conditioning here.

- It’s not relevant here (yet), but if I now take the expectation of this expectation, i.e. I am going to get a single number, in fact it collapses to .
- However, what’s troubling me is the notation of mean square error, defined above as . I have seen this written (by authors who know their stuff) as . This has been confusing me. I think it would be better to write . What we are trying to find is:
because is the random variable; it has a pdf (sometimes called a sampling distribution) and here a conditional pdf . We are also interested in evaluating a function of that random variable given by . But this is not a random variable because we are integrating over and conditioning on .

So I’m still none the wiser as to the notation for mean square error .