In an ideal world, everything has reason. Every question has a unambiguous answer. The data in sufficient to explain its behaviours, like the class it belongs to.

In the non ideal world, however, there is always something missing that stops us from knowing the entire truth. $g$ is beyond reach. In such cases we resort to probability.

It simply tells us how probable is the data belonging to a class($y=1$) if my observations are $x$.

If we build a classifier on this data, how good will it be? This is the question Bayes error answers.

Bayes Error

Lets say I’ve built a classifier $h$ to predict the class of data. $h(x)=\hat{y}$ is the predicted class and $y$ is the true class. Even ambiguous data needs to come from somewhere, So we assume $D$ is the joint distribution of $x$ and $y$.

Using an old trick to convert probability to expectation, $P[A] = E[1(A)]$, we have

The inner expectation is easier to solve when expanded.

Which give the final error to be

The last equation means, if the classifier predicts $+1$ for the data, it will contribute $n(x)$ to the error. On the other hand if it predicts $-1$ for the data, the contribution will be $1-n(x)$.

The best classifier would predict $+1$ when $n(x)$ is small and $-1$ when $n(x)$ is large. The minimum achievable error is then

This error is called Bayes Error.

References

Shivani Agarwal’s lectures