Frequentists vs Bayesians

XKCD – Frequentists vs. Bayesians

In the “data science” community it seems there is a big split in terms of how people approach statistics. I didn’t notice it at first but it’s a matter of belief almost like a “religion” of statistics.

While XKCD doesn’t present the frequentists in a really good way their approach may actually lead to the same outcome (using a different path).

Let’s assume that we want to train a supervised model on dataset. The goal is to find out the best values for parameter \(\theta\).

The frequentists see \(\theta\) as a truth that we aim to discover. \(\theta\) value is fixed but we don’t know its value yet. Our estimate \(\hat{\theta}\) is the random variable and is a function of the dataset (which is seen as random). The more events there are in the dataset the closer to \(\theta\) our estimate \(\hat{\theta}\) will be.

In the other hand, the bayesians see the dataset as something we observe and that is known (not random). The true parameter \(\theta\) is unknown and random. Their approach consists in using the probabilities and the knowledge we got from the events in the dataset to estimate the probability distribution of the random variable \(\theta\). Then after each event we try to estimate \(p(\theta | x_1, …, x_m)\) using Bayes rule:

\(p(\theta | x_1, …x_m) = \frac{p(x_1, …, x_m | \theta) p(\theta)}{p(x_1, …, x_m)}\)

The idea is to start with a broad distribution (constant or gaussian) and refine our knowledge after each event.

To me it seems that while there is a difference in the philosophy of the approach (probably the reason for the chiasm  in the community), the bayesian approach seems more general and may lead to the same results as the frequentists (under certain assumptions and a more complicated path to get the results).