**Gaussian Naive Bayes**

A simple algorithm based on bayes rule. The “naive” aspect of this classifier is that it assumes independence between every pair of features (hardly ever true in practice).

For the adjacent formula, y is a given class variable and the x variables represent the features. P(y) is the probability of observing class y in the training set. P(x_vector | y) is the probability of observing the specific x_vector given the class y. Note that the product over all conditionals P(x_i|y) is only possible because of our naive assumption. The final equation indicates how the classifier finds the class to predict.

**Advantages->** (1) work pretty well in practice despite naive assumption, (2) can estimate the necessary parameters with relatively small amount of training data, (3) “can be extremely fast compared to more sophisticated methods” (source)

**Disadvantages->** (1) more sophisticated models that are better suited for data which are trained well, can outperform NB models, (2) “known to be a bad estimator, so the probability outputs are not taken too seriously” (source) (3) can be particularly ineffective compared to more sophisticated models when there are significant dependent relations between pairs of features.

**Example Real World Application->**(1) document classification, (2) spam filtering

Continue reading →