Creating a News Classifier with Multinomial Naive Bayes
Using Python, it is possible to create that simple project in an hour.
Recently, reading some Data Science related content, I faced this quick project where the goal is to create a news headlines classifier. You provide the headline to the model and it will return to you the topic it is about.
The algorithm used to create such model is the Multinomial Naïve Bayes, from sklearn
package in Python. Our idea with this post is, therefore, go through the steps to create the dataset and the model.
Naïve Bayes
There are many algorithms based on Thomas Bayes’s theorem, that says that the probability of something to happen is conditional to the previous occurrence of the events. Thus, imagine we calculate the probability that today it will rain. But given that yesterday it rained, that information changes the probability of raining today.
The formula that represents the theorem is presented next.
Ok, but we still don’t know how this formula can be useful to our problem — finding the news headline category. So, let’s work on that.