Random Forest: An Algorithm for All Times
A tutorial of Random Forest Algorithm.
Source: fast.ai course link
- Random forest is a kind of universal machine learning technique
- It can be used for both regression (target is a continuous variable) or classification (target is a categorical variable) problems
- It also works with columns of any kinds, like pixel values, zip codes, revenue, etc.
- In general, random forest does not overfit (it’s very easy to stop it from overfitting)
- You do not need a separate validation set in general. It can tell you how well it generalizes even if you only have one dataset
- It has few (if any) statistical assumptions (it doesn’t assume that data is normally distributed, data is linear, or that you need to specify the interactions)
- Requires very few feature engineering tactics, so it’s a great place to start. For many different types of situations, you do not have to take the log of the data or multiply interactions together
- Most machine learning models (including random forest) cannot directly use categorical columns.
-
RandomForestRegressor
andRandomForestClassifier
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn import metrics