Introduction

In this notebook we will go through the main concepts of pandas library. First, let's import the pandas library. We use the common alias pd.

Source: fast.ai course link

  • Random forest is a kind of universal machine learning technique
  • It can be used for both regression (target is a continuous variable) or classification (target is a categorical variable) problems
  • It also works with columns of any kinds, like pixel values, zip codes, revenue, etc.
  • In general, random forest does not overfit (it’s very easy to stop it from overfitting)
  • You do not need a separate validation set in general. It can tell you how well it generalizes even if you only have one dataset
  • It has few (if any) statistical assumptions (it doesn’t assume that data is normally distributed, data is linear, or that you need to specify the interactions)
  • Requires very few feature engineering tactics, so it’s a great place to start. For many different types of situations, you do not have to take the log of the data or multiply interactions together
  • Most machine learning models (including random forest) cannot directly use categorical columns.
  • RandomForestRegressor and RandomForestClassifier
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn import metrics