Missing Values in Pandas: A Comprehensive Guide
A tutorial of how to approach missing value problem in any machine learning project.
import pandas as pd
from termcolor import colored, cprint
List columns in the dataframe which contain missing values:
print(colored(f'List of columns with missing values:', 'blue'))
print(df.columns[df.isnull().any()].to_list())
List of columns with missing values:
['sex', 'age_approx', 'anatom_site_general_challenge']
Compute the percentage of missing values, convert the results to a dataframe, order in descending order and show only non-zero values:
print(colored(f'Percentage of missing values:', 'blue'))
print(((df.isnull().sum()/len(df))*100).to_frame('NA_PERC')\
.sort_values(by = 'NA_PERC', ascending = False)\
.query('NA_PERC > 0'))