Introduction

In this notebook we will go through the main concepts of pandas library. First, let's import the pandas library. We use the common alias pd.

import pandas as pd
from termcolor import colored, cprint

List columns in the dataframe which contain missing values:

print(colored(f'List of columns with missing values:', 'blue'))
print(df.columns[df.isnull().any()].to_list())
List of columns with missing values:
['sex', 'age_approx', 'anatom_site_general_challenge']

Compute the percentage of missing values, convert the results to a dataframe, order in descending order and show only non-zero values:

print(colored(f'Percentage of missing values:', 'blue'))
print(((df.isnull().sum()/len(df))*100).to_frame('NA_PERC')\
          .sort_values(by = 'NA_PERC', ascending = False)\
          .query('NA_PERC > 0'))