Creating Interactive Python Visualizations With Plotly and Pandas

A Notebook Example of Canadian COVID-19 Cases

Jupyter notebooks make it incredibly easy to rapidly prototype vizualizations and work with data. In this example I wanted to show how you can also make interactive data vizualizations with the Plotly Python library. For the data sourcing and transformations I will use pandas library.

I was absolutely blown away with how little code it took for me to set up some more advanced interactive features such as selectable date ranges and themes. Plotly visuals come with tooltips automatically turned on, so these data viz are not your average static images!

I will then embed the notebook on my website (www.zachrenwick.com). If you want to see a full screen version of the markdown notebook, the link is available in my website directory here: www.zachrenwick.com/customposts/

I've tried to self-document my code with comments, so hopefully it is clear how to accomplish each step. If you have any questions, feel free to reach out.

Steps

  1. Import COVID 19 confirmed Canada cases from web source

  2. Complete data wrangling to transform into required format

  3. Vizualize Canadian new COVID cases

In [1]:
# Step 1 Import Data

import pandas as pd

# Get the COVID19 Global time series data
df = pd.read_csv('https://raw.githubusercontent.com/datasets/covid-19/master/data/time-series-19-covid-combined.csv')
df = pd.DataFrame(df, columns = ['Country/Region', 'Province/State','Confirmed', 'Date'])

df.head()
Out[1]:
Country/Region Province/State Confirmed Date
0 Afghanistan NaN 0.0 2020-01-22
1 Afghanistan NaN 0.0 2020-01-23
2 Afghanistan NaN 0.0 2020-01-24
3 Afghanistan NaN 0.0 2020-01-25
4 Afghanistan NaN 0.0 2020-01-26
In [2]:
# Step 2 Transform data. We need to group by date (as there are multiple provinces in dataset) 
#        create new metric for new cases. It is currently showing cumulative cases

# Filter for Canada Data only
# Create variable with TRUE if Country is Canada
Canadian = df['Country/Region'] == "Canada"

#Remove recovered caves
new_cases_only = df['Province/State'] != "Recovery aggregated"

# Create variable for dates 
selected_dates = df['Date'] >= '2020-01-10'

# Select all cases where country is Canada and date is in specified range
canada_data = df[Canadian & selected_dates & new_cases_only]

# Rename columns for Province and Country
canada_data = canada_data.rename(columns= { "Province/State" : "Province" , "Country/Region" : "Country"} , errors="raise")

# Since there are multiple provinces in current df, we will now group them by day
canada_df =  canada_data.groupby(["Date"], as_index=False)["Confirmed"].sum()

# Calculate the difference from previous row in dataset (convert from cumulative figures to new cases only)
daily_covid = canada_df['Confirmed'].diff()
#daily_covid.head()

canada_df.head()
Out[2]:
Date Confirmed
0 2020-01-22 0.0
1 2020-01-23 0.0
2 2020-01-24 0.0
3 2020-01-25 0.0
4 2020-01-26 1.0
In [3]:
# Step 3 Vizualize daily new Canadian COVID-19 cases
import plotly.graph_objects as go

x = canada_df['Date']
y = daily_covid

fig = go.Figure(data=go.Scatter(x=x, y=y))
fig.show()