Engaging plots, made easy.

Easily turn your data into engaging visualizations. Powerful API for coders. Powerful app for everyone.

main.py
notebook.ipynb
main.rs
from plotapi import Chord

Chord(matrix, names).show()

Visualizations Showcase

Top Olympic Medal Earning Countries

Top Olympic Medal Earning Countries


Preamble

import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import itertools
from plotapi import Chord

Introduction

In previous sections, we visualised co-occurrences of Pokémon type. Whilst it was interesting to look at, the dataset only contained Pokémon from the first six geerations. In this section, we're going to use the TidyTuesday Animal Crossing villagers dataset to visualise the relationship between Species and .

The Dataset

The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.

Let's download the mirrored dataset and have a look for ourselves.

data_url = 'https://datacrayon.com/datasets/athlete_events.csv'
raw_data = pd.read_csv(data_url)
raw_data.head()
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
0 1 A Dijiang M 24.0 180.0 80.0 China CHN 1992 Summer 1992 Summer Barcelona Basketball Basketball Men's Basketball NaN
1 2 A Lamusi M 23.0 170.0 60.0 China CHN 2012 Summer 2012 Summer London Judo Judo Men's Extra-Lightweight NaN
2 3 Gunnar Nielsen Aaby M 24.0 NaN NaN Denmark DEN 1920 Summer 1920 Summer Antwerpen Football Football Men's Football NaN
3 4 Edgar Lindenau Aabye M 34.0 NaN NaN Denmark/Sweden DEN 1900 Summer 1900 Summer Paris Tug-Of-War Tug-Of-War Men's Tug-Of-War Gold
4 5 Christine Jacoba Aaftink F 21.0 185.0 82.0 Netherlands NED 1988 Winter 1988 Winter Calgary Speed Skating Speed Skating Women's 500 metres NaN
data = raw_data[raw_data.Medal.notna()]
data.head()
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
3 4 Edgar Lindenau Aabye M 34.0 NaN NaN Denmark/Sweden DEN 1900 Summer 1900 Summer Paris Tug-Of-War Tug-Of-War Men's Tug-Of-War Gold
37 15 Arvo Ossian Aaltonen M 30.0 NaN NaN Finland FIN 1920 Summer 1920 Summer Antwerpen Swimming Swimming Men's 200 metres Breaststroke Bronze
38 15 Arvo Ossian Aaltonen M 30.0 NaN NaN Finland FIN 1920 Summer 1920 Summer Antwerpen Swimming Swimming Men's 400 metres Breaststroke Bronze
40 16 Juhamatti Tapio Aaltonen M 28.0 184.0 85.0 Finland FIN 2014 Winter 2014 Winter Sochi Ice Hockey Ice Hockey Men's Ice Hockey Bronze
41 17 Paavo Johannes Aaltonen M 28.0 175.0 64.0 Finland FIN 1948 Summer 1948 Summer London Gymnastics Gymnastics Men's Individual All-Around Bronze

capitalise the name, personality, and species of each villager.

It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.

data.shape
(39783, 15)
data = data[data['NOC'].isin(list(data['NOC'].value_counts()[:20].index))]

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1 and Type 2.

So let's select just these two columns and work with a list containing only them as we move forward.

species_personality = pd.DataFrame(data[['NOC', 'Medal']].values).dropna().astype(str)
species_personality
0 1
0 FIN Bronze
1 FIN Bronze
2 FIN Bronze
3 FIN Bronze
4 FIN Gold
... ... ...
30152 URS Gold
30153 URS Silver
30154 URS Bronze
30155 RUS Bronze
30156 RUS Silver

30157 rows × 2 columns

species_personality = species_personality.dropna()

Now for the names of our types.

#left = np.unique(pd.DataFrame(species_personality)[0]).tolist()
left = list(data['Medal'].value_counts().index)[::-1]
#left.sort()
left = list(["Gold","Silver","Bronze"])

pd.DataFrame(left)
0
0 Gold
1 Silver
2 Bronze
#right = np.unique(pd.DataFrame(species_personality)[1]).tolist()
right = list(data['NOC'].value_counts().index)
#right.sort()
pd.DataFrame(right)
0
0 USA
1 URS
2 GER
3 GBR
4 FRA
5 ITA
6 SWE
7 CAN
8 AUS
9 RUS
10 HUN
11 NED
12 NOR
13 GDR
14 CHN
15 JPN
16 FIN
17 SUI
18 ROU
19 KOR

Which we can now use to create the matrix.

features= left+right
d = pd.DataFrame(0, index=features, columns=features)

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

We can build a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.

species_personality.values
array([['FIN', 'Bronze'],
       ['FIN', 'Bronze'],
       ['FIN', 'Bronze'],
       ...,
       ['URS', 'Bronze'],
       ['RUS', 'Bronze'],
       ['RUS', 'Silver']], dtype=object)
for x in species_personality.values:
    d.at[x[0], x[1]] += 1
    d.at[x[1], x[0]] += 1

Chord Diagram

Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.

colors =["#FFD700","#C0C0C0","#A57164",
'#e6194b', '#3cb44b', '#ffe119', '#4363d8', '#f58231', '#911eb4', '#46f0f0', '#f032e6', '#bcf60c', '#fabebe', '#008080', '#e6beff', '#9a6324', '#fffac8', '#800000', '#aaffc3', '#808000', '#ffd8b1', '#000075', '#808080'
         #'#e6194B', '#3cb44b', '#ffe119', '#4363d8', '#f58231', '#911eb4', '#42d4f4', '#f032e6', '#bfef45', '#fabed4', '#469990', '#dcbeff', '#9A6324', '#fffac8', '#800000', '#aaffc3', '#808000', '#ffd8b1', '#000075', '#a9a9a9', '#ffffff', '#000000'
        ]
names = left + right
len(names)
23

Finally, we can put it all together.

Chord(d.values.tolist(), names,credit=True, colors=colors, curved_labels=False,
      margin=40, font_size_large=7,noun="medals", conjunction="awarded", verb="",
        details_separator="", bipartite=True, bipartite_idx=len(left),bipartite_size=.2, reverse_gradients=True).show()

Chord(d.values.tolist(), names,credit=True, colors=colors, curved_labels=False,
      margin=40, font_size_large=7,noun="medals", conjunction="awarded", verb="",
        details_separator="", bipartite=True, bipartite_idx=len(left),bipartite_size=.2, reverse_gradients=False).show()
PlotAPI - Chord Diagram
PlotAPI - Chord Diagram
import json

data = {"matrix": d.values.tolist(),
        "names": names,
        "colors": colors,
        "bipartite_idx": len(left)}

with open("olympic_medals.json", "w") as fp:
    json.dump(data, fp)
    
    
    

Chord(
    d.values.tolist(),
    names,
    colors=colors,
    curved_labels=False,
    margin=40,
    font_size_large=7,
    noun="medals",
    conjunction="awarded",
    verb="",
    details_separator="",
    bipartite=True,
    bipartite_idx=len(left),
    bipartite_size=0.2,
    reverse_gradients=False,
).show()
PlotAPI - Chord Diagram

Previous
Showcase