Top Olympic Medal Earning Countries

Preamble

import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import itertools
from plotapi import Chord

Introduction

In previous sections, we visualised co-occurrences of Pokémon type. Whilst it was interesting to look at, the dataset only contained Pokémon from the first six geerations. In this section, we're going to use the TidyTuesday Animal Crossing villagers dataset to visualise the relationship between Species and .

The Dataset

The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.

Let's download the mirrored dataset and have a look for ourselves.

data_url = 'https://datacrayon.com/datasets/athlete_events.csv'
raw_data = pd.read_csv(data_url)
raw_data.head()

	ID	Name	Sex	Age	Height	Weight	Team	NOC	Games	Year	Season	City	Sport	Event	Medal
0	1	A Dijiang	M	24.0	180.0	80.0	China	CHN	1992 Summer	1992	Summer	Barcelona	Basketball	Basketball Men's Basketball	NaN
1	2	A Lamusi	M	23.0	170.0	60.0	China	CHN	2012 Summer	2012	Summer	London	Judo	Judo Men's Extra-Lightweight	NaN
2	3	Gunnar Nielsen Aaby	M	24.0	NaN	NaN	Denmark	DEN	1920 Summer	1920	Summer	Antwerpen	Football	Football Men's Football	NaN
3	4	Edgar Lindenau Aabye	M	34.0	NaN	NaN	Denmark/Sweden	DEN	1900 Summer	1900	Summer	Paris	Tug-Of-War	Tug-Of-War Men's Tug-Of-War	Gold
4	5	Christine Jacoba Aaftink	F	21.0	185.0	82.0	Netherlands	NED	1988 Winter	1988	Winter	Calgary	Speed Skating	Speed Skating Women's 500 metres	NaN

data = raw_data[raw_data.Medal.notna()]
data.head()

	ID	Name	Sex	Age	Height	Weight	Team	NOC	Games	Year	Season	City	Sport	Event	Medal
3	4	Edgar Lindenau Aabye	M	34.0	NaN	NaN	Denmark/Sweden	DEN	1900 Summer	1900	Summer	Paris	Tug-Of-War	Tug-Of-War Men's Tug-Of-War	Gold
37	15	Arvo Ossian Aaltonen	M	30.0	NaN	NaN	Finland	FIN	1920 Summer	1920	Summer	Antwerpen	Swimming	Swimming Men's 200 metres Breaststroke	Bronze
38	15	Arvo Ossian Aaltonen	M	30.0	NaN	NaN	Finland	FIN	1920 Summer	1920	Summer	Antwerpen	Swimming	Swimming Men's 400 metres Breaststroke	Bronze
40	16	Juhamatti Tapio Aaltonen	M	28.0	184.0	85.0	Finland	FIN	2014 Winter	2014	Winter	Sochi	Ice Hockey	Ice Hockey Men's Ice Hockey	Bronze
41	17	Paavo Johannes Aaltonen	M	28.0	175.0	64.0	Finland	FIN	1948 Summer	1948	Summer	London	Gymnastics	Gymnastics Men's Individual All-Around	Bronze

capitalise the name, personality, and species of each villager.

It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.

data.shape

(39783, 15)

data = data[data['NOC'].isin(list(data['NOC'].value_counts()[:20].index))]

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1 and Type 2.

So let's select just these two columns and work with a list containing only them as we move forward.

species_personality = pd.DataFrame(data[['NOC', 'Medal']].values).dropna().astype(str)
species_personality

	0	1
0	FIN	Bronze
1	FIN	Bronze
2	FIN	Bronze
3	FIN	Bronze
4	FIN	Gold
...	...	...
30152	URS	Gold
30153	URS	Silver
30154	URS	Bronze
30155	RUS	Bronze
30156	RUS	Silver

30157 rows × 2 columns

species_personality = species_personality.dropna()

Now for the names of our types.

#left = np.unique(pd.DataFrame(species_personality)[0]).tolist()
left = list(data['Medal'].value_counts().index)[::-1]
#left.sort()
left = list(["Gold","Silver","Bronze"])

pd.DataFrame(left)

	0
0	Gold
1	Silver
2	Bronze

#right = np.unique(pd.DataFrame(species_personality)[1]).tolist()
right = list(data['NOC'].value_counts().index)
#right.sort()
pd.DataFrame(right)

	0
0	USA
1	URS
2	GER
3	GBR
4	FRA
5	ITA
6	SWE
7	CAN
8	AUS
9	RUS
10	HUN
11	NED
12	NOR
13	GDR
14	CHN
15	JPN
16	FIN
17	SUI
18	ROU
19	KOR

Which we can now use to create the matrix.

features= left+right
d = pd.DataFrame(0, index=features, columns=features)

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

We can build a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.

species_personality.values

array([['FIN', 'Bronze'],
       ['FIN', 'Bronze'],
       ['FIN', 'Bronze'],
       ...,
       ['URS', 'Bronze'],
       ['RUS', 'Bronze'],
       ['RUS', 'Silver']], dtype=object)

for x in species_personality.values:
    d.at[x[0], x[1]] += 1
    d.at[x[1], x[0]] += 1

Chord Diagram

Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.

colors =["#FFD700","#C0C0C0","#A57164",
'#e6194b', '#3cb44b', '#ffe119', '#4363d8', '#f58231', '#911eb4', '#46f0f0', '#f032e6', '#bcf60c', '#fabebe', '#008080', '#e6beff', '#9a6324', '#fffac8', '#800000', '#aaffc3', '#808000', '#ffd8b1', '#000075', '#808080'
         #'#e6194B', '#3cb44b', '#ffe119', '#4363d8', '#f58231', '#911eb4', '#42d4f4', '#f032e6', '#bfef45', '#fabed4', '#469990', '#dcbeff', '#9A6324', '#fffac8', '#800000', '#aaffc3', '#808000', '#ffd8b1', '#000075', '#a9a9a9', '#ffffff', '#000000'
        ]

names = left + right
len(names)

Finally, we can put it all together.

Chord(d.values.tolist(), names,credit=True, colors=colors, curved_labels=False,
      margin=40, font_size_large=7,noun="medals", conjunction="awarded", verb="",
        details_separator="", bipartite=True, bipartite_idx=len(left),bipartite_size=.2, reverse_gradients=True).show()

Chord(d.values.tolist(), names,credit=True, colors=colors, curved_labels=False,
      margin=40, font_size_large=7,noun="medals", conjunction="awarded", verb="",
        details_separator="", bipartite=True, bipartite_idx=len(left),bipartite_size=.2, reverse_gradients=False).show()

PlotAPI - Chord Diagram

import json

data = {"matrix": d.values.tolist(),
        "names": names,
        "colors": colors,
        "bipartite_idx": len(left)}

with open("olympic_medals.json", "w") as fp:
    json.dump(data, fp)
    
    
    

Chord(
    d.values.tolist(),
    names,
    colors=colors,
    curved_labels=False,
    margin=40,
    font_size_large=7,
    noun="medals",
    conjunction="awarded",
    verb="",
    details_separator="",
    bipartite=True,
    bipartite_idx=len(left),
    bipartite_size=0.2,
    reverse_gradients=False,
).show()

PlotAPI - Chord Diagram

Top Olympic Medal Earning Countries

Preamble

Introduction

The Dataset

Data Wrangling

Chord Diagram

Pokemon Types with Chord

Animal Crossing Villager Species and Personality

IMDb Top 1000 with Chord

Desktop Browsers Market Share with Pie Fight

Pokemon Trends with Bar Fight

Degree Classification by Graduate Gender with Terminus

Degree Classification by Graduate Ethnicity with Terminus

Global Email Spam with Terminus

Apple 2021 Q3 Results with Sankey

Apple 2021 Q4 Results with Sankey

League of Legends Classes

Pokemon Types with Heat Map

Video Game Publishers and Genres with SplitChord

Top Olympic Medal Earning Countries

League of Legends World Championship

Animal Crossing Villager Style

IMDb Top 1000 with Heat Map

StamiStudios Panels and Colours

	0
0	USA
1	URS
2	GER
3	GBR
4	FRA
5	ITA
6	SWE
7	CAN
8	AUS
9	RUS
10	HUN
11	NED
12	NOR
13	GDR
14	CHN
15	JPN
16	FIN
17	SUI
18	ROU
19	KOR

	0
0	USA
1	URS
2	GER
3	GBR
4	FRA
5	ITA
6	SWE
7	CAN
8	AUS
9	RUS
10	HUN
11	NED
12	NOR
13	GDR
14	CHN
15	JPN
16	FIN
17	SUI
18	ROU
19	KOR

	0
0	USA
1	URS
2	GER
3	GBR
4	FRA
5	ITA
6	SWE
7	CAN
8	AUS
9	RUS
10	HUN
11	NED
12	NOR
13	GDR
14	CHN
15	JPN
16	FIN
17	SUI
18	ROU
19	KOR