Engaging plots, made easy.

Easily turn your data into engaging visualizations. Powerful API for coders. Powerful app for everyone.

main.py
notebook.ipynb
main.rs
from plotapi import Chord

Chord(matrix, names).show()

Visualizations Showcase

Video Game Publishers and Genres with Split Chord

Video Game Titles - Publishers and Genres


Preamble

import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import itertools
from plotapi import SplitChord, Chord

Introduction

The Dataset

The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.

Let's download the mirrored dataset and have a look for ourselves.

data_url = 'https://datacrayon.com/datasets/vgsales.csv'
data = pd.read_csv(data_url)
data.head()
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
0 1 Wii Sports Wii 2006.0 Sports Nintendo 41.49 29.02 3.77 8.46 82.74
1 2 Super Mario Bros. NES 1985.0 Platform Nintendo 29.08 3.58 6.81 0.77 40.24
2 3 Mario Kart Wii Wii 2008.0 Racing Nintendo 15.85 12.88 3.79 3.31 35.82
3 4 Wii Sports Resort Wii 2009.0 Sports Nintendo 15.75 11.01 3.28 2.96 33.00
4 5 Pokemon Red/Pokemon Blue GB 1996.0 Role-Playing Nintendo 11.27 8.89 10.22 1.00 31.37

capitalise the name, personality, and species of each villager.

data['Publisher'] = data['Publisher'].str.capitalize()
data['Genre'] = data['Genre'].str.capitalize()

It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.

data.shape
(16598, 11)

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1 and Type 2.

pd.DataFrame(data.columns.values.tolist())
0
0 Rank
1 Name
2 Platform
3 Year
4 Genre
5 Publisher
6 NA_Sales
7 EU_Sales
8 JP_Sales
9 Other_Sales
10 Global_Sales

So let's select just these two columns and work with a list containing only them as we move forward.

species_personality = pd.DataFrame(data[['Publisher', 'Genre']].values).dropna().astype(str)
species_personality
0 1
0 Nintendo Sports
1 Nintendo Platform
2 Nintendo Racing
3 Nintendo Sports
4 Nintendo Role-playing
... ... ...
16593 Kemco Platform
16594 Infogrames Shooter
16595 Activision Racing
16596 7g//ames Puzzle
16597 Wanadoo Platform

16540 rows × 2 columns

Now for the names of our types.

#left = np.unique(pd.DataFrame(species_personality)[0]).tolist()
left = list(data.Publisher.value_counts()[:14].index)
pd.DataFrame(left)
0
0 Electronic arts
1 Activision
2 Namco bandai games
3 Ubisoft
4 Konami digital entertainment
5 Thq
6 Nintendo
7 Sony computer entertainment
8 Sega
9 Take-two interactive
10 Capcom
11 Atari
12 Tecmo koei
13 Square enix
right = np.unique(pd.DataFrame(species_personality)[1]).tolist()
pd.DataFrame(right)
0
0 Action
1 Adventure
2 Fighting
3 Misc
4 Platform
5 Puzzle
6 Racing
7 Role-playing
8 Shooter
9 Simulation
10 Sports
11 Strategy

Which we can now use to create the matrix.

features= left+right
d = pd.DataFrame(0, index=features, columns=features)

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

We can build a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.

species_personality = list(itertools.chain.from_iterable((i, i[::-1]) for i in species_personality.values))
for x in species_personality:
    if(x[0] in left or x[1] in left):
        d.at[x[0], x[1]] += 1
d
Electronic arts Activision Namco bandai games Ubisoft Konami digital entertainment Thq Nintendo Sony computer entertainment Sega Take-two interactive ... Fighting Misc Platform Puzzle Racing Role-playing Shooter Simulation Sports Strategy
Electronic arts 0 0 0 0 0 0 0 0 0 0 ... 39 46 16 7 159 35 139 116 561 37
Activision 0 0 0 0 0 0 0 0 0 0 ... 7 103 60 7 74 41 159 23 144 22
Namco bandai games 0 0 0 0 0 0 0 0 0 0 ... 134 97 19 20 27 151 37 29 51 61
Ubisoft 0 0 0 0 0 0 0 0 0 0 ... 19 151 70 24 52 41 92 119 72 29
Konami digital entertainment 0 0 0 0 0 0 0 0 0 0 ... 20 77 40 10 13 37 40 86 280 28
Thq 0 0 0 0 0 0 0 0 0 0 ... 71 66 85 17 101 8 36 27 31 32
Nintendo 0 0 0 0 0 0 0 0 0 0 ... 18 100 112 74 37 106 26 29 55 32
Sony computer entertainment 0 0 0 0 0 0 0 0 0 0 ... 30 128 66 12 65 49 51 15 124 12
Sega 0 0 0 0 0 0 0 0 0 0 ... 37 62 52 22 48 64 40 12 135 35
Take-two interactive 0 0 0 0 0 0 0 0 0 0 ... 1 27 11 1 20 6 65 4 151 22
Capcom 0 0 0 0 0 0 0 0 0 0 ... 58 11 46 6 13 38 25 2 3 3
Atari 0 0 0 0 0 0 0 0 0 0 ... 37 26 21 22 36 28 40 9 56 17
Tecmo koei 0 0 0 0 0 0 0 0 0 0 ... 12 14 1 0 5 47 3 13 39 50
Square enix 0 0 0 0 0 0 0 0 0 0 ... 3 6 0 4 0 129 16 4 0 9
Action 183 310 248 193 148 194 79 90 101 93 ... 0 0 0 0 0 0 0 0 0 0
Adventure 13 25 58 59 53 47 35 41 31 12 ... 0 0 0 0 0 0 0 0 0 0
Fighting 39 7 134 19 20 71 18 30 37 1 ... 0 0 0 0 0 0 0 0 0 0
Misc 46 103 97 151 77 66 100 128 62 27 ... 0 0 0 0 0 0 0 0 0 0
Platform 16 60 19 70 40 85 112 66 52 11 ... 0 0 0 0 0 0 0 0 0 0
Puzzle 7 7 20 24 10 17 74 12 22 1 ... 0 0 0 0 0 0 0 0 0 0
Racing 159 74 27 52 13 101 37 65 48 20 ... 0 0 0 0 0 0 0 0 0 0
Role-playing 35 41 151 41 37 8 106 49 64 6 ... 0 0 0 0 0 0 0 0 0 0
Shooter 139 159 37 92 40 36 26 51 40 65 ... 0 0 0 0 0 0 0 0 0 0
Simulation 116 23 29 119 86 27 29 15 12 4 ... 0 0 0 0 0 0 0 0 0 0
Sports 561 144 51 72 280 31 55 124 135 151 ... 0 0 0 0 0 0 0 0 0 0
Strategy 37 22 61 29 28 32 32 12 35 22 ... 0 0 0 0 0 0 0 0 0 0

26 rows × 26 columns

Chord Diagram

Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.

left[5] = 'THQ'
left[0] = 'EA'
left[2] = 'Namco'
left[4] = 'Konami'
left[7] = 'Sony'
left[9] = 'Take-Two'
left[-1] = "Square"
left[-2]= 'Tecmo'
left
['EA',
 'Activision',
 'Namco',
 'Ubisoft',
 'Konami',
 'THQ',
 'Nintendo',
 'Sony',
 'Sega',
 'Take-Two',
 'Capcom',
 'Atari',
 'Tecmo',
 'Square']
right[7] = 'RPG'
right
['Action',
 'Adventure',
 'Fighting',
 'Misc',
 'Platform',
 'Puzzle',
 'Racing',
 'RPG',
 'Shooter',
 'Simulation',
 'Sports',
 'Strategy']
colors1 =[
         
        "#ff4400","#ffcc00","#5c6633","#00e63d","#00d6e6","#566d73","#3d85f2","#00fff2","#0000e6","#290066","#ff80e5","#731d28"]

colors2=["#312f85",
         "#f4e301",
         "#f75802",
         "#3e4682",
         "#ad0332",
         "#666769",
         "#e80113",
         "#f78700",
         "#0100f4",
         "#1272c3","#f7cd01","#dd1a22","#00407b","#f70000",
]

colors1.reverse()
colors2.reverse()
colors = colors1 + colors2
names = left + right

Finally, we can put it all together.

Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
       font_size_large=7,noun="titles",
        details_separator="", bipartite=True, bipartite_idx=len(left),bipartite_size=.2, width=850).show()
PlotAPI - Chord Diagram
right
['Action',
 'Adventure',
 'Fighting',
 'Misc',
 'Platform',
 'Puzzle',
 'Racing',
 'RPG',
 'Shooter',
 'Simulation',
 'Sports',
 'Strategy']
df_links = pd.DataFrame(data[['Publisher', 'Genre']].values, columns=['Publisher', 'Genre']).dropna().astype(str)


df_links = df_links.value_counts().reset_index()

df_links = df_links.replace("Electronic arts", "EA")
df_links = df_links.replace("Thq", "THQ")
df_links = df_links.replace("Namco bandai games", "Namco")
df_links = df_links.replace("Konami digital entertainment", "Konami")
df_links = df_links.replace("Sony computer entertainment", "Sony")
df_links = df_links.replace("Take-two interactive", "Take-Two")
df_links = df_links.replace("Tecmo koei", "Tecmo")
df_links = df_links.replace("Square enix", "Square")
df_links = df_links.replace("Role-playing", "RPG")


df_links = df_links[df_links['Publisher'].isin(left)]


df_links
Publisher Genre 0
0 EA Sports 561
1 Activision Action 310
2 Konami Sports 280
3 Namco Action 248
4 THQ Action 194
... ... ... ...
984 Capcom Simulation 2
1114 Take-Two Puzzle 1
1118 Take-Two Fighting 1
1133 Square Racing 1
1340 Tecmo Platform 1

166 rows × 3 columns


nodes = []

for node in left:
    nodes.append({"name":node, "group":"left"})
    
for node in right:
    nodes.append({"name":node, "group":"right"}) 
    
nodes
[{'name': 'EA', 'group': 'left'},
 {'name': 'Activision', 'group': 'left'},
 {'name': 'Namco', 'group': 'left'},
 {'name': 'Ubisoft', 'group': 'left'},
 {'name': 'Konami', 'group': 'left'},
 {'name': 'THQ', 'group': 'left'},
 {'name': 'Nintendo', 'group': 'left'},
 {'name': 'Sony', 'group': 'left'},
 {'name': 'Sega', 'group': 'left'},
 {'name': 'Take-Two', 'group': 'left'},
 {'name': 'Capcom', 'group': 'left'},
 {'name': 'Atari', 'group': 'left'},
 {'name': 'Tecmo', 'group': 'left'},
 {'name': 'Square', 'group': 'left'},
 {'name': 'Action', 'group': 'right'},
 {'name': 'Adventure', 'group': 'right'},
 {'name': 'Fighting', 'group': 'right'},
 {'name': 'Misc', 'group': 'right'},
 {'name': 'Platform', 'group': 'right'},
 {'name': 'Puzzle', 'group': 'right'},
 {'name': 'Racing', 'group': 'right'},
 {'name': 'RPG', 'group': 'right'},
 {'name': 'Shooter', 'group': 'right'},
 {'name': 'Simulation', 'group': 'right'},
 {'name': 'Sports', 'group': 'right'},
 {'name': 'Strategy', 'group': 'right'}]
links = []

for index, link in df_links.iterrows():
    links.append({"source": link['Publisher'],
                  "target": link['Genre'],
                 "value": link[0]})
    
links

SplitChord(links, nodes, colors=colors, margin=140).show()
PlotAPI - Chord Diagram
import json

data = {"links": links,
        "nodes": nodes,
        "colors": colors}

with open("video_games.json", "w") as fp:
    json.dump(data, fp)
    
    
    

Previous
Showcase