Visualizations Showcase
Video Game Publishers and Genres with Split Chord
Video Game Titles - Publishers and Genres
Preamble
import numpy as np # for multi-dimensional containers
import pandas as pd # for DataFrames
import itertools
from plotapi import SplitChord, Chord
Introduction
The Dataset
The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.
Let's download the mirrored dataset and have a look for ourselves.
data_url = 'https://datacrayon.com/datasets/vgsales.csv'
data = pd.read_csv(data_url)
data.head()
Rank | Name | Platform | Year | Genre | Publisher | NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Wii Sports | Wii | 2006.0 | Sports | Nintendo | 41.49 | 29.02 | 3.77 | 8.46 | 82.74 |
1 | 2 | Super Mario Bros. | NES | 1985.0 | Platform | Nintendo | 29.08 | 3.58 | 6.81 | 0.77 | 40.24 |
2 | 3 | Mario Kart Wii | Wii | 2008.0 | Racing | Nintendo | 15.85 | 12.88 | 3.79 | 3.31 | 35.82 |
3 | 4 | Wii Sports Resort | Wii | 2009.0 | Sports | Nintendo | 15.75 | 11.01 | 3.28 | 2.96 | 33.00 |
4 | 5 | Pokemon Red/Pokemon Blue | GB | 1996.0 | Role-Playing | Nintendo | 11.27 | 8.89 | 10.22 | 1.00 | 31.37 |
capitalise the name, personality, and species of each villager.
data['Publisher'] = data['Publisher'].str.capitalize()
data['Genre'] = data['Genre'].str.capitalize()
It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.
data.shape
(16598, 11)
Perfect, that's exactly what we were expecting.
Data Wrangling
We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1
and Type 2
.
pd.DataFrame(data.columns.values.tolist())
0 | |
---|---|
0 | Rank |
1 | Name |
2 | Platform |
3 | Year |
4 | Genre |
5 | Publisher |
6 | NA_Sales |
7 | EU_Sales |
8 | JP_Sales |
9 | Other_Sales |
10 | Global_Sales |
So let's select just these two columns and work with a list containing only them as we move forward.
species_personality = pd.DataFrame(data[['Publisher', 'Genre']].values).dropna().astype(str)
species_personality
0 | 1 | |
---|---|---|
0 | Nintendo | Sports |
1 | Nintendo | Platform |
2 | Nintendo | Racing |
3 | Nintendo | Sports |
4 | Nintendo | Role-playing |
... | ... | ... |
16593 | Kemco | Platform |
16594 | Infogrames | Shooter |
16595 | Activision | Racing |
16596 | 7g//ames | Puzzle |
16597 | Wanadoo | Platform |
16540 rows × 2 columns
Now for the names of our types.
#left = np.unique(pd.DataFrame(species_personality)[0]).tolist()
left = list(data.Publisher.value_counts()[:14].index)
pd.DataFrame(left)
0 | |
---|---|
0 | Electronic arts |
1 | Activision |
2 | Namco bandai games |
3 | Ubisoft |
4 | Konami digital entertainment |
5 | Thq |
6 | Nintendo |
7 | Sony computer entertainment |
8 | Sega |
9 | Take-two interactive |
10 | Capcom |
11 | Atari |
12 | Tecmo koei |
13 | Square enix |
right = np.unique(pd.DataFrame(species_personality)[1]).tolist()
pd.DataFrame(right)
0 | |
---|---|
0 | Action |
1 | Adventure |
2 | Fighting |
3 | Misc |
4 | Platform |
5 | Puzzle |
6 | Racing |
7 | Role-playing |
8 | Shooter |
9 | Simulation |
10 | Sports |
11 | Strategy |
Which we can now use to create the matrix.
features= left+right
d = pd.DataFrame(0, index=features, columns=features)
Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.
We can build a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.
species_personality = list(itertools.chain.from_iterable((i, i[::-1]) for i in species_personality.values))
for x in species_personality:
if(x[0] in left or x[1] in left):
d.at[x[0], x[1]] += 1
d
Electronic arts | Activision | Namco bandai games | Ubisoft | Konami digital entertainment | Thq | Nintendo | Sony computer entertainment | Sega | Take-two interactive | ... | Fighting | Misc | Platform | Puzzle | Racing | Role-playing | Shooter | Simulation | Sports | Strategy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Electronic arts | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 39 | 46 | 16 | 7 | 159 | 35 | 139 | 116 | 561 | 37 |
Activision | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 7 | 103 | 60 | 7 | 74 | 41 | 159 | 23 | 144 | 22 |
Namco bandai games | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 134 | 97 | 19 | 20 | 27 | 151 | 37 | 29 | 51 | 61 |
Ubisoft | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 19 | 151 | 70 | 24 | 52 | 41 | 92 | 119 | 72 | 29 |
Konami digital entertainment | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 20 | 77 | 40 | 10 | 13 | 37 | 40 | 86 | 280 | 28 |
Thq | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 71 | 66 | 85 | 17 | 101 | 8 | 36 | 27 | 31 | 32 |
Nintendo | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 18 | 100 | 112 | 74 | 37 | 106 | 26 | 29 | 55 | 32 |
Sony computer entertainment | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 30 | 128 | 66 | 12 | 65 | 49 | 51 | 15 | 124 | 12 |
Sega | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 37 | 62 | 52 | 22 | 48 | 64 | 40 | 12 | 135 | 35 |
Take-two interactive | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 27 | 11 | 1 | 20 | 6 | 65 | 4 | 151 | 22 |
Capcom | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 58 | 11 | 46 | 6 | 13 | 38 | 25 | 2 | 3 | 3 |
Atari | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 37 | 26 | 21 | 22 | 36 | 28 | 40 | 9 | 56 | 17 |
Tecmo koei | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 12 | 14 | 1 | 0 | 5 | 47 | 3 | 13 | 39 | 50 |
Square enix | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 3 | 6 | 0 | 4 | 0 | 129 | 16 | 4 | 0 | 9 |
Action | 183 | 310 | 248 | 193 | 148 | 194 | 79 | 90 | 101 | 93 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Adventure | 13 | 25 | 58 | 59 | 53 | 47 | 35 | 41 | 31 | 12 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Fighting | 39 | 7 | 134 | 19 | 20 | 71 | 18 | 30 | 37 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Misc | 46 | 103 | 97 | 151 | 77 | 66 | 100 | 128 | 62 | 27 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Platform | 16 | 60 | 19 | 70 | 40 | 85 | 112 | 66 | 52 | 11 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Puzzle | 7 | 7 | 20 | 24 | 10 | 17 | 74 | 12 | 22 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Racing | 159 | 74 | 27 | 52 | 13 | 101 | 37 | 65 | 48 | 20 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Role-playing | 35 | 41 | 151 | 41 | 37 | 8 | 106 | 49 | 64 | 6 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Shooter | 139 | 159 | 37 | 92 | 40 | 36 | 26 | 51 | 40 | 65 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Simulation | 116 | 23 | 29 | 119 | 86 | 27 | 29 | 15 | 12 | 4 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Sports | 561 | 144 | 51 | 72 | 280 | 31 | 55 | 124 | 135 | 151 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Strategy | 37 | 22 | 61 | 29 | 28 | 32 | 32 | 12 | 35 | 22 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
26 rows × 26 columns
Chord Diagram
Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.
left[5] = 'THQ'
left[0] = 'EA'
left[2] = 'Namco'
left[4] = 'Konami'
left[7] = 'Sony'
left[9] = 'Take-Two'
left[-1] = "Square"
left[-2]= 'Tecmo'
left
['EA', 'Activision', 'Namco', 'Ubisoft', 'Konami', 'THQ', 'Nintendo', 'Sony', 'Sega', 'Take-Two', 'Capcom', 'Atari', 'Tecmo', 'Square']
right[7] = 'RPG'
right
['Action', 'Adventure', 'Fighting', 'Misc', 'Platform', 'Puzzle', 'Racing', 'RPG', 'Shooter', 'Simulation', 'Sports', 'Strategy']
colors1 =[
"#ff4400","#ffcc00","#5c6633","#00e63d","#00d6e6","#566d73","#3d85f2","#00fff2","#0000e6","#290066","#ff80e5","#731d28"]
colors2=["#312f85",
"#f4e301",
"#f75802",
"#3e4682",
"#ad0332",
"#666769",
"#e80113",
"#f78700",
"#0100f4",
"#1272c3","#f7cd01","#dd1a22","#00407b","#f70000",
]
colors1.reverse()
colors2.reverse()
colors = colors1 + colors2
names = left + right
Finally, we can put it all together.
Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
font_size_large=7,noun="titles",
details_separator="", bipartite=True, bipartite_idx=len(left),bipartite_size=.2, width=850).show()
right
['Action', 'Adventure', 'Fighting', 'Misc', 'Platform', 'Puzzle', 'Racing', 'RPG', 'Shooter', 'Simulation', 'Sports', 'Strategy']
df_links = pd.DataFrame(data[['Publisher', 'Genre']].values, columns=['Publisher', 'Genre']).dropna().astype(str)
df_links = df_links.value_counts().reset_index()
df_links = df_links.replace("Electronic arts", "EA")
df_links = df_links.replace("Thq", "THQ")
df_links = df_links.replace("Namco bandai games", "Namco")
df_links = df_links.replace("Konami digital entertainment", "Konami")
df_links = df_links.replace("Sony computer entertainment", "Sony")
df_links = df_links.replace("Take-two interactive", "Take-Two")
df_links = df_links.replace("Tecmo koei", "Tecmo")
df_links = df_links.replace("Square enix", "Square")
df_links = df_links.replace("Role-playing", "RPG")
df_links = df_links[df_links['Publisher'].isin(left)]
df_links
Publisher | Genre | 0 | |
---|---|---|---|
0 | EA | Sports | 561 |
1 | Activision | Action | 310 |
2 | Konami | Sports | 280 |
3 | Namco | Action | 248 |
4 | THQ | Action | 194 |
... | ... | ... | ... |
984 | Capcom | Simulation | 2 |
1114 | Take-Two | Puzzle | 1 |
1118 | Take-Two | Fighting | 1 |
1133 | Square | Racing | 1 |
1340 | Tecmo | Platform | 1 |
166 rows × 3 columns
nodes = []
for node in left:
nodes.append({"name":node, "group":"left"})
for node in right:
nodes.append({"name":node, "group":"right"})
nodes
[{'name': 'EA', 'group': 'left'}, {'name': 'Activision', 'group': 'left'}, {'name': 'Namco', 'group': 'left'}, {'name': 'Ubisoft', 'group': 'left'}, {'name': 'Konami', 'group': 'left'}, {'name': 'THQ', 'group': 'left'}, {'name': 'Nintendo', 'group': 'left'}, {'name': 'Sony', 'group': 'left'}, {'name': 'Sega', 'group': 'left'}, {'name': 'Take-Two', 'group': 'left'}, {'name': 'Capcom', 'group': 'left'}, {'name': 'Atari', 'group': 'left'}, {'name': 'Tecmo', 'group': 'left'}, {'name': 'Square', 'group': 'left'}, {'name': 'Action', 'group': 'right'}, {'name': 'Adventure', 'group': 'right'}, {'name': 'Fighting', 'group': 'right'}, {'name': 'Misc', 'group': 'right'}, {'name': 'Platform', 'group': 'right'}, {'name': 'Puzzle', 'group': 'right'}, {'name': 'Racing', 'group': 'right'}, {'name': 'RPG', 'group': 'right'}, {'name': 'Shooter', 'group': 'right'}, {'name': 'Simulation', 'group': 'right'}, {'name': 'Sports', 'group': 'right'}, {'name': 'Strategy', 'group': 'right'}]
links = []
for index, link in df_links.iterrows():
links.append({"source": link['Publisher'],
"target": link['Genre'],
"value": link[0]})
links
SplitChord(links, nodes, colors=colors, margin=140).show()
import json
data = {"links": links,
"nodes": nodes,
"colors": colors}
with open("video_games.json", "w") as fp:
json.dump(data, fp)