Visualizations Showcase
Preamble
Preamble
import numpy as np # for multi-dimensional containers
import pandas as pd # for DataFrames
import itertools
from plotapi import Chord
import json
Introduction
In previous sections, we visualised co-occurrences of Pokémon type. Whilst it was interesting to look at, the dataset only contained Pokémon from the first six geerations. In this section, we're going to use the TidyTuesday Animal Crossing villagers dataset to visualise the relationship between Species and .
The Dataset
The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.
Let's download the mirrored dataset and have a look for ourselves.
data_url = 'ac.csv'
data = pd.read_csv(data_url)
data["url"] = "https://datacrayon.com/datasets/ac_img/"+data["image_name"]+".png"
data.head()
Unnamed: 0 | birthday | gender | image_name | image_url | name | personality | phrase | species | url | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 11-Aug | Male | Ace | https://dodo.ac/np/images/9/91/Ace_amiibo.png | Ace | Jock | ace | Bird | https://datacrayon.com/datasets/ac_img/Ace.png |
1 | 1 | 27-Jan | Male | Admiral | https://dodo.ac/np/images/thumb/e/ed/Admiral_N... | Admiral | Cranky | aye aye | Bird | https://datacrayon.com/datasets/ac_img/Admiral... |
2 | 2 | 02-Jul | Female | AgentS | https://dodo.ac/np/images/thumb/a/a7/Agent_S_N... | Agent S | Peppy | sidekick | Squirrel | https://datacrayon.com/datasets/ac_img/AgentS.png |
3 | 3 | 21-Apr | Female | Agnes | https://dodo.ac/np/images/thumb/4/4e/Agnes_NH_... | Agnes | Big sister | snuffle | Pig | https://datacrayon.com/datasets/ac_img/Agnes.png |
4 | 4 | 18-Oct | Male | Al | https://dodo.ac/np/images/thumb/c/c4/Al_NH.png... | Al | Lazy | hoo hoo ha | Gorilla | https://datacrayon.com/datasets/ac_img/Al.png |
capitalise the name, personality, and species of each villager.
data['name'] = data['name'].str.title()
data['personality'] = data['personality'].str.title()
data['species'] = data['species'].str.title()
It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.
data.shape
(413, 10)
Perfect, that's exactly what we were expecting.
Data Wrangling
We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1
and Type 2
.
pd.DataFrame(data.columns.values.tolist())
0 | |
---|---|
0 | Unnamed: 0 |
1 | birthday |
2 | gender |
3 | image_name |
4 | image_url |
5 | name |
6 | personality |
7 | phrase |
8 | species |
9 | url |
So let's select just these two columns and work with a list containing only them as we move forward.
species_personality = pd.DataFrame(data[['species', 'personality']].values)
species_personality
0 | 1 | |
---|---|---|
0 | Bird | Jock |
1 | Bird | Cranky |
2 | Squirrel | Peppy |
3 | Pig | Big Sister |
4 | Gorilla | Lazy |
... | ... | ... |
408 | Wolf | Cranky |
409 | Koala | Snooty |
410 | Deer | Smug |
411 | Anteater | Normal |
412 | Octopus | Lazy |
413 rows × 2 columns
Now for the names of our types.
left = np.unique(pd.DataFrame(species_personality)[0]).tolist()
pd.DataFrame(left)
0 | |
---|---|
0 | Alligator |
1 | Anteater |
2 | Bear |
3 | Bear Cub |
4 | Bird |
5 | Bull |
6 | Cat |
7 | Chicken |
8 | Cow |
9 | Deer |
10 | Dog |
11 | Duck |
12 | Eagle |
13 | Elephant |
14 | Frog |
15 | Goat |
16 | Gorilla |
17 | Hamster |
18 | Hippo |
19 | Horse |
20 | Kangaroo |
21 | Koala |
22 | Lion |
23 | Monkey |
24 | Mouse |
25 | Octopus |
26 | Ostrich |
27 | Penguin |
28 | Pig |
29 | Rabbit |
30 | Rhinoceros |
31 | Sheep |
32 | Squirrel |
33 | Tiger |
34 | Wolf |
right = np.unique(pd.DataFrame(species_personality)[1]).tolist()
pd.DataFrame(right)
0 | |
---|---|
0 | Big Sister |
1 | Cranky |
2 | Jock |
3 | Lazy |
4 | Normal |
5 | Peppy |
6 | Smug |
7 | Snooty |
Which we can now use to create the matrix.
features= left+right
d = pd.DataFrame(0, index=features, columns=features)
Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.
We can build a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.
species_personality = list(itertools.chain.from_iterable((i, i[::-1]) for i in species_personality.values))
for x in species_personality:
d.at[x[0], x[1]] += 1
d
Chord Diagram
Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.
colors =["#ff2200","#ffcc00","#ace6cb","#0057d9","#633366","#73341d","#665f00","#00ffcc","#001433","#e6acda","#ffa280","#eeff00","#336663","#001f73","#ff00aa","#ffd9bf","#f2ffbf","#36ced9","#737399","#73003d","#ff8800","#44ff00","#00a2f2","#6600ff","#ff0044","#99754d","#416633","#004d73","#5e008c","#bf606c","#332200","#60bf60","#acd2e6","#e680ff","#66333a","#3d005c","#6e0060","#99005d","#bd0055","#db2f48","#f05738","#fc7e23","#ffa600"]
names = left + right
Finally, we can put it all together.
Chord(d.values.tolist(), names,colors=colors, wrap_labels=False, margin=40, font_size_large=10).show()
Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
margin=40, font_size_large=7,noun="villagers",
details_separator="", divide=True, divide_idx=len(left),divide_size=.2, width=850).show()
Chord Diagram with Names
It would be nice to show a list of Pokémon names when hovering over co-occurring Pokémon types. To do this, we can make use of the optional details
parameter.
Next, we'll create an empty multi-dimensional array with the same shape as our matrix
.
details = np.empty((len(names),len(names)),dtype=object)
details_thumbs = np.empty((len(names),len(names)),dtype=object)
Now we can populate the details
array with lists of Pokémon names in the correct positions.
for count_x, item_x in enumerate(names):
for count_y, item_y in enumerate(names):
details_urls = data[
(data['species'].isin([item_x, item_y])) &
(data['personality'].isin([item_y, item_x]))]['url'].to_list()
details_names = data[
(data['species'].isin([item_x, item_y])) &
(data['personality'].isin([item_y, item_x]))]['name'].to_list()
urls_names = np.column_stack((details_urls, details_names))
if(urls_names.size > 0):
details[count_x][count_y] = details_names
details_thumbs[count_x][count_y] = details_urls
else:
details[count_x][count_y] = []
details_thumbs[count_x][count_y] = []
details=pd.DataFrame(details).values.tolist()
details_thumbs=pd.DataFrame(details_thumbs).values.tolist()
len(right)
8
Finally, we can put it all together but this time with the details
matrix passed in.
Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
padding=0, font_size_large=10,details=details,details_thumbs=details_thumbs,noun="villagers",
details_separator="", divide=True, divide_idx=len(left),divide_size=.2, width=800).show()
np.empty(shape=(6,1)).tolist()
[[2.291755454583e-312], [2.22809558106e-312], [2.143215749443e-312], [2.37663528627e-312], [2.29175545472e-312], [0.0]]
data
Unnamed: 0 | birthday | gender | image_name | image_url | name | personality | phrase | species | url | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 11-Aug | Male | Ace | https://dodo.ac/np/images/9/91/Ace_amiibo.png | Ace | Jock | ace | Bird | https://datacrayon.com/datasets/ac_img/Ace.png |
1 | 1 | 27-Jan | Male | Admiral | https://dodo.ac/np/images/thumb/e/ed/Admiral_N... | Admiral | Cranky | aye aye | Bird | https://datacrayon.com/datasets/ac_img/Admiral... |
2 | 2 | 02-Jul | Female | AgentS | https://dodo.ac/np/images/thumb/a/a7/Agent_S_N... | Agent S | Peppy | sidekick | Squirrel | https://datacrayon.com/datasets/ac_img/AgentS.png |
3 | 3 | 21-Apr | Female | Agnes | https://dodo.ac/np/images/thumb/4/4e/Agnes_NH_... | Agnes | Big Sister | snuffle | Pig | https://datacrayon.com/datasets/ac_img/Agnes.png |
4 | 4 | 18-Oct | Male | Al | https://dodo.ac/np/images/thumb/c/c4/Al_NH.png... | Al | Lazy | hoo hoo ha | Gorilla | https://datacrayon.com/datasets/ac_img/Al.png |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
408 | 408 | 25-Nov | Male | Wolfgang | https://dodo.ac/np/images/thumb/a/aa/Wolfgang_... | Wolfgang | Cranky | snarrrl | Wolf | https://datacrayon.com/datasets/ac_img/Wolfgan... |
409 | 409 | 20-Jul | Female | Yuka | https://dodo.ac/np/images/thumb/c/ca/Yuka_NH.p... | Yuka | Snooty | tsk tsk | Koala | https://datacrayon.com/datasets/ac_img/Yuka.png |
410 | 410 | 07-Jun | Male | Zell | https://dodo.ac/np/images/thumb/c/c0/Zell_NH.p... | Zell | Smug | pronk | Deer | https://datacrayon.com/datasets/ac_img/Zell.png |
411 | 411 | 10-Feb | Female | Zoe | https://dodo.ac/np/images/0/0b/Zoe_amiibo.png | Zoe | Normal | whiiifff | Anteater | https://datacrayon.com/datasets/ac_img/Zoe.png |
412 | 412 | 08-Mar | Male | Zucker | https://dodo.ac/np/images/thumb/7/7f/Zucker_NH... | Zucker | Lazy | bloop | Octopus | https://datacrayon.com/datasets/ac_img/Zucker.png |
413 rows × 10 columns
data['gender'] = data['gender'].str.capitalize()
data['gender'] = data['gender'].replace("Male",'<span class="gender_male">♂</span>')
data['gender'] = data['gender'].replace("Female",'<span class="gender_female">♀</span>')
data_table = data[["personality", 'species', 'image_url', "gender", "birthday"]]
data_table["phrase"] = '<i>“'+data['phrase'] +'”</i>'
data_table['name'] = data['name'].str.title()
data_table["URL"] = '<img src="'+data['url'] +'">'
# types combined
data_table["image_url"] = '<span class="'+data_table["personality"].str.replace(" ","")+'">'+data_table["personality"]+'</span> <span class="'+data_table["species"].str.replace(" ","")+'">'+data_table["species"]+'</span>'
data_table.columns = ['Personality', 'Species', 'Personality × Species', ' ', 'Bday', 'Phrase', 'Name', '']
data_table = data_table.to_csv(index=False)
/var/folders/2j/4xqwdh2x1d11bcq2xhgd65r40000gn/T/ipykernel_47411/3461381860.py:10: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy data_table["phrase"] = '“'+data['phrase'] +'”' /var/folders/2j/4xqwdh2x1d11bcq2xhgd65r40000gn/T/ipykernel_47411/3461381860.py:12: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy data_table['name'] = data['name'].str.title() /var/folders/2j/4xqwdh2x1d11bcq2xhgd65r40000gn/T/ipykernel_47411/3461381860.py:16: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy data_table["URL"] = '' /var/folders/2j/4xqwdh2x1d11bcq2xhgd65r40000gn/T/ipykernel_47411/3461381860.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy data_table["image_url"] = ''+data_table["personality"]+' '+data_table["species"]+''
import json
import json
dataa = {"matrix": d.values.tolist(),
"names": names,
"details_thumbs": details_thumbs,
"details": details,
"bipartite_idx": len(left),
"colors" :["#ff2200","#ffcc00","#ace6cb","#0057d9","#633366","#73341d","#665f00","#00ffcc","#001433","#e6acda","#ffa280","#D5E500","#336663","#001f73","#ff00aa","#ffd9bf","#f2ffbf","#36ced9","#737399","#73003d","#ff8800","#44ff00","#00a2f2","#6600ff","#ff0044","#99754d","#416633","#004d73","#5e008c","#bf606c","#332200","#60bf60","#acd2e6","#e680ff","#66333a","#3d005c","#6e0060","#99005d","#bd0055","#db2f48","#f05738","#fc7e23","#ffa600"]
,"data_table":data_table
}
with open("ac_species_personality.json", "w") as fp:
json.dump(dataa, fp)
dataa['colors']
['#ff2200', '#ffcc00', '#ace6cb', '#0057d9', '#633366', '#73341d', '#665f00', '#00ffcc', '#001433', '#e6acda', '#ffa280', '#D5E500', '#336663', '#001f73', '#ff00aa', '#ffd9bf', '#f2ffbf', '#36ced9', '#737399', '#73003d', '#ff8800', '#44ff00', '#00a2f2', '#6600ff', '#ff0044', '#99754d', '#416633', '#004d73', '#5e008c', '#bf606c', '#332200', '#60bf60', '#acd2e6', '#e680ff', '#66333a', '#3d005c', '#6e0060', '#99005d', '#bd0055', '#db2f48', '#f05738', '#fc7e23', '#ffa600']
for idx, n in enumerate(dataa['names']):
print("."+dataa['names'][idx]+"{background-color:"+dataa['colors'][idx]+"}")
.Alligator{background-color:#ff2200} .Anteater{background-color:#ffcc00} .Bear{background-color:#ace6cb} .Bear Cub{background-color:#0057d9} .Bird{background-color:#633366} .Bull{background-color:#73341d} .Cat{background-color:#665f00} .Chicken{background-color:#00ffcc} .Cow{background-color:#001433} .Deer{background-color:#e6acda} .Dog{background-color:#ffa280} .Duck{background-color:#D5E500} .Eagle{background-color:#336663} .Elephant{background-color:#001f73} .Frog{background-color:#ff00aa} .Goat{background-color:#ffd9bf} .Gorilla{background-color:#f2ffbf} .Hamster{background-color:#36ced9} .Hippo{background-color:#737399} .Horse{background-color:#73003d} .Kangaroo{background-color:#ff8800} .Koala{background-color:#44ff00} .Lion{background-color:#00a2f2} .Monkey{background-color:#6600ff} .Mouse{background-color:#ff0044} .Octopus{background-color:#99754d} .Ostrich{background-color:#416633} .Penguin{background-color:#004d73} .Pig{background-color:#5e008c} .Rabbit{background-color:#bf606c} .Rhinoceros{background-color:#332200} .Sheep{background-color:#60bf60} .Squirrel{background-color:#acd2e6} .Tiger{background-color:#e680ff} .Wolf{background-color:#66333a} .Big Sister{background-color:#3d005c} .Cranky{background-color:#6e0060} .Jock{background-color:#99005d} .Lazy{background-color:#bd0055} .Normal{background-color:#db2f48} .Peppy{background-color:#f05738} .Smug{background-color:#fc7e23} .Snooty{background-color:#ffa600}
Chord(
dataa["matrix"],
dataa["names"],
colors=dataa["colors"],
details=dataa["details"],
details_thumbs=dataa["details_thumbs"],
noun="villagers!",
thumbs_width=50,
curved_labels=True,
popup_width=600,
bipartite=True,
bipartite_idx=dataa["bipartite_idx"],
bipartite_size=0.4,
padding=0.0,
width=800,
font_size_large="15px",
data_table=dataa["data_table"],
data_table_show_indices=False
).show()
Conclusion
In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!