Preamble

import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import itertools
from plotapi import Chord
import json

Introduction

In previous sections, we visualised co-occurrences of Pokémon type. Whilst it was interesting to look at, the dataset only contained Pokémon from the first six geerations. In this section, we're going to use the TidyTuesday Animal Crossing villagers dataset to visualise the relationship between Species and .

The Dataset

The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.

Let's download the mirrored dataset and have a look for ourselves.

data_url = 'ac.csv'
data = pd.read_csv(data_url)


data["url"] = "https://datacrayon.com/datasets/ac_img/"+data["image_name"]+".png"




data.head()

	Unnamed: 0	birthday	gender	image_name	image_url	name	personality	phrase	species	url
0	0	11-Aug	Male	Ace	https://dodo.ac/np/images/9/91/Ace_amiibo.png	Ace	Jock	ace	Bird	https://datacrayon.com/datasets/ac_img/Ace.png
1	1	27-Jan	Male	Admiral	https://dodo.ac/np/images/thumb/e/ed/Admiral_N...	Admiral	Cranky	aye aye	Bird	https://datacrayon.com/datasets/ac_img/Admiral...
2	2	02-Jul	Female	AgentS	https://dodo.ac/np/images/thumb/a/a7/Agent_S_N...	Agent S	Peppy	sidekick	Squirrel	https://datacrayon.com/datasets/ac_img/AgentS.png
3	3	21-Apr	Female	Agnes	https://dodo.ac/np/images/thumb/4/4e/Agnes_NH_...	Agnes	Big sister	snuffle	Pig	https://datacrayon.com/datasets/ac_img/Agnes.png
4	4	18-Oct	Male	Al	https://dodo.ac/np/images/thumb/c/c4/Al_NH.png...	Al	Lazy	hoo hoo ha	Gorilla	https://datacrayon.com/datasets/ac_img/Al.png

capitalise the name, personality, and species of each villager.

data['name'] = data['name'].str.title()
data['personality'] = data['personality'].str.title()
data['species'] = data['species'].str.title()

It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.

data.shape

(413, 10)

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1 and Type 2.

pd.DataFrame(data.columns.values.tolist())

	0
0	Unnamed: 0
1	birthday
2	gender
3	image_name
4	image_url
5	name
6	personality
7	phrase
8	species
9	url

So let's select just these two columns and work with a list containing only them as we move forward.

species_personality = pd.DataFrame(data[['species', 'personality']].values)
species_personality

	0	1
0	Bird	Jock
1	Bird	Cranky
2	Squirrel	Peppy
3	Pig	Big Sister
4	Gorilla	Lazy
...	...	...
408	Wolf	Cranky
409	Koala	Snooty
410	Deer	Smug
411	Anteater	Normal
412	Octopus	Lazy

413 rows × 2 columns

Now for the names of our types.

left = np.unique(pd.DataFrame(species_personality)[0]).tolist()
pd.DataFrame(left)

	0
0	Alligator
1	Anteater
2	Bear
3	Bear Cub
4	Bird
5	Bull
6	Cat
7	Chicken
8	Cow
9	Deer
10	Dog
11	Duck
12	Eagle
13	Elephant
14	Frog
15	Goat
16	Gorilla
17	Hamster
18	Hippo
19	Horse
20	Kangaroo
21	Koala
22	Lion
23	Monkey
24	Mouse
25	Octopus
26	Ostrich
27	Penguin
28	Pig
29	Rabbit
30	Rhinoceros
31	Sheep
32	Squirrel
33	Tiger
34	Wolf

right = np.unique(pd.DataFrame(species_personality)[1]).tolist()
pd.DataFrame(right)

	0
0	Big Sister
1	Cranky
2	Jock
3	Lazy
4	Normal
5	Peppy
6	Smug
7	Snooty

Which we can now use to create the matrix.

features= left+right
d = pd.DataFrame(0, index=features, columns=features)

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

We can build a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.

species_personality = list(itertools.chain.from_iterable((i, i[::-1]) for i in species_personality.values))

for x in species_personality:
    d.at[x[0], x[1]] += 1

Chord Diagram

Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.

colors =["#ff2200","#ffcc00","#ace6cb","#0057d9","#633366","#73341d","#665f00","#00ffcc","#001433","#e6acda","#ffa280","#eeff00","#336663","#001f73","#ff00aa","#ffd9bf","#f2ffbf","#36ced9","#737399","#73003d","#ff8800","#44ff00","#00a2f2","#6600ff","#ff0044","#99754d","#416633","#004d73","#5e008c","#bf606c","#332200","#60bf60","#acd2e6","#e680ff","#66333a","#3d005c","#6e0060","#99005d","#bd0055","#db2f48","#f05738","#fc7e23","#ffa600"]

names = left + right

Finally, we can put it all together.

Chord(d.values.tolist(), names,colors=colors, wrap_labels=False, margin=40, font_size_large=10).show()

PlotAPI - Chord Diagram

Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
      margin=40, font_size_large=7,noun="villagers",
        details_separator="", divide=True, divide_idx=len(left),divide_size=.2, width=850).show()

PlotAPI - Chord Diagram

Chord Diagram with Names

It would be nice to show a list of Pokémon names when hovering over co-occurring Pokémon types. To do this, we can make use of the optional details parameter.

Next, we'll create an empty multi-dimensional array with the same shape as our matrix.

details = np.empty((len(names),len(names)),dtype=object)
details_thumbs = np.empty((len(names),len(names)),dtype=object)

Now we can populate the details array with lists of Pokémon names in the correct positions.

for count_x, item_x in enumerate(names):
    for count_y, item_y in enumerate(names):
        details_urls = data[
            (data['species'].isin([item_x, item_y])) &
            (data['personality'].isin([item_y, item_x]))]['url'].to_list()
        
        details_names = data[
            (data['species'].isin([item_x, item_y])) &
            (data['personality'].isin([item_y, item_x]))]['name'].to_list()
        
        urls_names = np.column_stack((details_urls, details_names))
        if(urls_names.size > 0):
            details[count_x][count_y] = details_names
            details_thumbs[count_x][count_y] = details_urls

        else:
            details[count_x][count_y] = []
            details_thumbs[count_x][count_y] = []

details=pd.DataFrame(details).values.tolist()
details_thumbs=pd.DataFrame(details_thumbs).values.tolist()

len(right)

Finally, we can put it all together but this time with the details matrix passed in.

Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
      padding=0, font_size_large=10,details=details,details_thumbs=details_thumbs,noun="villagers",
        details_separator="", divide=True, divide_idx=len(left),divide_size=.2, width=800).show()

PlotAPI - Chord Diagram

np.empty(shape=(6,1)).tolist()

[[2.291755454583e-312],
 [2.22809558106e-312],
 [2.143215749443e-312],
 [2.37663528627e-312],
 [2.29175545472e-312],
 [0.0]]

data

	Unnamed: 0	birthday	gender	image_name	image_url	name	personality	phrase	species	url
0	0	11-Aug	Male	Ace	https://dodo.ac/np/images/9/91/Ace_amiibo.png	Ace	Jock	ace	Bird	https://datacrayon.com/datasets/ac_img/Ace.png
1	1	27-Jan	Male	Admiral	https://dodo.ac/np/images/thumb/e/ed/Admiral_N...	Admiral	Cranky	aye aye	Bird	https://datacrayon.com/datasets/ac_img/Admiral...
2	2	02-Jul	Female	AgentS	https://dodo.ac/np/images/thumb/a/a7/Agent_S_N...	Agent S	Peppy	sidekick	Squirrel	https://datacrayon.com/datasets/ac_img/AgentS.png
3	3	21-Apr	Female	Agnes	https://dodo.ac/np/images/thumb/4/4e/Agnes_NH_...	Agnes	Big Sister	snuffle	Pig	https://datacrayon.com/datasets/ac_img/Agnes.png
4	4	18-Oct	Male	Al	https://dodo.ac/np/images/thumb/c/c4/Al_NH.png...	Al	Lazy	hoo hoo ha	Gorilla	https://datacrayon.com/datasets/ac_img/Al.png
...	...	...	...	...	...	...	...	...	...	...
408	408	25-Nov	Male	Wolfgang	https://dodo.ac/np/images/thumb/a/aa/Wolfgang_...	Wolfgang	Cranky	snarrrl	Wolf	https://datacrayon.com/datasets/ac_img/Wolfgan...
409	409	20-Jul	Female	Yuka	https://dodo.ac/np/images/thumb/c/ca/Yuka_NH.p...	Yuka	Snooty	tsk tsk	Koala	https://datacrayon.com/datasets/ac_img/Yuka.png
410	410	07-Jun	Male	Zell	https://dodo.ac/np/images/thumb/c/c0/Zell_NH.p...	Zell	Smug	pronk	Deer	https://datacrayon.com/datasets/ac_img/Zell.png
411	411	10-Feb	Female	Zoe	https://dodo.ac/np/images/0/0b/Zoe_amiibo.png	Zoe	Normal	whiiifff	Anteater	https://datacrayon.com/datasets/ac_img/Zoe.png
412	412	08-Mar	Male	Zucker	https://dodo.ac/np/images/thumb/7/7f/Zucker_NH...	Zucker	Lazy	bloop	Octopus	https://datacrayon.com/datasets/ac_img/Zucker.png

413 rows × 10 columns

data['gender'] = data['gender'].str.capitalize()
data['gender'] = data['gender'].replace("Male",'<span class="gender_male">♂</span>')

data['gender'] = data['gender'].replace("Female",'<span class="gender_female">♀</span>')


data_table = data[["personality", 'species', 'image_url', "gender", "birthday"]]


data_table["phrase"] = '<i>“'+data['phrase'] +'”</i>'

data_table['name'] = data['name'].str.title()



data_table["URL"] = '<img src="'+data['url'] +'">'




# types combined
data_table["image_url"] = '<span class="'+data_table["personality"].str.replace(" ","")+'">'+data_table["personality"]+'</span> <span class="'+data_table["species"].str.replace(" ","")+'">'+data_table["species"]+'</span>'



    
data_table.columns = ['Personality', 'Species', 'Personality × Species', ' ', 'Bday', 'Phrase', 'Name', '']

data_table = data_table.to_csv(index=False)

/var/folders/2j/4xqwdh2x1d11bcq2xhgd65r40000gn/T/ipykernel_47411/3461381860.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_table["phrase"] = '“'+data['phrase'] +'”'
/var/folders/2j/4xqwdh2x1d11bcq2xhgd65r40000gn/T/ipykernel_47411/3461381860.py:12: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_table['name'] = data['name'].str.title()
/var/folders/2j/4xqwdh2x1d11bcq2xhgd65r40000gn/T/ipykernel_47411/3461381860.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_table["URL"] = ''
/var/folders/2j/4xqwdh2x1d11bcq2xhgd65r40000gn/T/ipykernel_47411/3461381860.py:22: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_table["image_url"] = ''+data_table["personality"]+' '+data_table["species"]+''

import json

import json

dataa = {"matrix": d.values.tolist(),
        "names": names,
       "details_thumbs": details_thumbs,
        "details": details,
        "bipartite_idx": len(left),
"colors" :["#ff2200","#ffcc00","#ace6cb","#0057d9","#633366","#73341d","#665f00","#00ffcc","#001433","#e6acda","#ffa280","#D5E500","#336663","#001f73","#ff00aa","#ffd9bf","#f2ffbf","#36ced9","#737399","#73003d","#ff8800","#44ff00","#00a2f2","#6600ff","#ff0044","#99754d","#416633","#004d73","#5e008c","#bf606c","#332200","#60bf60","#acd2e6","#e680ff","#66333a","#3d005c","#6e0060","#99005d","#bd0055","#db2f48","#f05738","#fc7e23","#ffa600"]
        ,"data_table":data_table
        }

with open("ac_species_personality.json", "w") as fp:
    json.dump(dataa, fp)

dataa['colors']

['#ff2200',
 '#ffcc00',
 '#ace6cb',
 '#0057d9',
 '#633366',
 '#73341d',
 '#665f00',
 '#00ffcc',
 '#001433',
 '#e6acda',
 '#ffa280',
 '#D5E500',
 '#336663',
 '#001f73',
 '#ff00aa',
 '#ffd9bf',
 '#f2ffbf',
 '#36ced9',
 '#737399',
 '#73003d',
 '#ff8800',
 '#44ff00',
 '#00a2f2',
 '#6600ff',
 '#ff0044',
 '#99754d',
 '#416633',
 '#004d73',
 '#5e008c',
 '#bf606c',
 '#332200',
 '#60bf60',
 '#acd2e6',
 '#e680ff',
 '#66333a',
 '#3d005c',
 '#6e0060',
 '#99005d',
 '#bd0055',
 '#db2f48',
 '#f05738',
 '#fc7e23',
 '#ffa600']

for idx, n in enumerate(dataa['names']):
    print("."+dataa['names'][idx]+"{background-color:"+dataa['colors'][idx]+"}")

.Alligator{background-color:#ff2200}
.Anteater{background-color:#ffcc00}
.Bear{background-color:#ace6cb}
.Bear Cub{background-color:#0057d9}
.Bird{background-color:#633366}
.Bull{background-color:#73341d}
.Cat{background-color:#665f00}
.Chicken{background-color:#00ffcc}
.Cow{background-color:#001433}
.Deer{background-color:#e6acda}
.Dog{background-color:#ffa280}
.Duck{background-color:#D5E500}
.Eagle{background-color:#336663}
.Elephant{background-color:#001f73}
.Frog{background-color:#ff00aa}
.Goat{background-color:#ffd9bf}
.Gorilla{background-color:#f2ffbf}
.Hamster{background-color:#36ced9}
.Hippo{background-color:#737399}
.Horse{background-color:#73003d}
.Kangaroo{background-color:#ff8800}
.Koala{background-color:#44ff00}
.Lion{background-color:#00a2f2}
.Monkey{background-color:#6600ff}
.Mouse{background-color:#ff0044}
.Octopus{background-color:#99754d}
.Ostrich{background-color:#416633}
.Penguin{background-color:#004d73}
.Pig{background-color:#5e008c}
.Rabbit{background-color:#bf606c}
.Rhinoceros{background-color:#332200}
.Sheep{background-color:#60bf60}
.Squirrel{background-color:#acd2e6}
.Tiger{background-color:#e680ff}
.Wolf{background-color:#66333a}
.Big Sister{background-color:#3d005c}
.Cranky{background-color:#6e0060}
.Jock{background-color:#99005d}
.Lazy{background-color:#bd0055}
.Normal{background-color:#db2f48}
.Peppy{background-color:#f05738}
.Smug{background-color:#fc7e23}
.Snooty{background-color:#ffa600}

Chord(
    dataa["matrix"],
    dataa["names"],
    colors=dataa["colors"],
    details=dataa["details"],
    details_thumbs=dataa["details_thumbs"],
    noun="villagers!",
    thumbs_width=50,
    curved_labels=True,
    popup_width=600,
    bipartite=True,
    bipartite_idx=dataa["bipartite_idx"],
    bipartite_size=0.4,
    padding=0.0,
    width=800,
    font_size_large="15px",
    data_table=dataa["data_table"],
    data_table_show_indices=False
).show()

PlotAPI - Chord Diagram

Conclusion

In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!

Preamble

Preamble

Introduction

The Dataset

Data Wrangling

Chord Diagram

Chord Diagram with Names

Conclusion

Pokemon Types with Chord

Animal Crossing Villager Species and Personality

IMDb Top 1000 with Chord

Desktop Browsers Market Share with Pie Fight

Pokemon Trends with Bar Fight

Degree Classification by Graduate Gender with Terminus

Degree Classification by Graduate Ethnicity with Terminus

Global Email Spam with Terminus

Apple 2021 Q3 Results with Sankey

Apple 2021 Q4 Results with Sankey

League of Legends Classes

Pokemon Types with Heat Map

Video Game Publishers and Genres with SplitChord

Top Olympic Medal Earning Countries

League of Legends World Championship

Animal Crossing Villager Style

IMDb Top 1000 with Heat Map

StamiStudios Panels and Colours