We have a brand new Etsy store now! Come visit and support our project!

Published on July 19

Taiwan mangoes interactive visualization write-up

By Julia Janicki

Here is the github repo if you want to refer to the code.

Part 1. Background research & ideation

1.1 Goal & Target audience

To create a visualization of the most common mango varieties in Taiwan to show off the diversity by allowing the user to sort by sweetness or size, and also allow the user to click on a mango to see more details about it.

1.2 What are some initial ideas

Arrange all mangoes in a circle, ordered by a specific attribute starting from the top of the circle.

1.3 Direction after doing some research

Some considerations regarding the type of  story & data viz?

  • Exploration vs presentation → Exploration

  • Static vs interactive → Interactive

  • 2D vs 3D → 2D but with 3D rendered mangoes

  • Charts vs maps vs graphics → A mix of chart plus graphics

1.4 What would be our tools?

→ Javascript, more specifically d3 for the interactive visualization  (By Julia Janicki)

→ Cinema4D for the 3D mangoes (By Jane Guan)

→ Adobe XD for the interface design (By Daisy Chung)

Part 2. Data cleaning / exploration

2.1 Data collection / data sources

Normally I would start by looking for an existing dataset.  But since no curated dataset exists for Taiwanese mangoes, I had to create my own.  The compiled dataset below is based on over ten different sources, most of which were in Chinese.

2.2 Compile data & do some data cleaning
2.2.1 Description of dataset:

We set up a Google sheets to store mango data with the following fields (columns), with each row corresponding to a mango variety

  • name, name_en

  • size_cm

  • sweetness_brix

  • color (didn’t end up using)

  • origin_en, origin

  • feature_en, feature

  • region, region_en

  • year

Here is the link to the CSV & JSON datasets on github.

2.2.2 Add attributes:

To arrange the mangoes into a circle, there are a couple different options:

  • Option 1 is to calculate their positions manually using sine and cosine, which I will not go into detail here.

  • Option 2, what I did  in this case, is to use D3’s cluster layout.

d3.cluster creates node-link diagrams that place leaf nodes of the tree at the same depth.  If you pass it a root to the cluster layout, it would add x & y values to each descendent, in our it’s all the mangoes.  The x and y attributes can then be treated as angles as a radius to produce a radial layout.  Since we only have one level of children nodes, the attributes can then be used to arrange the mangoes in a circle.

Here are my steps:

(1)  I manipulated the data by adding a parent data point.  To do so, I added a column “parent” as well as a new row (first data row above) representing the parent to all the mango varieties.

(2) I converted the CSV file into a json file, and stored the array of objects in a Javascript variable (mangoes) in its own file.

(3) Next I needed to convert the data from tabular to hierarchical data.   This part I did directly in Javascript using D3.   I transformed the data from a flat list format into a hierarchical format by using d3.stratify, so that there is one root which is the parent node and each mango is a leaf node.  Some layouts in D3 require data in a hierarchical format, such as d3.tree or in our case d3.cluster.

(4) D3’s cluster layout is used to produce dendrograms. Since we want to produce a radial layout, for the size we can pass in  [360, radius] which corresponds to a breadth of 360° and a depth of radius, and in this case I passed in null for the radius since the x & y positions will be calculated in the next step.  Then we pass in the root from the previous step to the cluster layout and only get the leaf nodes (as we only have the one parent node plus one level of children nodes) to get the mango data we will work with.

(5) Next I added further attributes to each node:

The x position should minus the centerAdjustment so it starts at the top center.  Then the distance between two mangoes can be calculated based on the difference of that attribute between the first two mangoes (or any two mangoes).

I wrote a function to calculate the other attributes we would need for each node, as we will be doing this multiple times when the data updates when a user sorts them or clicks on a mango.