Creating a Sankey Diagram in R

4 min readJun 5, 2024

Use R to quickly create an interactive Sankey Chart.

The Sankey Diagram. Image by the author.

What is it?

Do you know the Sankey Graphic/ Diagram?

Sankey diagrams are a powerful visualization tool that uses a diagram to show the flow, movement, and change from one state to another.

The components help us easily understand the graphic:

Each node is a category
Colors help us differentiate them
The node’s size and the bands’ widths are proportional to the flow rate of the category.

In other words, it is a great visualization tool that helps determine which portion of a whole went to each category.
Let’s see an example.

Code

Let’s say you want to show the components of a company’s Total Sales in a given year.
If you plot it in a Sankey Diagram, it’s easy to see because .

First, load the libraries needed to plot the graphic.

# Libraries
library(networkD3)
library(dplyr)

Next, let’s create a data frame. And here is the catch: you should always think about how each node interacts to each other when creating your data frame.

# A connection data frame is a list of flows with intensity for each flow
links <- data.frame(
  source=c("TOTAL SALES", "TOTAL SALES", 
           "Products", "Products", 
           "Services", "Services"), 
  target=c("Products", "Services", "Product A", 'Product B', "Maintenance", 'Upgrade'), 
  value=c(22, 5, 10, 12, 2, 3)
)

Here, we are creating the flow from TOTAL SALES. So, TOTAL SALES is our source, and PRODUCTS and SERVICES are our targets. Therefore, we already need two rows in our data frame: TOTAL SALES | PRODUCTS and TOTAL SALES | SERVICES.

Next, Products will break into Product A and B. So, now Products is our Source, and Product A and Product B are the targets. Thus, two more rows.

The same applies to Services. And then we can add the total amounts for each node. This is how the data frame should look like so far.

Data with sources and targets. Image by the author.

Now, to finalize the visual, the library networdD3 needs the data encoded to be able to work properly. So, the next rows are meant to encode the sources and targets.

# From these flows we need to create a node data frame: it lists every entities involved in the flow
nodes <- data.frame(
  name=c(as.character(links$source), 
         as.character(links$target)) %>% unique()
)

# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
links$IDsource <- match(links$source, nodes$name)-1 
links$IDtarget <- match(links$target, nodes$name)-1

Encoded data with sources and targets. Image by the author.

And finally, let’s build the visual.

# Make the Network
p <- sankeyNetwork(Links = links, Nodes = nodes,
                   Source = "IDsource", Target = "IDtarget",
                   Value = "value", NodeID = "name", 
                   sinksRight=FALSE,
                   fontSize = 12, fontFamily = "Arial Black")
# View
p

Before You Go

Now you have another visual tool in your hands. Make sure to make a good use of it.

Sankey Charts are a great resource to display parts of a whole or when you want to show how a resource was distributed in categories or groups. Observe how intuitive it is to see how much of the Total Sales went to Products and Services.

Certainly, like any other visualization chart, it won’t be the best option every time. When there are too many variables and categories, it can get messy pretty quickly. So have that in mind.

If you liked this quick tutorial, follow me for more content.

Gustavo Santos - Medium

Read writing from Gustavo Santos on Medium. Data Scientist. I extract insights from data to help people and companies…

gustavorsantos.medium.com

Find me on LinkedIn as well.

Reference

Sankey diagram - Wikipedia

Sankey diagrams are a data visualisation technique or flow diagram that emphasizes flow/movement/change from one state…

en.wikipedia.org

Sankey Plot | the R Graph Gallery

How to build a Sankey Plot with R: a set of examples with reproducible code using the networkD3 library.

r-graph-gallery.com