An alluvial diagram (similar to a sankey diagram) visualizes data flowing from one set of values to another. The things being connected are called nodes, and the connections between them are called links. The height of a node is proportional to the amount of data that flows between the two categories or stages.
Use an alluvial diagram when you want to show how the same data moves across several categorical dimensions. In other words, it shows how proportions of the same quantity change between various characteristics. For example: the same number of people could be broken down by different dimensions, like pet preference, age, country of origin, et cetera. The changes between dimensions could also happen over time.
An alluvial diagram can help to highlight patterns and trends that might not be immediately obvious from raw data. This diagram also makes it easier to understand how different factors are influencing the overall outcome.
Use alluvial diagrams for visualizing the flow of data or information between different categories or stages.
Label each category or stage clearly and consistently.
Don't overcomplicate your diagram with too many categories or stages. Don't use more than five stages, and don't use more than five options per stage.
Spacing and positioning
Use 16px for the thickness of the node, and use 8px for the distance between the node and its label. Center-align labels to the node it belongs to. If a node value is very small, keep its height to a minimum of 2px. Use 16px between each node when the node height is bigger than its label. Otherwise, make sure there is 8px distance between text labels of the closest nodes.
By default, we place all labels to the right of their nodes, except for the ones of the first category. This way, the diagram is easier to read. However, you may place them all on the right if that works better for your use case.
For very complex alluvial diagrams, the default way of linking nodes causes them to overlap. We solve this by changing the way links are formed, in particular how they are curved. This solution reduces clutter, and brings clarity when interpreting an alluvial diagram with many nodes.
When using a lot of nodes, make sure you’re changing the way the links are curved, to avoid overlapping.
It's much harder to read the diagram when links overlap and don't curve.
Node labels order
By default, the labels are positioned in the following order: the name of the stage is first, its corresponding value is second. This way, when end users are reading the diagram for the first time, they start by scanning the name to get the context of the stage. Only after do they look at its corresponding value.
We offer the option to switch the order of node labels. In this case, the value is shown first, and the corresponding stage second. We advice to do this for scenarios when end users are viewing the alluvial diagram on a regular basis. In this case, we assume that they're familiar with the categories, so it makes sense to emphasize the number by displaying it first.
By default, we display the name of the stage first. This way, end users first scan the name to understand the context of the diagram.
It's possible to display the value first. Use this this option when end uers regularly interact with the diagram.
Color should be used with purpose. If your main message is to show the different proportions at different stages, just use one color throughout the alluvial diagram. If your main message is to highlight a specific flow or category, use a different color to draw attention to it. If the meaning behind a category has a negative or positive connotation, and you want to highlight that, then consider using a "storytelling" color. This could be green, orange, or red. For color details, see our page on colors.
Only add color if it adds meaning to the visualization.
Adding multiple colors for different nodes is misleading, as it suggests additional meaning.
To tell a better story with colors, we recommend using a gradient for the links. This way, the color emphasizes the change of a state, and is connected visually to the beginning and the end of the respective nodes.
A gradient emphasizes the flow of the nodes, and also makes it easier to show that the label of the node is a summary of all scenarios.
It's not a must to use a gradient. However, be aware of how having two nodes of the same color might potentially cause confusion for the label associated with the node they belong to.
The "one node, one color rule" is the default way of visualizing one stage in a flow. However, in some scenarios there could be an exception. In that case, we recommend combining colors into one node, which resembles a stacked bar chart. Only do this for nodes that are either in the beginning or at the end of the diagram.
It's possible to add a multi-colored node, to add an extra dimension or to emphasize a certain message.
Don't add multi-colored nodes in the middle of the flow, as this will make it more difficult to interpret.
When hovering over a node, the closest neighbor nodes are highlighted. When hovering over a link, the two nodes around it are highlighted. The numbers corresponding to the highlighted nodes are also updated accordingly to the highlighted relations. See some examples in the images below.
It's possible to highlight the full flow of the node or link that's being hovered. In that case, it's essential to supply the diagram with extra data, so all possible relationship combinations are calculated.
No data behavior
For an alluvial diagram, it's critical that there are no missing data points. If one or more data points are missing from the data set, then they aren't calculated in the total. This can result in a misleading diagram. Because of that, we advice against n/a data fields. If you can't avoid missing data, then you can manually add an extra node for the unidentified/undefined category.
If missing data is unavoidable, add a dedicated node for it to tell the whole story.
Without an extra category for the missing data, the diagram becomes misleading. The sum of the items (400+254+150+41) doesn't equal 901.
Zero value behavior
Typically, if a value is zero it's not shown. However, if your use case requires the explicit display of a zero value, then we recommend using a height of 2px and making the color gray.