Parallel coordinates plot is one of the tools for visualizing multivariate data. Every observation in a dataset is represented with a polyline that crosses a set of parallel axes corresponding to variables in the dataset. You can create such plots in R using a function `parcoord` in package MASS. For example, we can create such plot for the built-in dataset mtcars:

```library(MASS) library(colorRamps)   data(mtcars) k <- blue2red(100) x <- cut( mtcars\$mpg, 100)   op <- par(mar=c(3, rep(.1, 3))) parcoord(mtcars, col=k[as.numeric(x)]) par(op)```

This produces the plot below. The lines are colored using a blue-to-red color ramp according to the miles-per-gallon variable.

What to do if some of the variables are categorical? One approach is to use polylines with different width. Another approach is to add some random noise (jitter) to the values. Titanic data is a crossclassification of Titanic passengers according to class, gender, age, and survival status (survived or not). Consequently, all variables are categorical. Let’s try the jittering approach. After converting the crossclassification (R `table`) to data frame we “blow it up” by repeating observations according to their frequency in the table.

```data(Titanic) # convert to data frame of numeric variables titdf <- as.data.frame(lapply(as.data.frame(Titanic), as.numeric)) # repeat obs. according to their frequency titdf2 <- titdf[ rep(1:nrow(titdf), titdf\$Freq) , ] # new columns with jittered values titdf2[,6:9] <- lapply(titdf2[,1:4], jitter) # colors according to survival status, with some transparency k <- adjustcolor(brewer.pal(3, "Set1")[titdf2\$Survived], alpha=.2) op <- par(mar=c(3, 1, 1, 1)) parcoord(titdf2[,6:9], col=k) par(op)```

This produces the following (red lines are for passengers who did not survive):

It is not so easy to read, is it. Did the majority of 1st class passengers (bottom category on leftmost axis) survived or not? Definitely most of women from that class did, but in aggregate?

At this point it would be nice to, instead of drawing a bunch of lines, to draw segments for different groups of passengers. Later I learned that such plot exists and even has a name: alluvial diagram. They seem to be related to Sankey diagrams blogged about on R-bloggers recently, e.g. here. What is more, I was not alone in thinking how to create such a thing with R, see for example here. Later I found that what I need is a “parallel set” plot, as it was called, and implemented, on CrossValidated here. Thats look terrific to me, nevertheless, I still would prefer to:

• The axes to be vertical. If the variables correspond to measurements on different points in time, then we should have nice flows from left to right.
• If only the segments could be smooth curves, e.g. splines or Bezier curves…

And so I wrote a prototype function `alluvial` (tadaaa!), now in a package alluvial on Github. I strongy relied on code by Aaron from his answer on CrossValidated (hat tip).

See the following examples of using `alluvial` on Titanic data:

First, just using two variables Class and Survival, and with stripes being simple polygons.

This was produced with the code below.

```# load packages and prepare data library(alluvial) tit <- as.data.frame(Titanic)   # only two variables: class and survival status tit2d <- aggregate( Freq ~ Class + Survived, data=tit, sum)   alluvial( tit2d[,1:2], freq=tit2d\$Freq, xw=0.0, alpha=0.8, gap.width=0.1, col= "steelblue", border="white", layer = tit2d\$Survived != "Yes" )```

The function accepts data as (collection of) vectors or data frames. The `xw` argument specifies the position of the knots of xspline relative to the axes. If positive, the knot is further away from the axis, which will make the stripes go horizontal longer before turning towards the other axis. Argument `gap.width` specifies distances between categories on the axes.

Another example is showing the whole Titanic data. Red stripes for those who did not survive.

Now its possible to see that, e.g.:

• A bit more than 50% of 1st class passangers survived
• Women who did not survive come almost exclusively from 3rd class
• etc.

The plot was produced with:

```alluvial( tit[,1:4], freq=tit\$Freq, border=NA, hide = tit\$Freq < quantile(tit\$Freq, .50), col=ifelse( tit\$Survived == "No", "red", "gray") )```

In this variant the stripes have no borders, color transparency is at 0.5, and for the purpose of the example the plot shows only “thickest” 50% of the stripes (argument `hide`).

As compared to the parallel set solution mentioned earlier, the main differences are:

• Axes are vertical instead of horizontal
• I used `xspline` to draw the “stripes”
• with argument `hide` you can skip plotting of selected groups of cases

If you have suggestions or ideas for extensions/modifications, let me know on Github!

Stay tuned for more examples from panel data.

20 Responses leave one →
June 9, 2014

LOVE your alluvial package! I am very interested in making a graph like this: http://www.nature.com/srep/2012/120801/srep00551/fig_tab/srep00551_F7.html. This graph has a series of categories for each year and these categories may change in abundance across year with some frequency. I can seem to arrange my data by year and abundance in such a way that your package will work. Any suggestions?

Here is what my data looks like.
Class, 1991, 2003, 2005, 2009, 2011, 2013
1, 818, 604, 601, 570, 563, 556
2, 183, 147, 145, 143, 142, 150
3, 40, 55, 55, 55, 55, 50
4, 48, 70, 81, 76, 85, 99
5, 126, 140, 142, 148, 155, 153
6, 396, 566, 568, 566, 525, 508
7, 158, 189, 153, 98, 118, 123
8, 244, 206, 238, 296, 269, 247
9, 83, 91, 85, 90, 86, 92
10, 76, 88, 88, 89, 89, 91
11, 28, 30, 30, 30, 30, 30
12, 0, 14, 14, 39, 83, 101

Apologies if this is not the write platform for such a question but with your package in development and all I wasn’t sure where else to go with my questions. THANKS!

• August 7, 2014

I think your data does not really fit an alluvial diagram because it is cross-sectional. We do not know all the flows between the classes. For example, we do not know in what class 881 – 604 = 214 people who left class 1 between 1991 and 2003 end up in…

August 14, 2014

Michal,

Thanks for the replies. There may ave been some confusion in my earlier post. Because I was looking at vegetation at fixed points across time (n=3700 points), I know how each class was transitioned. I ended up using your tool to create transitions between each pair of consecutive sample years then I merged each image in Adobe Illustrator. With a little bit of Illustrator magic I created a nice graph. Again thanks!! I do have a question about how you would like it referenced. I currently have it like this:

“These inter-annual transition tables were developed into an alluvial diagram to visually examine the type and intensity of disturbance and recovery pathways of habitat shift between the series of sampling events. The alluvial tool is in development in R (https://github.com/mbojan/alluvial).”

Do you have something better for a reference?

• November 24, 2014

Hi Bill

The package has just been updated with a function I’ve written to handle this kind of data. Is this what you were after?

Robin

December 19, 2014

That is great! (just got this update via email last week). I will use your new bit of code in the future.

I ended up gluing the individual transitions between years in Illustrator to make my publication figure. The paper is currently in review, but I want to make sure I reference you properly before it goes final. This is all I have currently: “The alluvial graphical tool is currently in development in R (https://github.com/mbojan/alluvial).” Would you like something better?

Keep up the great work!

Bill

June 11, 2014

Sorry for the typos in my last post, it was a late night. I played with the apps in http://www.mapequation.org/ they add on one year at a time and build the alluvial charts by adding together each section, but not with the frequency type data in my example in my last post or in the Titanic data set I ended up doing the same thing with your code and my annual frequency data set by creating multiple graphs and attaching them in Photoshop. You have a great tool here and I am sure the folks at iGraph (http://igraph.org/redirect.html) would include it in their package. It would be nice if we could make a multi-year graphs tho. Maybe you were trying to get there in the POLPAN example, but I could not get that to work. The POLPAN data was not included and when I found it on the GESIS site, I couldn’t make it work then either. THANKS for you efforts!!

3. June 13, 2014

Hi, Michał!

That’s an awesome work from you (as usual)!

The third picture reminded me a similar task, I need to solve.

I need to visualize the results of my project on regrouping students. In particular, I need to show the connections between “old” and “new” groups to demonstrate how many people from each of the “old” groups moved to each of the new groups. The only difference from your graph is that I want to show the numbers of people in groups (squares) and numbers on the “ties”.

How easy is it to modify your graph for that purpose? Or I should better use some SNA visualization package/software like Gephi or igraph?

July 7, 2014

Hi,
I wanted to test your package alluvial in order to make some graphs, but as I have no really competences with R, I got problems to install him.
I can make the install “Installer depuis le fichier .zip”, it appears in the win-library, but when I call library(alluvial) or library(), it says “‘alluvial’ n’est pas un nom correct de package installé” (‘alluvial’ is not the correct name). It seems to be present otherwise it would say “aucun package trouvé” (no package found).
Could the problem come from the fact I use a french version of R ?
I try to find an answer, but I do not find something understandable.
I try to change ‘alluvial-master’ in ‘alluvial’, but even with this change, it doesn’t work.
Best regards
A.C. Bronner

• August 7, 2014

Install package ‘devtools’ and then

library(devtools)
install_github(“mbojan/alluvial”)

November 30, 2015

Hi, great example
i tried to download the package and run it but is not working with R 3.2.2 any chance for an update?
Uriel

• November 30, 2015

Hi,

I will need more details from you as it works with R 3.2.2 for me.

Can you please file an issue on Github here https://github.com/mbojan/alluvial/issues, but give more details how are you installing the package and what errors are you getting?

Thanks

November 30, 2015

Hi Michal, Thanks for the fast reply.
I added an issue on github
BR

Uriel

December 4, 2015

Just stoped by to say Great work! Many thanks.

• December 16, 2015

Thanks! 🙂

7. June 22, 2016

Awesome Job, thank you! Just what I was looking for ….

• June 23, 2016

Thanks. If your graph will be published somewhere (paper/web etc.) can you send me a link? I’m curious about in what types problems and data people use `alluvial()` for.