Journal Basic and Applied Social Psychology (BASP) bans the use of statistical hypothesis testing.
The BASP editorial by Trafimow and Marks here.
The story have also been covered by:
And discussed in/by, among others:
Where this will go, I wonder…
Three days ago Nature published a note commenting on an recent heated social media discussions whether MS Word is better than LaTeX for writing scientific papers. The note refers to a PLOS article by Knauf & Nejasmic reporting a study on word-processor use. The overall result of that study is that participants who used Word took less time and made less mistakes in reproducing the probe text as compared to people who used LaTeX.
I find it rather funny that Nature picked-up the topic. Such discussions always seemed rather futile to me (de gustibus non disputandum est and the fact that some solution A is better or more “efficient” than B does not necessarily lead to A becoming accepted, as is the case with QWERTY vs Dvorak keyboard layouts) and far away from anything scientific.
As it goes for myself, I do not like Word nor its Linux counterparts (LibreOffice, Abiword etc), let’s call them WYSIWYGs. First and foremost because I believe they are very poor text editors (as compared to Vim or Emacs): it is cumbersome to navigate longer texts, search. The fact that it is convenient to read a piece of text in, say, Times New Roman does not mean that it is convenient to write using it. Second, when writing in WYSIWYGs I always have an impression that I am handcrafting something: formatting, styles and so on. It is like sculpturing: if you don’t like the result you need to get another piece of wood and start from the beginning. All that seems to counter the main purpuse for which the computers were developed in the first place, which is taking over “mechanistic” tasks and leave “creative” ones to the user.
I like that the Nature note referred to Markdown as an emerging technology for writing [scientific] texts. If do not know, Markdown is a lightweight plain text format, not unlike Wikipedia markup. Texts written in Markdown can be processed to PDF, HTML, MSWord and so on. More and more people are using for writing articles or even books. It is simple (plain text) and allows to focus on writing.
Last, the note still contains a popular misconception that one of the downsides of LaTeX is a lack of spell checker…
ICM, jak co roku, organizuje praktyki dla studentów. W tym roku poszukuję osoby, która byłaby zainteresowana pracą nad stworzeniem aplikacji umożliwiającej interaktywną wizualizację danych sieciowych.
Oferujemy pracę w młodej i dynamicznej grupie badaczy sieci oraz nawiązanie kontaktów z zagranicznym zespołem naukowym.
Wymagania (pierwsze jest warunkiem koniecznym, pozostałe będą dodatkowymi atutami):
- Programowanie w R
- Tworzenie aplikacji Shiny
- Znajomość biblioteki D3js
- Znajomość metod Social Network Analysis (SNA)
Jeżeli jesteś zainteresowany, wypełnij formularz na stronie ICM! Mój temat ma numer 22.
Parallel coordinates plot is one of the tools for visualizing multivariate data. Every observation in a dataset is represented with a polyline that crosses a set of parallel axes corresponding to variables in the dataset. You can create such plots in R using a function
parcoord in package MASS. For example, we can create such plot for the built-in dataset mtcars:
library(MASS) library(colorRamps) data(mtcars) k <- blue2red(100) x <- cut( mtcars$mpg, 100) op <- par(mar=c(3, rep(.1, 3))) parcoord(mtcars, col=k[as.numeric(x)]) par(op)
This produces the plot below. The lines are colored using a blue-to-red color ramp according to the miles-per-gallon variable.
What to do if some of the variables are categorical? One approach is to use polylines with different width. Another approach is to add some random noise (jitter) to the values. Titanic data is a crossclassification of Titanic passengers according to class, gender, age, and survival status (survived or not). Consequently, all variables are categorical. Let’s try the jittering approach. After converting the crossclassification (R
table) to data frame we “blow it up” by repeating observations according to their frequency in the table.
data(Titanic) # convert to data frame of numeric variables titdf <- as.data.frame(lapply(as.data.frame(Titanic), as.numeric)) # repeat obs. according to their frequency titdf2 <- titdf[ rep(1:nrow(titdf), titdf$Freq) , ] # new columns with jittered values titdf2[,6:9] <- lapply(titdf2[,1:4], jitter) # colors according to survival status, with some transparency k <- adjustcolor(brewer.pal(3, "Set1")[titdf2$Survived], alpha=.2) op <- par(mar=c(3, 1, 1, 1)) parcoord(titdf2[,6:9], col=k) par(op)
This produces the following (red lines are for passengers who did not survive):
It is not so easy to read, is it. Did the majority of 1st class passengers (bottom category on leftmost axis) survived or not? Definitely most of women from that class did, but in aggregate?
At this point it would be nice to, instead of drawing a bunch of lines, to draw segments for different groups of passengers. Later I learned that such plot exists and even has a name: alluvial diagram. They seem to be related to Sankey diagrams blogged about on R-bloggers recently, e.g. here. What is more, I was not alone in thinking how to create such a thing with R, see for example here. Later I found that what I need is a “parallel set” plot, as it was called, and implemented, on CrossValidated here. Thats look terrific to me, nevertheless, I still would prefer to:
- The axes to be vertical. If the variables correspond to measurements on different points in time, then we should have nice flows from left to right.
- If only the segments could be smooth curves, e.g. splines or Bezier curves…
See the following examples of using
alluvial on Titanic data:
First, just using two variables Class and Survival, and with stripes being simple polygons.
This was produced with the code below.
# load packages and prepare data library(alluvial) tit <- as.data.frame(Titanic) # only two variables: class and survival status tit2d <- aggregate( Freq ~ Class + Survived, data=tit, sum) alluvial( tit2d[,1:2], freq=tit2d$Freq, xw=0.0, alpha=0.8, gap.width=0.1, col= "steelblue", border="white", layer = tit2d$Survived != "Yes" )
The function accepts data as (collection of) vectors or data frames. The
xw argument specifies the position of the knots of xspline relative to the axes. If positive, the knot is further away from the axis, which will make the stripes go horizontal longer before turning towards the other axis. Argument
gap.width specifies distances between categories on the axes.
Another example is showing the whole Titanic data. Red stripes for those who did not survive.
Now its possible to see that, e.g.:
- A bit more than 50% of 1st class passangers survived
- Women who did not survive come almost exclusively from 3rd class
The plot was produced with:
alluvial( tit[,1:4], freq=tit$Freq, border=NA, hide = tit$Freq < quantile(tit$Freq, .50), col=ifelse( tit$Survived == "No", "red", "gray") )
In this variant the stripes have no borders, color transparency is at 0.5, and for the purpose of the example the plot shows only “thickest” 50% of the stripes (argument
As compared to the parallel set solution mentioned earlier, the main differences are:
- Axes are vertical instead of horizontal
- I used
xsplineto draw the “stripes”
- with argument
hideyou can skip plotting of selected groups of cases
If you have suggestions or ideas for extensions/modifications, let me know on Github!
Stay tuned for more examples from panel data.
These are slides from the very first SER meeting – an R user group in Warsaw – that took place on February 27, 2014. I talked about various “lifehacking” tricks for R and focused how to use R with GNU make effectively. I will post some detailed examples in forthcoming posts.
And so I wrote a post on the Future of ___ PhD yesterday. Today I just learned about this shocking story about a political science PhD looking to be employed as an assistant professor at the University of Wrocław and facing shady realities of (parts of) of Polish higher education… Share and beware.
Fill-in the blank in the title of this post with a name of scientific discipline of choice. Nov 1 issue of NYT features a piece “The Repurposed Ph.D. Finding Life After Academia — and Not Feeling Bad About It”. The gloomy state of affairs described in the article mostly applies to humanities and social sciences, at least in the U.S., but I’m sure it applies to other countries as well. I’m sure it does to Poland too. More and more people are entering the job market with a PhD (at least in Poland as evidence shows). At the same time, available positions are scarce and the pays are low. It is somewhat heart-warming to know that people are self-organizing into groups like “Versatile Ph.D” to support each other in such difficult situation.
The article links to several interesting pieces including the “The Future of the Humanities Ph.D. at Stanford” discussing the ways of modifying humanities PhD programs so that humanities training will remain relevant in the society and economy of today. Definitely a worthy read for higher education administrators and decision makers in Poland.
Google Reader was one of my main way of reading Internet. It was great to read news and updates from many websites. For example, I had my own “R bloggers” folder within Google Reader long before Tal Galili created R-bloggers.com. Unfortunately, Google is killing the Reader on July 1. There are several alternatives to the Reader, just search for “google reader alternative”. Meanwhile, I switched to Feedly. It’s pretty cool, although there is a couple of things that annoy me a lot, e.g.: too many content (feed/item) recommendations and keyboard shortcuts are different than in Google Reader. The mobile app (I use Android) is also great although a bit heavy for my Samsung Ace. Nice features include being able to (1) push feed items to Instapaper or Evernote, (2) save selected items for later reading.
And so, I just browsed my Feedly “Saved for later” folder and here are a couple of interesting items from last 30 days:
- Nice R illustrations of multicollinearity.
- Almost like self-analytics in the spirit of Stephen Wolfram, here is a great analysis of infant feeding schedule.
- If you read recently Signal and Noise by Nate Silver, have a look at TED talk by Didier Sornette about predicitng financial crises.
- Computerworld published a nice short introduction to R. Although probably it is not ideal if you never programmed a computer before.
- If you own a computer with Intel processor and are willing to buy Intel MKL, Flavio Barros shows how to integrate it with R. MKL can speed-up many elementary operations, like matrix multiplication, by a factor of 3!
Recent issue of Science brings a very cool paper by Luís M. A. Bettencourt explaining the scaling properties of cities: how things like GDP, crime, traffic congestion etc. depend on city size. Descriptively the relationships seem to follow a simple power-law relation (see this presentation by Geoffrey West). However, as the paper shows, explaining it is not that simple and involves considering many types of interactions and interdependencies.
To finish on a somewhat less geeky note, Warsaw National Museum has a temporary exhibition of Mark Rothko featuring his works from National Gallery of Art in Washington DC, which is a first Polish exhibition of Rothko’s works ever. Accompanying the exhibition, there is a lovely childrend’s guide by Zosia Dzierżawska.
Yesterday I submitted a new version (marked 2.0-0) of package ‘intergraph’ to CRAN. There are some major changes and bug fixes. Here is a summary:
- The package supports “igraph” objects created with ‘igraph’ version 0.6-0 and newer (vertex indexing starting from 1, not 0) only!
- Main functions for converting network data between object classes “igraph” and “network” are now called
- There is a generic function
asDFthat converts network object to a list of two data frames containing (1) edge list with edge attributes and (2) vertex database with vertex attributes
asIgraphallow for creating network objects from data frames (edgelists with edge attributes and vertex databases with vertex attributes).
Usage experiences and bug reports are more than welcome.