Suppose (to simplify) I have a table containing some control vs. treatment data:
Which, Color, Response, Count
Control, Red, 2, 10
Control, Blue, 3, 20
Treatment
The cast
function from the reshape
package (not to be confused with the reshape
function in base R) can do this and many other things. See here: http://had.co.nz/reshape/
To add to the options (many years later)....
The typical approach in base R would involve the reshape
function (which is generally unpopular because of the multitude of arguments that take time to master). It's a pretty efficient function for smaller datasets, but doesn't always scale well.
reshape(mydf, direction = "wide", idvar = "Color", timevar = "Which")
# Color Response.Control Count.Control Response.Treatment Count.Treatment
# 1 Red 2 10 1 14
# 2 Blue 3 20 4 21
Already covered are cast
/dcast
from the "reshape" and "reshape2" (and now, dcast.data.table
from "data.table", especially useful when you have large datasets). But also from the Hadleyverse, there's "tidyr", which works nicely with the "dplyr" package:
library(tidyr)
library(dplyr)
mydf %>%
gather(var, val, Response:Count) %>% ## make a long dataframe
unite(RN, var, Which) %>% ## combine the var and Which columns
spread(RN, val) ## make the results wide
# Color Count_Control Count_Treatment Response_Control Response_Treatment
# 1 Blue 20 21 3 4
# 2 Red 10 14 2 1
Also to note would be that in a forthcoming version of "data.table", the dcast.data.table
function should be able to handle this without having to first melt
your data.
The data.table
implementation of dcast
allows you to convert multiple columns to a wide format without melting it first, as follows:
library(data.table)
dcast(as.data.table(mydf), Color ~ Which, value.var = c("Response", "Count"))
# Color Response_Control Response_Treatment Count_Control Count_Treatment
# 1: Blue 3 4 20 21
# 2: Red 2 1 10 14
Reshape does indeed work for pivoting a skinny data frame (e.g., from a simple SQL query) to a wide matrix, and is very flexible, but it's slow. For large amounts of data, very very slow. Fortunately, if you only want to pivot to a fixed shape, it's fairly easy to write a little C function to do the pivot fast.
In my case, pivoting a skinny data frame with 3 columns and 672,338 rows took 34 seconds with reshape, 25 seconds with my R code, and 2.3 seconds with C. Ironically, the C implementation was probably easier to write than my (tuned for speed) R implementation.
Here's the core C code for pivoting floating point numbers. Note that it assumes that you have already allocated a correctly sized result matrix in R before calling the C code, which causes the R-devel folks to shudder in horror:
#include <R.h>
#include <Rinternals.h>
/*
* This mutates the result matrix in place.
*/
SEXP
dtk_pivot_skinny_to_wide(SEXP n_row ,SEXP vi_1 ,SEXP vi_2 ,SEXP v_3 ,SEXP result)
{
int ii, max_i;
unsigned int pos;
int nr = *INTEGER(n_row);
int * aa = INTEGER(vi_1);
int * bb = INTEGER(vi_2);
double * cc = REAL(v_3);
double * rr = REAL(result);
max_i = length(vi_2);
/*
* R stores matrices by column. Do ugly pointer-like arithmetic to
* map the matrix to a flat vector. We are translating this R code:
* for (ii in 1:length(vi.2))
* result[((n.row * (vi.2[ii] -1)) + vi.1[ii])] <- v.3[ii]
*/
for (ii = 0; ii < max_i; ++ii) {
pos = ((nr * (bb[ii] -1)) + aa[ii] -1);
rr[pos] = cc[ii];
/* printf("ii: %d \t value: %g \t result index: %d \t new value: %g\n", ii, cc[ii], pos, rr[pos]); */
}
return(result);
}
Using the reshape package.
First, melt your data.frame:
x <- melt(df)
Then cast:
dcast(x, Color ~ Which + variable)
Depending on which version of the reshape package you're working with it could be cast()
(reshape) or dcast()
(reshape2)
Voila.