Suppose (to simplify) I have a table containing some control vs. treatment data:
Which, Color, Response, Count
Control, Red, 2, 10
Control, Blue, 3, 20
Treatment
Reshape does indeed work for pivoting a skinny data frame (e.g., from a simple SQL query) to a wide matrix, and is very flexible, but it's slow. For large amounts of data, very very slow. Fortunately, if you only want to pivot to a fixed shape, it's fairly easy to write a little C function to do the pivot fast.
In my case, pivoting a skinny data frame with 3 columns and 672,338 rows took 34 seconds with reshape, 25 seconds with my R code, and 2.3 seconds with C. Ironically, the C implementation was probably easier to write than my (tuned for speed) R implementation.
Here's the core C code for pivoting floating point numbers. Note that it assumes that you have already allocated a correctly sized result matrix in R before calling the C code, which causes the R-devel folks to shudder in horror:
#include
#include
/*
* This mutates the result matrix in place.
*/
SEXP
dtk_pivot_skinny_to_wide(SEXP n_row ,SEXP vi_1 ,SEXP vi_2 ,SEXP v_3 ,SEXP result)
{
int ii, max_i;
unsigned int pos;
int nr = *INTEGER(n_row);
int * aa = INTEGER(vi_1);
int * bb = INTEGER(vi_2);
double * cc = REAL(v_3);
double * rr = REAL(result);
max_i = length(vi_2);
/*
* R stores matrices by column. Do ugly pointer-like arithmetic to
* map the matrix to a flat vector. We are translating this R code:
* for (ii in 1:length(vi.2))
* result[((n.row * (vi.2[ii] -1)) + vi.1[ii])] <- v.3[ii]
*/
for (ii = 0; ii < max_i; ++ii) {
pos = ((nr * (bb[ii] -1)) + aa[ii] -1);
rr[pos] = cc[ii];
/* printf("ii: %d \t value: %g \t result index: %d \t new value: %g\n", ii, cc[ii], pos, rr[pos]); */
}
return(result);
}