I would like to split a data frame with thousands of columns. The data frame looks like this:
# sample data of four columns
sample <-read.table(stdin(),header
Here's an alternative solution that instead of generating multiple rows, generates a bit mask for each value indicating presence or absence of the "0" "1" "2" etc bit.
> sample <-read.table(stdin(),header=TRUE,sep="",
row.names=1,colClasses="character")
0: POS v1 v2 v3 v4
1: 152 0 0/1 0/2 0/1/2
2: 73 1 0 0/1 0/1
3: 185 0 1 0/3 0
4:
> # transform the strings into bit masks
> B<-function(X)lapply(strsplit(X,"/"),
function(n)Reduce(bitOr,bitwShiftL(1,as.numeric(n)),0))
> B("0/1")
[[1]]
[1] 3
> # apply it everywhere
> s<-colwise(B)(sample)
> rownames(s)<-rownames(sample)
> s
v1 v2 v3 v4
152 1 3 5 7
73 2 1 3 3
185 1 2 9 1
While it's not what you asked for, assuming the set of enum values is small (0,1,2) it is much much more efficient in storage space and can be processed easily:
Which elements have v1 "0" and v3 "0" and "1"
> subset(s, bitAnd(v1,B("0")) & bitAnd(v4,B("0/1")))
v1 v2 v3 v4
152 1 3 5 7
185 1 2 9 1