问题
The data set I'm working with is in Excel. It shows sales of products in both unit and revenue terms for the first 26 weeks of availability.
Each row of data represents a product. Let's say there are 50 of them.
The 2nd header row could basically be reconstructed with rep(("Units","Revenue"),26)
Above each of those ("Units","Revenue") pairs in the 1st header row is a merged pair of cells taking the sequence "Week 1", "Week 2"...."Week 26".
I basically want to convert the dataset from 50 rows to 50*26 = 1300 rows with 4 columns (Product, Week, Units, Sales).
I've seen how to handle two row headers and how to reshape data with the melt function, but I'm not sure I've seen anything that indicates a best practice for combining the two, especially in cases like this where both header rows contain key information needed to reshape the data.
回答1:
It is somwhat abiguous what sort of csv file might result from merged cells but assuming there are twice as many such cells you would first need to read in the first two lines with readLines
using sep=","
, then:
gsub( " ", "", paste( rep( row1[row1 > ""], each=2), c("Units","Revenue"), sep="_") )
To any red-hot moderator: yes, I know code-only answers are deprecated , but I think they should be acceptable for answering code and data-deficient questions.
回答2:
I have run into the same problem many times and have used melt in reshape2 in the past. But here is a function that takes multiple rows of headers as well as multiple columns:
PivReady <- function(data,label_rows,label_columns){
c<-nrow(data)
d<-ncol(data)
pivRdata <- data.frame(matrix(ncol = (label_columns+label_rows+1), nrow = ((c-label_rows)*(d-label_columns))))
for(i in 1:label_columns){
pivRdata[,i]<-rep(data[(label_rows+1):c,i],each=(d-label_columns))
}
trowlabels<-t(data[1:label_rows,(label_columns+1):d])
pivRdata[,(label_columns+1):(label_columns+label_rows)]<-do.call(rbind, replicate(((c-label_rows)*(d-label_columns))/(d-label_columns), trowlabels, simplify=FALSE))
datatrans<-t(data[(label_rows+1):c,(label_columns+1):d])
datatrans<-as.vector(datatrans)
pivRdata[,(label_columns+label_rows+1)]<-as.data.frame(datatrans)
names <- data.frame(matrix(ncol = (label_columns+label_rows+1), nrow = 1))
names[1,1:label_columns]<-as.matrix(data[label_rows,1:label_columns])
names[1,(label_columns+1):(label_columns+label_rows)]<-paste("Category",1:label_rows,sep="")
names[1,(label_columns+label_rows+1)]<-"Value"
names(pivRdata)<-names
return(pivRdata)
}
Yes, I know this code is not very beautiful but if you import your data with headers=FALSE and then specify in the above function that the data has e.g. 2 columns of labels (left most columns), and 3 rows of headers, then this works quite nicely.
eg.
long_data <- PivReady(wide_data,3,2)
来源:https://stackoverflow.com/questions/23233606/how-to-best-reshape-a-data-set-in-r-that-has-a-two-row-header