问题
I am trying to write a code that identifies the length of a repeating time sequence (in seconds) in R and subsets each sequence into its own data frame for curve fitting and analysis. Each sequence is a time series of sensor voltage output and has to be analyzed separately.
My code seems clunky, but it works as it is written here. I am trying to figure out if there was a package or easy step that I was missing for doing this more elegantly. The seconds are decimal seconds and the data could be numeric, or integer, it doesn't matter for this example. This is not the actual sensor output, but the same format.
set.seed(1)
all_data = data.frame( sec = rep(1.8:4,9), data = sample(1:27), data2 = sample(5:7))
#identify time step length in seconds
lowest = min(all_data$sec)
highest = max(all_data$sec)
#put into data frame
time_step = c(lowest,highest)
#find index of first time period
matches = match(time_step,all_data[,1])
#subset first time period
total_measures = nrow(all_data)/matches[2]
all_data = all_data[matches[1]:nrow(all_data),]
# test_frame = data.frame(c(1,2))
n = matches[2]
#counter for number of measures in file
count = c(1:(nrow(all_data)/n))
count2 = c(0:(nrow(all_data)/n-1))
# subset to break each measure into its own workable file
eq = paste("subd",count," = all_data[((",count2,"*n)+1):(",count,"*n),]",sep = "")
eval(parse(text = eq))
Thank you!
回答1:
I would use data.table
to give the rows id's for each subset.
require(data.table)
dt <- data.table(all_data)
dt[which.min(sec):nrow(dt), id:=1:.N, by=sec]
Then you can continue to split as you did:
count <- 1:dt[, max(id, na.rm=TRUE)]
eq = paste("subd", count," = data.frame(dt[id==", count, ",list(sec, data, data2)])", sep = "")
eval(parse(text = eq))
Alternatively, and more common in R, you can use split
to split into subsets. This will return a list
of data.frames
. That's very useful, since you can then use lapply
to evaluate a function (curve fitting, etc.) on all data.frames
simultaneously.
split(data.frame(dt[, list(sec, data, data2)]), dt$id)
回答2:
I think a more idiomatic way would be to set up a label for each measure:
labl <- rep(count, each=n)
And then create a list containing your subd's
subds <- by(all_data, labl, I)
This breaks up all_data by the label (the I
function is the identity - if you want to process individual measures in some way, you could replace that I
with the functionality required).
来源:https://stackoverflow.com/questions/23341123/identify-time-sequence-in-data-and-subset-by-that-sequence-r