问题
I have the following arraymy_list <- c("Jan-01--Dec-31|00:00--24:00", "Jan-01--Jun-30|12:00--18:00",
"Jul-06--Dec-31|09:00--19:00")
What is the shortest code which results in:
x1 x2 x3
1 Jan-01 Jan-01 Jul-06
2 Dec-31 Jun-30 Dec-31
and
x2 x2 x3
1 00:00 12:00 09:00
2 24:00 18:00 19:00
At the moment I have the (not very nice) code
df <- as.data.frame(strsplit(my_list, split = "|", fixed = T),
stringsAsFactors = F)
date_list <- strsplit(as.character(df[1, ]), split = "--", fixed = T)
date_df <- as.data.frame(date_list, col.names = c(1:length(date_list)),
stringsAsFactors = F)
time_list <- strsplit(as.character(df[2, ]), split = "--", fixed = T)
time_df <- as.data.frame(time_list, col.names = c(1:length(date_list)),
stringsAsFactors = F)
The best thing I have up to now is
date_list <- sapply(strsplit(schedule$schedule, split = "|", fixed = T), "[", 1)
date_df <- t(data.frame(x1=sapply(strsplit(df1, split = "--", fixed = T), "[", 1),
x2=sapply(strsplit(df1, split = "--", fixed = T), "[", 2),
stringsAsFactors = F))
# and similarly for time_list and time_df.
Is there something more elegant?
回答1:
tstrsplit
from data.table
package and str_split_fixed
from stringr
are pretty useful functions to get correct shaped data when splitting vectors of strings; The former provides transpose
of the splitted string which allows you to extract the date and time separately without using apply
function and the latter split strings into matrix with specified columns:
library(data.table); library(stringr)
lapply(tstrsplit(my_list, "\\|"), function(s) t(str_split_fixed(s, "--", 2)))
#[[1]]
# [,1] [,2] [,3]
#[1,] "Jan-01" "Jan-01" "Jul-06"
#[2,] "Dec-31" "Jun-30" "Dec-31"
#[[2]]
# [,1] [,2] [,3]
#[1,] "00:00" "12:00" "09:00"
#[2,] "24:00" "18:00" "19:00"
回答2:
my_results <- sapply(strsplit(my_list,"|",fixed=T),function(x) strsplit(x,"--",fixed=T))
my_dates <- t(Reduce("rbind",myresults[1,]))
my_times <- t(Reduce("rbind",myresults[2,]))
回答3:
strsplit
accepts a greppish pattern that can do the split in one pass. Then can use lapply
(or sapply
) and finish up with setNames
.
setNames( data.frame(lapply( strsplit( my_vec, split="\\-\\-|\\|"), "[", 1:2) ), paste0("x",1:3) )
x1 x2 x3
1 Jan-01 Jan-01 Jul-06
2 Dec-31 Jun-30 Dec-31
Obviously the times could be handled by substituting 3:4 for 1:2 in the code above.
回答4:
One more alternative using stringr
:
library(stringr)
a <- t(str_split_fixed(my_list, "\\||--", 4))
# [,1] [,2] [,3]
#[1,] "Jan-01" "Jan-01" "Jul-06"
#[2,] "Dec-31" "Jun-30" "Dec-31"
#[3,] "00:00" "12:00" "09:00"
#[4,] "24:00" "18:00" "19:00"
To get the final output, data.frame(a[1:2,])
and data.frame(a[3:4,])
Update
my_list <- "Jan-01--Dec-31|00:00--24:00"
a <- t(str_split_fixed(my_list, "\\||--", 4))
[,1]
[1,] "Jan-01"
[2,] "Dec-31"
[3,] "00:00"
[4,] "24:00"
data.frame(a[1:2,])
a.1.2...
1 Jan-01
2 Dec-31
data.frame(a[3:4,])
a.3.4...
1 00:00
2 24:00
回答5:
Here is a base R
option
lst <- strsplit(scan(text=my_list, sep="|", what ="", quiet=TRUE), "--")
do.call(cbind, lst[c(TRUE, FALSE)])
# [,1] [,2] [,3]
#[1,] "Jan-01" "Jan-01" "Jul-06"
#[2,] "Dec-31" "Jun-30" "Dec-31"
do.call(cbind, lst[c(FALSE, TRUE)])
# [,1] [,2] [,3]
#[1,] "00:00" "12:00" "09:00"
#[2,] "24:00" "18:00" "19:00"
Or in a single line base R
option
lapply(split(scan(text=my_list, sep="|", what ="", quiet=TRUE), 1:2),
function(x) do.call(cbind, strsplit(x, "--")))
#$`1`
# [,1] [,2] [,3]
#[1,] "Jan-01" "Jan-01" "Jul-06"
#[2,] "Dec-31" "Jun-30" "Dec-31"
#$`2`
# [,1] [,2] [,3]
#[1,] "00:00" "12:00" "09:00"
#[2,] "24:00" "18:00" "19:00"
来源:https://stackoverflow.com/questions/38797026/how-can-i-transform-an-array-of-characters-with-a-few-lines-of-code-to-a-data-fr