问题
I have a dataframe with multiple columns with names of various lengths and structures (so not sure how to capture them with a regex). Each column ends either with .t1
or .t3
I want to combine columns based on names without the t1/t3, with an additional column of Time
based on that suffix.
So, for example, a dataframe such as:
df<-data.frame("Subject"= c(1:10),
"intercept.freq.acc.t1" = c(1:10),
"intercept.freq.acc.t3" = c(1:10),
"freq.rt.t1" = c(1:10),
"freq.rt.t3" = c(1:10),
"vowel.con.acc.t1" = c(1:10),
"vowel.con.acc.t3" = c(1:10))
I want to turn it into
df<-data.frame("Subject"= rep(1:10,2),
"Time" = rep(c('t1','t3'), each = 10),
"intercept.freq.acc" = rep(1:10, 2),
"freq.rt" = rep(1:10,2),
"vowel.con.acc" = rep(1:10, 2))
How do I go about doing this?
回答1:
You can use :
tidyr::pivot_longer(df,
cols = -Subject,
names_to = c('.value', 'Time'),
names_pattern = '(.*)\\.(t\\d+)')
# Subject Time intercept.freq.acc freq.rt vowel.con.acc
# <int> <chr> <int> <int> <int>
# 1 1 t1 1 1 1
# 2 1 t3 1 1 1
# 3 2 t1 2 2 2
# 4 2 t3 2 2 2
# 5 3 t1 3 3 3
# 6 3 t3 3 3 3
# 7 4 t1 4 4 4
# 8 4 t3 4 4 4
# 9 5 t1 5 5 5
#10 5 t3 5 5 5
#11 6 t1 6 6 6
#12 6 t3 6 6 6
#13 7 t1 7 7 7
#14 7 t3 7 7 7
#15 8 t1 8 8 8
#16 8 t3 8 8 8
#17 9 t1 9 9 9
#18 9 t3 9 9 9
#19 10 t1 10 10 10
#20 10 t3 10 10 10
回答2:
You could make use of the pivot_longer_spec
function. This function takes a data frame template where you specify your input and output columns and then you feed this tempalte into the pivot_longer_spec
function.
This usually is very helpful when you have no nice and easy split pattern for your columns. Personally, I find it easier to use such a template than to figuring our the regex for splitting up columns (in this case, the regex is still ok, though):
library(tidyverse)
template <- data.frame(.name = colnames(df)[-1],
.value = c("intercept.freq.acc", "intercept.freq.acc", "freq.rt", "freq.rt", "vowel.con.acc", "vowel.con.acc"),
Time = c("t1", "t3", "t1", "t3", "t1", "t3"))
The template looks as follows:
.name .value Time
1 intercept.freq.acc.t1 intercept.freq.acc t1
2 intercept.freq.acc.t3 intercept.freq.acc t3
3 freq.rt.t1 freq.rt t1
4 freq.rt.t3 freq.rt t3
5 vowel.con.acc.t1 vowel.con.acc t1
6 vowel.con.acc.t3 vowel.con.acc t3
And then you can do an easy pivot_longer:
dat_long <- df %>%
pivot_longer_spec(template)
which gives:
# A tibble: 20 x 5
Subject Time intercept.freq.acc freq.rt vowel.con.acc
<int> <chr> <int> <int> <int>
1 1 t1 1 1 1
2 1 t3 1 1 1
3 2 t1 2 2 2
4 2 t3 2 2 2
5 3 t1 3 3 3
6 3 t3 3 3 3
7 4 t1 4 4 4
8 4 t3 4 4 4
9 5 t1 5 5 5
10 5 t3 5 5 5
11 6 t1 6 6 6
12 6 t3 6 6 6
13 7 t1 7 7 7
14 7 t3 7 7 7
15 8 t1 8 8 8
16 8 t3 8 8 8
17 9 t1 9 9 9
18 9 t3 9 9 9
19 10 t1 10 10 10
20 10 t3 10 10 10
回答3:
We can use melt
library(data.table)
melt(setDT(df), id.var = 'Subject', measure = patterns('intercept', 'freq', 'vowel'), value.name = c('intercept.freq.acc', 'freq.rt', 'vowel.con.acc'))
来源:https://stackoverflow.com/questions/65158050/r-pivot-longer-combining-columns-based-on-the-end-of-column-names