问题
I would like to use something similar to the OFFSET and MATCH functions of Excel, here is an example data set: data=
Which Test?|Test1 |Test2 |Test3 |RESULT
Test1 |TRUE |80% |0 |
Test2 |FALSE |25% |0 |
Test1 |TRUE |16% |0 |
Test3 |FALSE |12% |1 |
Result column should read:
Which Test?|Test1 |Test2 |Test3 |RESULT
Test1 |TRUE |80% |0 |TRUE
Test2 |FALSE |25% |0 |25%
Test1 |TRUE |16% |0 |TRUE
Test3 |FALSE |12% |1 |1
In the final RESULT column I would like the test result of searching the Which test? column. In this example the RESULT column could return, for example, numbers or strings. In the Excel formula would be:
=OFFSET($A$1, ROW()-1,MATCH(A2,$B$1:$D$1,0))
I have tried to list the Tests using sapply so far and return this to another function such as which(colnames... and this is where I am stuck.
Many thanks in advance!
回答1:
I'll go with sapply:
data <- read.table(text="Which Test?|Test1 |Test2 |Test3 |RESULT
Test1 |TRUE |80% |0 |
Test2 |FALSE |25% |0 |
Test1 |TRUE |16% |0 |
Test3 |FALSE |12% |1 |",
header=T,
sep="|",
stringsAsFactors=F,
strip.white=T)
data$RESULT <- sapply( 1:nrow(data), function(x) { data[x,data[x,1]] })
For each row, get the target column data[x,1]
(the inner access), and for this column get the row value data[x,...]
.
Output:
> data
Which.Test. Test1 Test2 Test3 RESULT Result
1 Test1 TRUE 80% 0 NA TRUE
2 Test2 FALSE 25% 0 NA 25%
3 Test1 TRUE 16% 0 NA TRUE
4 Test3 FALSE 12% 1 NA 1
With two vars the function in the sapply
would be:
function(x) {
tcol <- data[x,1] # First column value of row x
data[x,tcol]) # Get the value at row x and column tcol
}
An approach using Map/mapply
would be to provide the 'i' (seq(nrow(data))
), 'j' (match(data$Which.Test., names(data))
) row/column index and use [
to extract the elements from the 'data'. We wrap with list
so that the 'data' remains as a single data.frame and will recycle through the lengths of 'i', 'j'.
mapply(`[`, list(data), seq(nrow(data)), match(data$Which.Test., names(data) ) )
#[1] "TRUE" "25%" "TRUE" "1"
Though, a possible vectorized approach would be just
data[cbind(1:nrow(data), match(data$Which.Test., names(data)))]
## [1] " TRUE" "25%" " TRUE" "1"
This is matching the values in Which.Test.
against the column names of data
and returning the index of the matched column. Then, we subset these columns per each row by combining it with 1:nrow(data)
using cbind
.
More detailed explanation of @DavidArenburg solution above (as I had to spend some time to understand it fully):
The subset operator accepts a matrix so we do:
1:nrow(data)
easy it gives a vector[1] 1 2 3 4
corresponding to the number of rows in our datasetmatch(data$Which.Test., names(data)))
giving the index of each matching test[1] 1 2 3 4
cbind(..,..)
bind our two preceding point to build a matrix:[,1] [,2] [1,] 1 2 [2,] 2 3 [3,] 3 2 [4,] 4 4
We see this matrix match for each row the column we wish to take the value of. So when giving this matrix as the selector of our dataset we get the correct results. Then we can assign it to a new variable or to a new column of the df.
回答2:
Try
library(tidyr)
out = subset(gather(data, key, RESULT, 2:4), Which_Test == key)
req = unique(merge(data, out[,-2], by = "Which_Test"))
#>req
# Which_Test Test1 Test2 Test3 RESULT
#1 Test1 TRUE 80% 0 TRUE
#3 Test1 TRUE 16% 0 TRUE
#5 Test2 FALSE 25% 0 25%
#6 Test3 FALSE 12% 1 1
Here using Which_test
as constant, Test
columns are gathered together into a single column called RESULT
(using gather
from tidyr
package) simultaneously the names of Test
columns are listed under key
which is then used to filter the data based on the match against Which_test
using base R's subset
data
data = structure(list(Which_Test = structure(c(1L, 2L, 1L, 3L),
.Label = c("Test1","Test2", "Test3"), class = "factor"),
Test1 = c(TRUE, FALSE, TRUE, FALSE),
Test2 = structure(c(4L, 3L, 2L, 1L),
.Label = c("12%", "16%", "25%", "80%"), class = "factor"),
Test3 = c(0L, 0L, 0L, 1L)),
.Names = c("Which_Test", "Test1", "Test2", "Test3"),
class = "data.frame", row.names = c(NA, -4L))
来源:https://stackoverflow.com/questions/32252167/r-how-to-offset-and-match-within-a-dataframe