R: How to offset and match within a dataframe?

安稳与你 提交于 2019-12-23 12:46:37

问题


I would like to use something similar to the OFFSET and MATCH functions of Excel, here is an example data set: data=

Which Test?|Test1   |Test2  |Test3  |RESULT
Test1      |TRUE    |80%    |0      |
Test2      |FALSE   |25%    |0      |
Test1      |TRUE    |16%    |0      |
Test3      |FALSE   |12%    |1      |

Result column should read:

Which Test?|Test1   |Test2  |Test3  |RESULT
Test1      |TRUE    |80%    |0      |TRUE
Test2      |FALSE   |25%    |0      |25%
Test1      |TRUE    |16%    |0      |TRUE
Test3      |FALSE   |12%    |1      |1

In the final RESULT column I would like the test result of searching the Which test? column. In this example the RESULT column could return, for example, numbers or strings. In the Excel formula would be:

=OFFSET($A$1, ROW()-1,MATCH(A2,$B$1:$D$1,0))

I have tried to list the Tests using sapply so far and return this to another function such as which(colnames... and this is where I am stuck.

Many thanks in advance!


回答1:


I'll go with sapply:

data <- read.table(text="Which Test?|Test1   |Test2  |Test3  |RESULT
Test1      |TRUE    |80%    |0      |
Test2      |FALSE   |25%    |0      |
Test1      |TRUE    |16%    |0      |
Test3      |FALSE   |12%    |1      |", 
 header=T, 
 sep="|",
 stringsAsFactors=F,
 strip.white=T)

data$RESULT <- sapply( 1:nrow(data), function(x) { data[x,data[x,1]] })

For each row, get the target column data[x,1] (the inner access), and for this column get the row value data[x,...].

Output:

> data
  Which.Test. Test1 Test2 Test3 RESULT Result
1       Test1  TRUE   80%     0     NA   TRUE
2       Test2 FALSE   25%     0     NA    25%
3       Test1  TRUE   16%     0     NA   TRUE
4       Test3 FALSE   12%     1     NA      1

With two vars the function in the sapply would be:

function(x) {
 tcol <- data[x,1] # First column value of row x
 data[x,tcol]) # Get the value at row x and column tcol
}

An approach using Map/mapply would be to provide the 'i' (seq(nrow(data))), 'j' (match(data$Which.Test., names(data))) row/column index and use [ to extract the elements from the 'data'. We wrap with list so that the 'data' remains as a single data.frame and will recycle through the lengths of 'i', 'j'.

 mapply(`[`, list(data), seq(nrow(data)), match(data$Which.Test., names(data) ) )
 #[1] "TRUE" "25%"  "TRUE" "1"   

Though, a possible vectorized approach would be just

data[cbind(1:nrow(data), match(data$Which.Test., names(data)))]
## [1] " TRUE" "25%"   " TRUE" "1"  

This is matching the values in Which.Test. against the column names of data and returning the index of the matched column. Then, we subset these columns per each row by combining it with 1:nrow(data) using cbind.

More detailed explanation of @DavidArenburg solution above (as I had to spend some time to understand it fully):

The subset operator accepts a matrix so we do:

  1. 1:nrow(data) easy it gives a vector [1] 1 2 3 4 corresponding to the number of rows in our dataset
  2. match(data$Which.Test., names(data))) giving the index of each matching test [1] 1 2 3 4
  3. cbind(..,..)bind our two preceding point to build a matrix:

         [,1] [,2]
    [1,]    1    2
    [2,]    2    3
    [3,]    3    2
    [4,]    4    4
    

    We see this matrix match for each row the column we wish to take the value of. So when giving this matrix as the selector of our dataset we get the correct results. Then we can assign it to a new variable or to a new column of the df.




回答2:


Try

library(tidyr)
out = subset(gather(data, key, RESULT, 2:4), Which_Test == key)
req = unique(merge(data, out[,-2], by = "Which_Test"))

#>req
#  Which_Test Test1 Test2 Test3 RESULT
#1      Test1  TRUE   80%     0   TRUE
#3      Test1  TRUE   16%     0   TRUE
#5      Test2 FALSE   25%     0    25%
#6      Test3 FALSE   12%     1      1

Here using Which_test as constant, Test columns are gathered together into a single column called RESULT (using gather from tidyr package) simultaneously the names of Test columns are listed under key which is then used to filter the data based on the match against Which_test using base R's subset

data

data = structure(list(Which_Test = structure(c(1L, 2L, 1L, 3L), 
      .Label = c("Test1","Test2", "Test3"), class = "factor"), 
       Test1 = c(TRUE, FALSE, TRUE, FALSE), 
       Test2 = structure(c(4L, 3L, 2L, 1L), 
      .Label = c("12%", "16%", "25%", "80%"), class = "factor"), 
       Test3 = c(0L, 0L, 0L, 1L)), 
      .Names = c("Which_Test", "Test1", "Test2", "Test3"), 
       class = "data.frame", row.names = c(NA, -4L))


来源:https://stackoverflow.com/questions/32252167/r-how-to-offset-and-match-within-a-dataframe

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!