Normalisation of a two column data using min and max values

前端 未结 2 1975
旧巷少年郎
旧巷少年郎 2021-01-22 11:18

I am trying to find an R code for normalisation of my values using min and max value for a two column matrix.

My matrix looks like this: Column one (C1) and C2 I.D not

2条回答
  •  时光取名叫无心
    2021-01-22 11:27

    Given some example data along the lines you describe

    set.seed(1)
    d <- data.frame(C1 = LETTERS[1:4], C2 = letters[1:4],
                    C3 = runif(4, min = 0, max = 10),
                    C4 = runif(4, min = 0, max = 10))
    d
    

    then we can write a simple function to do the normalisation you describe

    normalise <- function(x, na.rm = TRUE) {
        ranx <- range(x, na.rm = na.rm)
        (x - ranx[1]) / diff(ranx)
    }
    

    This can be applied to the data in a number of ways, but here I use apply():

    apply(d[, 3:4], 2, normalise)
    

    which gives

    R> apply(d[, 3:4], 2, normalise)
                C3        C4
    [1,] 0.0000000 0.0000000
    [2,] 0.1658867 0.9377039
    [3,] 0.4782093 1.0000000
    [4,] 1.0000000 0.6179273
    

    To add these to the existing data, we could do:

    d2 <- data.frame(d, apply(d[, 3:4], 2, normalise))
    d2
    

    Which gives:

    R> d2
      C1 C2       C3       C4      C3.1      C4.1
    1  A  a 2.655087 2.016819 0.0000000 0.0000000
    2  B  b 3.721239 8.983897 0.1658867 0.9377039
    3  C  c 5.728534 9.446753 0.4782093 1.0000000
    4  D  d 9.082078 6.607978 1.0000000 0.6179273
    

    Now you mentioned that your data include NA and we must handle that. You may have noticed that I set the na.rm argument to TRUE in the normalise() function. This means it will work even in the presence of NA:

    d3 <- d
    d3[c(1,3), c(3,4)] <- NA ## set some NA
    d3
    
    
    R> d3
      C1 C2       C3       C4
    1  A  a       NA       NA
    2  B  b 3.721239 8.983897
    3  C  c       NA       NA
    4  D  d 9.082078 6.607978
    

    With normalise() we still get some output that is of use, using only the non-NA data:

    R> apply(d3[, 3:4], 2, normalise)
         C3 C4
    [1,] NA NA
    [2,]  0  1
    [3,] NA NA
    [4,]  1  0
    

    If we had not done this in writing normalise(), then the output would look something like this (na.rm = FALSE is the default for range() and other similar functions!)

    R> apply(d3[, 3:4], 2, normalise, na.rm = FALSE)
         C3 C4
    [1,] NA NA
    [2,] NA NA
    [3,] NA NA
    [4,] NA NA
    

提交回复
热议问题