How to split a number into digits in R

后端 未结 7 612
花落未央
花落未央 2021-02-01 17:38

I have a data frame with a numerical ID variable which identify the Primary, Secondary and Ultimate Sampling Units from a multistage sampling scheme. I want to split the origina

相关标签:
7条回答
  • 2021-02-01 18:23

    You could use for example use substring:

    df <- data.frame(ID = c(501901, 501902))
    
    splitted <- t(sapply(df$ID, function(x) substring(x, first=c(1,2,4), last=c(1,3,6))))
    cbind(df, splitted)
    #      ID 1  2   3
    #1 501901 5 01 901
    #2 501902 5 01 902
    
    0 讨论(0)
  • 2021-02-01 18:23

    This should work:

    df <- cbind(do.call(rbind, strsplit(gsub('(.)(..)(...)', '\\1 \\2 \\3', paste(df[,1])),' ')), df[,-1]) # You need that paste() there because gsub() works only with text.
    

    Or with substr()

    df <- cbind(ID1=substr(df[, 1],1,1), ID2=substr(df[, 1],2,3), ID3=substr(df[, 1],4,6), df[, -1])
    
    0 讨论(0)
  • 2021-02-01 18:27

    Since they are numbers, you will have to do some math to extract the digits you want. A number represented in radix-10 can be written as:

    d0*10^0 + d1*10^1 + d2*10^2 ... etc. where d0..dn are the digits of the number.
    

    Thus, to extract the most significant digit from a 6-digit number which is mathematically represented as:

    number = d5*10^5 + d4*10^4 + d3*10^3 + d2*10^2 + d1*10^1 + d0*10^0
    

    As you can see, dividing this number by 10^5 will get you:

    number / 10^5 = d5*10^0 + d4*10^(-1) + d3*10^(-2) + d2*10^(-3) + d1*10^(-4) + d0*10^(-5)
    

    Voila! Now you have extracted the most significant digit if you interpret the result as an integer, because all the other digits now have a weight less than 0 and thus are smaller than 1. You can do similar things for extracting the other digits. For digits in least significant position you can do modulo operation instead of division.

    Examples:

    501901 / 10^5 = 5 // first digit
    501901 % 10^5 = 1 // last digit
    (501901 / 10^4) % 10^1 = 0 // second digit
    (501901 / 10^2) % 10^2 = 19 // third and fourth digit
    
    0 讨论(0)
  • 2021-02-01 18:32

    Yet another alternative is to re-read the first column using read.fwf and specify the widths:

    cbind(read.fwf(file = textConnection(as.character(df[, 1])), 
                   widths = c(1, 2, 3), colClasses = "character", 
                   col.names = c("ID1", "ID2", "ID3")), 
          df[-1])
    #   ID1 ID2 ID3 var1 var2 var3 var4  var5
    # 1   5  01 901    9 SP.1    1    W 12.10
    # 2   5  01 901    9 SP.1    2    W 17.68
    

    One advantage here is being able to set the resulting column names in a convenient manner, and ensure that the columns are characters, thus retaining any leading zeroes that might be present.

    0 讨论(0)
  • 2021-02-01 18:32

    If you don't want to convert to character for some reason, following is one of the way to achieve what you want

    DF <- data.frame(ID = c(501901, 501902), var1 = c("a", "b"), var2 = c("c", "d"))
    
    result <- t(sapply(DF$ID, function(y) {
        c(y%/%1e+05, (y - y%/%1e+05 * 1e+05)%/%1000, y - y%/%1000 * 1000)
    }))
    
    
    DF <- cbind(result, DF[, -1])
    
    names(DF)[1:3] <- c("ID1", "ID2", "ID3")
    
    DF
    ##   ID1 ID2 ID3 var1 var2
    ## 1   5   1 901    a    c
    ## 2   5   1 902    b    d
    
    0 讨论(0)
  • 2021-02-01 18:35

    With so many answers it felt like I needed to come up with something :)

    library(qdap)
    x <- colSplit(dat$ID_Var, col.sep="")
    data.frame(ID1=x[, 1], ID2=paste2(x[, 2:3], sep=""), 
        ID3=paste2(x[, 4:6],sep=""), dat[, -1])
    
    ##   ID1 ID2 ID3 var1 var2 var3 var4  var5
    ## 1   5  01 901    9 SP.1    1    W 12.10
    ## 2   5  01 901    9 SP.1    2    W 17.68
    
    0 讨论(0)
提交回复
热议问题