How to split a number into digits in R

后端未结

关注

 7  642

I have a data frame with a numerical ID variable which identify the Primary, Secondary and Ultimate Sampling Units from a multistage sampling scheme. I want to split the origina

相关标签:

7条回答

遥遥无期

2021-02-01 18:23

You could use for example use substring:

df <- data.frame(ID = c(501901, 501902))

splitted <- t(sapply(df$ID, function(x) substring(x, first=c(1,2,4), last=c(1,3,6))))
cbind(df, splitted)
#      ID 1  2   3
#1 501901 5 01 901
#2 501902 5 01 902

0 讨论(0)

孤街浪徒

2021-02-01 18:23

This should work:

df <- cbind(do.call(rbind, strsplit(gsub('(.)(..)(...)', '\\1 \\2 \\3', paste(df[,1])),' ')), df[,-1]) # You need that paste() there because gsub() works only with text.

Or with substr()

df <- cbind(ID1=substr(df[, 1],1,1), ID2=substr(df[, 1],2,3), ID3=substr(df[, 1],4,6), df[, -1])

0 讨论(0)

[愿得一人]

2021-02-01 18:27
Since they are numbers, you will have to do some math to extract the digits you want. A number represented in radix-10 can be written as:
```
d0*10^0 + d1*10^1 + d2*10^2 ... etc. where d0..dn are the digits of the number.
```
Thus, to extract the most significant digit from a 6-digit number which is mathematically represented as:
```
number = d5*10^5 + d4*10^4 + d3*10^3 + d2*10^2 + d1*10^1 + d0*10^0
```
As you can see, dividing this number by 10^5 will get you:
```
number / 10^5 = d5*10^0 + d4*10^(-1) + d3*10^(-2) + d2*10^(-3) + d1*10^(-4) + d0*10^(-5)
```
Voila! Now you have extracted the most significant digit if you interpret the result as an integer, because all the other digits now have a weight less than 0 and thus are smaller than 1. You can do similar things for extracting the other digits. For digits in least significant position you can do modulo operation instead of division.

Examples:
```
501901 / 10^5 = 5 // first digit
501901 % 10^5 = 1 // last digit
(501901 / 10^4) % 10^1 = 0 // second digit
(501901 / 10^2) % 10^2 = 19 // third and fourth digit
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
青春惊慌失措

2021-02-01 18:32
Yet another alternative is to re-read the first column using read.fwf and specify the widths:
```
cbind(read.fwf(file = textConnection(as.character(df[, 1])), 
               widths = c(1, 2, 3), colClasses = "character", 
               col.names = c("ID1", "ID2", "ID3")), 
      df[-1])
#   ID1 ID2 ID3 var1 var2 var3 var4  var5
# 1   5  01 901    9 SP.1    1    W 12.10
# 2   5  01 901    9 SP.1    2    W 17.68
```
One advantage here is being able to set the resulting column names in a convenient manner, and ensure that the columns are characters, thus retaining any leading zeroes that might be present.
0 讨论(0)
发布评论:

提交评论
- 加载中...

孤独总比滥情好

2021-02-01 18:32

If you don't want to convert to character for some reason, following is one of the way to achieve what you want

DF <- data.frame(ID = c(501901, 501902), var1 = c("a", "b"), var2 = c("c", "d"))

result <- t(sapply(DF$ID, function(y) {
    c(y%/%1e+05, (y - y%/%1e+05 * 1e+05)%/%1000, y - y%/%1000 * 1000)
}))


DF <- cbind(result, DF[, -1])

names(DF)[1:3] <- c("ID1", "ID2", "ID3")

DF
##   ID1 ID2 ID3 var1 var2
## 1   5   1 901    a    c
## 2   5   1 902    b    d

0 讨论(0)

Happy的楠姐

2021-02-01 18:35

With so many answers it felt like I needed to come up with something :)

library(qdap)
x <- colSplit(dat$ID_Var, col.sep="")
data.frame(ID1=x[, 1], ID2=paste2(x[, 2:3], sep=""), 
    ID3=paste2(x[, 4:6],sep=""), dat[, -1])

##   ID1 ID2 ID3 var1 var2 var3 var4  var5
## 1   5  01 901    9 SP.1    1    W 12.10
## 2   5  01 901    9 SP.1    2    W 17.68

0 讨论(0)

1 2 下一页