R: Converting data frame of percentages from factor to numeric

前端 未结 4 467
失恋的感觉
失恋的感觉 2021-01-22 15:50

Running into issues converting a data frame into R.

I have a bunch of columns that were read as factors and have % symbols with them.

I

相关标签:
4条回答
  • 2021-01-22 15:54

    parse_number from the readr package will remove the % symbols. For your given data set, try:

    library(dplyr)
    library(readr)
    
    res <- cbind(df %>% select(Year), # preserve the year column as-is
                 df %>% select(-Year) %>% mutate_all(funs(parse_number))
                 )
    
    > res
        Year v1 v2 v3 v4
    1 12-Oct  0  0 39 14
    2 12-Nov  0  6 59  4
    3 12-Dec 22  0 37 26
    4 13-Jan 45  0 66 19
    5 13-Feb 28 39 74 13
    

    If you don't need to preserve your first column, you only need the excerpt:

    df %>% select(-Year) %>% mutate_all(funs(parse_number))
    
    0 讨论(0)
  • 2021-01-22 16:02

    Here is an option using set from data.table, which would be faster for big datasets as the overhead of [.data.table is avoided

    library(stringi)
    library(data.table)
    
    setDT(df)
    for(j in 2:ncol(df)){
         set(df, i=NULL, j=j, value= as.numeric(stri_extract(df[[j]], regex='\\d+')))
    }
    
    df
    #     Year v1 v2 v3 v4
    #1: 12-Oct  0  0 39 14
    #2: 12-Nov  0  6 59  4
    #3: 12-Dec 22  0 37 26
    #4: 13-Jan 45  0 66 19
    #5: 13-Feb 28 39 74 13
    
    0 讨论(0)
  • 2021-01-22 16:04

    Try this approach using functions from base:

    # dummy data:
    df<-data.frame(v1=c("78%", "65%", "32%"), v2=c("43%", "56%", "23%"))
    
    # function
    df2<-data.frame(lapply(df, function(x) as.numeric(sub("%", "", x))) )
    

    As per the comments provided this first strips away the percentage signs, and then converts the columns from factors to numeric. I've changed the original answer from apply to lapply following @thelatemail's suggestions.

    0 讨论(0)
  • 2021-01-22 16:05

    Here is a one line solution that assumes the data is in fixed width columns. I needed to remove the first row of names since all the columns did not have names. The widths of columns are specified as integers (with negative meaning to skip that many characters.) It also changes the column classes to numeric during the read.

    your data
    
    1 12-Oct        0%      0%      39%      14%
    2 12-Nov        0%      6%      59%       4%
    3 12-Dec       22%      0%      37%      26%
    4 13-Jan       45%      0%      66%      19%
    5 13-Feb       28%     39%      74%      13%
    
    the R one-line script
    
    adf <- read.fwf(file="a.dat",widths=c(-8,9,-1,7,-1,8,-1,8),colClasses=rep("numeric",4))
    
    output result (first col provided by R to count the rows)
    
      V1 V2 V3 V4
    1  0  0 39 14
    2  0  6 59  4
    3 22  0 37 26
    4 45  0 66 19
    5 28 39 74 13
    
    0 讨论(0)
提交回复
热议问题