Running into issues converting a data frame into R.
I have a bunch of columns that were read as factors
and have %
symbols with them.
I
parse_number
from the readr
package will remove the %
symbols. For your given data set, try:
library(dplyr)
library(readr)
res <- cbind(df %>% select(Year), # preserve the year column as-is
df %>% select(-Year) %>% mutate_all(funs(parse_number))
)
> res
Year v1 v2 v3 v4
1 12-Oct 0 0 39 14
2 12-Nov 0 6 59 4
3 12-Dec 22 0 37 26
4 13-Jan 45 0 66 19
5 13-Feb 28 39 74 13
If you don't need to preserve your first column, you only need the excerpt:
df %>% select(-Year) %>% mutate_all(funs(parse_number))
Here is an option using set
from data.table
, which would be faster for big datasets as the overhead of [.data.table
is avoided
library(stringi)
library(data.table)
setDT(df)
for(j in 2:ncol(df)){
set(df, i=NULL, j=j, value= as.numeric(stri_extract(df[[j]], regex='\\d+')))
}
df
# Year v1 v2 v3 v4
#1: 12-Oct 0 0 39 14
#2: 12-Nov 0 6 59 4
#3: 12-Dec 22 0 37 26
#4: 13-Jan 45 0 66 19
#5: 13-Feb 28 39 74 13
Try this approach using functions from base
:
# dummy data:
df<-data.frame(v1=c("78%", "65%", "32%"), v2=c("43%", "56%", "23%"))
# function
df2<-data.frame(lapply(df, function(x) as.numeric(sub("%", "", x))) )
As per the comments provided this first strips away the percentage signs, and then converts the columns from factors to numeric. I've changed the original answer from apply
to lapply
following @thelatemail's suggestions.
Here is a one line solution that assumes the data is in fixed width columns. I needed to remove the first row of names since all the columns did not have names. The widths of columns are specified as integers (with negative meaning to skip that many characters.) It also changes the column classes to numeric during the read.
your data
1 12-Oct 0% 0% 39% 14%
2 12-Nov 0% 6% 59% 4%
3 12-Dec 22% 0% 37% 26%
4 13-Jan 45% 0% 66% 19%
5 13-Feb 28% 39% 74% 13%
the R one-line script
adf <- read.fwf(file="a.dat",widths=c(-8,9,-1,7,-1,8,-1,8),colClasses=rep("numeric",4))
output result (first col provided by R to count the rows)
V1 V2 V3 V4
1 0 0 39 14
2 0 6 59 4
3 22 0 37 26
4 45 0 66 19
5 28 39 74 13