R - Converting Fractions in Text to Numeric

前端 未结 2 885
一整个雨季
一整个雨季 2021-02-20 11:23

I\'m trying to convert, for example, \'9¼\"\'to \'9.25\' but cannot seem to read the fraction correctly.

Here\'s the data I\'m working with:

library(XM         


        
相关标签:
2条回答
  • 2021-02-20 11:51

    I don't think this is clever or efficient compared to alternatives, but this uses gsub to replace the " symbol and convert each fraction to its decimal, before converting to numeric:

    #data (I've not downloaded XML for this, so maybe the encoding will make a difference?)
    combine = data.frame(Hands = c('1"','1⅛"','1¼"','1⅜"','1½"','1⅝"','1¾"','1⅞"'))
    
    #remove the "
    combine$Hands = gsub('"', '', combine$Hands)
    
    #replace each fraction with its decimal form
    combine$Hands = gsub("⅛", ".125", combine$Hands)
    combine$Hands = gsub("¼", ".25", combine$Hands)
    combine$Hands = gsub("⅜", ".375", combine$Hands)
    combine$Hands = gsub("½", ".5", combine$Hands)
    combine$Hands = gsub("⅝", ".625", combine$Hands)
    combine$Hands = gsub("¾", ".75", combine$Hands)
    combine$Hands = gsub("⅞", ".875", combine$Hands)
    
    
    combine$Hands <- as.numeric(combine$Hands)
    
    0 讨论(0)
  • 2021-02-20 12:00

    You can try to transform the unicode encoding to ASCII directly when reading the XML using a special return function:

    library(stringi)
    readHTMLTable(url,which=1, header=FALSE, stringsAsFactors=F,elFun=function(node) {
            val = xmlValue(node); stri_trans_general(val,"latin-ascii")})
    

    You can then use @Metrics' suggestion to convert it to numbers.

    You could do for example, using @G. Grothendieck's function from this post clean up the Arms data:

    library(XML)
    library(stringi)
    library(gsubfn)
    #the calc function is by @G. Grothendieck
    calc <- function(s) {
            x <- c(if (length(s) == 2) 0, as.numeric(s), 0:1)
            x[1] + x[2] / x[3]
    }
    
    url <- paste("http://mockdraftable.com/players/2014/", sep = "")  
    
    combine<-readHTMLTable(url,which=1, header=FALSE, stringsAsFactors=F,elFun=function(node) {
            val = xmlValue(node); stri_trans_general(val,"latin-ascii")})
    
    names(combine) <- c("Name", "Pos", "Hght", "Wght", "Arms", "Hands",
                        "Dash40yd", "Dash20yd", "Dash10yd", "Bench", "Vert", "Broad", 
                        "Cone3", "ShortShuttle20")
    
    sapply(strapplyc(gsub('\"',"",combine$Arms), "\\d+"), calc)
    
    #[1] 30.000 31.500 30.000 31.750 31.875 29.875 31.000 31.000 30.250 33.000 32.500 31.625 32.875
    

    There might be some encoding issues depending on your machine (see the comments)

    0 讨论(0)
提交回复
热议问题