Convert from billion to million and vice versa

后端 未结 6 515
旧巷少年郎
旧巷少年郎 2020-11-30 15:27

Suppose I have the following data frame named DF. I would like to convert all the values in the Revenue column to the same unit.

         


        
相关标签:
6条回答
  • 2020-11-30 15:33

    We can also do this with gsubfn. Replace the 'bn', 'Mn' with * 1, * 1/1000, evaluate the string and paste with 'bn'.

    library(gsubfn)
    sprintf("%.2f bn", sapply(gsubfn("([[:alpha:]]+)", list(Mn = "* 1/1000", 
            bn = "* 1"), df1$Revenue), function(x) eval(parse(text=x))))
    #[1] "50.10 bn" "41.20 bn" "0.03 bn"  "15.10 bn"
    
    0 讨论(0)
  • 2020-11-30 15:33

    You should consider whether you really want to store revenue as text in your data. That will make it harder to do any sort of computations on revenue. You may find it better to store revenue as a float and write a custom format to display it with a suffix. The unit of measurement could be ones, billions, or anything else.

    Depending on what you decide, here are two approaches using the popular "tidyverse" packages. In both approaches, use tidyr's separate(..., sep='\\s',convert=TRUE) to split your revenue text into a number and a unit.

    Keeping revenue stored as text (not best practice)

    library(tidyr)
    library(dplyr)
    DF %>% 
      separate(Revenue, into=c('Rev.Amt','Rev.Denom'), sep='\\s', convert=TRUE) %>%
      mutate( Rev.Amt = Rev.Amt/ifelse(Rev.Denom=='Mn',1000,1), # other conversions as needed
              Rev.Denom = 'bn' ) %>%
      unite(Revenue, Rev.Amt, Rev.Denom, sep=' ')
    # Brands   Revenue
    # 1      A   50.1 bn
    # 2      B   41.2 bn
    # 3      C 0.0325 bn
    # 4      D   15.1 bn
    

    Store revenue as numeric (better practice)

    Here we'll store revenue as units without a multiple, but you could also store it as billions and avoid the step of division upon display.

    DF %<>% 
      separate(Revenue, into=c('Rev.Amt','Rev.Denom'), sep='\\s', convert=TRUE) %>%
      mutate( Rev.Amt = Rev.Amt*ifelse(Rev.Denom=='Mn', 1e6, 1e9)) %>% # other conv as needed
      select(-Rev.Denom)
    
    # To display numeric revenue in billions
    showInBn <- function(x) paste(x/1e9,'bn')
    DF %>% mutate(Rev.Expr = showInBn(Rev.Amt)) %>% select(-Rev.Amt)
    #   Brands  Rev.Expr
    # 1      A   50.1 bn
    # 2      B   41.2 bn
    # 3      C 0.0325 bn
    # 4      D   15.1 bn
    
    0 讨论(0)
  • 2020-11-30 15:37

    One idea,

    new <- ifelse(gsub('.*\\s+', '', DF$Revenue) == 'bn',
                  as.numeric(gsub('[A-Za-z]', '', DF$Revenue))*1000, DF$Revenue)
    
    new[!grepl('Mn', new)] <- paste(new[!grepl('Mn', new)], 'Mn', sep = ' ')
    DF$Revenue <- new
    
    DF
    #  Brands  Revenue
    #1      A 50100 Mn
    #2      B 41200 Mn
    #3      C  32.5 Mn
    #4      D 15100 Mn
    

    To do the opposite then,

    new <- ifelse(gsub('.*\\s+', '', DF$Revenue) == 'Mn',
                   as.numeric(gsub('[A-Za-z]', '', DF$Revenue))/1000, DF$Revenue)
    
     new[!grepl('bn', new)] <- paste(new[!grepl('bn', new)], 'bn', sep = ' ')
     DF$Revenue <- new
     DF
    #  Brands   Revenue
    #1      A   50.1 bn
    #2      B   41.2 bn
    #3      C 0.0325 bn
    #4      D   15.1 bn
    
    0 讨论(0)
  • 2020-11-30 15:45

    This is a solution that replaces the "units" by appropriate factors and evaluates the resulting calculations.

    The first step is to replace "bn" and "Mn" by a factor:

    conversion <- c(Mn = 1/1000, bn = 1)
    for (unit in names(conversion)) {
      df$Revenue <- gsub(unit, paste0("*", conversion[unit]), df$Revenue)
    }
    df
    ##   CBrands             Revenue
    ## 1       A             50.1 *1
    ## 2       B             41.2 *1
    ## 3       C         32.5 *0.001
    ## 4       D             15.1 *1
    

    Then evaluate the expressions in Revenue and "bn" again:

    df$Revenue <- sapply(df$Revenue, function(x) eval(parse(text = x)))
    df$Revenue <- paste(df$Revenue, "bn")
    df
    ##   CBrands   Revenue
    ## 1       A   50.1 bn
    ## 2       B   41.2 bn
    ## 3       C 0.0325 bn
    ## 4       D   15.1 bn
    
    0 讨论(0)
  • 2020-11-30 15:52

    Another method: separate the monetary value from the text using split:

    # split value and "level" in a list
    temp <- split(df$Revenue, split=" ")
    # add separately to data.frame
    df$Revenue <- sapply(temp, function(i) as.numeric(i[[1]]))
    df$level <- sapply(temp, function(i) "[", 2)
    
    df
      Brands Revenue level
    1      A 50100.0    bn
    2      B 41200.0    bn
    3      C    32.5    bn
    4      D 15100.0    bn
    

    Now, convert to millions subsetting on the levels with "bn":

    df$Revenue[df$level == "bn"] <- df$Revenue[df$level == "bn"] * 1000
    df$level <- "Mn"
    

    This results in

    df
      Brands Revenue level
    1      A  0.0501    Mn
    2      B  0.0412    Mn
    3      C 32.5000    Mn
    4      D  0.0151    Mn
    

    Instead convert to billions (a similar procedure)

    df$Revenue[df$level == "Mn"] <- df$Revenue[df$level == "Mn"] / 1000
    df$level <- "bn"
    

    This results in

    df
      Brands Revenue level
    1      A  0.0501    bn
    2      B  0.0412    bn
    3      C 32.5000    bn
    4      D  0.0151    bn
    
    0 讨论(0)
  • 2020-11-30 15:58

    To maybe simplify the parsing procedure compared to the previous solutions. I am using the awesome library stringr:

    library(stringr)
    
    dd$units <- word(dd$Revenue, 2, sep = " ")
    dd$amounts <- word(dd$Revenue, 1, sep = " ")
    
    
    # The following lines create an extra column in the dataframe,
    # You can overwrite the original column if you so wish.
    
    # Convert to billions
    dd$convert_to_bn <- paste(as.numeric(dd$amounts) * ifelse(dd$units == "bn", 1 , 0.001), "bn")
    
    # Convert to millions
    dd$convert_to_mn <- paste(as.numeric(dd$amounts) * ifelse(dd$units == "Mn", 1 , 1000), "Mn")
    
    0 讨论(0)
提交回复
热议问题