Suppose I have the following data frame named DF
. I would like to convert all the values in the Revenue
column to the same unit.
We can also do this with gsubfn
. Replace the 'bn', 'Mn' with * 1
, * 1/1000
, evaluate the string and paste with 'bn'.
library(gsubfn)
sprintf("%.2f bn", sapply(gsubfn("([[:alpha:]]+)", list(Mn = "* 1/1000",
bn = "* 1"), df1$Revenue), function(x) eval(parse(text=x))))
#[1] "50.10 bn" "41.20 bn" "0.03 bn" "15.10 bn"
You should consider whether you really want to store revenue as text in your data. That will make it harder to do any sort of computations on revenue. You may find it better to store revenue as a float and write a custom format to display it with a suffix. The unit of measurement could be ones, billions, or anything else.
Depending on what you decide, here are two approaches using the popular "tidyverse" packages. In both approaches, use tidyr's separate(..., sep='\\s',convert=TRUE)
to split your revenue text into a number and a unit.
library(tidyr)
library(dplyr)
DF %>%
separate(Revenue, into=c('Rev.Amt','Rev.Denom'), sep='\\s', convert=TRUE) %>%
mutate( Rev.Amt = Rev.Amt/ifelse(Rev.Denom=='Mn',1000,1), # other conversions as needed
Rev.Denom = 'bn' ) %>%
unite(Revenue, Rev.Amt, Rev.Denom, sep=' ')
# Brands Revenue
# 1 A 50.1 bn
# 2 B 41.2 bn
# 3 C 0.0325 bn
# 4 D 15.1 bn
Here we'll store revenue as units without a multiple, but you could also store it as billions and avoid the step of division upon display.
DF %<>%
separate(Revenue, into=c('Rev.Amt','Rev.Denom'), sep='\\s', convert=TRUE) %>%
mutate( Rev.Amt = Rev.Amt*ifelse(Rev.Denom=='Mn', 1e6, 1e9)) %>% # other conv as needed
select(-Rev.Denom)
# To display numeric revenue in billions
showInBn <- function(x) paste(x/1e9,'bn')
DF %>% mutate(Rev.Expr = showInBn(Rev.Amt)) %>% select(-Rev.Amt)
# Brands Rev.Expr
# 1 A 50.1 bn
# 2 B 41.2 bn
# 3 C 0.0325 bn
# 4 D 15.1 bn
One idea,
new <- ifelse(gsub('.*\\s+', '', DF$Revenue) == 'bn',
as.numeric(gsub('[A-Za-z]', '', DF$Revenue))*1000, DF$Revenue)
new[!grepl('Mn', new)] <- paste(new[!grepl('Mn', new)], 'Mn', sep = ' ')
DF$Revenue <- new
DF
# Brands Revenue
#1 A 50100 Mn
#2 B 41200 Mn
#3 C 32.5 Mn
#4 D 15100 Mn
To do the opposite then,
new <- ifelse(gsub('.*\\s+', '', DF$Revenue) == 'Mn',
as.numeric(gsub('[A-Za-z]', '', DF$Revenue))/1000, DF$Revenue)
new[!grepl('bn', new)] <- paste(new[!grepl('bn', new)], 'bn', sep = ' ')
DF$Revenue <- new
DF
# Brands Revenue
#1 A 50.1 bn
#2 B 41.2 bn
#3 C 0.0325 bn
#4 D 15.1 bn
This is a solution that replaces the "units" by appropriate factors and evaluates the resulting calculations.
The first step is to replace "bn" and "Mn" by a factor:
conversion <- c(Mn = 1/1000, bn = 1)
for (unit in names(conversion)) {
df$Revenue <- gsub(unit, paste0("*", conversion[unit]), df$Revenue)
}
df
## CBrands Revenue
## 1 A 50.1 *1
## 2 B 41.2 *1
## 3 C 32.5 *0.001
## 4 D 15.1 *1
Then evaluate the expressions in Revenue
and "bn" again:
df$Revenue <- sapply(df$Revenue, function(x) eval(parse(text = x)))
df$Revenue <- paste(df$Revenue, "bn")
df
## CBrands Revenue
## 1 A 50.1 bn
## 2 B 41.2 bn
## 3 C 0.0325 bn
## 4 D 15.1 bn
Another method: separate the monetary value from the text using split
:
# split value and "level" in a list
temp <- split(df$Revenue, split=" ")
# add separately to data.frame
df$Revenue <- sapply(temp, function(i) as.numeric(i[[1]]))
df$level <- sapply(temp, function(i) "[", 2)
df
Brands Revenue level
1 A 50100.0 bn
2 B 41200.0 bn
3 C 32.5 bn
4 D 15100.0 bn
Now, convert to millions subsetting on the levels with "bn":
df$Revenue[df$level == "bn"] <- df$Revenue[df$level == "bn"] * 1000
df$level <- "Mn"
This results in
df
Brands Revenue level
1 A 0.0501 Mn
2 B 0.0412 Mn
3 C 32.5000 Mn
4 D 0.0151 Mn
Instead convert to billions (a similar procedure)
df$Revenue[df$level == "Mn"] <- df$Revenue[df$level == "Mn"] / 1000
df$level <- "bn"
This results in
df
Brands Revenue level
1 A 0.0501 bn
2 B 0.0412 bn
3 C 32.5000 bn
4 D 0.0151 bn
To maybe simplify the parsing procedure compared to the previous solutions. I am using the awesome library stringr:
library(stringr)
dd$units <- word(dd$Revenue, 2, sep = " ")
dd$amounts <- word(dd$Revenue, 1, sep = " ")
# The following lines create an extra column in the dataframe,
# You can overwrite the original column if you so wish.
# Convert to billions
dd$convert_to_bn <- paste(as.numeric(dd$amounts) * ifelse(dd$units == "bn", 1 , 0.001), "bn")
# Convert to millions
dd$convert_to_mn <- paste(as.numeric(dd$amounts) * ifelse(dd$units == "Mn", 1 , 1000), "Mn")