Let\'s imagine you have a string:
strLine <- \"The transactions (on your account) were as follows: 0 3,000 (500) 0 2.25 (1,200)\"
Is the
library(stringr)
x <- str_extract_all(strLine,"\\(?[0-9,.]+\\)?")[[1]]
> x
[1] "0" "3,000" "(500)" "0" "2.25" "(1,200)"
Change the parens to negatives:
x <- gsub("\\((.+)\\)","-\\1",x)
x
[1] "0" "3,000" "-500" "0" "2.25" "-1,200"
And then as.numeric()
or taRifx::destring
to finish up (the next version of destring
will support negatives by default so the keep
option won't be necessary):
library(taRifx)
destring( x, keep="0-9.-")
[1] 0 3000 -500 0 2.25 -1200
OR:
as.numeric(gsub(",","",x))
[1] 0 3000 -500 0 2.25 -1200
Since this came up in another question, this is an uncrutched stringi
solution (vs the stringr
crutch):
as.numeric(
stringi::stri_replace_first_fixed(
stringi::stri_replace_all_regex(
unlist(stringi::stri_match_all_regex(
"The transactions (on your account) were as follows: 0 3,000 (500) 0 2.25 (1,200)",
"\\(?[0-9,.]+\\)?"
)), "\\)$|,", ""
),
"(", "-"
)
)
What for me worked perfectly when working on single strings in a data frame
(One string per row in same column) was the following:
library(taRifx)
DataFrame$Numbers<-as.character(destring(DataFrame$Strings, keep="0-9.-"))
The results are in a new column from the same data frame
.
Here's the base R way, for the sake of completeness...
x <- unlist(regmatches(strLine, gregexpr('\\(?[0-9,.]+', strLine)))
x <- as.numeric(gsub('\\(', '-', gsub(',', '', x)))
[1] 0.00 3000.00 -500.00 0.00 2.25 -1200.00