How do I determine the number of significant figures in data in R?

一笑奈何 提交于 2019-12-10 13:24:30

问题


I have a large dataset that I'm analyzing in R and I'm interested in one column or vector of information. Each entry in this vector has a varied number (ranging from 1-5) of significant figures, and I want to subset this vector so I'm not seeing data with only one significant digit. What kind of test or function can I use to get R to report the number of significant figures for each entry? I've looked into the signif() function but that is more for rounding data to a specified number of significant digits, not querying how many sig figs are there.

Example: Suppose I have this vector:
4
28.382
120
82.3
100
30.0003

I want to remove the entries that only have one significant digit. That would be entries 1 (value of 4) and entry 5 (value of 100). I know how to subset data in R, but I don't know how to tell R to "find" all the values with only one significant figure.


回答1:


x <- c(4, 28.382, 120, 82.3, 100, 30.0003)
#compare the values with result of signif
#you need to consider floating point precision
keep <- abs(signif(x, 1) - x) > .Machine$double.eps
x[keep]
#[1]  28.3820 120.0000  82.3000  30.0003



回答2:


I think this should be equivalent to Rolands solution.

x <- c(4, 4.0, 4.00, 28.382, 120,
       82.3, 100, 100.0, 30.0003)
x
ifelse(x == signif(x, 1), NA, x)
ifelse(x == signif(x, 2), NA, x)
ifelse(x == signif(x, 3), NA, x)

In any case, it at least has the same problem with giving the incorrect number of significant digits for cases like "4.00" and "100.0".

The solution is in part, as pointed out above, to treat the numbers as strings of characters. It isn't sufficient to simply convert the numbers to characters, they have to be read in as such, which takes a bit of care. The colClasses argument in the read.table group of functions can come in handy.

xc <- c("4", "4.0", "4.00", "28.382", "120",
        "82.3", "100", "100.0", "30.0003")
xc
# "4"  "4.0" "4.00" "28.382" "120" "82.3" "100" "100.0" "30.0003"
ifelse(xc == signif(as.numeric(xc), 1), NA, xc)
# "NA" "4.0" "4.00" "28.382" "120" "82.3" "NA"  "100.0" "30.0003"

Only "4" and "100" are removed. That looks promising, but if we go a bit further we see that not everything is quite as it ought to be.

ifelse(xc == signif(as.numeric(xc), 2), NA, xc)
# "NA" "4.0" "4.00" "28.382" "120" "82.3" "NA"  "100.0" "30.0003"
ifelse(xc == signif(as.numeric(xc), 3), NA, xc)
# "NA" "4.0" "4.00" "28.382" "120" "82.3" "NA"  "100.0" "30.0003"

The reason can be demonstrated like this

2 == "2"
# TRUE – only what's between the quotes is compared
2.0 == "2"; 02 == "2"
# TRUE
# TRUE – R removes what's considered numerically empty characters
2 == "2.0"
# FALSE – strings aren't modified.
2 == as.numeric("2.0")
# TRUE – that is, unless you explicitly request it.

It's also worth keeping in mind that comparisons of strings are based on alphanumerical order, even if the strings easily can be interpreted as numbers.

2 < "2.0"
# TRUE
2 > "2.0"
# FALSE
"2.0" < "2.00"
# TRUE
sort(xc)
# "100" "100.0" "120" "28.382" "30.0003" "4" "4.0" "4.00" "82.3" 

So far the only complete fix I've found for this problem is a little hacky. It consists of separating out the strings containing a decimal separator ("."), and replacing the last character of those strings with a "1" (or any non-zero digit). Thus turning "4.0" into "4.1", but leaving "100" as it is. This new vector is then used as the basis for comparison.

xc.1 <- xc
decimal <- grep(".", xc, fixed=TRUE)
xc.1[decimal] <- gsub(".$", "1", xc[decimal])
xc.1 <- as.numeric(xc.1)

xc
# "4"  "4.0" "4.00" "28.382" "120" "82.3" "100" "100.0" "30.0003"
ifelse(xc.1 == signif(xc.1, 1), NA, xc)
# "NA" "4.0" "4.00" "28.382" "120" "82.3" "NA"  "100.0" "30.0003"
ifelse(xc.1 == signif(xc.1, 2), NA, xc)
# "NA" "NA"  "4.00" "28.382" "NA"  "82.3" "NA"  "100.0" "30.0003"
ifelse(xc.1 == signif(xc.1, 3), NA, xc)
# "NA" "NA"  "NA"   "28.382" "NA"  "NA"   "NA"  "100.0" "30.0003"

If you want to actually count the number of significant digits, that can be done with a small loop.

n <- 7

# true counts
xc.count <- vector(length=length(xc.1))
for (i in n:1) xc.count[xc.1 == signif(xc.1, i)] <- i
xc.count
# 1 2 3 5 2 3 1 4 6

# simple counts
x.count <- vector(length=length(x))
for (i in n:1) x.count[x == signif(x, i)] <- i
x.count
# 1 1 1 5 2 3 1 1 6


来源:https://stackoverflow.com/questions/27767841/how-do-i-determine-the-number-of-significant-figures-in-data-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!