What is the most useful R trick? [closed]

前端未结

关注

 30  2574

佛祖请我去吃肉

相关标签:

30条回答

眼角桃花

2020-11-28 00:38

As a total noob to R and a novice at stats I love unclass() to print all elements of a data frame as an ordinary list.

It's pretty handy for a look at a complete data set all in one go to quickly eyeball any potential issues.

0 讨论(0)
发布评论:

提交评论
- 加载中...
自闭症患者

2020-11-28 00:39

The traceback() function is a must when you have an error somewhere and do not understand it readily. It will print a trace of the stack, very helpful as R is not very verbose by default.

Then setting options(error=recover) will allow you to "enter" into the function raising the error and try and understand what happens exactly, as if you had full control over it and could put a browser() in it.

These three functions can really help debugging your code.

0 讨论(0)
发布评论:

提交评论
- 加载中...

心在旅途

2020-11-28 00:39

I'm really surprised no one has posted about apply, tapply, lapply, and sapply. A general rule I use when doing stuff in R is that if I have a for loop that is doing data processing or simulations, I try to factor it out and replace it with an *apply. Some people shy away from the *apply functions because they think only single parameter functions can be passed in. Nothing could be further from the truth! Like passing around functions with parameters as first class objects in Javascript, you do this in R with anonymous functions. For example:

 > sapply(rnorm(100, 0, 1), round)
  [1]  1  1  0  1  1 -1 -2  0  2  2 -2 -1  0  1 -1  0  1 -1  0 -1  0  0  0  0  0
 [26]  2  0 -1 -2  0  0  1 -1  1  5  1 -1  0  1  1  1  2  0 -1  1 -1  1  0 -1  1
 [51]  2  1  1 -2 -1  0 -1  2 -1  1 -1  1 -1  0 -1 -2  1  1  0 -1 -1  1  1  2  0
 [76]  0  0  0 -2 -1  1  1 -2  1 -1  1  1  1  0  0  0 -1 -3  0 -1  0  0  0  1  1


> sapply(rnorm(100, 0, 1), round(x, 2)) # How can we pass a parameter?
Error in match.fun(FUN) : object 'x' not found


# Wrap your function call in an anonymous function to use parameters
> sapply(rnorm(100, 0, 1), function(x) {round(x, 2)})
  [1] -0.05 -1.74 -0.09 -1.23  0.69 -1.43  0.76  0.55  0.96 -0.47 -0.81 -0.47
 [13]  0.27  0.32  0.47 -1.28 -1.44 -1.93  0.51 -0.82 -0.06 -1.41  1.23 -0.26
 [25]  0.22 -0.04 -2.17  0.60 -0.10 -0.92  0.13  2.62  1.03 -1.33 -1.73 -0.08
 [37]  0.45 -0.93  0.40  0.05  1.09 -1.23 -0.35  0.62  0.01 -1.08  1.70 -1.27
 [49]  0.55  0.60 -1.46  1.08 -1.88 -0.15  0.21  0.06  0.53 -1.16 -2.13 -0.03
 [61]  0.33 -1.07  0.98  0.62 -0.01 -0.53 -1.17 -0.28 -0.95  0.71 -0.58 -0.03
 [73] -1.47 -0.75 -0.54  0.42 -1.63  0.05 -1.90  0.40 -0.01  0.14 -1.58  1.37
 [85] -1.00 -0.90  1.69 -0.11 -2.19 -0.74  1.34 -0.75 -0.51 -0.99 -0.36 -1.63
 [97] -0.98  0.61  1.01  0.55

# Note that anonymous functions aren't being called, but being passed.
> function() {print('hello #rstats')}()
function() {print('hello #rstats')}()
> a = function() {print('hello #rstats')}
> a
function() {print('hello #rstats')}
> a()
[1] "hello #rstats"

(For those that follow #rstats, I also posted this there).

Remember, use apply, sapply, lapply, tapply, and do.call! Take avantage of R's vectorization. You should never walk up to a bunch of R code and see:

N = 10000
l = numeric()
for (i in seq(1:N)) {
    sim <- rnorm(1, 0, 1)
    l <- rbind(l, sim)
}

Not only is this not vectorized, but the array structure in R is not grown as it is in Python (doubling size when space runs out, IIRC). So each rbind step must first grow l enough to accept the results from rbind(), then copy all over the previous l's contents. For fun, try the above in R. Notice how long it takes (you won't even need Rprof or any timing function). Then try

N=10000
l <- rnorm(N, 0, 1)

The following is better than the first version too:

N = 10000
l = numeric(N)
for (i in seq(1:N)) {
    sim <- rnorm(1, 0, 1)
    l[i] <- sim
}

0 讨论(0)

没有蜡笔的小新

2020-11-28 00:41
You can assign a value returning from an if-else block.

Instead of, e.g.
```
condition <- runif(1) > 0.5
if(condition) x <- 1 else x <- 2
```
you can do
```
x <- if(condition) 1 else 2
```
Exactly how this works is deep magic.
0 讨论(0)
发布评论:

提交评论
- 加载中...
梦谈多话

2020-11-28 00:41
Here is an annoying workaround to convert a factor into a numeric. (Similar for other data types as well)
```
old.var <- as.numeric(levels(old.var))[as.numeric(old.var)]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
太阳男子

2020-11-28 00:42
I find I am using with() and within() more and more. No more $ littering my code and one doesn't need to start attaching objects to the search path. More seriously, I find with() etc make the intention of my data analysis scripts much clearer.
```
> df <- data.frame(A = runif(10), B = rnorm(10))
> A <- 1:10 ## something else hanging around...
> with(df, A + B) ## I know this will use A in df!
 [1]  0.04334784 -0.40444686  1.99368816  0.13871605 -1.17734837
 [6]  0.42473812  2.33014226  1.61690799  1.41901860  0.8699079
```
with() sets up an environment within which the R expression is evaluated. within() does the same thing but allows you to modify the data object used to create the environment.
```
> df <- within(df, C <- rpois(10, lambda = 2))
> head(df)
           A          B C
1 0.62635571 -0.5830079 1
2 0.04810539 -0.4525522 1
3 0.39706979  1.5966184 3
4 0.95802501 -0.8193090 2
5 0.76772541 -1.9450738 2
6 0.21335006  0.2113881 4
```
Something I didn't realise when I first used within() is that you have to do an assignment as part of the expression evaluated and assign the returned object (as above) to get the desired effect.
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题