Lazy sequences in R

冷暖自知 提交于 2019-12-21 03:32:21

问题


In Clojure, it's easy to create infinite sequences using the lazy sequence constructor. For example,

(def N (iterate inc 0))

returns a data object N which is equivalent to the infinite sequence

(0 1 2 3 ...)

Evaluating the value N results in an infinite loop. Evaluating (take 20 N) returns the top 20 numbers. Since the sequence is lazy, the inc function is only iterated when you ask it to. Since Clojure is homoiconic, the lazy sequence is stored recursively.

In R, is it possible to do something similar? Can you present some sample R code which produces a data object N that is equivalent to the full infinite sequence of natural numbers? Evaluating the full object N should result in a loop, but something like head(N) should return just the leading numbers.

Note: I am really more interested in lazy sequences rather than the natural numbers per se.

Edit: Here is the Clojure source for lazy-seq:

(defmacro lazy-seq
"Takes a body of expressions that returns an ISeq or nil, and yields
a Seqable object that will invoke the body only the first time seq
is called, and will cache the result and return it on all subsequent
seq calls. See also - realized?"
{:added "1.0"}
[& body]
(list 'new 'clojure.lang.LazySeq (list* '^{:once true} fn* [] body)))    

I'm looking for a macro with the same functionality in R.


回答1:


Alternate Implementation

Introduction

I've had occasion to be working in R more frequently since this post, so I offer an alternate base R implementation. Again, I suspect you could get much better performance by dropping down to the C extension level. Original answer follows after the break.

The first challenge in base R is the lack of a true cons (exposed at the R level, that is). R uses c for a hybrid cons/concat operation, but this does not create a linked list, but rather a new vector populated with the elements of both arguments. In particular, the length of both arguments has to be known, which is not the case with lazy sequences. Additionally, successive c operations exhibit quadratic performance rather than constant time. Now, you could use "lists" (which are really vectors, not linked-lists) of length two to emulate cons cells, but...

The second challenge is the forcing of promises in data structures. R has some lazy evaluation semantics using implicit promises, but these are second class citizens. The function for returning an explicit promise, delay, has been deprecated in favor implicit delayedAssign, which is only executed for its side effects -- "Unevaluated promises should never be visible." Function arguments are implicit promises, so you can get your hands on them, but you can't place a promise into a data structure without it being forced.

CS 101

It turns out these two challenges can be solved by thinking back to Computer Science 101. Data structures can be implemented via closures.

cons <- function(h,t) function(x) if(x) h else t

first <- function(kons) kons(TRUE)
rest <- function(kons) kons(FALSE)  

Now due to R's lazy function argument semantics, our cons already capable of lazy sequences.

fibonacci <- function(a,b) cons(a, fibonacci(b, a+b))
fibs <- fibonacci(1,1)

In order to be useful though we need a suite of lazy sequence processing functions. In Clojure, the sequence processing functions that are part of the core language work naturally with lazy sequences as well. R's sequence functions, on the other hand, would not be immediately compatible. Many rely on knowing ahead of time the (finite) sequence length. Let's define a few capable of working on lazy sequences instead.

filterz <- function(pred, s) {
  if(is.null(s)) return(NULL)
  f <- first(s)
  r <- rest(s)
  if(pred(f)) cons(f, filterz(pred, r)) else filterz(pred, r) }

take_whilez <- function(pred, s) {
   if(is.null(s) || !pred(first(s))) return(NULL)
   cons(first(s), take_whilez(pred, rest(s))) }

reduce <- function(f, init, s) {
  r <- init
  while(!is.null(s)) {
    r <- f(r, first(s))
    s <- rest(s) }
  return(r) }

Let's use what we've created to sum all of the even Fibonacci numbers less than 4 million (Euler Project #2):

reduce(`+`, 0, filterz(function(x) x %% 2 == 0, take_whilez(function(x) x < 4e6, fibs)))
# [1] 4613732

Original Answer

I'm very rusty with R, but since (1) since I'm familiar with Clojure, and (2) I don't think you are getting your point across to the R users, I'll attempt a sketch based on my illustration of how Clojure lazy-sequences work. This is for example purposes only, and not tuned for performance in any way. It would probably be better implemented as an C extension (if one does not exist already).

A lazy sequence has the rest of the sequence generating calculation in a thunk. It is not immediately called. As each element (or chunk of elements as the case may be) is requested, a call to the next thunk is made to retrieve the value(s). That thunk may create another thunk to represent the tail of the sequence if it continues. The magic is that (1) these special thunks implement a sequence interface and can transparently be used as such and (2) each thunk is only called once -- its value is cached -- so the realized portion is a sequence of values.

Standard examples first

Natural numbers

numbers <- function(x) as.LazySeq(c(x, numbers(x+1)))
nums <- numbers(1)

take(10,nums) 
#=> [1]  1  2  3  4  5  6  7  8  9 10

#Slow, but does not overflow the stack (level stack consumption)
sum(take(100000,nums))
#=> [1] 5000050000

Fibonacci sequence

fibonacci <- function(a,b) { 
 as.LazySeq(c(a, fibonacci(b, a+b)))}

fibs <- fibonacci(1,1)

take(10, fibs)
#=> [1]  1  1  2  3  5  8 13 21 34 55

nth(fibs, 20)
#=> [1] 6765

Followed by naive R implementation

Lazy sequence class

is.LazySeq <- function(x) inherits(x, "LazySeq")

as.LazySeq <- function(s) {
  cache <- NULL
  value <- function() {
    if (is.null(cache)) {
      cache <<- force(s)
      while (is.LazySeq(cache)) cache <<- cache()}
    cache}
  structure(value, class="LazySeq")}

Some generic sequence methods with implementations for LazySeq

first <- function(x) UseMethod("first", x)
rest <- function(x) UseMethod("rest", x)

first.default <- function(s) s[1]

rest.default <- function(s) s[-1]

first.LazySeq <- function(s) s()[[1]]

rest.LazySeq <- function(s) s()[[-1]]

nth <- function(s, n) {
  while (n > 1) {
    n <- n - 1
    s <- rest(s) }
  first(s) }

#Note: Clojure's take is also lazy, this one is "eager"
take <- function(n, s) {
  r <- NULL
  while (n > 0) {
    n <- n - 1
    r <- c(r, first(s))
    s <- rest(s) }
  r}



回答2:


The iterators library might be able to achieve what you're looking for:

library(iterators)
i <- icount()
nextElem(i)
# 1
nextElem(i)
# 2

You can keep calling nextElem forever.




回答3:


Since R works best on vectors it seems like one often wants an iterator that return a vector, for instance

library(iterators)
ichunk <- function(n, ..., chunkSize) {
    it <- icount(n)              # FIXME: scale by chunkSize
    chunk <- seq_len(chunkSize)
    structure(list(nextElem=function() {
        (nextElem(it) - 1L) * chunkSize + chunk
    }), class=c("abstractiter", "iter"))
}

with typical use being with chunkSize in the millions range. At least this speeds up the approach to forever.




回答4:


You didn't state your intended goal, so I'll point out that, in any language,

j <- 1
while(TRUE) { x[j+1] <- x[j]+1  ; j<-j+1}

will give you an infinite sequence. What do you actually want to do with an iterator?



来源:https://stackoverflow.com/questions/23509381/lazy-sequences-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!