Cartesian product data frame

后端 未结 7 2081
臣服心动
臣服心动 2020-11-29 18:47

I have three or more independent variables represented as R vectors, like so:

A <- c(1,2,3)
B <- factor(c(\'x\',\'y\'))
C <- c(0.1,0.5)
相关标签:
7条回答
  • 2020-11-29 19:26

    Here's a way to do both, using Ramnath's suggestion of expand.grid:

    f <- function(x,y,z) paste(x,y,z,sep="+")
    d <- expand.grid(x=A, y=B, z=C)
    d$D <- do.call(f, d)
    

    Note that do.call works on d "as-is" because a data.frame is a list. But do.call expects the column names of d to match the argument names of f.

    0 讨论(0)
  • 2020-11-29 19:29

    With library tidyr one can use tidyr::crossing (order will be as in OP):

    library(tidyr)
    crossing(A,B,C)
    # A tibble: 12 x 3
    #        A B         C
    #    <dbl> <fct> <dbl>
    #  1     1 x       0.1
    #  2     1 x       0.5
    #  3     1 y       0.1
    #  4     1 y       0.5
    #  5     2 x       0.1
    #  6     2 x       0.5
    #  7     2 y       0.1
    #  8     2 y       0.5
    #  9     3 x       0.1
    # 10     3 x       0.5
    # 11     3 y       0.1
    # 12     3 y       0.5 
    

    The next step would be to use tidyverse and especially the purrr::pmap* family:

    library(tidyverse)
    crossing(A,B,C) %>% mutate(D = pmap_chr(.,paste,sep="_"))
    # A tibble: 12 x 4
    #        A B         C D      
    #    <dbl> <fct> <dbl> <chr>  
    #  1     1 x       0.1 1_1_0.1
    #  2     1 x       0.5 1_1_0.5
    #  3     1 y       0.1 1_2_0.1
    #  4     1 y       0.5 1_2_0.5
    #  5     2 x       0.1 2_1_0.1
    #  6     2 x       0.5 2_1_0.5
    #  7     2 y       0.1 2_2_0.1
    #  8     2 y       0.5 2_2_0.5
    #  9     3 x       0.1 3_1_0.1
    # 10     3 x       0.5 3_1_0.5
    # 11     3 y       0.1 3_2_0.1
    # 12     3 y       0.5 3_2_0.5
    
    0 讨论(0)
  • 2020-11-29 19:33

    There's a function manipulating dataframe, which is helpful in this case.

    It can produce various join(in SQL terminology), while Cartesian product is a special case.

    You have to convert the varibles to data frames first, because it take data frame as parameters.

    so something like this will do:

    A.B=merge(data.frame(A=A), data.frame(B=B),by=NULL);
    A.B.C=merge(A.B, data.frame(C=C),by=NULL);
    

    The only thing to care about is that rows are not sorted as you depicted. You may sort them manually as you wish.

    merge(x, y, by = intersect(names(x), names(y)),
          by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
          sort = TRUE, suffixes = c(".x",".y"),
          incomparables = NULL, ...)
    

    "If by or both by.x and by.y are of length 0 (a length zero vector or NULL), the result, r, is the Cartesian product of x and y"

    see this url for detail: http://stat.ethz.ch/R-manual/R-patched/library/base/html/merge.html

    0 讨论(0)
  • 2020-11-29 19:40

    Using cross join in sqldf:

    library(sqldf)
    
    A <- data.frame(c1 = c(1,2,3))
    B <- data.frame(c2 = factor(c('x','y')))
    C <- data.frame(c3 = c(0.1,0.5))
    
    result <- sqldf('SELECT * FROM (A CROSS JOIN B) CROSS JOIN C') 
    
    0 讨论(0)
  • 2020-11-29 19:44

    you can use expand.grid(A, B, C)

    EDIT: an alternative to using do.call to achieve the second part, is the function mdply from the package plyr. here is the code

    library(plyr)
    d = expand.grid(x = A, y = B, z = C)
    d = mdply(d, f)
    

    to illustrate its usage using a trivial function 'paste', you can try

    d = mdply(d, 'paste', sep = '+');
    
    0 讨论(0)
  • 2020-11-29 19:45

    Consider using the wonderful data.table library for expressiveness and speed. It handles many plyr use-cases (relational group by), along with transform, subset and relational join using a fairly simple uniform syntax.

    library(data.table)
    d <- CJ(x=A, y=B, z=C)  # Cross join
    d[, w:=f(x,y,z)]  # Mutates the data.table
    

    or in one line

    d <- CJ(x=A, y=B, z=C)[, w:=f(x,y,z)]
    
    0 讨论(0)
提交回复
热议问题