cartesian product with dplyr R

前端 未结 6 539
清酒与你
清酒与你 2020-12-03 07:06

I\'m trying to find the dplyr function for cartesian product. I\'ve two simple data.frame with no common variable:

x <- data.frame(x=c(\"a\",\"b\",\"c\"))         


        
相关标签:
6条回答
  • 2020-12-03 07:18

    This is a continuation of dsz's comment. Idea came from: http://jarrettmeyer.com/2018/07/10/cross-join-dplyr.

    tbl_1$fake <- 1
    tbl_2$fake <- 1
    my_cross_join <- full_join(tbl_1, tbl_2, by = "fake") %>%
                     select(-fake)
    

    I tested this on four columns of data ranging in size from 4 to 640 obs, and it took about 1.08 seconds.

    0 讨论(0)
  • 2020-12-03 07:20

    Apologies to all: the below example does not appear to work with data.frames or data.tables.

    When x and y are database tbls (tbl_dbi / tbl_sql) you can now also do:

    full_join(x, y, by = character())

    Added to dplyr at the end of 2017, and also gets translated to a CROSS JOIN in the DB world. Saves the nastiness of having to introduce the fake variables.

    0 讨论(0)
  • 2020-12-03 07:29

    Use crossing from the tidyr package:

    x <- data.frame(x=c("a","b","c"))
    y <- data.frame(y=c(1,2,3))
    
    crossing(x, y)
    

    Result:

       x y
     1 a 1
     2 a 2
     3 a 3
     4 b 1
     5 b 2
     6 b 3
     7 c 1
     8 c 2
     9 c 3
    
    0 讨论(0)
  • 2020-12-03 07:33

    When faced with this problem, I tend to do something like this:

    x <- data.frame(x=c("a","b","c"))
    y <- data.frame(y=c(1,2,3))
    x %>% mutate(temp=1) %>% 
    inner_join(y %>% mutate(temp=1),by="temp") %>%
    dplyr::select(-temp) 
    

    If x and y are multi-column data frames, but I want to do every combination of a row of x with a row of y, then this is neater than any expand.grid() option that I can come up with

    0 讨论(0)
  • 2020-12-03 07:36

    If we need a tidyverse output, we can use expand from tidyr

    library(tidyverse)
    y %>% 
       expand(y, x= x$x) %>%
       select(x,y)
    # A tibble: 9 × 2
    #       x     y
    #  <fctr> <dbl>
    #1      a     1
    #2      b     1
    #3      c     1
    #4      a     2
    #5      b     2
    #6      c     2
    #7      a     3
    #8      b     3
    #9      c     3
    
    0 讨论(0)
  • 2020-12-03 07:39
    expand.grid(x=c("a","b","c"),y=c(1,2,3))
    

    Edit: Consider also this following elegant solution from "Y T" for n more complex data.frame :

    https://stackoverflow.com/a/21911221/5350791

    in short:

    expand.grid.df <- function(...) Reduce(function(...) merge(..., by=NULL), list(...))
    expand.grid.df(df1, df2, df3)
    
    0 讨论(0)
提交回复
热议问题