How to LEFT JOIN on ANY of the matching clauses in R?

我是研究僧i 提交于 2021-01-28 11:30:45

问题


could you please help me out with this:

I have a dataframe (df1) that has index of all articles published in the website's CMS. There's a column for current URL and a column of original URLs in case they were changed after publication (column name Origin):

URL Origin ArticleID Author Category Cost
https://example.com/article1 https://example.com/article 001 AuthorName Politics 120 USD
https://example.com/article2 https://example.com/article2 002 AuthorName Finance 68 USD

Next I have an huge dataframe (df2)with web analytics export for a timeframe. It has a date, just 1 column for URL and number of pageviews.

PageviewDate URL Pageviews
2019-01-01 https://example.com/article 224544
2019-01-01 https://example.com/article1 656565

How do I left join this with first dataframe but matching on either URL = URL OR Origin = URL

So that end result would look like this:

PageviewDate Pageviews ArticleID Author Category
2019-01-01 881109 001 AuthorName Politics

i.e 881109 is the result of adding up 224544 and 656565 that both related to the same article

I guess what I'm looking for is the equivalent of SQL syntax like:

LEFT JOIN ...`enter code here`
ON URL = URL
OR Origin = URL```

回答1:


You could get dataframe 1 (df1) in long format so that both Origin and URL are in the same column and then perform the join with second dataframe (df2).

library(dplyr)
library(tidyr)

df1 %>%
  pivot_longer(cols = c(URL, Origin), values_to = 'URL') %>%
  inner_join(df2, by = 'URL') %>%
  select(-name)

#  ArticleID Author     Category name   URL                          PageviewDate Pageviews
#      <int> <chr>      <chr>    <chr>  <chr>                        <chr>            <int>
#1         1 AuthorName Politics URL    https://example.com/article1 2019-01-01      656565
#2         1 AuthorName Politics Origin https://example.com/article  2019-01-01      224544

data

df1 <- structure(list(URL = c("https://example.com/article1", "https://example.com/article2"
), Origin = c("https://example.com/article", "https://example.com/article2"
), ArticleID = 1:2, Author = c("AuthorName", "AuthorName"), 
Category = c("Politics", "Finance")), class = "data.frame",row.names =c(NA, -2L))


df2 <- structure(list(PageviewDate = c("2019-01-01", "2019-01-01"), 
    URL = c("https://example.com/article", "https://example.com/article1"), 
Pageviews = c(224544L, 656565L)), class = "data.frame", row.names = c(NA, -2L))


来源:https://stackoverflow.com/questions/65638384/how-to-left-join-on-any-of-the-matching-clauses-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!