问题
I'm trying to figure out how to a query to generate an ARRAY given a sliding window over a character column with Postgre.
For example, if I have this:
pid
<chr>
1 WP_096038768.1
2 WP_013465871.1
3 WP_058155244.1
4 WP_011329269.1
5 WP_058374608.1
6 WP_089368983.1
7 WP_096739105.1
8 WP_089346667.1
9 WP_096041177.1
10 WP_010553306.1
...
I want a sliding window of size 1 before and after the row. The result is this:
pid g
<chr> <chr>
1 WP_013465871.1 WP_096038768.1,WP_013465871.1,WP_058155244.1
2 WP_058155244.1 WP_013465871.1,WP_058155244.1,WP_011329269.1
3 WP_011329269.1 WP_058155244.1,WP_011329269.1,WP_058374608.1
4 WP_058374608.1 WP_011329269.1,WP_058374608.1,WP_089368983.1
5 WP_089368983.1 WP_058374608.1,WP_089368983.1,WP_096739105.1
6 WP_096739105.1 WP_089368983.1,WP_096739105.1,WP_089346667.1
7 WP_089346667.1 WP_096739105.1,WP_089346667.1,WP_096041177.1
8 WP_096041177.1 WP_089346667.1,WP_096041177.1,WP_010553306.1
9 WP_010553306.1 WP_096041177.1,WP_010553306.1,WP_007376542.1
10 WP_007376542.1 WP_010553306.1,WP_007376542.1,WP_039038284.1
...
Any hint is appreciated.
This example I did with R:
library(tidyverse)
library(dbplyr)
library(RPostgreSQL)
library(DBI)
st2tm %>%
mutate(
p1 = lag(pid),
p2 = lead(pid)
) %>%
group_by(pid) %>%
mutate(g = paste(na.omit(c(p1,pid,p2)), sep = ",")) %>%
ungroup() %>%
select(-c(p1, p2)) %>%
filter(str_count(g,",")==2)
But when applied to a Postgres table through a DBI connection it fails with
Error in vapply(x, escape, character(1), con = con) :
values must be length 1,
but FUN(X[[1]]) result is length 3
at paste
and Error: str_count() is not available in this SQL variant
at the filter
.
Also, I think some smarter strategy.
Thanks in advance
回答1:
This is most likely due to dbplyr not having translations defined for converting na.omit
or str_count
into postgresql (a translation for paste
is most likely defined).
You can replace str_count
and na.omit
by checking earlier for missing values.
st2tm %>%
mutate(
p1 = lag(pid),
p2 = lead(pid)
) %>%
filter(!is.na(p1),
!is.na(p2)) %>%
mutate(g = paste(p1, ",", pid, ",", p2)) %>%
select(-c(p1, p2)) %>%
And if paste
is the issue you could replace it with postgresql's inbuilt CONCAT
function.
st2tm %>%
mutate(
p1 = lag(pid),
p2 = lead(pid)
) %>%
filter(!is.na(p1),
!is.na(p2)) %>%
mutate(g = CONCAT(p1, ",", pid, ",", p2)) %>%
select(-c(p1, p2)) %>%
Because CONCAT
is not an R function, dbplyr will pass it as written to postgresql rather than attempting to translate it.
来源:https://stackoverflow.com/questions/64124013/postgres-generate-array-using-slide-window