Find entities whose ref-to-many attribute contains all elements of input

走远了吗. 提交于 2019-12-01 05:32:06

问题


Suppose I have entity entry with ref-to-many attribute :entry/groups. How should I build a query to find entities whose :entry/groups attribute contains all of my input foreign ids?

Next pseudocode will illustrate my question better:

[2 3] ; having this as input foreign ids

;; and having these entry entities in db
[{:entry/id "A" :entry/groups  [2 3 4]}  
 {:entry/id "B" :entry/groups  [2]}     
 {:entry/id "C" :entry/groups  [2 3]}  
 {:entry/id "D" :entry/groups  [1 2 3]}
 {:entry/id "E" :entry/groups  [2 4]}] 

;; only A, C, D should be pulled

Being new in Datomic/Datalog, I exhausted all options, so any help is appreciated. Thanks!


回答1:


TL;DR

You're tackling the general problem of 'dynamic conjunction' in Datomic's Datalog.

3 strategies here:

  1. Write a dynamic Datalog query which uses 2 negations and 1 disjunction or a recursive rule (see below)
  2. Generate the query code (equivalent to Alan Thompson's answer): the drawbacks are the usual drawbacks of generating Datalog clauses dynamically, i.e you don't benefit from query plan caching.
  3. Use the indexes directly (EAVT or AVET).

Dynamic Datalog query

Datalog has no direct way of expressing dynamic conjunction (logical AND / 'for all ...' / set intersection). However, you can achieve it in pure Datalog by combining one disjunction (logical OR / 'exists ...' / set union) and two negations, i.e (For all ?g in ?Gs p(?e,?g)) <=> NOT(Exists ?g in ?Gs, such that NOT(p(?e, ?g)))

In your case, this could be expressed as:

[:find [?entry ...] :in $ ?groups :where
  ;; these 2 clauses are for restricting the set of considered datoms, which is more efficient (and necessary in Datomic's Datalog, which will refuse to scan the whole db)
  ;; NOTE: this imposes ?groups cannot be empty!
  [(first ?groups) ?group0]
  [?entry :entry/groups ?group0]
  ;; here comes the double negation
  (not-join [?entry ?groups]
    [(identity ?groups) [?group ...]]
    (not-join [?entry ?group]
      [?entry :entry/groups ?group]))]

Good news: this can be expressed as a very general Datalog rule (which I may end up adding to Datofu):

[(matches-all ?e ?a ?vs)
 [(first ?vs) ?v0]
 [?e ?a ?v0]
 (not-join [?e ?a ?vs]
   [(seq ?vs) [?v ...]]
   (not-join [?e ?a ?v]
     [?e ?a ?v]))]

... which means your query can now be expressed as:

[:find [?entry ...] :in % $ ?groups :where
 (matches-all ?entry :entry/groups ?groups)]

NOTE: there's an alternate implementation using a recursive rule:

[[(matches-all ?e ?a ?vs)
  [(seq ?vs)]
  [(first ?vs) ?v]
  [?e ?a ?v]
  [(rest ?vs) ?vs2]
  (matches-all ?e ?a ?vs2)]
 [(matches-all ?e ?a ?vs)
  [(empty? ?vs)]]]

This one has the advantage of accepting an empty ?vs collection (so long as ?e and ?a have been bound in some other way in the query).

Generating the query code

The advantage of generating the query code is that it's relatively simple in this case, and it can probably make the query execution more efficient than the more dynamic alternative. The drawback of generating Datalog queries in Datomic is that you may lose the benefits of query plan caching; therefore, even if you're going to generate queries, you still want to make them as generic as possible (i.e depending only on the number of v values)

(defn q-find-having-all-vs 
  [n-vs]
  (let [v-syms (for [i (range n-vs)]
                 (symbol (str "?v" i)))]
    {:find '[[?e ...]]
     :in (into '[$ ?a] v-syms)
     :where 
     (for [?v v-syms]
       ['?e '?a ?v])}))

;; examples    
(q-find-having-all-vs 1)
=> {:find [[?e ...]], 
    :in [$ ?a ?v0],
    :where 
    ([?e ?a ?v0])}
(q-find-having-all-vs 2)
=> {:find [[?e ...]],
    :in [$ ?a ?v0 ?v1], 
    :where
    ([?e ?a ?v0] 
     [?e ?a ?v1])}
(q-find-having-all-vs 3)
=> {:find [[?e ...]], 
    :in [$ ?a ?v0 ?v1 ?v2], 
    :where 
    ([?e ?a ?v0] 
     [?e ?a ?v1]
     [?e ?a ?v2])}


;; executing the query: note that we're passing the attribute and values!
(apply d/q (q-find-having-all-vs (count groups))
  db :entry/group groups)

Use the indexes directly

I'm not sure at all how efficient the above approaches are in the current implementation of Datomic Datalog. If your benchmarking shows this is slow, you can always fall back to direct index access.

Here's an example in Clojure using the AVET index:

(defn find-having-all-vs
  "Given a database value `db`, an attribute identifier `a` and a non-empty seq of entity identifiers `vs`,
  returns a set of entity identifiers for entities which have all the values in `vs` via `a`"
  [db a vs]
  ;; DISCLAIMER: a LOT can be done to improve the efficiency of this code! 
  (apply clojure.set/intersection 
    (for [v vs]
      (into #{} 
        (map :e)
        (d/datoms db :avet a v)))))



回答2:


You can see an example of this in the James Bond example from the Tupelo-Datomic library. You just specify 2 clauses, one for each desired value in the set:

; Search for people that match both {:weapon/type :weapon/guile} and {:weapon/type :weapon/gun}
(let [tuple-set   (td/find :let    [$ (live-db)]
                           :find   [?name]
                           :where  {:person/name ?name :weapon/type :weapon/guile }
                                   {:person/name ?name :weapon/type :weapon/gun } ) ]
  (is (= #{["Dr No"] ["M"]} tuple-set )))

In pure Datomic it will look similar, but using something like the Entity ID:

[?eid :entry/groups 2]
[?eid :entry/groups 3]

and Datomic will perform an implicit AND operation (i.e. both clauses must match; any surplus entries are ignored). This is logically a "join" operation, even though it is the same entity being queried for both values. You can find more info in the Datomic docs.



来源:https://stackoverflow.com/questions/43784258/find-entities-whose-ref-to-many-attribute-contains-all-elements-of-input

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!