INTERSECTION of (n) arrays in ArangoDB AQL

爷,独闯天下 提交于 2019-12-05 18:35:05

ArangoDB 3.0 introduced special array comparison operators (ANY, ALL, NONE). ALL IN can be used to test if every element in the left-hand side array are also in the right-hand side array:

[ "red", "green", "blue" ] ALL IN [ "purple", "red", "blue", "green" ]
// true

Note that these operators can not use indexes yet. Given a data model that embeds the tags directly into the documents, a workaround is to use an index to find all documents that contain one of the tags (e.g. take the first element, ["red","green","blue"][0]) to reduce the result set without a full collection scan, then post-filter with ALL IN if the other tags are also in the list:

LET tagsToSearchFor = [ "red", "green", "blue" ]
FOR doc IN coll
  FILTER tagsToSearchFor[0] IN doc.tags[*] // array index
  FILTER tagsToSeachFor ALL IN doc.tags
  RETURN doc

ALL IN can also be used for your data model with a separate collection for tags, but you will not be able to make use of an index like above. For instance:

FOR doc IN documents
    LET tags = (
        FOR v IN INBOUND doc contains
            RETURN v._key
    )
    FILTER ["red", "green", "blue"] ALL IN tags
    RETURN MERGE(doc, {tags})

Or if you want to start the traversal with the tags and use an intersection-based approach:

LET startTags = ["red", "green", "blue"] // must exist
LET ids = (
    FOR startTag IN DOCUMENT("tags", startTags)
        RETURN (
            FOR v IN OUTBOUND startTag contains
                RETURN v._id
        )
)
LET docs = APPLY("INTERSECTION", ids)

FOR doc IN DOCUMENT(docs)
    RETURN MERGE(doc, {
        tags: (FOR tag IN INBOUND doc contains RETURN tag._key)

    })

I would consider storing your tags as attributes on the on your items. ArangoDB 2.8 includes array indexes which are exactly aimed at your scenario. From their blog post:

{ 
  text: "Here's what I want to retrieve...",
  tags: [ "graphdb", "ArangoDB", "multi-model" ]   
}

FOR doc IN documents 
  FILTER "graphdb" IN doc.tags[*] 
    RETURN doc

This should be both more performant and eliminate the need for the AQL above, simplifying your app.

You can ensure that you don't get documents twice in a result of an AQL-Query using the DISTINCT keyword.

Lets demonstate this in a graph query using the knows graph example:

var examples = require("org/arangodb/graph-examples/example-graph.js");

var g = examples.loadGraph("knows_graph");
db._query("FOR oneGuy IN persons " +
  "FOR v IN 1..1 OUTBOUND oneGuy GRAPH 'knows_graph' RETURN v.name").toArray()
[ 
  "Charlie", 
  "Dave", 
  "Alice", 
  "Bob", 
  "Bob" 
]

We see your situation, Bob is returned twice. Now lets add the distinct keyword:

db._query("FOR oneGuy IN persons " +
  "FOR v IN 1..1 OUTBOUND oneGuy GRAPH 'knows_graph' RETURN DISTINCT v.name"
  ).toArray()
[ 
  "Bob", 
  "Alice", 
  "Dave", 
  "Charlie" 
]
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!