INTERSECTION of (n) arrays in ArangoDB AQL

问题

The scenario is this: I have an ArangoDB collection containing items, and another collection containing tags. I am using a graph, and I have an edge collection called "Contains" connecting the items and tags. An item has multiple tags.

Now I am trying to do a search for items containing multiple tags. E.g. items containing the tags "photography", "portrait" and "faces".

My general approach is to start a graph traversal from each of the tag vertices and find the items that relate to that tag. That part works fine. I get a list of items.

But the last part of my task is to make an intersection of all the lists in order to find the items that contain ALL the tags specified. And I cannot work out how to do this.

What I wanted to do was something like this:

let tagnames = SPLIT(@tagnames,',')
let tagcollections = (
    FOR tagname IN tagnames
    LET atag = (FOR t IN tags FILTER LOWER(t.text)==LOWER(tagname) RETURN t)
    let collections = (FOR v IN 1..1 INBOUND atag[0] Contains RETURN v)

    RETURN { tag: atag, collections: collections }
)

RETURN INTERSECTION(tagcollections)

However, it doesn't work: The INTERSECTION function does not work on a single list, but on multiple items, like this: INTERSECTION(listA, listB, listC...).

How can I make an intersection of the lists found in the FOR .. RETURN block?

回答1:

ArangoDB 3.0 introduced special array comparison operators (ANY, ALL, NONE). ALL IN can be used to test if every element in the left-hand side array are also in the right-hand side array:

[ "red", "green", "blue" ] ALL IN [ "purple", "red", "blue", "green" ]
// true

Note that these operators can not use indexes yet. Given a data model that embeds the tags directly into the documents, a workaround is to use an index to find all documents that contain one of the tags (e.g. take the first element, ["red","green","blue"][0]) to reduce the result set without a full collection scan, then post-filter with ALL IN if the other tags are also in the list:

LET tagsToSearchFor = [ "red", "green", "blue" ]
FOR doc IN coll
  FILTER tagsToSearchFor[0] IN doc.tags[*] // array index
  FILTER tagsToSeachFor ALL IN doc.tags
  RETURN doc

ALL IN can also be used for your data model with a separate collection for tags, but you will not be able to make use of an index like above. For instance:

FOR doc IN documents
    LET tags = (
        FOR v IN INBOUND doc contains
            RETURN v._key
    )
    FILTER ["red", "green", "blue"] ALL IN tags
    RETURN MERGE(doc, {tags})

Or if you want to start the traversal with the tags and use an intersection-based approach:

LET startTags = ["red", "green", "blue"] // must exist
LET ids = (
    FOR startTag IN DOCUMENT("tags", startTags)
        RETURN (
            FOR v IN OUTBOUND startTag contains
                RETURN v._id
        )
)
LET docs = APPLY("INTERSECTION", ids)

FOR doc IN DOCUMENT(docs)
    RETURN MERGE(doc, {
        tags: (FOR tag IN INBOUND doc contains RETURN tag._key)

    })

回答2:

I would consider storing your tags as attributes on the on your items. ArangoDB 2.8 includes array indexes which are exactly aimed at your scenario. From their blog post:

{ 
  text: "Here's what I want to retrieve...",
  tags: [ "graphdb", "ArangoDB", "multi-model" ]   
}

FOR doc IN documents 
  FILTER "graphdb" IN doc.tags[*] 
    RETURN doc

This should be both more performant and eliminate the need for the AQL above, simplifying your app.

回答3:

You can ensure that you don't get documents twice in a result of an AQL-Query using the DISTINCT keyword.

Lets demonstate this in a graph query using the knows graph example:

var examples = require("org/arangodb/graph-examples/example-graph.js");

var g = examples.loadGraph("knows_graph");
db._query("FOR oneGuy IN persons " +
  "FOR v IN 1..1 OUTBOUND oneGuy GRAPH 'knows_graph' RETURN v.name").toArray()
[ 
  "Charlie", 
  "Dave", 
  "Alice", 
  "Bob", 
  "Bob" 
]

We see your situation, Bob is returned twice. Now lets add the distinct keyword:

db._query("FOR oneGuy IN persons " +
  "FOR v IN 1..1 OUTBOUND oneGuy GRAPH 'knows_graph' RETURN DISTINCT v.name"
  ).toArray()
[ 
  "Bob", 
  "Alice", 
  "Dave", 
  "Charlie" 
]

来源：https://stackoverflow.com/questions/35661231/intersection-of-n-arrays-in-arangodb-aql

标签

arangodb

aql