I have tens of millions of documents like the following.
id: "<some unit test id>",
groupName: "<some group name>",
result: [
1, 0, 1, 1, ... 1
Result field is an 200 array of numbers, 0 or 1.
My job is to find, given a groupName, say, "group17" and a few numbers, say, 3, 8, 27 find all the document whose result array elements for the groupName are all equal to 1 disregarding the values at positions 3, 8, 27.
Would appreciate if someone could point out if there is a quick search for it.
One way to achieve what you want is to add another field that contains the equivalent integer value of the bitset contained in the result
array and then use a bitwise AND operation.
For instance, let's say that the result array is
result: [1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0]
The integer value represented by those bits is 1470, so I store the following document:
PUT test/doc/1
"groupName": "group12",
"result": [
1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0
"resultLong": "1470"
Now, the query would look like this
POST test/_search
"query": {
"script": {
"script": {
"source": """
// 1. create a BigInt out of the resultLong value we just computed
def value = new BigInteger(doc['resultLong'].value.toString());
// 2. create a bitset filled with 1's except for those positions specified in the ignore parameters array
def comp = IntStream.range(1, 12).mapToObj(i -> params.ignore.contains(i - 1) ? "0" : "1").collect(Collectors.joining());
// 3. create a BigInt out of the number we've just created
def compare = new BigInteger(comp, 2);
// 4. compare both using a bitwise AND operation
return value.and(compare).equals(compare);
"params": {
"ignore": [1, 4, 10]
Step 2 first creates a string of length 11 filled with 1's or 0's if the current index is in the params.ignore
array. We end up with the string "10110111110"
Step 3 then creates a BigInteger out of that string (in base 2).
Step 4 compares both numbers bit by bit, i.e. the document will only be returned if both numbers have 1's at the same positions.
Note: for arrays of length 200, you need to use IntStream.range(1, 201)