how to implement this array algorithm in a more efficient way?

问题

Assuming I have n = 3 lists of same length for example:

R1 = [7,5,8,6,0,6,7]

R2 = [8,0,2,2,0,2,2]

R3 = [1,7,5,9,0,9,9]

I need to find the first index t that verifies the n = 3 following conditions for a period p = 2. Edit: the meaning of period p is the number of consecutive "boxes".

R1[t] >= 5, R1[t+1] >= 5. Here t +p -1 = t+1, we need to only verify for two boxes t and t+1. If p was equal to 3 we will need to verify for t, t+1 and t+2. Note that It's always the same number for which we test, we always test if it's greater than 5 for every index. The condition is always the same for all the "boxes".
R2[t] >= 2, R2[t+1] >= 2
R3[t] >= 9, R3[t+1] >= 9

In total there is 3 * p conditions.

Here the t I am looking for is 5 (indexing is starting from 0).

The basic way to do this is by looping on all the indexes using a for loop. If the condition is found for some index t we store it in some local variable temp and we verify the conditions still hold for every element whose index is between t+1 and t+p -1. If while checking, we find an index that does not satisfy a condition, we forget about the temp and we keep going.

What is the most efficient way to do this in Python if I have large lists (like of 10000 elements)? Is there a more efficient way than the for loop?

回答1:

Since all your conditions are the same (>=), we could leverage this.

This solution will work for any number of conditions and any size of analysis window, and no for loop is used.

You have an array:

>>> R = np.array([R1, R2, R3]).T                                                                                                                                                                         
>>> R
array([[7, 8, 1],
       [5, 0, 7],
       [8, 2, 5],
       [6, 2, 9],
       [0, 0, 0],
       [6, 2, 9],
       [7, 2, 9]]

and you have thresholds:

>>> thresholds = [5, 2, 9]

So you can check where the conditions are met:

>>> R >= thresholds
array([[ True,  True, False],
       [ True, False, False],
       [ True,  True, False],
       [ True,  True,  True],
       [False, False, False],
       [ True,  True,  True],
       [ True,  True,  True]])

And where they all met at the same time:

>>> R_cond = np.all(R >= thresholds, axis=1)
>>> R_cond
array([False, False, False,  True, False,  True,  True])

From there, you want the conditions to be met for a given window.

We'll use the fact that booleans can sum together, and convolution to apply the window:

>>> win_size = 2
>>> R_conv = np.convolve(R_cond, np.ones(win_size), mode="valid")
>>> R_conv
array([0., 0., 1., 1., 1., 2.])

The resulting array will have values equal to win_size at the indices where all conditions are met on the window range.

So let's retrieve the first of those indices:

>>> index = np.where(R_conv == win_size)[0][0]
>>> index
5

If such an index doesn't exist, it will raise an IndexError, I'm letting you handle that.

So, as a one-liner function, it gives:

def idx_conditions(arr, thresholds, win_size, condition):
    return np.where(
        np.convolve(
            np.all(condition(arr, thresholds), axis=1),
            np.ones(win_size),
            mode="valid"
        )
        == win_size
    )[0][0]

I added the condition as an argument to the function, to be more general.

>>> from operator import ge
>>> idx_conditions(R, thresholds, win_size, ge)
5

回答2:

This could be a way:

R1 = [7,5,8,6,0,6,7]

R2 = [8,0,2,2,0,2,2]

R3 = [1,7,5,9,0,9,9]

for i,inext in zip(range(len(R1)),range(len(R1))[1:]):
    if (R1[i]>=5 and R1[inext]>=5)&(R2[i]>=2 and R2[inext]>=2)&(R3[i]>=9 and R3[inext]>=9):
        print(i)

Output:

Edit: Generalization could be:

def foo(ls,conditions):
    index=0
    for i,inext in zip(range(len(R1)),range(len(R1))[1:]):
        if all((ls[j][i]>=conditions[j] and ls[j][inext]>=conditions[j])  for j in range(len(ls))):
            index=i
    return index


R1 = [7,5,8,6,0,6,7]

R2 = [8,0,2,2,0,2,2]

R3 = [1,7,5,9,0,9,9]

R4 = [1,7,5,9,0,1,1]

R5 = [1,7,5,9,0,3,3]



conditions=[5,2,9,1,3]
ls=[R1,R2,R3,R4,R5]

print(foo(ls,conditions))

Output:

And, maybe if the arrays match the conditions multiple times, you could return a list of the indexes:

def foo(ls,conditions):
    index=[]
    for i,inext in zip(range(len(R1)),range(len(R1))[1:]):
        if all((ls[j][i]>=conditions[j] and ls[j][inext]>=conditions[j])  for j in range(len(ls))):
            print(i)
            index.append(i)
    return index


R1 = [6,7,8,6,0,6,7]

R2 = [2,2,2,2,0,2,2]

R3 = [9,9,5,9,0,9,9]

R4 = [1,1,5,9,0,1,1]

R5 = [3,3,5,9,0,3,3]

conditions=[5,2,9,1,3]
ls=[R1,R2,R3,R4,R5]

print(foo(ls,conditions))

Output:

[0,5]

回答3:

Here is a solution using numpy ,without for loops:

import numpy as np
R1 = np.array([7,5,8,6,0,6,7])
R2 = np.array([8,0,2,2,0,2,2])
R3 = np.array([1,7,5,9,0,9,9])
a = np.logical_and(np.logical_and(R1>=5,R2>=2),R3>=9)
np.where(np.logical_and(a[:-1],a[1:]))[0].item()

ouput

Edit:
Generalization
Say you have a list of lists R and a list of conditions c:

R = [[7,5,8,6,0,6,7],
     [8,0,2,2,0,2,2],
     [1,7,5,9,0,9,9]]
c = [5,2,9]

First we convert them to numpy arrays. the reshape(-1,1) converts c to a column matrix so that we can use pythons broadcasting feature in the >= operator

R = np.array(R)
c = np.array(c).reshape(-1,1)
R>=c

output:
array([[ True,  True,  True,  True, False,  True,  True],
       [ True, False,  True,  True, False,  True,  True],
       [False, False, False,  True, False,  True,  True]])

then we perform logical & operation between all rows using reduce function

a = np.logical_and.reduce(R>=c)
a
output:
array([False, False, False,  True, False,  True,  True])

next we create two arrays by removing first and last element of a and perform a logical & between them which shows which two subsequent elements satisfied the conditions in all lists:

np.logical_and(a[:-1],a[1:])
output:
array([False, False, False, False, False,  True])

now np.where just shows the index of the True element

np.where(np.logical_and(a[:-1],a[1:]))[0].item()
output:
5

来源：https://stackoverflow.com/questions/62746907/how-to-implement-this-array-algorithm-in-a-more-efficient-way

标签

python

performance

numpy