问题
I have two Python lists of different length. One may assume that one of the lists is multiple times larger than the other one.
Both lists contain the same physical data but captured with different sample rates.
My goal is to downsample the larger signal such that it has exactly as much data points as the smaller one.
I came up with the following code which basically does the job but is neither very Pythonic nor capable of handling very large lists in a performant way:
import math
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,4.5,6.9]
if len(a) > len(b):
div = int(math.floor(len(a)/len(b)))
a = a[::div]
diff = len(a)-len(b)
a = a[:-diff]
else:
div = int(math.floor(len(b)/len(a)))
b = b[::div]
diff = len(b)-len(a)
b = b[:-diff]
print a
print b
I would appreciated if more experienced Python users could elaborate alternative ways to solve this task.
Any answer or comment is highly appreciated.
回答1:
Here's a shortened version of the code (not necessarily better performance):
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,4.5,6.9]
order = 0 # To determine a and b.
if len(b) > len(a):
a, b = b, a # swap the values so that 'a' is always larger.
order = 1
div = len(a) / len(b) # In Python2, this already gives the floor.
a = a[::div][:len(b)]
if order:
print b
print a
else:
print a
print b
Since you're ultimately discarding some of the latter elements of the larger list, an explicit for
loop may increase performance, as then you don't have to "jump" to the values which will be discarded:
new_a = []
jump = len(b)
index = 0
for i in range(jump):
new_a.append(a[index])
index += jump
a = new_a
回答2:
First off, for performance you should be using numpy
. The questions been tagged with numpy
, so maybe you already are, and didn't show it, but in any case the lists can be converted to numpy arrays with
import numpy as np
a = np.array(a)
b = np.array(b)
Indexing is the same.
It's possible to use len
on arrays, but array.shape
is more general, giving the following (very similar) code.
a[::a.shape[0] // b.shape[0]]
Performance wise, this should give a large boost in speed for most data. Testing with a much larger a and b array (10e6, and 1e6 elements respectively), shows that numpy can give a large increase in performance.
a = np.ones(10000000)
b = np.ones(1000000)
%timeit a[::a.shape[0] // b.shape[0]] # Numpy arrays
1000000 loops, best of 3: 348 ns per loop
a = list(a);
b = list(b);
%timeit a[::len(a) // len(b)] # Plain old python lists
1000000 loops, best of 3: 29.5 ms per loop
回答3:
If you're iterating over the list, you could use a generator so you don't have to copy the whole thing to memory.
from __future__ import division
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,4.5,6.9]
def zip_downsample(a, b):
if len(a) > len(b):
b, a = a, b # make b the longer list
for i in xrange(len(a)):
yield a[i], b[i * len(b) // len(a)]
for z in zip_downsample(a, b):
print z
回答4:
#a = [1,2,3,4,5,6,7,8,9,10]
#b = [1,4.5,6.9]
a, b = zip(*zip(a, b))
# a = [1, 2, 3]
# b = [1, 4.5, 6.9]
The inner zip combines the lists into pars, discarding the excess items from the larger list, returning something like [(1, 1), (2, 4.5), (3, 6.9)]
. The outer zip then performs the inverse of this (since we unpack with the * operator), but since we have discarded the excess with the first zip, the lists should be the same size. This returns as [a, b]
so we then unpack to the respective variables (a, b = ...
).
See https://www.programiz.com/python-programming/methods/built-in/zip for more info on zip and using it as it's own inverse
来源:https://stackoverflow.com/questions/45965444/match-length-of-two-python-lists