可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am finding that a lot of time spent in my matlab function is in this code:
intersect(freq_bins, our_bins);
Both can be rather large vectors, and are comprised of only integers. I just need to know which integers are in both. This is truly the primitive purpose of intersect(), so I suspect that the answer is: it doesn't get any better. But maybe someone has some suggestions.
回答1:
intersect
calls ismember
. In your case, you don't need all the complicated checks that intersect
does, so you can save some overhead and call ismember
directly (note: I made sure to call both functions before timing them):
a = randi(1000,100,1); b = randi(1000,100,1); >> tic,intersect(a,b),toc ans = 76 338 490 548 550 801 914 930 Elapsed time is 0.027104 seconds. >> tic,a(ismember(a,b)),toc ans = 914 801 490 548 930 550 76 338 Elapsed time is 0.000613 seconds.
You can make this even faster by calling ismembc
, the function that does the actual testing, directly. Note that ismembc
requires sorted arrays (so you can drop the sort if your input is sorted already!)
tic,a=sort(a);b=sort(b);a(ismembc(a,b)),toc ans = 76 338 490 548 550 801 914 930 Elapsed time is 0.000473 seconds.
回答2:
If you can assume that your inputs contain sorted lists of unique integers, then you can do this in linear time with a very simple algorithm:
function c = intersect_sorted(a,b) ia = 1; na = length(a); ib = 1; nb = length(b); ic = 0; cn = min(na,nb); c = zeros(1,cn); while (ia <= na && ib <= nb) if (a(ia) > b(ib)) ib = ib + 1; elseif a(ia) < b(ib) ia = ia + 1; else % a(ia) == b(ib) ic = ic + 1; c(ic) = a(ia); ib = ib + 1; ia = ia + 1; end end c = c(1:ic); end
The max runtime for lists of length n and m will be O(n+m).
>>a = unique(randi(1000,100,1)); >>b = unique(randi(1000,100,1)); >>tic;for i = 1:10000, intersect(a,b); end,toc Elapsed time is 1.224514 seconds. >> tic;for i = 1:10000, intersect_sorted(a,b); end,toc Elapsed time is 0.289075 seconds.