How to accumulate data-sets?

问题

I have vector with values between 1 and N > 1. Some values COULD occur multiple times consecutively. Now I want to have a second row which counts the consecutively entries and remove all those consecutively occuring entries, e.g.:

A = [1 2 1 1 3 2 4 4 1 1 1 2]'

would lead to:

(you see, the second column contains the number of consecutively entries! I came across accumarray() in MATLAB recently but I can't find any solution with it for this task since it always regards the whole vector and not only consecutively entries.

Any idea?

回答1:

This probably isn't the most readable or elegant way of doing it, but if you have large vectors and speed is an issue, this vectorisation may help...

A = [1 2 1 1 3 2 4 4 1 1 1 2];

First I'm going to pad A with a leading and trailing zero to capture the first and final transitions

>>  A = [0, A, 0];

The transition locations can be found where the difference between neighbouring values is not equal to zero:

>> locations = find(diff(A)~=0);

But because we padded the start of A with a zero, the first transition is nonsensical, so we only take the locations from 2:end. The values in A of these are the value of each segment:

>> first_column = A(locations(2:end))

ans =

     1     2     1     3     2     4     1     2

That's the first colomn - now to find the count of each number. This can be found from the difference in locations. This is where padding A at both ends becomes important:

>> second_column = diff(locations)

ans =

 1     1     2     1     1     2     3     1

Finally combining:

B = [first_column', second_column']

B =

 1     1
 2     1
 1     2
 3     1
 2     1
 4     2
 1     3
 2     1

This can all be combined into one less-readable line:

>> A = [1 2 1 1 3 2 4 4 1 1 1 2]';
>> B = [A(find(diff([A; 0]) ~= 0)), diff(find(diff([0; A; 0])))]

B =

 1     1
 2     1
 1     2
 3     1
 2     1
 4     2
 1     3
 2     1

回答2:

I don't see another way then looping through the data set, but it is rather straight forward. Maybe this is not the most elegant solution, but as far as I can see, it works fine.

function B = accum_data_set(A)
    prev = A(1);
    count = 1;
    B = [];
    for i=2:length(A)
        if (prev == A(i))
            count = count + 1;
        else
            B = [B;prev count];
            count = 1;
        end
        prev = A(i);
    end
    B = [B;prev count];

output:

>> A = [1 2 1 1 3 2 4 4 1 1 1 2]';
>> B = accum_data_set(A)

B =

     1     1
     2     1
     1     2
     3     1
     2     1
     4     2
     1     3
     2     1

来源：https://stackoverflow.com/questions/8941582/how-to-accumulate-data-sets

标签

matlab

vector

count

accumulate