partitioning an float array into similar segments (clustering)

前端未结

关注

 2  1085

I have an array of floats like this:

[1.91, 2.87, 3.61, 10.91, 11.91, 12.82, 100.73, 100.71, 101.89, 200]

Now, I want to partition the arra

相关标签:

2条回答

我寻月下人不归

2021-01-04 18:58

Clustering usually assumes multidimensional data.

If you have one dimensional data, sort it, and then use either kernel density estimation, or just scan for the largest gaps.

In 1 dimension, the problem gets substantially easier, because the data can be sorted. If you use a clustering algorithm, it will unfortunately not exploit this, so use a 1 dimensional method instead!

Consider finding the largest gap in 1 dimensional data. It's trivial: sort (n log n, but in practise as fast as it can get), then look at two adjacent values for the largest difference.

Now try defining "largest gap" in 2 dimensions, and an efficient algorithm to locate it...

0 讨论(0)
发布评论:

提交评论
- 加载中...

礼貌的吻别

2021-01-04 18:59

I think I'd sort the data (if it's not already), then take adjacent differences. Divide the differences by the smaller of the numbers it's a difference between to get a percentage change. Set a threshold and when the change exceeds that threshold, start a new "cluster".

Edit: Quick demo code in C++:

#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <numeric>
#include <functional>

int main() {
    std::vector<double> data{ 
        1.91, 2.87, 3.61, 10.91, 11.91, 12.82, 100.73, 100.71, 101.89, 200 
    };

    // sort the input data
    std::sort(data.begin(), data.end());

    // find the difference between each number and its predecessor
    std::vector<double> diffs;
    std::adjacent_difference(data.begin(), data.end(), std::back_inserter(diffs));

    // convert differences to percentage changes
    std::transform(diffs.begin(), diffs.end(), data.begin(), diffs.begin(),
        std::divides<double>());

    // print out the results
    for (int i = 0; i < data.size(); i++) {

        // if a difference exceeds 40%, start a new group:
        if (diffs[i] > 0.4)
            std::cout << "\n";

        // print out an item:
        std::cout << data[i] << "\t";
    }

    return 0;
}

Result:

1.91    2.87    3.61
10.91   11.91   12.82
100.71  100.73  101.89
200

0 讨论(0)