partitioning an float array into similar segments (clustering)

前端 未结 2 1085
盖世英雄少女心
盖世英雄少女心 2021-01-04 18:13

I have an array of floats like this:

[1.91, 2.87, 3.61, 10.91, 11.91, 12.82, 100.73, 100.71, 101.89, 200]

Now, I want to partition the arra

相关标签:
2条回答
  • 2021-01-04 18:58

    Clustering usually assumes multidimensional data.

    If you have one dimensional data, sort it, and then use either kernel density estimation, or just scan for the largest gaps.

    In 1 dimension, the problem gets substantially easier, because the data can be sorted. If you use a clustering algorithm, it will unfortunately not exploit this, so use a 1 dimensional method instead!

    Consider finding the largest gap in 1 dimensional data. It's trivial: sort (n log n, but in practise as fast as it can get), then look at two adjacent values for the largest difference.

    Now try defining "largest gap" in 2 dimensions, and an efficient algorithm to locate it...

    0 讨论(0)
  • 2021-01-04 18:59

    I think I'd sort the data (if it's not already), then take adjacent differences. Divide the differences by the smaller of the numbers it's a difference between to get a percentage change. Set a threshold and when the change exceeds that threshold, start a new "cluster".

    Edit: Quick demo code in C++:

    #include <iostream>
    #include <vector>
    #include <algorithm>
    #include <iterator>
    #include <numeric>
    #include <functional>
    
    int main() {
        std::vector<double> data{ 
            1.91, 2.87, 3.61, 10.91, 11.91, 12.82, 100.73, 100.71, 101.89, 200 
        };
    
        // sort the input data
        std::sort(data.begin(), data.end());
    
        // find the difference between each number and its predecessor
        std::vector<double> diffs;
        std::adjacent_difference(data.begin(), data.end(), std::back_inserter(diffs));
    
        // convert differences to percentage changes
        std::transform(diffs.begin(), diffs.end(), data.begin(), diffs.begin(),
            std::divides<double>());
    
        // print out the results
        for (int i = 0; i < data.size(); i++) {
    
            // if a difference exceeds 40%, start a new group:
            if (diffs[i] > 0.4)
                std::cout << "\n";
    
            // print out an item:
            std::cout << data[i] << "\t";
        }
    
        return 0;
    }
    

    Result:

    1.91    2.87    3.61
    10.91   11.91   12.82
    100.71  100.73  101.89
    200
    
    0 讨论(0)
提交回复
热议问题