Better (non-linear) binning

后端 未结 2 907
予麋鹿
予麋鹿 2021-01-15 10:18

The last question I asked concerned how to bin data by x co-ordinate. The solution was simple and elegant, and I\'m ashamed I didn\'t see it. This question may be harder (

2条回答
  •  天涯浪人
    2021-01-15 11:05

    It sounds like you want to use bins that vary in size depending on the density of x values. I think you can still use the function HISTC like in the answer to your previous post, but you would just have to give it a different set of edges.

    I don't know if this is exactly want you want, but here's one suggestion: instead of splitting the x axis into 70 equally spaced groups, split the sorted x data into 70 equal groups and determine the edge values. I think this code should work:

    % Start by assuming x and y are vectors of data:
    
    nBins = 70;
    nValues = length(x);
    [xsort,index] = sort(x);  % Sort x in ascending order
    ysort = y(index);         % Sort y the same way as x
    binEdges = [xsort(1:ceil(nValues/nBins):nValues) xsort(nValues)+1];
    
    % Bin the data and get the averages as in previous post (using ysort instead of y):
    
    [h,whichBin] = histc(xsort,binEdges);
    
    for i = 1:nBins
        flagBinMembers = (whichBin == i);
        binMembers = ysort(flagBinMembers);
        binMean(i) = mean(binMembers);
    end
    

    This should give you bins that vary in size with the data density.


    UPDATE: Another version...

    Here's another idea I came up with after a few comments. With this code, you set a threshold (maxDelta) for the difference between neighboring data points in x. Any x values that differ from their larger neighbor by an amount greater than or equal to maxDelta are forced to be in their own bin (all by their lonesome). You still choose a value for nBins, but the final number of bins will be larger than this value when spread-out points are relegated to their own bins.

    % Start by assuming x and y are vectors of data:
    
    maxDelta = 10; % Or whatever suits your data set!
    nBins = 70;
    nValues = length(x);
    [xsort,index] = sort(x);  % Sort x in ascending order
    ysort = y(index);         % Sort y the same way as x
    
    % Create bin edges:
    
    edgeIndex = false(1,nValues);
    edgeIndex(1:ceil(nValues/nBins):nValues) = true;
    edgeIndex = edgeIndex | ([0 diff(xsort)] >= maxDelta);
    nBins = sum(edgeIndex);
    binEdges = [xsort(edgeIndex) xsort(nValues)+1];
    
    % Bin the data and get the y averages:
    
    [h,whichBin] = histc(xsort,binEdges);
    
    for i = 1:nBins
        flagBinMembers = (whichBin == i);
        binMembers = ysort(flagBinMembers);
        binMean(i) = mean(binMembers);
    end
    

    I tested this on a few small sample sets of data and it seems to do what it's supposed to. Hopefully it will work for your data set too, whatever it contains! =)

提交回复
热议问题