MATLAB: Find abbreviated version of matrix that minimises sum of matrix elements

こ雲淡風輕ζ 提交于 2019-12-01 03:26:25

This is an approximate solution using a genetic algorithm.

I started with your test case:

data_points = 10; % How many data points will be generated for each person, in order to create the correlation matrix.
num_people = 25; % Number of people initially.
to_keep = 13; % Number of people to be kept in the correlation matrix.
to_drop = num_people - to_keep; % Number of people to drop from the correlation matrix.
num_comparisons = 100; % Number of times to compare the iterative and optimization techniques.
for j = 1:data_points
    rand_dat(j,:) = 1 + 2.*randn(num_people,1); % Generate random data.
end
A = corr(rand_dat);

then I defined the functions you need to evolve the genetic algorithm:

function individuals = user1205901individuals(nvars, FitnessFcn, gaoptions, num_people)

individuals = zeros(num_people,gaoptions.PopulationSize);
for cnt=1:gaoptions.PopulationSize
    individuals(:,cnt)=randperm(num_people);
end

individuals = individuals(1:nvars,:)';

is the individual generation function.

function fitness = user1205901fitness(ind, A)

fitness = sum(sum(A(ind,ind)));

is the fitness evaluation function

function offspring = user1205901mutations(parents, options, nvars, FitnessFcn, state, thisScore, thisPopulation, num_people)

offspring=zeros(length(parents),nvars);
for cnt=1:length(parents)
    original = thisPopulation(parents(cnt),:);
    extraneus = setdiff(1:num_people, original);
    original(fix(rand()*nvars)+1) = extraneus(fix(rand()*(num_people-nvars))+1);
    offspring(cnt,:)=original;
end

is the function to mutate an individual

function children = user1205901crossover(parents, options, nvars, FitnessFcn, unused, thisPopulation)

children=zeros(length(parents)/2,nvars);
cnt = 1;
for cnt1=1:2:length(parents)
    cnt2=cnt1+1;
        male = thisPopulation(parents(cnt1),:);
        female = thisPopulation(parents(cnt2),:);
        child = union(male, female);
        child = child(randperm(length(child)));
        child = child(1:nvars);
        children(cnt,:)=child;
        cnt = cnt + 1;

end

is the function to generate a new individual coupling two parents.

At this point you can define your problem:

gaproblem2.fitnessfcn=@(idx)user1205901fitness(idx,A)
gaproblem2.nvars = to_keep
gaproblem2.options = gaoptions()
gaproblem2.options.PopulationSize=40
gaproblem2.options.EliteCount=10
gaproblem2.options.CrossoverFraction=0.1
gaproblem2.options.StallGenLimit=inf
gaproblem2.options.CreationFcn= @(nvars,FitnessFcn,gaoptions)user1205901individuals(nvars,FitnessFcn,gaoptions,num_people)
gaproblem2.options.CrossoverFcn= @(parents,options,nvars,FitnessFcn,unused,thisPopulation)user1205901crossover(parents,options,nvars,FitnessFcn,unused,thisPopulation)
gaproblem2.options.MutationFcn=@(parents, options, nvars, FitnessFcn, state, thisScore, thisPopulation) user1205901mutations(parents, options, nvars, FitnessFcn, state, thisScore, thisPopulation, num_people)
gaproblem2.options.Vectorized='off'

open the genetic algorithm tool

gatool

from the File menu select Import Problem... and choose gaproblem2 in the window that opens.

Now, run the tool and wait for the iterations to stop.

The gatool enables you to change hundreds of parameters, so you can trade speed for precision in the selected output.

The resulting vector is the list of indices that you have to keep in the original matrix so A(garesults.x,garesults.x) is the matrix with only the desired persons.

Jimmy

If I have understood you problem statement, you have a N x N matrix M (which happens to be a correlation matrix), and you wish to find for integer n where 2 <= n < N, a n x n matrix m which minimises the sum over all elements of m which I denote f(m)?

In Matlab it is fairly easy and fast to obtain a sub-matrix of a matrix (see for example Removing rows and columns from matrix in Matlab), and the function f is relatively inexpensive to evaluate for n = 151. So why can't you implement an algorithm that solves this backwards dynamically in a program as below where I have sketched out the pseudocode:

function reduceM(M, n){

    m = M

    for (ii = N to n+1) {

        for (jj = 1 to ii) {

            val(jj) = f(m) where mhas column and row jj removed, f(X) being summation over all elements of X

        }

        JJ(ii) = jj s.t. val(jj) is smallest

        m = m updated by removing column and row JJ(ii)   
    }
}

In the end you end up with an m of dimension n which is the solution to your problem and a vector JJ which contains the indices removed at each iteration (you should easily be able to convert these back to indices applicable to the full matrix M)

There are several approaches to finding an approximate solution (eg. quadratic programming on relaxed problem or greedy search), but finding the exact solution is an NP-hard problem.

Disclaimer: I'm not an expert on binary quadratic programming, and you may want to consult the academic literature for more sophisticated algorithms.

Mathematically equivalent formulation:

Your problem is equivalent to:

For some symmetric, positive semi-definite matrix S

minimize (over vector x)  x'*S*x

             subject to     0 <= x(i) <= 1        for all i
                            sum(x)==n
                            x(i) is either 1 or 0 for all i

This is a quadratic programming problem where the vector x is restricted to taking only binary values. Quadratic programming where the domain is restricted to a set of discrete values is called mixed integer quadratic programming (MIQP). The binary version is sometimes called Binary Quadratic Programming (BQP). The last restriction, that x is binary, makes the problem substantially more difficult; it destroys the problem's convexity!

Quick and dirty approach to finding an approximate answer:

If you don't need a precise solution, something to play around with might be a relaxed version of the problem: drop the binary constraint. If you drop the constraint that x(i) is either 1 or 0 for all i, then the problem becomes a trivial convex optimization problem and can be solved nearly instantaneously (eg. by Matlab's quadprog). You could try removing entries that, on the relaxed problem, quadprog assigns the lowest values in the x vector, but this does not truly solve the original problem!

Note also that the relaxed problem gives you a lower bound on the optimal value of the original problem. If your discretized version of the solution to the relaxed problem leads to a value for the objective function close to the lower bound, there may be a sense in which this ad-hoc solution can't be that far off from the true solution.


To solve the relaxed problem, you might try something like:

% k is number of observations to drop
n = size(S, 1);
Aeq = ones(1,n)
beq = n-k;
[x_relax, f_relax] = quadprog(S, zeros(n, 1), [], [], Aeq, beq, zeros(n, 1), ones(n, 1));
f_relax = f_relax * 2;  % Quadprog solves .5 * x' * S * x... so mult by 2
temp = sort(x_relax);
cutoff = temp(k);
x_approx = ones(n, 1);
x_approx(x_relax <= cutoff) = 0;
f_approx = x_approx' * S * x_approx;

I'm curious how good x_approx is? This doesn't solve your problem, but it might not be horrible! Note that f_relax is a lower bound on the solution to the original problem.


Software to solve your exact problem

You should check out this link and go down to the section on Mixed Integer Quadratic Programming (MIQP). It looks to me that Gurobi can solve problems of your type. Another list of solvers is here.

Working on a suggestion from Matthew Gunn and also some advice at the Gurobi forums, I came up with the following function. It seems to work pretty well.

I will award it the answer, but if someone can come up with code that works better I'll remove the tick from this answer and place it on their answer instead.

function [ values ] = the_optimal_method( CM , num_to_keep)
%the_iterative_method Takes correlation matrix CM and number to keep, returns list of people who should be kicked out 

N = size(CM,1);  

clear model;  
names = strseq('x',[1:N]);  
model.varnames = names;  
model.Q = sparse(CM); % Gurobi needs a sparse matrix as input  
model.A = sparse(ones(1,N));  
model.obj = zeros(1,N);  
model.rhs = num_to_keep;  
model.sense = '=';  
model.vtype = 'B';

gurobi_write(model, 'qp.mps');

results = gurobi(model);

values = results.x;

end
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!