Is it possible to shuffle a 2D matrix while preserving row AND column frequencies?

后端 未结 4 974
孤城傲影
孤城傲影 2020-12-19 12:40

Suppose I have a 2D array like the following:

GACTG
AGATA
TCCGA

Each array element is taken from a small finite set (in my case, DNA nucleo

相关标签:
4条回答
  • 2020-12-19 13:14

    The answer to question 2 is no. Consider the following 2 matrices:

    A B C   C A B
    C A B   B C A
    B C A   A B C
    

    They clearly have the same row and column frequencies. Yet, there is no 2x2 submatrix with common corners.

    0 讨论(0)
  • 2020-12-19 13:23

    It turns out that for 0-1 matrices, 2x2 swaps are sufficient to get from one matrix to any other. This was proved by H J Ryser as Theorem 3.1 in a paper called "Combinatorial Properties of Matrices of Zeros and Ones": http://cms.math.ca/cjm/v9/cjm1957v09.0371-0377.pdf . People have been trying to prove for a while that the Markov chain based on 2x2 swaps mixes rapidly; this paper http://arxiv.org/pdf/1004.2612v3 seems to come the closest.

    If one could prove the generalization of Ryser's theorem to your case (maybe with up to 4x4 "swaps"), then on account of the symmetry of the swaps, it wouldn't be too hard to get a chain whose steady state distribution is uniform on the matrices of interest. I don't think there's any hope at the moment of proving that it mixes rapidly for all possible row/column distributions, but perhaps you know something about the distributions that we don't...

    0 讨论(0)
  • 2020-12-19 13:27

    Edit: oops missed the last paragraph of OP's question, let me rephrase.

    To digress briefly, the question you linked to had quite a hilarious discussion about the "level" of randomness for the selected solution, allow me to paraphrase:

    "...I really require matrices that are as random as possible..."

    "...The algorithm, as implemented in the code, is quite random..."

    "...if you choose this method, a different way to improve the randomness is to repeat the randomization process several times (a random number of times)..."

    None of these comments make any sort of sense, there is no such thing as "more" random, this is all exactly like this lovely Daily WTF entry. That said, the last quote is almost onto something. It's well known that if you simulate a Markov chain, like that random swapping algorithm, for long enough you will eventually start generating samples from the steady state distribution. Just exactly what that distribution looks like, who knows...

    Anyway, depending on your objectives you may not really care what this distribution looks like as long as it contains enough elements. So some sort of swapping algorithm might be useful, but I really would not expect this to be easy since the problem is NP-Complete (more general than Sudoku).

    With that in mind, you could consider solving your problem any approach that works for solving Sudoku, if you are in Acadamia I would suggest getting a copy of IBM CPLEX 12 which is free for academic use. You can code up a Sudoku-like solver in their CP language (OPL) and as the integer linear program solver to generate solutions for you. I think they even have example code for solving Sudoku you can borrow from.

    Here's the only truly random and unbiased way I can think of to sample from such matrices: First get CPLEX to find all N solutions to the given Sudoku problem. After you have this set of N solutions, draw a random number between 1 and N and use that solution, if you want another one, draw another number. Since generating all solutions might be a bit slow, you could approximate something like this by telling the solver to stop after a certain number of solutions or time elapsed and only sample from that set.

    0 讨论(0)
  • 2020-12-19 13:39

    No clue, but what you are talking about is basically a generalized sudoku solver. Try http://scholar.google.com/scholar?q=sudoku

    0 讨论(0)
提交回复
热议问题