Maximize row-sameness given a binary MxN matrix and the ability to toggle columns?

匿名 (未验证) 提交于 2019-12-03 08:30:34

问题:

If you have a binary matrix of 1s and 0s, and you are able to toggle columns (change all 1s to 0s in the column, and all 0s to 1s), how do you find the max number of "pure" rows for all possible combinations of column toggles? "pure" meaning the row is all 0s, or all 1s.

Ex:

1 0

1 0

1 1

You can toggle either column to get 2 rows that are "pure", which is the best you can do (toggling both is not better), so you return 2 (the max number of "pure" rows).

I can't seem to figure out an efficient way to do this. The only way I've gotten so far is with a bunch of loops and brute force and checking for sameness by checking if the sum of a row is either 0 (all 0s) or N (the number of elements in a row).

回答1:

Update

After clarification from the OP, the max-pure row problem is to find the max number of rows that become either 00...0 or 11...1 after toggling. I have updated my solution accordingly.

Note that we have the following facts:

  1. If two rows ri and rj reduce to a pure row after toggling, then we must have ri = rj to start with.

  2. If rirj and ri overlaps rj (i.e. some of their corresponding column are the same), then both of them cannot map to a pure row.

Both of the facts above comes directly from the following observation:

Max number of "pure" rows is the same as the max number of identical rows 


Proof

We claim that all the rows that constitute a solution of the max-pure problem must be identical in the matrix M.

Suppose we are given a m-by-n matrix M, and we have found a solution of the max-pure row problem. Let rows ri and rj be two arbitrary rows that get reduce to pure rows after toggling.

Observe that after all the necessary toggling operation on the columns (denote by σ1, σ2, ..., σk), ri and rj are both "pure" rows. i.e. We have the following:

σ1(σ2(...(σk(ri)...)) = σ1(σ2(...(σk(rj)...)) = 00...0 

or

σ1(σ2(...(σk(ri)...)) = σ1(σ2(...(σk(rj)...)) = 11...1 

So after applying all these toggling operations, ri and rj will equal each other. If we undo the very last toggling (i.e. we toggling the same column entry of these rows), it is obviously that both ri and rj will still map to the same output. i.e. We have the following:

σ2(σ3(...(σk(ri)...)) = σ2(σ3(...(σk(rj)...)) 

If we we continue undoing the toggling operations, we can conclude that ri = rj. In other words, if you pick any arbitrary rows from a solution of the max-pure problem, these rows must be identical in the beginning.


Idea

Given a row ri, if it can be reduce to the pure row, say 00...0, then we know that another row rj cannot be reduced to 11...1 if ri overlaps with rj (from fact 2 above). We can only hope that another row rk which does not overlap with ri to reduce to 11...1.


Algorithm

From the preceding idea, we can have the following simple algorithm to solve the max-pure row problem.

We first scan over the rows of matrix M, and then find all the unique rows of the matrix (denote by s1, s2, ..., sk). We let count(si) denotes the number of times si appears in M. We then loop over all the pairs (si, sj) to determine the max-pure row number as below:

int maxCount = 0;  for each row si:     for each  sj ≠ si:         if (sj overlaps si)             continue;         else             if (count(si) + count(sj) > maxCount)                 // We have found a better pair                 maxCount = count(si) + count(sj);      return maxCount; 

We are doing O(n) works in the inner for loop (for entry-wise checking whether two rows overlap), and the loops are over O(m2) rows in the worst-case, so the running time of the algorithm is O(nm2).



回答2:

Maybe I'm missing something, but a quick run down the rows should answer your question.

Start with the top row, and flip each column as needed until the top row is all T. Count the number of pure rows. Repeat for every other row, finding if the count is greater than any previous row.

You don't need to invert the whole matrix so each row is all F, the count will be the same.

The worst-case running time would be O(nm).



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!